Home

Latest versions	fixlist
11.1.0.7
10.5.0.9
10.1.0.6
9.8.0.5
9.7.0.11
9.5.0.10
9.1.0.12

Have problems? - contact us.
Register for free
Contact form

DB2 - Problem description

Problem IC90996	Status: Closed
SQL0952N : INCORRECT TIMEOUT VALUE OF -1 LEADS TO NODE FAILURES AND INTERMITTENT "LOG STATE MARKED BAD" ERRORS
product:
DB2 FOR LUW / DB2FORLUW / A10 - DB2
Problem description:
- This problem happens intermittently in DPF (multi-partition) environments. - You will notice INTERRUPTS (SQLCODE -952) on non-catalog node and ROLLBACKs (SQLCODE -1229) on catalog node, accompanied by following db2diag.log messages : On non-catalog nodes : 2013-02-27-19.42.XXX XXXX LEVEL: Error PID : 23330818 TID : 140509 PROC : db2sysc 22 INSTANCE: db2inst1 NODE : 015 DB : SAMPLE APPHDL : 0-22 APPID: xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx AUTHID : user HOSTNAME: AAAAAA EDUID : 140509 EDUNAME: db2agntp (SAMPLE) 15 FUNCTION: DB2 UDB, data protection services, SQLP_DBCB::setLogState, probe:5005 DATA #1 : <preformatted> Error detected during initialization. As a result, for precautionary reasons the database log state has been marked bad. 2013-02-27-19.42.XXX XXXX LEVEL: Severe PID : 23330818 TID : 140509 PROC : db2sysc 22 INSTANCE: db2inst1 NODE : 015 DB : SAMPLE APPHDL : 0-22 APPID: xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx AUTHID : user HOSTNAME: AAAAAA EDUID : 140509 EDUNAME: db2agntp (SAMPLE) 15 FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FirstConnect, probe:8721 DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes sqlcaid : SQLCA sqlcabc: 136 sqlcode: -952 sqlerrml: 0 sqlerrmc: sqlerrp : SQLEDINT sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000016 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate: - The first trigger of the problem can be found in db2diag.log when catalog node detects an fcm connection failure while trying to communicate with the non catalog node due to TIMEOUT : 2013-02-27-19.42.XXX XXXX LEVEL: Error PID : 23330818 TID : 140509 PROC : db2sysc 22 INSTANCE: db2inst1 NODE : 0 DB : SAMPLE APPHDL : 0-22 APPID: xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx AUTHID : user HOSTNAME: AAAAAA EDUID : 1800 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfNetworkServices::detectNodeFailure, probe:15 DATA #1 : <preformatted> Detected failure for node 15 - time elapsed: 4294967295; max timeout: 500; link state: 4 The max timeout by default is 500 (default values of 10 secs (CONN_ELAPSE ) and 5 ( MAX_CONNRETRIES ) it converts to 500 seconds). So in above example node 0 could not reach node 15 in more than 500 secs. Time elapsed: 4294967295, 4294967295 converts to hex 0xFFFFFFFF which is -1. This is the trigger of the FCM failures resulting in INTERRUPTS on non-catalog nodes, -1229's on catalog node and the log state being marked bad. This way the node becomes unreachable due to a timing problem in db2.
Problem Summary:
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Problem Description above. * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version V10.1 Fix Pack 3. * ****************************************************************
Local Fix:
N/A.
available fix packs:
DB2 Version 10.1 Fix Pack 3 for Linux, UNIX, and Windows DB2 Version 10.1 Fix Pack 4 for Linux, UNIX, and Windows DB2 Version 10.1 Fix Pack 3a for Linux, UNIX, and Windows DB2 Version 10.1 Fix Pack 6 for Linux, UNIX, and Windows
Solution
First fixed in DB2 Version 10.1 Fix Pack 3.
Workaround
not known / see Local fix
BUG-Tracking
forerunner : APAR is sysrouted TO one or more of the following: IC95228 follow-up :
Timestamps
Date - problem reported : Date - problem closed : Date - last modified :	20.03.2013 19.11.2013 19.11.2013
Problem solved at the following versions (IBM BugInfos)

Problem solved according to the fixlist(s) of the following version(s)
10.1.0.3
10.1.0.3