home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC71429 Status: Closed

ON LARGE DPF SYSTEMS WITH MANY NODES, DB2STOP CAN TAKE A LONG TIME TO
COMPLETE (TOO LONG IN NODE RECOVERY)

product:
DB2 FOR LUW / DB2FORLUW / 950 - DB2
Problem description:
If there are many nodes, e.g. more than 90 nodes, in a DPF 
environment, then it is possible for db2stop to hit the default 
START_STOP_TIME timeout of 10 minutes, which would cause DB2 to 
issue a kill underneath to all nodes. 
 
Symptom: 
 
1.) db2diag.log would have logs like the following: 
 
2010-08-19-03.20.45.041184-300 I109662A299        LEVEL: Event 
PID     : 5910764              TID  : 1           PROC : 
db2stop2 
INSTANCE: XXXXXX              NODE : 000 
EDUID   : 1 
FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:240 
DATA #1 : String, 26 bytes 
Stop phase is in progress. 
 
2010-08-19-03.20.45.041799-300 I109962A314        LEVEL: Event 
PID     : 5910764              TID  : 1           PROC : 
db2stop2 
INSTANCE: XXXXXX              NODE : 000 
EDUID   : 1 
FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:250 
DATA #1 : String, 41 bytes 
Requesting system controller termination. 
 
 
and then many occurrences of the following messages: 
 
 
2010-08-19-03.21.20.554971-300 I115791A390        LEVEL: Error 
PID     : 389258               TID  : 772         PROC : db2sysc 
0 
INSTANCE: XXXXXX              NODE : 000 
EDUID   : 772                  EDUNAME: db2fcms 0 
FUNCTION: DB2 UDB, fast comm manager, 
sqkfSendConduit::ValidateConnectedLinks, probe:100 
RETCODE : ZRC=0x8159006B=-2124873621=SQLKF_CONN_CLOSED "FCM 
connection closed" 
 
2010-08-19-03.21.39.244694-300 I116936A362        LEVEL: Error 
PID     : 389258               TID  : 772         PROC : db2sysc 
0 
INSTANCE: XXXXXX              NODE : 000 
EDUID   : 772                  EDUNAME: db2fcms 0 
FUNCTION: DB2 UDB, fast comm manager, sqkfTcpLink::closeConn, 
probe:25 
MESSAGE : Link info: node 14; type 4; state 5; session 
0;activated 1 
 
2010-08-19-03.21.39.244867-300 I117299A390        LEVEL: Error 
PID     : 389258               TID  : 772         PROC : db2sysc 
0 
INSTANCE: XXXXXX              NODE : 000 
EDUID   : 772                  EDUNAME: db2fcms 0 
FUNCTION: DB2 UDB, fast comm manager, 
sqkfSendConduit::ValidateConnectedLinks, probe:100 
RETCODE : ZRC=0x8159006B=-2124873621=SQLKF_CONN_CLOSED "FCM 
connection closed" 
 
2010-08-19-03.21.50.657934-300 I118444A362        LEVEL: Error 
PID     : 389258               TID  : 772         PROC : db2sysc 
0 
INSTANCE: XXXXXX              NODE : 000 
EDUID   : 772                  EDUNAME: db2fcms 0 
FUNCTION: DB2 UDB, fast comm manager, sqkfTcpLink::closeConn, 
probe:25 
MESSAGE : Link info: node 15; type 4; state 5; session 
0;activated 1 
 
...... 
 
 
2.) The stack trace would have the following pattern: 
 
<StackTrace> 
-------Frame------ ------Function + Offset------ 
0x09000000001174D4 __fd_poll + 0x98 
0x09000000000A9AE0 poll + 0xC 
0x09000000000A8968 res_nsend + 0xDA4 
0x0900000000100D0C res_nquery + 0x130 
0x0900000000100370 res_nquerydomain + 0x180 
0x090000000010063C res_nsearch + 0x228 
0x09000000000B3D1C res_search + 0xA8 
0x0900000000106088 ho_byname2 + 0x13C 
0x09000000001210E0 ho_byname2 + 0x1AC 
0x09000000000A6550 gethostbyname2 + 0x190 
0x09000000000A9E98 getaddrinfo2 + 0x384 
0x09000000000AB2C4 getaddrinfo + 0x36C 
0x0900000009515220 sqloPdbTcpIpGetAddrInfo + 0x13C 
0x090000000C2DACE0 sqloPdbTcpIpResolveHostName + 0x1C4 
0x090000000C2DB030 sqloPdbTcpIpResolveHostName@glue557 + 0x7C 
0x09000000086A7324 sqloReadDb2nodes + 0x8C4 
0x090000000836A978 RefreshDb2nodesCache__19sqkfFastCommManagerFv 
+ 0x210 
0x090000000834FF48 
RefreshNodesInfo__15sqkfSendConduitFP14sqkfDataTargetiPb + 0x74 
0x090000000928A560 
CheckForFailoverConnectRetry__15sqkfSendConduitFsPi + 0x328 
0x090000000927F640 HandleConnectLostEvent__15sqkfSendConduitFUl 
+ 0x160 
0x090000000927D6D0 RunEDU__15sqkfSendConduitFv + 0x264 
0x0900000008A46E2C EDUDriver__9sqzEDUObjFv + 0xF8 
0x0900000008A4C778 sqloEDUEntry + 0x278 
</StackTrace>
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* Large DPF systems                                            * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* If there are many nodes, e.g. more than 90 nodes, in a       * 
* DPFenvironment, then it is possible for db2stop to hit       * 
* thedefault START_STOP_TIME timeout of 10 minutes, which      * 
* wouldcause DB2 to issue a kill underneath to all nodes.      * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Update to Version 9.5 Fix Pack 7                             * 
****************************************************************
Local Fix:
Solution
Problem was first fixed in Version 9.5 Fix Pack 7
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
23.09.2010
25.10.2010
25.10.2010
Problem solved at the following versions (IBM BugInfos)
9.5.FP7
Problem solved according to the fixlist(s) of the following version(s)