DB2 - Problem description
Problem IC71429 | Status: Closed |
ON LARGE DPF SYSTEMS WITH MANY NODES, DB2STOP CAN TAKE A LONG TIME TO COMPLETE (TOO LONG IN NODE RECOVERY) | |
product: | |
DB2 FOR LUW / DB2FORLUW / 950 - DB2 | |
Problem description: | |
If there are many nodes, e.g. more than 90 nodes, in a DPF environment, then it is possible for db2stop to hit the default START_STOP_TIME timeout of 10 minutes, which would cause DB2 to issue a kill underneath to all nodes. Symptom: 1.) db2diag.log would have logs like the following: 2010-08-19-03.20.45.041184-300 I109662A299 LEVEL: Event PID : 5910764 TID : 1 PROC : db2stop2 INSTANCE: XXXXXX NODE : 000 EDUID : 1 FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:240 DATA #1 : String, 26 bytes Stop phase is in progress. 2010-08-19-03.20.45.041799-300 I109962A314 LEVEL: Event PID : 5910764 TID : 1 PROC : db2stop2 INSTANCE: XXXXXX NODE : 000 EDUID : 1 FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:250 DATA #1 : String, 41 bytes Requesting system controller termination. and then many occurrences of the following messages: 2010-08-19-03.21.20.554971-300 I115791A390 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfSendConduit::ValidateConnectedLinks, probe:100 RETCODE : ZRC=0x8159006B=-2124873621=SQLKF_CONN_CLOSED "FCM connection closed" 2010-08-19-03.21.39.244694-300 I116936A362 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfTcpLink::closeConn, probe:25 MESSAGE : Link info: node 14; type 4; state 5; session 0;activated 1 2010-08-19-03.21.39.244867-300 I117299A390 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfSendConduit::ValidateConnectedLinks, probe:100 RETCODE : ZRC=0x8159006B=-2124873621=SQLKF_CONN_CLOSED "FCM connection closed" 2010-08-19-03.21.50.657934-300 I118444A362 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfTcpLink::closeConn, probe:25 MESSAGE : Link info: node 15; type 4; state 5; session 0;activated 1 ...... 2.) The stack trace would have the following pattern: <StackTrace> -------Frame------ ------Function + Offset------ 0x09000000001174D4 __fd_poll + 0x98 0x09000000000A9AE0 poll + 0xC 0x09000000000A8968 res_nsend + 0xDA4 0x0900000000100D0C res_nquery + 0x130 0x0900000000100370 res_nquerydomain + 0x180 0x090000000010063C res_nsearch + 0x228 0x09000000000B3D1C res_search + 0xA8 0x0900000000106088 ho_byname2 + 0x13C 0x09000000001210E0 ho_byname2 + 0x1AC 0x09000000000A6550 gethostbyname2 + 0x190 0x09000000000A9E98 getaddrinfo2 + 0x384 0x09000000000AB2C4 getaddrinfo + 0x36C 0x0900000009515220 sqloPdbTcpIpGetAddrInfo + 0x13C 0x090000000C2DACE0 sqloPdbTcpIpResolveHostName + 0x1C4 0x090000000C2DB030 sqloPdbTcpIpResolveHostName@glue557 + 0x7C 0x09000000086A7324 sqloReadDb2nodes + 0x8C4 0x090000000836A978 RefreshDb2nodesCache__19sqkfFastCommManagerFv + 0x210 0x090000000834FF48 RefreshNodesInfo__15sqkfSendConduitFP14sqkfDataTargetiPb + 0x74 0x090000000928A560 CheckForFailoverConnectRetry__15sqkfSendConduitFsPi + 0x328 0x090000000927F640 HandleConnectLostEvent__15sqkfSendConduitFUl + 0x160 0x090000000927D6D0 RunEDU__15sqkfSendConduitFv + 0x264 0x0900000008A46E2C EDUDriver__9sqzEDUObjFv + 0xF8 0x0900000008A4C778 sqloEDUEntry + 0x278 </StackTrace> | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Large DPF systems * **************************************************************** * PROBLEM DESCRIPTION: * * If there are many nodes, e.g. more than 90 nodes, in a * * DPFenvironment, then it is possible for db2stop to hit * * thedefault START_STOP_TIME timeout of 10 minutes, which * * wouldcause DB2 to issue a kill underneath to all nodes. * **************************************************************** * RECOMMENDATION: * * Update to Version 9.5 Fix Pack 7 * **************************************************************** | |
Local Fix: | |
Solution | |
Problem was first fixed in Version 9.5 Fix Pack 7 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 23.09.2010 25.10.2010 25.10.2010 |
Problem solved at the following versions (IBM BugInfos) | |
9.5.FP7 | |
Problem solved according to the fixlist(s) of the following version(s) |