DB2 - Problem description
Problem IC75700 | Status: Closed |
NODE FAILURE RECOVERY MAY BE MISSED DURING MULTIPLE NODE FAILURES LEADING TO APPLICATION OR DATABASE HANG | |
product: | |
DB2 FOR LUW / DB2FORLUW / 950 - DB2 | |
Problem description: | |
A node failure recovery may be missed when multiple physical nodes are failed over to become logical nodes. If this happens then an application may hang. The hung application may in fact cause the whole database to hang. Typical hung application may look like this: FODC_Hang_2010-11-09-10.58.38.885013/60162180.32957.007.stack.tx t -- db2stmm(DBNAME) -- 2010-11-09-11.04.41.130607(Signal #30) Stack: 0x09000000000EE2F8 thread_wait + 0x98 0x090000000908EF70 sqloWaitEDUWaitPost + 0x1C4 0x09000000052B42E0 WaitRecvReady__11sqkfChannelFiT1 + 0x3E0 0x09000000052B33FC ReceiveBuffer__11sqkfChannelFPP10sqkfBufferi + 0x77C 0x09000000052B270C getNextBuffer__18sqkdBdsBufferTableFPP10sqkfBufferP8SQLKD_CB + 0x114 0x09000000052B2544 @129@sqlkd_rcv_buffer__FP8SQLKD_CBPP10sqkfBuffer + 0x100 0x09000000052B21D8 @129@sqlkd_rcv_get_next_buffer__FP8SQLKD_CB + 0x4C 0x09000000064293DC @129@sqlkd_rcv_init__FP8SQLKD_CBiT2 + 0xC8 0x0900000006F94A4C sqlkdReceiveReply__FP16sqlkdRqstRplyFmt + 0x27C 0x0900000008C17A2C sqleReceiveAndMergeReplies__FP24SQLE_RECEIVE_MERGE_INPUTP25SQLE_ RECEIVE_MERGE_OUTPUTP5sqlcaP8sqlrr_cb + 0x7C4 0x0900000006511DB0 sqlkdInterrupt__FP22SQLKD_INTERRUPT_FORMATP5sqlcaP8sqlrr_cb + 0x4C0 0x0900000005665000 sqleDssStopUsing__FUcsP8sqeAgentP5sqlcaP14sqeApplicationP16sqeLo calDatabaseP8SQLE_BWA + 0x780 0x09000000056FA314 sqleDssStopUsing__FUcsP8sqeAgentP5sqlcaP14sqeApplicationP16sqeLo calDatabaseP8SQLE_BWA@glue5EC + 0x7C 0x09000000056FA9B8 ForwardStopRequest__14sqeApplicationFP8sqeAgentUcP5sqlcaP14sqeAp plicationP16sqeLocalDatabaseP8SQLE_BWA + 0x45C 0x0900000009287690 AppStopUsing__14sqeApplicationFP8sqeAgentUcP5sqlca + 0x6A8 0x09000000092917B4 @73@sqleIndDBConnTerm__FP8sqeAgentP5sqlcai + 0x1DC 0x090000000635B26C @73@sqleIndCoordTerm__FP8sqeAgentP5sqlcaiT3 + 0xB8 0x0900000009291BF4 sqleIndCoordProcessRequest__FP8sqeAgent + 0x11C Look for the following db2diag.log entries with no node recovery logged after this point. 2010-11-09-10.40.52.182769+540 I4693614A518 LEVEL: Severe PID : 60162180 TID : 1029 PROC : db2sysc 7 INSTANCE: db2inst1 NODE : 007 EDUID : 1029 EDUNAME: db2fcms 7 FUNCTION: DB2 UDB, fast comm manager, sqkfNodeManager::refreshNodesCache, probe:21 MESSAGE : OLD: nodenum: 2; lineno: 3; port: 1; hostname: hostname1; netname: netname1; computer: computer1 DATA #1 : Hexdump, 4 bytes 0x070000000F3FB434 : 0000 0002 .... 2010-11-09-10.40.52.183095+540 I4694133A518 LEVEL: Severe PID : 60162180 TID : 1029 PROC : db2sysc 7 INSTANCE: db2inst1 NODE : 007 EDUID : 1029 EDUNAME: db2fcms 7 FUNCTION: DB2 UDB, fast comm manager, sqkfNodeManager::refreshNodesCache, probe:22 MESSAGE : NEW: nodenum: 2; lineno: 3; port: 7; hostname: hostname2; netname: netname2; computer: computer2 DATA #1 : Hexdump, 4 bytes 0x070000000F3FB434 : 0000 0002 .... 2010-11-09-10.40.52.184112+540 I4694652A388 LEVEL: Event PID : 60162180 TID : 1029 PROC : db2sysc 7 INSTANCE: db2inst1 NODE : 007 EDUID : 1029 EDUNAME: db2fcms 7 FUNCTION: DB2 UDB, fast comm manager, sqkfNodeManager::refreshNodesCache, probe:666 DATA #1 : <preformatted> Attempt to Syncing up krcb cache with fcm cache-physical | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * Node failure recovery may be missed during multiple node * * failures leading to application or database hang. * **************************************************************** * RECOMMENDATION: * * Upgrade to db2 Version 9.7 FixPack inclusing IC73109 fix * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 08.04.2011 09.05.2011 09.05.2011 |
Problem solved at the following versions (IBM BugInfos) | |
9.7. | |
Problem solved according to the fixlist(s) of the following version(s) |