DB2 - Problembeschreibung
Problem IC73109 | Status: Geschlossen |
NODE FAILURE RECOVERY MAY BE MISSED DURING MULTIPLE NODE FAILURES LEADING TO APPLICATION OR DATABASE HANG | |
Produkt: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problembeschreibung: | |
A node failure recovery may be missed when multiple physical nodes are failed over to become logical nodes. If this happens then an application may hang. The hung application may in fact cause the whole database to hang. Typical hung application may look like this: FODC_Hang_2010-11-09-10.58.38.885013/60162180.32957.007.stack.tx t -- db2stmm(DBNAME) -- 2010-11-09-11.04.41.130607(Signal #30) Stack: 0x09000000000EE2F8 thread_wait + 0x98 0x090000000908EF70 sqloWaitEDUWaitPost + 0x1C4 0x09000000052B42E0 WaitRecvReady__11sqkfChannelFiT1 + 0x3E0 0x09000000052B33FC ReceiveBuffer__11sqkfChannelFPP10sqkfBufferi + 0x77C 0x09000000052B270C getNextBuffer__18sqkdBdsBufferTableFPP10sqkfBufferP8SQLKD_CB + 0x114 0x09000000052B2544 @129@sqlkd_rcv_buffer__FP8SQLKD_CBPP10sqkfBuffer + 0x100 0x09000000052B21D8 @129@sqlkd_rcv_get_next_buffer__FP8SQLKD_CB + 0x4C 0x09000000064293DC @129@sqlkd_rcv_init__FP8SQLKD_CBiT2 + 0xC8 0x0900000006F94A4C sqlkdReceiveReply__FP16sqlkdRqstRplyFmt + 0x27C 0x0900000008C17A2C sqleReceiveAndMergeReplies__FP24SQLE_RECEIVE_MERGE_INPUTP25SQLE_ RECEIVE_MERGE_OUTPUTP5sqlcaP8sqlrr_cb + 0x7C4 0x0900000006511DB0 sqlkdInterrupt__FP22SQLKD_INTERRUPT_FORMATP5sqlcaP8sqlrr_cb + 0x4C0 0x0900000005665000 sqleDssStopUsing__FUcsP8sqeAgentP5sqlcaP14sqeApplicationP16sqeLo calDatabaseP8SQLE_BWA + 0x780 0x09000000056FA314 sqleDssStopUsing__FUcsP8sqeAgentP5sqlcaP14sqeApplicationP16sqeLo calDatabaseP8SQLE_BWA@glue5EC + 0x7C 0x09000000056FA9B8 ForwardStopRequest__14sqeApplicationFP8sqeAgentUcP5sqlcaP14sqeAp plicationP16sqeLocalDatabaseP8SQLE_BWA + 0x45C 0x0900000009287690 AppStopUsing__14sqeApplicationFP8sqeAgentUcP5sqlca + 0x6A8 0x09000000092917B4 @73@sqleIndDBConnTerm__FP8sqeAgentP5sqlcai + 0x1DC 0x090000000635B26C @73@sqleIndCoordTerm__FP8sqeAgentP5sqlcaiT3 + 0xB8 0x0900000009291BF4 sqleIndCoordProcessRequest__FP8sqeAgent + 0x11C Look for the following db2diag.log entries with no node recovery logged after this point. 2010-11-09-10.40.52.182769+540 I4693614A518 LEVEL: Severe PID : 60162180 TID : 1029 PROC : db2sysc 7 INSTANCE: db2inst1 NODE : 007 EDUID : 1029 EDUNAME: db2fcms 7 FUNCTION: DB2 UDB, fast comm manager, sqkfNodeManager::refreshNodesCache, probe:21 MESSAGE : OLD: nodenum: 2; lineno: 3; port: 1; hostname: hostname1; netname: netname1; computer: computer1 DATA #1 : Hexdump, 4 bytes 0x070000000F3FB434 : 0000 0002 .... 2010-11-09-10.40.52.183095+540 I4694133A518 LEVEL: Severe PID : 60162180 TID : 1029 PROC : db2sysc 7 INSTANCE: db2inst1 NODE : 007 EDUID : 1029 EDUNAME: db2fcms 7 FUNCTION: DB2 UDB, fast comm manager, sqkfNodeManager::refreshNodesCache, probe:22 MESSAGE : NEW: nodenum: 2; lineno: 3; port: 7; hostname: hostname2; netname: netname2; computer: computer2 DATA #1 : Hexdump, 4 bytes 0x070000000F3FB434 : 0000 0002 .... 2010-11-09-10.40.52.184112+540 I4694652A388 LEVEL: Event PID : 60162180 TID : 1029 PROC : db2sysc 7 INSTANCE: db2inst1 NODE : 007 EDUID : 1029 EDUNAME: db2fcms 7 FUNCTION: DB2 UDB, fast comm manager, sqkfNodeManager::refreshNodesCache, probe:666 DATA #1 : <preformatted> Attempt to Syncing up krcb cache with fcm cache-physical | |
Problem-Zusammenfassung: | |
**************************************************************** * USERS AFFECTED: * * DPF feature with multiple physicals * **************************************************************** * PROBLEM DESCRIPTION: * * A node failure recovery may be missed when multiple physical * * nodes are failed over to become logical nodes. If this * * happens then an application may hang. The hung application * * may in fact cause the whole database to hang. * **************************************************************** * RECOMMENDATION: * * Upgrade to db2 Version 9.7 FixPak 5 * **************************************************************** | |
Local-Fix: | |
verfügbare FixPacks: | |
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows | |
Lösung | |
Problem was first fixed in Version 9.7 FixPak 5 | |
Workaround | |
keiner bekannt / siehe Local-Fix | |
Bug-Verfolgung | |
Vorgänger : APAR is sysrouted TO one or more of the following: IC75700 Nachfolger : | |
Weitere Daten | |
Datum - Problem gemeldet : Datum - Problem geschlossen : Datum - der letzten Änderung: | 08.12.2010 15.12.2011 15.12.2011 |
Problem behoben ab folgender Versionen (IBM BugInfos) | |
9.7. | |
Problem behoben lt. FixList in der Version | |
9.7.0.5 |