home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC73109 Status: Closed

NODE FAILURE RECOVERY MAY BE MISSED DURING MULTIPLE NODE FAILURES LEADING
TO APPLICATION OR DATABASE HANG

product:
DB2 FOR LUW / DB2FORLUW / 970 - DB2
Problem description:
A node failure recovery may be missed when multiple physical 
nodes are failed over to become logical nodes. If this happens 
then 
an application may hang. The hung application may in fact cause 
the whole database to hang. 
 
Typical hung application may look like this: 
 
FODC_Hang_2010-11-09-10.58.38.885013/60162180.32957.007.stack.tx 
t 
-- db2stmm(DBNAME) -- 2010-11-09-11.04.41.130607(Signal #30) 
 
Stack: 
 
  0x09000000000EE2F8 thread_wait + 0x98 
  0x090000000908EF70 sqloWaitEDUWaitPost + 0x1C4 
  0x09000000052B42E0 WaitRecvReady__11sqkfChannelFiT1 + 0x3E0 
  0x09000000052B33FC 
ReceiveBuffer__11sqkfChannelFPP10sqkfBufferi + 0x77C 
  0x09000000052B270C 
getNextBuffer__18sqkdBdsBufferTableFPP10sqkfBufferP8SQLKD_CB + 
0x114 
  0x09000000052B2544 
@129@sqlkd_rcv_buffer__FP8SQLKD_CBPP10sqkfBuffer + 0x100 
  0x09000000052B21D8 @129@sqlkd_rcv_get_next_buffer__FP8SQLKD_CB 
+ 0x4C 
  0x09000000064293DC @129@sqlkd_rcv_init__FP8SQLKD_CBiT2 + 0xC8 
  0x0900000006F94A4C sqlkdReceiveReply__FP16sqlkdRqstRplyFmt + 
0x27C 
  0x0900000008C17A2C 
sqleReceiveAndMergeReplies__FP24SQLE_RECEIVE_MERGE_INPUTP25SQLE_ 
RECEIVE_MERGE_OUTPUTP5sqlcaP8sqlrr_cb 
+ 0x7C4 
  0x0900000006511DB0 
sqlkdInterrupt__FP22SQLKD_INTERRUPT_FORMATP5sqlcaP8sqlrr_cb + 
0x4C0 
  0x0900000005665000 
sqleDssStopUsing__FUcsP8sqeAgentP5sqlcaP14sqeApplicationP16sqeLo 
calDatabaseP8SQLE_BWA 
+ 0x780 
  0x09000000056FA314 
sqleDssStopUsing__FUcsP8sqeAgentP5sqlcaP14sqeApplicationP16sqeLo 
calDatabaseP8SQLE_BWA@glue5EC 
+ 0x7C 
  0x09000000056FA9B8 
ForwardStopRequest__14sqeApplicationFP8sqeAgentUcP5sqlcaP14sqeAp 
plicationP16sqeLocalDatabaseP8SQLE_BWA 
+ 0x45C 
  0x0900000009287690 
AppStopUsing__14sqeApplicationFP8sqeAgentUcP5sqlca + 0x6A8 
  0x09000000092917B4 @73@sqleIndDBConnTerm__FP8sqeAgentP5sqlcai 
+ 0x1DC 
  0x090000000635B26C @73@sqleIndCoordTerm__FP8sqeAgentP5sqlcaiT3 
+ 0xB8 
  0x0900000009291BF4 sqleIndCoordProcessRequest__FP8sqeAgent + 
0x11C 
 
 
Look for the following db2diag.log entries with no node recovery 
logged after this point. 
 
2010-11-09-10.40.52.182769+540 I4693614A518       LEVEL: Severe 
PID     : 60162180             TID  : 1029        PROC : db2sysc 
7 
INSTANCE: db2inst1             NODE : 007 
EDUID   : 1029                 EDUNAME: db2fcms 7 
FUNCTION: DB2 UDB, fast comm manager, 
sqkfNodeManager::refreshNodesCache, probe:21 
MESSAGE : OLD: nodenum: 2; lineno: 3; port: 1; hostname: 
hostname1; netname: 
          netname1; computer: computer1 
DATA #1 : Hexdump, 4 bytes 
0x070000000F3FB434 : 0000 0002 
.... 
 
2010-11-09-10.40.52.183095+540 I4694133A518       LEVEL: Severe 
PID     : 60162180             TID  : 1029        PROC : db2sysc 
7 
INSTANCE: db2inst1             NODE : 007 
EDUID   : 1029                 EDUNAME: db2fcms 7 
FUNCTION: DB2 UDB, fast comm manager, 
sqkfNodeManager::refreshNodesCache, probe:22 
MESSAGE : NEW: nodenum: 2; lineno: 3; port: 7; hostname: 
hostname2; netname: 
          netname2; computer: computer2 
DATA #1 : Hexdump, 4 bytes 
0x070000000F3FB434 : 0000 0002 
.... 
 
2010-11-09-10.40.52.184112+540 I4694652A388       LEVEL: Event 
PID     : 60162180             TID  : 1029        PROC : db2sysc 
7 
INSTANCE: db2inst1             NODE : 007 
EDUID   : 1029                 EDUNAME: db2fcms 7 
FUNCTION: DB2 UDB, fast comm manager, 
sqkfNodeManager::refreshNodesCache, probe:666 
DATA #1 : <preformatted> 
Attempt to Syncing up krcb cache with fcm cache-physical
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* DPF feature with multiple physicals                          * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* A node failure recovery may be missed when multiple physical * 
* nodes are failed over to become logical nodes. If this       * 
* happens then an application may hang. The hung application   * 
* may in fact cause the whole database to hang.                * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to db2 Version 9.7 FixPak 5                          * 
****************************************************************
Local Fix:
available fix packs:
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Solution
Problem was first fixed in Version 9.7 FixPak 5
Workaround
not known / see Local fix
BUG-Tracking
forerunner  : APAR is sysrouted TO one or more of the following: IC75700 
follow-up : 
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
08.12.2010
15.12.2011
15.12.2011
Problem solved at the following versions (IBM BugInfos)
9.7.
Problem solved according to the fixlist(s) of the following version(s)
9.7.0.5 FixList