home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC75211 Status: Closed

IN DPF, CONNECT OR CONNECT RESET HANGS DUE TO MISSING REPLY AFTER NODE
FAILURE.

product:
DB2 FOR LUW / DB2FORLUW / 950 - DB2
Problem description:
During failed connect (which does implicit connect reset), or 
connect reset processing in a multi-node environment, if a node 
failure occurs, an expected reply from a remote node to the 
connect reset can be missed. The coordinator agent will hang in 
the following stack: 
 
sqloWaitEDUWaitPost 
WaitRecvReady 
ReceiveBuffer 
getNextBuffer 
sqlkd_rcv_buffer 
sqlkd_rcv_get_next_buffer 
sqlkd_rcv_init 
sqlkdReceiveReply 
sqleReceiveAndMergeReplies 
sqlkdInterrupt 
sqleDssStopUsing 
ForwardStopRequest 
AppStopUsing 
sqlesrspWrp 
sqleUCagentConnectReset 
sqljsCleanup 
sqljsDrdaAsInnerDriver 
sqljsDrdaAsDriver 
RunEDU 
 
A log should be made in the db2diag.log on the coord node 
similar to: 
 
2011-03-02-04.15.40.706078+540 I601932A472        LEVEL: Error 
PID     : 4841666              TID  : 4885        PROC : db2sysc 
1 
INSTANCE: db2inst              NODE : 001         DB   : P64816 
APPHDL  : 1-51                 APPID: *N1.dpfv971.110301191344 
AUTHID  : DB2INST 
EDUID   : 4885                 EDUNAME: db2agent (sample) 1 
FUNCTION: DB2 UDB, buffer dist serv, sqlkdReceiveReply, probe:10 
RETCODE : ZRC=0x81590016=-2124873706=SQLKF_NODE_FAILED "Node 
Recovery" 
 
 
Another indication of this hang is seeing one or more subagents 
for the stop using coord, stuck in log term sync, on a 
non-coord node with this callstack: 
 
sqloWaitEDUWaitPost 
WaitRecvReady 
ReceiveBuffer 
getNextBuffer 
sqlkd_rcv_buffer 
sqlkd_rcv_get_next_buffer 
sqlkd_rcv_init 
sqlkdReceiveReply 
sqlpLSrequestor 
sqlpPerformTermLogSync 
sqlpTermLogSync 
sqlpterm 
CleanDB 
TermDbConnect 
AppStopUsing 
sqleSubAgentStopUsing 
sqleSubRequestRouter 
 
As a result of the hang problem, a connection attempt to the 
node will fail with SQL1229N.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* Users using DPF environment                                  * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* During failed connect (which does implicit connect reset),   * 
* or connect reset processing in a multi-node environment,     * 
* if a node failure occurs, an expected reply from a remote    * 
* node to the connect reset can be missed. The coordinator     * 
* agent will hang in the following stack:                      * 
*                                                              * 
*                                                              * 
*                                                              * 
* sqloWaitEDUWaitPost                                          * 
* WaitRecvReady                                                * 
* ReceiveBuffer                                                * 
* getNextBuffer                                                * 
* sqlkd_rcv_buffer                                             * 
* sqlkd_rcv_get_next_buffer                                    * 
* sqlkd_rcv_init                                               * 
* sqlkdReceiveReply                                            * 
* sqleReceiveAndMergeReplies                                   * 
* sqlkdInterrupt                                               * 
* sqleDssStopUsing                                             * 
* ForwardStopRequest                                           * 
* AppStopUsing                                                 * 
* sqlesrspWrp                                                  * 
* sqleUCagentConnectReset                                      * 
* sqljsCleanup                                                 * 
* sqljsDrdaAsInnerDriver                                       * 
* sqljsDrdaAsDriver                                            * 
* RunEDU                                                       * 
*                                                              * 
* A log should be made in the db2diag.log on the coord node    * 
*                                                              * 
* similar to:                                                  * 
*                                                              * 
*                                                              * 
*                                                              * 
* 2011-03-02-04.15.40.706078+540 I601932A472        LEVEL:     * 
* Error                                                        * 
* PID    : 4841666              TID  : 4885        PROC :      * 
* db2sysc                                                      * 
* 1                                                            * 
*                                                              * 
* INSTANCE: db2inst              NODE : 001        DB  :       * 
* P64816                                                       * 
* APPHDL  : 1-51                APPID:                         * 
* *N1.dpfv971.110301191344                                     * 
* AUTHID  : DB2INST                                            * 
*                                                              * 
* EDUID  : 4885                EDUNAME: db2agent (sample) 1    * 
* FUNCTION: DB2 UDB, buffer dist serv, sqlkdReceiveReply,      * 
* probe:10                                                     * 
* RETCODE : ZRC=0x81590016=-2124873706=SQLKF_NODE_FAILED "Node * 
*                                                              * 
* Recovery"                                                    * 
*                                                              * 
*                                                              * 
*                                                              * 
* Another indication of this hang is seeing one or more        * 
* subagents for the stop using coord, stuck in log term        * 
* sync, on a non-coord node with this callstack:               * 
*                                                              * 
*                                                              * 
* sqloWaitEDUWaitPost                                          * 
* WaitRecvReady                                                * 
* ReceiveBuffer                                                * 
* getNextBuffer                                                * 
* sqlkd_rcv_buffer                                             * 
* sqlkd_rcv_get_next_buffer                                    * 
* sqlkd_rcv_init                                               * 
* sqlkdReceiveReply                                            * 
* sqlpLSrequestor                                              * 
* sqlpPerformTermLogSync                                       * 
* sqlpTermLogSync                                              * 
* sqlpterm                                                     * 
* CleanDB                                                      * 
* TermDbConnect                                                * 
* AppStopUsing                                                 * 
* sqleSubAgentStopUsing                                        * 
* sqleSubRequestRouter                                         * 
*                                                              * 
*                                                              * 
* As a result of the hang problem, a connection attempt to the * 
* node will fail with SQL1229N.                                * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to Version 9.5 FixPack 8.                            * 
****************************************************************
Local Fix:
available fix packs:
DB2 Version 9.5 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows
DB2 Version 9.5 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.5 Fix Pack 10 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Solution
Problem was first fixed in DB2 UDB Version 9.5 FixPack 8.
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
23.03.2011
30.06.2011
30.06.2011
Problem solved at the following versions (IBM BugInfos)
9.5.FP8
Problem solved according to the fixlist(s) of the following version(s)
9.5.0.8 FixList
9.7.0.5 FixList