DB2 - Problem description
Problem IC75211 | Status: Closed |
IN DPF, CONNECT OR CONNECT RESET HANGS DUE TO MISSING REPLY AFTER NODE FAILURE. | |
product: | |
DB2 FOR LUW / DB2FORLUW / 950 - DB2 | |
Problem description: | |
During failed connect (which does implicit connect reset), or connect reset processing in a multi-node environment, if a node failure occurs, an expected reply from a remote node to the connect reset can be missed. The coordinator agent will hang in the following stack: sqloWaitEDUWaitPost WaitRecvReady ReceiveBuffer getNextBuffer sqlkd_rcv_buffer sqlkd_rcv_get_next_buffer sqlkd_rcv_init sqlkdReceiveReply sqleReceiveAndMergeReplies sqlkdInterrupt sqleDssStopUsing ForwardStopRequest AppStopUsing sqlesrspWrp sqleUCagentConnectReset sqljsCleanup sqljsDrdaAsInnerDriver sqljsDrdaAsDriver RunEDU A log should be made in the db2diag.log on the coord node similar to: 2011-03-02-04.15.40.706078+540 I601932A472 LEVEL: Error PID : 4841666 TID : 4885 PROC : db2sysc 1 INSTANCE: db2inst NODE : 001 DB : P64816 APPHDL : 1-51 APPID: *N1.dpfv971.110301191344 AUTHID : DB2INST EDUID : 4885 EDUNAME: db2agent (sample) 1 FUNCTION: DB2 UDB, buffer dist serv, sqlkdReceiveReply, probe:10 RETCODE : ZRC=0x81590016=-2124873706=SQLKF_NODE_FAILED "Node Recovery" Another indication of this hang is seeing one or more subagents for the stop using coord, stuck in log term sync, on a non-coord node with this callstack: sqloWaitEDUWaitPost WaitRecvReady ReceiveBuffer getNextBuffer sqlkd_rcv_buffer sqlkd_rcv_get_next_buffer sqlkd_rcv_init sqlkdReceiveReply sqlpLSrequestor sqlpPerformTermLogSync sqlpTermLogSync sqlpterm CleanDB TermDbConnect AppStopUsing sqleSubAgentStopUsing sqleSubRequestRouter As a result of the hang problem, a connection attempt to the node will fail with SQL1229N. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Users using DPF environment * **************************************************************** * PROBLEM DESCRIPTION: * * During failed connect (which does implicit connect reset), * * or connect reset processing in a multi-node environment, * * if a node failure occurs, an expected reply from a remote * * node to the connect reset can be missed. The coordinator * * agent will hang in the following stack: * * * * * * * * sqloWaitEDUWaitPost * * WaitRecvReady * * ReceiveBuffer * * getNextBuffer * * sqlkd_rcv_buffer * * sqlkd_rcv_get_next_buffer * * sqlkd_rcv_init * * sqlkdReceiveReply * * sqleReceiveAndMergeReplies * * sqlkdInterrupt * * sqleDssStopUsing * * ForwardStopRequest * * AppStopUsing * * sqlesrspWrp * * sqleUCagentConnectReset * * sqljsCleanup * * sqljsDrdaAsInnerDriver * * sqljsDrdaAsDriver * * RunEDU * * * * A log should be made in the db2diag.log on the coord node * * * * similar to: * * * * * * * * 2011-03-02-04.15.40.706078+540 I601932A472 LEVEL: * * Error * * PID : 4841666 TID : 4885 PROC : * * db2sysc * * 1 * * * * INSTANCE: db2inst NODE : 001 DB : * * P64816 * * APPHDL : 1-51 APPID: * * *N1.dpfv971.110301191344 * * AUTHID : DB2INST * * * * EDUID : 4885 EDUNAME: db2agent (sample) 1 * * FUNCTION: DB2 UDB, buffer dist serv, sqlkdReceiveReply, * * probe:10 * * RETCODE : ZRC=0x81590016=-2124873706=SQLKF_NODE_FAILED "Node * * * * Recovery" * * * * * * * * Another indication of this hang is seeing one or more * * subagents for the stop using coord, stuck in log term * * sync, on a non-coord node with this callstack: * * * * * * sqloWaitEDUWaitPost * * WaitRecvReady * * ReceiveBuffer * * getNextBuffer * * sqlkd_rcv_buffer * * sqlkd_rcv_get_next_buffer * * sqlkd_rcv_init * * sqlkdReceiveReply * * sqlpLSrequestor * * sqlpPerformTermLogSync * * sqlpTermLogSync * * sqlpterm * * CleanDB * * TermDbConnect * * AppStopUsing * * sqleSubAgentStopUsing * * sqleSubRequestRouter * * * * * * As a result of the hang problem, a connection attempt to the * * node will fail with SQL1229N. * **************************************************************** * RECOMMENDATION: * * Upgrade to Version 9.5 FixPack 8. * **************************************************************** | |
Local Fix: | |
available fix packs: | |
DB2 Version 9.5 Fix Pack 8 for Linux, UNIX, and Windows | |
Solution | |
Problem was first fixed in DB2 UDB Version 9.5 FixPack 8. | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 23.03.2011 30.06.2011 30.06.2011 |
Problem solved at the following versions (IBM BugInfos) | |
9.5.FP8 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.5.0.8 | |
9.7.0.5 |