DB2 - Problem description
Problem IC71654 | Status: Closed |
TAKEOVER HADR COMMAND HANGS UP ON STANDBY WHEN A TRAP HAS BEEN PREVIOUSLY SUSTAINED IN PRIMARY DATABASE | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
The hang problem occurs if a takeover is issued on an HADR Standby when the HADR Primary has previously sustained a trap. On the HADR Standby: the takeover command will hang, and other commands such as 'db2stop force' will either hang or not work. On the HADR Primary: clients will be unable to connect. If the HADR Primary has previously sustained a trap, you will be able to see: 1) ADM14012C or ADM14013C messages in the administration notification log ({instance_name}.nfy) AND 2) A suspended db2agent in 'db2pd -EDUs' output. And even after you apply APAR IC69960 fix, the takeover command will get into hang on the conditions above. The takeover command fails on the condition above with the Severe error messages like ADM14013C in db2diag.log of primary, which indicate the db2agents had been suspended in primary like below. 2010-09-27-14.35.38.415495+540 I1781400A564 LEVEL: Severe PID : 1577038 TID : 11054 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TESTDB APPHDL : 0-367 APPID: 10.219.61.1.64526.100927053458 AUTHID : DB2INST1 EDUID : 11054 EDUNAME: db2agent (TESTDB) 0 FUNCTION: DB2 UDB, RAS/PD component, pdResilienceIsSafeToSustain, probe:800 DATA #1 : String, 37 bytes Trap Sustainability Criteria Checking DATA #2 : Hex integer, 8 bytes 0x0000000000021000 DATA #3 : Boolean, 1 bytes true ... 2010-09-27-14.35.38.625896+540 E1813735A941 LEVEL: Severe PID : 1577038 TID : 11054 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TESTDB APPHDL : 0-367 APPID: 10.219.61.1.64526.100927053458 AUTHID : DB2INST1 EDUID : 11054 EDUNAME: db2agent (TESTDB) 0 (suspended) 0 FUNCTION: DB2 UDB, DRDA Application Server, sqljsTrapResilience, probe:800 MESSAGE : ADM14013C The following type of critical error occurred: "Trap". This error occurred because one or more threads that are associated with the current DB2 instance have been suspended, but the instance process is still running. First Occurrence Data Capture (FODC) was invoked in the following mode: "Automatic". FODC diagnostic information is located in the following directory: "/var/log/db2/FODC_Trap_2010-09-27-14.35.38.031284/". For more information on sustained traps, see: * Enhanced resilience to errors and traps reduces outages http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t opic=/com.ibm.db2.luw.wn.doc/doc/c0054512.html | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * "takeover hadr" command hangs up when a trap has been * * sustained. * **************************************************************** * RECOMMENDATION: * * Upgrade to db2 Version 9.7 FixPak 4 * **************************************************************** | |
Local Fix: | |
If db2_kill is issued on the primary hadr system to disconnect HADR connection, takeover hadr should be ended with errors. For more information on recovering from sustained traps, see: * Recovering from sustained traps http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t opic=/com.ibm.db2.luw.admin.trb.doc/doc/t0055494.html | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
Solution | |
Problem was the first fixed in Version 9.7 FixPak 4 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 04.10.2010 09.05.2011 09.05.2011 |
Problem solved at the following versions (IBM BugInfos) | |
9.7. | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.4 |