DB2 - Problem description
Problem IC77640 | Status: Closed |
STANDBY SHUTDOWN AFTER LOG RETRIEVE ATTEMPT FAILURE | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
When a storage manager is setup on standby, database activation will fail if it cannot retrieve a log file required for recovery. This behavior is expected on a standard database, but not on a HADR standby. Standby should move from local catchup to remote catchup in order to fetch all the log files. Usually, if the problem is hit, the return code from userexit program is 4 or 8. Here are the related db2diag.log entries: 2011-05-21-15.22.48.976692-300 I227718A364 LEVEL: Warning PID : 26476788 TID : 5142 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 EDUID : 5142 EDUNAME: db2logmgr (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpgRetrieveLogFile, probe:4130 MESSAGE : Started retrieve for log file S0249205.LOG. 2011-05-21-15.27.09.530887-300 E229599A511 LEVEL: Error PID : 26476788 TID : 5142 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 EDUID : 5142 EDUNAME: db2logmgr (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpgUserexitLogAdminMsg, probe:1180 MESSAGE : ADM1835E The user exit program returned an error when retrieving log file "S0249205.LOG" to "/db2/SAMPLE/log_dir/NODE0000/" for database "SAMPLE". The error code was "8". 2011-05-21-15.27.09.544174-300 E230111A431 LEVEL: Warning PID : 26476788 TID : 5142 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 EDUID : 5142 EDUNAME: db2logmgr (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpgRetrieveLogFile, probe:4165 MESSAGE : ADM1847W Failed to retrieve log file "S0249205.LOG" on chain "23" to "/db2/SAMPLE/log_dir/NODE0000/". 2011-05-21-15.27.10.055514-300 I230543A469 LEVEL: Error PID : 26476788 TID : 5913 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 EDUID : 5913 EDUNAME: db2lfr (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlplfrOpenExtentRetrieve, probe:225 MESSAGE : Received error from db2logmgr on retrieve of log 249205, rc: DATA #1 : Hexdump, 4 bytes 0x0700000019FFD890 : 0000 0008 .... 2011-05-21-15.27.10.057751-300 I231013A478 LEVEL: Error PID : 26476788 TID : 16193 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : SAMPLE APPHDL : 0-8 APPID: *LOCAL.DB2.110521202116 EDUID : 16193 EDUNAME: db2redom (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpPRecReadLog, probe:1275 RETCODE : ZRC=0x82100016=-2112880618=SQLPLFR_RC_RETRIEVE_FAILED "Log could not be retrieved" 2011-05-21-15.27.10.437046-300 E233417A922 LEVEL: Critical PID : 26476788 TID : 4370 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : SAMPLE APPHDL : 0-8 APPID: *LOCAL.DB2.110521202116 EDUID : 4370 EDUNAME: db2agent (SAMPLE) 0 FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10 MESSAGE : ADM14001C An unexpected and critical error has occurred: "DBMarkedBad". The instance may have been shutdown as a result. "Automatic" FODC (First Occurrence Data Capture) has been invoked and diagnostic information has been recorded in directory "/db2/SAMPLE/db2dump/FODC_DBMarkedBad_2011-05-21-15.27.10.432900 /". Please look in this directory for detailed evidence about what happened and contact IBM support if necessary to diagnose the problem. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to 9.7 FP6 * **************************************************************** | |
Local Fix: | |
1. Move userexit program (the file should be sqllib/bin/db2uext2) to another path or rename it. 2. start standby: because userexit program is not available, we will get message similar to this: 2011-07-05-20.49.52.653962-420 E123698E549 LEVEL: Error PID : 12496 TID : 47739391961408PROC : db2sysc INSTANCE: sfbao NODE : 000 EDUID : 60 EDUNAME: db2logmgr (HADRDB) FUNCTION: DB2 UDB, data protection services, sqlpgUserexitLogAdminMsg, probe:1170 MESSAGE : ADM1834E DB2 was unable to find the user exit program when retrieving log file "S0003169.LOG" to "/u/sfbao/sfbao/NODE0000/SQL00001/SQLOGDIR/" for database "HADRDB". The error code was "24". However, log manager should return to lfr instantly after it detects this error, and standby will start up eventually i.e., move into remote catchup state. 3. Once HADR goes into a peer mode, move or rename userexit program back | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows | |
Solution | |
Problem first fixed on DB2 Version 9.7 Fix Pack 6 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 20.07.2011 05.12.2012 05.12.2012 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP6 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.6 |