DB2 - Problem description
Problem IT08542 | Status: Closed |
HADR STANDBY DATABASE IS TERMINATED ABNORMALLY AFTER RETRIEVING A BAD LOG FILE FROM ARCHIVE | |
product: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
Problem description: | |
If the standby database retrieved a log file from archive, but this retrieved log file cannot be used for whatever reason, the error condition causes the standby database to be terminated abnormally. The original version of this log file on the standby, which might contain some valid log data, is replaced by the retrieved log file. First, Standby successfully retrieves the extent from archive and then it deletes the log file from active path. db2diag.log contains the information regarding the retrieval of the log file is complete. 2015-04-24-12.12.26.897103-240 I70566E601 LEVEL: Info PID : 3002 TID : 46914804377920 PROC :db2sysc INSTANCE: yafanhu NODE : 000 DB :SAMPLE HOSTNAME: hotellnx101 EDUID : 131 EDUNAME: db2logmgr (HADRDB) FUNCTION: DB2 UDB, data protection services, sqlpgRetrieveLogFile, probe:4148 DATA #1 : <preformatted> Completed retrieve for log file S0000034.LOG on chain 1 to /home/hotellnx93/yafanhu/yafanhu/NODE0000/SQL00001/LOGSTREAM0000 /LOGSTREAM0000/. ...... 2015-04-24-12.12.26.937103-240 I70566E601 LEVEL: Warning PID : 3002 TID : 46914804377920 PROC :db2sysc INSTANCE: yafanhu NODE : 000 DB :SAMPLE HOSTNAME: hotellnx101 EDUID : 299 EDUNAME: db2lfr.0 (SAMPLE) FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog, probe:622 DATA #1 : SQLPLFR_SCAN_ID, PD_TYPE_SQLPLFR_SCAN_ID, 8 bytes LFR Scan Num = 6 LFR Scan Caller's EDUID = 307 MESSAGE : Deleted stale log file from active log path. staleExtNum: DATA #2 : SQLPG_EXTENT_NUM, PD_TYPE_SQLPG_EXTENT_NUM, 4 bytes 34 Second, standby notifies that the log file retrieve from archive is not from the valid chain. But by now it is too late, as the log from active path has been deleted already. 2015-04-24-12.12.26.937849-240 I71869E2830 LEVEL: Info PID : 3002 TID : 46914804377920 PROC : db2sysc INSTANCE: yafanhu NODE : 000 DB : SAMPLE HOSTNAME: hotellnx101 EDUID : 299 EDUNAME: db2lfr.0 (SAMPLE) FUNCTION: DB2 UDB, recovery manager, sqlplfrIsLogFromValidChain, probe:9999 MESSAGE : ZRC=0x071000D7=118489303=SQLP_EXT_NOT_IN_CHAIN "This extent is not a successor of the previous extent. Fwd Recovery can not continue." .... Third HADR realizes it encountered an abnormal condition caused by the bad log file. 2015-04-24-12.12.31.212006-240 E112205E450 LEVEL: Error PID : 3002 TID : 46914791795008 PROC : db2sysc INSTANCE: yafanhu NODE : 000 DB : SAMPLE HOSTNAME: hotellnx101 EDUID : 307 EDUNAME: db2hadrs.0.0 (SAMPLE) FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEdu::hdrEduS, probe:21580 MESSAGE : ADM12509E HADR encountered an abnormal condition. Reason code: "1" 2015-04-24-12.12.31.212826-240 I112656E593 LEVEL: Warning PID : 3002 TID : 46914791795008 PROC : db2sysc INSTANCE: yafanhu NODE : 000 DB : SAMPLE HOSTNAME: hotellnx101 EDUID : 307 EDUNAME: db2hadrs.0.0 (SAMPLE) FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEdu::hdrEduS, probe:21580 MESSAGE : ZRC=0x87800148=-2021654200=HDR_ZRC_BAD_LOG "HADR standby found bad log" DATA #1 : String, 99 bytes HADR standby error handling: will close connection to primary, then reconnect, and perform a retry. After a couple of retries, standby is shut down. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * DB2 LUW * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description. * * * * The main reason we are fixing this is not so much about if * * the standby give some error. It is the fact that the * * standby will delete the partially written but good log file * * in the active log path, after retrieving a bad log file from * * the archive. * * * * In addition, in the db2diag.log file, customer should see * * following dump messages. * * * * First, Standby successfully retrieves the extent from * * archive and then it deletes the log file from active path. * * * * 2015-04-24-12.12.26.936351-240 I70095E470 LEVEL: * * Warning * * PID : 3002 TID : 46914804377920 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 299 EDUNAME: db2lfr.0 (SAMPLE) * * FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog, * * probe:620 * * DATA #1 : <preformatted> * * LFR Scan Num = 6 * * LFR Scan Caller's EDUID = 307 * * Extent 0: Using extent from archive. * * * * 2015-04-24-12.12.26.937103-240 I70566E601 LEVEL: * * Warning * * PID : 3002 TID : 46914804377920 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 299 EDUNAME: db2lfr.0 (SAMPLE) * * FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog, * * probe:622 * * DATA #1 : SQLPLFR_SCAN_ID, PD_TYPE_SQLPLFR_SCAN_ID, 8 bytes * * LFR Scan Num = 6 * * LFR Scan Caller's EDUID = 307 * * MESSAGE : Deleted stale log file from active log path. * * staleExtNum: * * DATA #2 : SQLPG_EXTENT_NUM, PD_TYPE_SQLPG_EXTENT_NUM, 4 * * bytes * * 0 * * * * Second, standby notifies that the log file retrieve from * * archive is not from the valid chain. * * 2015-04-24-12.12.26.937849-240 I71869E2830 LEVEL: * * Info * * PID : 3002 TID : 46914804377920 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 299 EDUNAME: db2lfr.0 (SAMPLE) * * FUNCTION: DB2 UDB, recovery manager, * * sqlplfrIsLogFromValidChain, probe:9999 * * MESSAGE : ZRC=0x071000D7=118489303=SQLP_EXT_NOT_IN_CHAIN * * "This extent is not a successor of the previous * * extent. Fwd Recovery can not continue." * * DATA #1 : <preformatted> * * LFR Scan Num = 6 * * LFR Scan Caller's EDUID = 307 * * Log chain validation failed for extent 0 on log stream 0 and * * log chain 1 * * Specific errors were previously logged. * * logAtBkp Extent: 0, bkpEndMrkr Extent: 0 * * lastRecLfsLsn: 7179/000000000003ADE6, LastRecLfsLsnLFH: * * 7179/000000000003ADE6 * * backupEndLso: 40760001, lastRecLso: 38444548, lso: 40760001, * * firstLso: 40760001 * * firstLFSInExtent: 7180, lsnBase: 000000000003ADE7, * * firstLFSInNextExtent: 0 * * lastLfsLsnInExtent: 7289/000000000003B0AE * * OCID: 1429891562, CID: 1429891562, PID: 4294967295 * * lfrCurExtNum: 0. lfrPrevExtNum: 4294967295, lfrPrevExtCId: * * 4294967295 * * Prev ext chain: * * 18446744073709551615/FFFFFFFFFFFFFFFF/4294967295/0 * * logChainInfo[0]: 0/0000000000000000/0/0 * * logChainInfo[1]: 0/0000000000000000/0/0 * * logChainInfo[2]: 0/0000000000000000/1429891304/1039050948 * * logChainInfo[3]: * * 18446744073709551615/FFFFFFFFFFFFFFFF/4294967295/0 * * lastRecLogChain: 0/0000000000000000/1429891304/1039050948 * * currentLogChain: 0/0000000000000000/1429891304/1039050948 * * backupLogChain: 0/0000000000000000/0/0 * * gta[+2]: * * 18446744073709551615/FFFFFFFFFFFFFFFF/4294967295/0 * * gta[+1]: 7179/000000000003ADE7/1429891816/21610855 * * gta[+0]: 0/0000000000000000/1429891304/1039050948 * * gta[-1]: 0/0000000000000000/0/0 * * gta[-2]: 0/0000000000000000/0/0 * * logChainId: 1, rfwdHeadChainId: 0, logFileChainId: 1 * * Code path: 380a3ba, lfrScanOpenFlagsIn: c0 * * CALLSTCK: (Static functions may not be resolved correctly, * * as they are resolved to the nearest symbol) * * [0] 0x00002AAAAEE2A3B2 pdLogVPrintf + 0x2F2 * * [1] 0x00002AAAAEE2A0AC pdLogPrintf + 0x8C * * [2] 0x00002AAAAE22B111 * * _Z26sqlplfrIsLogFromValidChainP12SQLPLFR_DBCBPK9SQLP_LECBP21 * * SQLPLFR_REQ_SCAN_NEXTP17SQLPLFR_SCAN_DATA + 0x1EC1 * * [3] 0x00002AAAAE22CCBE * * _Z16sqlplfrFMReadLogP12SQLPLFR_DBCBP21SQLPLFR_REQ_SCAN_NEXTP * * 17SQLPLFR_SCAN_DATA + 0x68E * * [4] 0x00002AAAAE22FFC8 * * _Z17sqlplfrDoScanNextP12SQLPLFR_DBCBP11SQLPLFR_REQ + 0x2D8 * * [5] 0x00002AAAACABFAFC _Z10sqlplfrEduP9sqpLfrEdu + 0x44C * * [6] 0x00002AAAACAF1BDE _ZN9sqpLfrEdu6RunEDUEv + 0x2E * * [7] 0x00002AAAACFDB063 _ZN9sqzEDUObj9EDUDriverEv + 0xF3 * * [8] 0x00002AAAACFDAF69 _Z10sqlzRunEDUPcj + 0x9 * * [9] 0x00002AAAACA42EC1 sqloEDUEntry + 0x2A1 * * [10] 0x00002AAAAABCE2A3 /lib64/libpthread.so.0 + 0x62A3 * * [11] 0x00002AAAB30D514D __clone + 0x6D * * * * Third HADR realizes it encountered an abnormal condition * * caused by the bad log file. * * * * 2015-04-24-12.12.31.212006-240 E112205E450 LEVEL: * * Error * * PID : 3002 TID : 46914791795008 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 307 EDUNAME: db2hadrs.0.0 * * (SAMPLE) * * FUNCTION: DB2 UDB, High Availability Disaster Recovery, * * hdrEdu::hdrEduS, probe:21580 * * MESSAGE : ADM12509E HADR encountered an abnormal condition. * * Reason code: "1" * * * * 2015-04-24-12.12.31.212826-240 I112656E593 LEVEL: * * Warning * * PID : 3002 TID : 46914791795008 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 307 EDUNAME: db2hadrs.0.0 * * (SAMPLE) * * FUNCTION: DB2 UDB, High Availability Disaster Recovery, * * hdrEdu::hdrEduS, probe:21580 * * MESSAGE : ZRC=0x87800148=-2021654200=HDR_ZRC_BAD_LOG * * "HADR standby found bad log" * * DATA #1 : String, 99 bytes * * HADR standby error handling: will close connection to * * primary, then reconnect, and perform a retry. * * * * * * After a couple of retries, standby stops the replay master * * and terminates. * * * * 2015-04-24-12.21.40.066122-240 I2746137E412 LEVEL: * * Info * * PID : 3002 TID : 46914791795008 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 307 EDUNAME: db2hadrs.0.0 * * (SAMPLE) * * FUNCTION: DB2 UDB, High Availability Disaster Recovery, * * hdrStopReplayMaster, probe:21272 * * MESSAGE : Replaymaster request done. * * * * 2015-04-24-12.21.40.066443-240 I2746550E468 LEVEL: * * Error * * PID : 3002 TID : 46914791795008 PROC : * * db2sysc * * INSTANCE: yafanhu NODE : 000 DB : * * SAMPLE * * HOSTNAME: hotellnx101 * * EDUID : 307 EDUNAME: db2hadrs.0.0 * * (SAMPLE) * * FUNCTION: DB2 UDB, High Availability Disaster Recovery, * * hdrEdu::hdrEduEntry, probe:21150 * * MESSAGE : ZRC=0x87800148=-2021654200=HDR_ZRC_BAD_LOG * * "HADR standby found bad log" * **************************************************************** * RECOMMENDATION: * * Apply V101 Fix Pack 5 * **************************************************************** | |
Local Fix: | |
Disable standby database from accessing the archive | |
Solution | |
First Fixed in DB2 V101 Fix Pack 5 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 24.04.2015 22.07.2015 22.07.2015 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.1.0.5 |