DB2 - Problem description
Problem IC65817 | Status: Closed |
SLOW LOG RETRIEVAL ON A HADR STANDBY CAUSES IT TO MOVE TO REMOTE CATCHUP | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
When opening log file from archive on HADR standby, db2 does not wait long enough if an older version of the file exists in log path. If an incomplete version of a log file exists in log path, HADR standby will try opening the newer (and final) version from archive. But db2 does not wait long enough for slow archive retrieval. If retrieval is slow, db2 may end up using the version in log path, which will trigger end of Local Catchup because the file is incomplete (as result of a previous Remote Catchup) When a HADR standby database is activated, it goes through local catchup. If standby can retrieve log files from storage manager, it sends a request and wait until it gets the log file to process. Starting from 9.1 FP5, standby will not wait for log files to be retrieved even after sending a retrieve request during a local catchup, it will just move to remote catchup in order to get the same log files from primary. This is not an expected behavior. Related APAR is IZ12262 (Log files created by Remote Catchup prevent Local Catchup from using archived log files). Basically, the fix for IZ12262 does not handle slow archive retrieval well. Here's a sample db2diag.log messages on standby: 2009-05-15-05.10.32.041290+000 I1147488A310 LEVEL: Warning PID : 685220 TID : 1 PROC : db2lfr (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog, probe:600 MESSAGE : Extent 347223 in log path may be stale. Trying archive. 2009-05-15-05.10.32.287650+000 I1147799A314 LEVEL: Warning PID : 557070 TID : 1 PROC : db2logmgr (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, data protection services, sqlpgRetrieveLogFile, probe:4130 MESSAGE : Started retrieve for log file S0347223.LOG. 2009-05-15-05.10.33.042008+000 I1148114A316 LEVEL: Warning PID : 685220 TID : 1 PROC : db2lfr (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog, probe:630 MESSAGE : Extent 347223 not found in archive. Using extent in log path. 2009-05-15-05.10.33.054443+000 I1148431A362 LEVEL: Warning PID : 418570 TID : 1 PROC : db2shred (DB1) 0 INSTANCE: db2inst1 NODE : 000 DB : DB1 APPHDL : 0-9 APPID: *LOCAL.DB2.090515045953 FUNCTION: DB2 UDB, recovery manager, sqlpshrEdu, probe:18300 MESSAGE : Maxing hdrLCUEndLsnRequested 2009-05-15-05.10.33.065118+000 E1148794A335 LEVEL: Event PID : 790946 TID : 1 PROC : db2hadrs (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchupPending (was S-LocalCatchup) 2009-05-15-05.10.33.165711+000 E1149130A336 LEVEL: Event PID : 790946 TID : 1 PROC : db2hadrs (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchup (was S-RemoteCatchupPending) 2009-05-15-05.10.33.165853+000 I1149467A309 LEVEL: Warning PID : 790946 TID : 1 PROC : db2hadrs (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSPrepareLogWrite, probe:10260 MESSAGE : RCUStartLsn 00008A54F6924C41 2009-05-15-05.10.57.143329+000 I1149777A368 LEVEL: Warning PID : 557070 TID : 1 PROC : db2logmgr (DB1) 0 INSTANCE: db2inst1 NODE : 000 FUNCTION: DB2 UDB, data protection services, sqlpgRetrieveLogFile, probe:4148 MESSAGE : Completed retrieve for log file S0347223.LOG on chain 7 to /db2/DB1/log_dir/NODE0000/. Here, db2 log manager started the log retrieval from the storage manager, but db2lfr process reported the log extent missing, and db2shred reported that local catchup was done. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * HADR users with a shared archive where log archive retrieval * * on Standby takes more than 1 second * **************************************************************** * PROBLEM DESCRIPTION: * * HADR Standby moves from Local Catchup to Remote Catchup * * Pending when there are still available log files in the log * * archive. * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 fixpack 2 * **************************************************************** | |
Local Fix: | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 2 for Linux, UNIX, and Windows | |
Solution | |
In DB2 Version 9.7 fixpack 2, the HADR Standby will wait for log archive retrieval requests to either complete or fail before it moves to Remote Catchup Pending. | |
Workaround | |
not known / see Local fix | |
BUG-Tracking | |
forerunner : APAR is sysrouted TO one or more of the following: IC67078 follow-up : | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 25.01.2010 25.05.2010 25.05.2010 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP2, 9.7.FP2 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.2 |