DB2 - Problem description
Problem IT09301 | Status: Closed |
IN A HADR/TSA ENVIRONMENT, TSA MAY UNEXPECTEDLY FAILBACK THE HADR DATABASE FOLLOWING A MANUAL TAKEOVER | |
product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
Problem description: | |
In the case where there is a failure to the primary database serverA and TSA initiates an automated failover to the standby serverB. Upon restart of the DB2 instance on serverA, the HADR database is reintegrated as the standby. If a manual takeover is issued immediately from the new standby on serverA, it may cause TSA to failback the HADR database to serverB. This issue is due to a small timing hole in the DB2 automation logic. It is unlikely to hit this issue. In order to avoid hitting this issue, it is advisable to wait approximately one minute before issuing a manual takeover request on the standby after it has completed automatic reintegration following a DB2 instance level failure. If this issue is encountered, it should be observed that the manual takeover request occurred in between the time the db2V105_start.ksh script was run and the first run of the hadrV105_monitor.ksh script after instance startup. For example, in an occurrence of this issue, the manual takeover occurred at this timestamp: 2015-05-15-10.03.45.004971+480 E1512997E516 LEVEL: Event PID : 5564 TID : 140386626430720 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : HADRDB APPHDL : 0-9 APPID: *LOCAL.db2inst1.150515020345 AUTHID : DB2INST1 HOSTNAME: HOST01 EDUID : 23 EDUNAME: db2agent (HADRDB) 0 FUNCTION: DB2 UDB, base sys utilities, sqeDBMgr::StartUsingLocalDatabase, probe:13 START : Received TAKEOVER HADR command. And from the syslogs, we see the following timestamps for the db2V105_start.ksh and hadrV105_monitor.ksh script runs: May 15 10:03:11 HOST01 db2V105_start.ksh[4057]: Entered /usr/sbin/rsct/sapolicies/db2/db2V105_start.ksh, db2inst1, 0 May 15 10:03:11 HOST01 db2V105_start.ksh[4057]: Able to cd to /db2/db2home/sqllib : /usr/sbin/rsct/sapolicies/db2/db2V105_start.ksh, db2inst1, 0 May 15 10:03:11 HOST01 db2V105_start.ksh[4057]: 1 partitions total: /usr/sbin/rsct/sapolicies/db2/db2V105_start.ksh, db2inst1, 0 May 15 10:03:17 HOST01 db2V105_start.ksh[4057]: Forcing apps off before reintegration May 15 10:03:18 HOST01 db2V105_start.ksh[4057]: After forcing, starting to reintegrate May 15 10:03:30 HOST01 db2V105_start.ksh[4057]: Returning 0 from /usr/sbin/rsct/sapolicies/db2/db2V105_start.ksh ( db2inst1, 0) First hadr monitor run after reintegration is complete: May 15 10:03:52 HOST01 hadrV105_monitor.ksh[7234]: Reintegration memory file created under /tmp for db HADRDB May 15 10:03:53 HOST01 hadrV105_monitor.ksh[7234]: Must try reading with db2gcf May 15 10:03:53 HOST01 hadrV105_monitor.ksh[7234]: auto mode, and state flags or db2pd fails, use db2gcf: manual:0, usepd:3, flagFile:0, grc:3 May 15 10:03:57 HOST01 hadrV105_monitor.ksh[7234]: su - db2inst1 -c /db2/db2home/sqllib/bin/db2gcf -t 14 -s -i db2inst1 -i db2inst1 -h HADRDB returns 1 May 15 10:03:57 HOST01 hadrV105_monitor.ksh[7234]: Returning 2 : db2inst1 db2inst1 HADRDB | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * N/A * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 V10.5 FIXPACK 7. * **************************************************************** | |
Local Fix: | |
Solution | |
The problem is firstly fixed on V10.5 FIXPACK 7. | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 05.06.2015 20.01.2016 20.01.2016 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.5.0.7 |