Home

Latest versions	fixlist
11.1.0.7
10.5.0.9
10.1.0.6
9.8.0.5
9.7.0.11
9.5.0.10
9.1.0.12

Have problems? - contact us.
Register for free
Contact form

DB2 - Problem description

Problem IC74257	Status: Closed
A CONTROLLED TAKEOVER IN A TSA / HADR ENV MAY BE FOLLOWED BY AN IMMEDIATE FAILBACK IF THERE IS LATENCY IN CHANGES TO RESOURCES
product:
DB2 FOR LUW / DB2FORLUW / 950 - DB2
Problem description:
In a TSA / HADR environment, the state of the database is monitored by scripts located under /usr/sbin/rsct/sapolicies/db2/. During the course of a user initiated HADR takeover, DB2 issues requests to TSA to create, lock, and unlock resources. If those operations are not propagated quickly enough, there is a chance the monitor scripts will report the database is down on both servers when there are no locks or flags in place. If that happens, TSA will issue a second takeover by force. A manual reintegration may be required. Note: this issue only affects controlled takeover and it is expected to happen only in rare cases. Node 1 syslogs: ## The database is reporting as online (return code 1): Jan 25 12:23:42 server-a user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[3735742]: Returning 1 : db2inst1 db2inst1 SAMPLE Jan 25 12:24:04 server-a user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[3735792]: Returning 1 : db2inst1 db2inst1 SAMPLE ## The takeover is issued and has started several seconds ago, but the takeover is not finished and the monitor script runs and reports offline (return code 2): Jan 25 12:24:26 server-a user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[983346]: Returning 2 : db2inst1 db2inst1 SAMPLE Jan 25 12:24:48 server-a user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[2752700]: Returning 2 : db2inst1 db2inst1 SAMPLE Jan 25 12:25:10 server-a user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[3866888]: Returning 2 : db2inst1 db2inst1 SAMPLE Jan 25 12:25:32 server-a user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[2228742]: Returning 2 : db2inst1 db2inst1 SAMPLE Node 2 syslogs: ## This was the standby, so reporting offline is expected Jan 25 12:23:45 server-b user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[2294478]: Returning 2 : db2inst1 db2inst1 SAMPLE Jan 25 12:24:07 server-b user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[1508116]: Returning 2 : db2inst1 db2inst1 SAMPLE ## The takeover starts and resource changes are made Jan 25 12:24:07 server-b user:debug root[2687334]: Entering /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg lock Jan 25 12:24:13 server-b user:debug root[2294526]: Exiting /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg lock: 1 Jan 25 12:24:23 server-b user:debug root[1769768]: Entering /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg unlock Jan 25 12:24:23 server-b user:debug root[3211824]: Exiting /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg unlock: 0 Jan 25 12:24:26 server-b user:debug root[2687360]: Entering /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg lock ## The monitor on node 1 ran at this point and returned offline. The last status for node 2 is also offline and we are still modifying the resources: Jan 25 12:24:29 server-b user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[2294596]: Returning 2 : db2inst1 db2inst1 SAMPLE Jan 25 12:24:32 server-b user:debug root[3473642]: Exiting /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg lock: 1 Jan 25 12:24:32 server-b user:debug root[3146076]: Entering /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg unlock Jan 25 12:24:32 server-b user:notice /usr/sbin/rsct/sapolicies/db2/hadrV95_start.ksh[3211836]: Entering : db2inst1 db2inst1 SAMPLE Jan 25 12:24:32 server-b user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_start.ksh[2949514]: su - db2inst1 -c db2gcf -t 3600 -u -i db2inst1 -i db2inst1 -h SAMPLE -L Jan 25 12:24:32 server-b user:debug root[2163396]: Exiting /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg unlock: 0 Jan 25 12:24:33 server-b user:notice /usr/sbin/rsct/sapolicies/db2/hadrV95_start.ksh[2163400]: Returning 0 : db2inst1 db2inst1 SAMPLE Jan 25 12:24:33 server-b user:debug root[3211870]: Entering /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg lock Jan 25 12:24:34 server-b user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[2687424]: Returning 1 : db2inst1 db2inst1 SAMPLE Jan 25 12:24:39 server-b user:debug root[2687428]: Exiting /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg lock: 1 Jan 25 12:24:40 server-b user:debug root[3211884]: Entering /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg unlock Jan 25 12:24:40 server-b user:debug root[2753468]: Exiting /usr/sbin/rsct/sapolicies/db2/lockreqprocessed db2_db2inst1_db2inst1_SAMPLE-rg unlock: 0 Jan 25 12:24:56 server-b user:debug /usr/sbin/rsct/sapolicies/db2/hadrV95_monitor.ksh[6553680]: Returning 1 : db2inst1 db2inst1 SAMPLE
Problem Summary:
**************************************************************** * USERS AFFECTED: * * DB2 HADR / SA MP integrated solution users * **************************************************************** * PROBLEM DESCRIPTION: * * See error description * **************************************************************** * RECOMMENDATION: * * Apply DB2 V9.5.0.8 * ****************************************************************
Local Fix:
Verify the current TSA resources OpState and the actual DB2 HADR roles. If manual reintegration is required, issue "db2 start hadr on <dbname> as standby". If issue is readily reproducible, there may be an underlying latency problem. Resolve latency problem to reduce the chance the monitor scripts will run in the middle of resource management.
available fix packs:
DB2 Version 9.5 Fix Pack 8 for Linux, UNIX, and Windows DB2 Version 9.5 Fix Pack 9 for Linux, UNIX, and Windows DB2 Version 9.5 Fix Pack 10 for Linux, UNIX, and Windows
Solution
Apply DB2 V9.5.0.8
Workaround
not known / see Local fix
BUG-Tracking
forerunner : APAR is sysrouted TO one or more of the following: IC75196 follow-up :
Timestamps
Date - problem reported : Date - problem closed : Date - last modified :	02.02.2011 14.07.2011 14.07.2011
Problem solved at the following versions (IBM BugInfos)
9.5.0.8
Problem solved according to the fixlist(s) of the following version(s)
9.5.0.8