DB2 - Problem description
Problem IC65580 | Status: Closed |
HADRV97_MONITOR NOT REPORTING CORRECT HADR STATE AFTER TAKEOVER WHILE TSAMP IN MANUAL MODE | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
Starting with primary in a peer state: Initial State $ db2pd -hadr -db hadrdb Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes) Primary Peer Sync 0 0 $ db2pd -hadr -db hadrdb HADR Information: Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes) Standby Peer Sync 0 0 root:# lssam -g db2_db2inst1_db2inst1_HADRDB-rg Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1 '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2 '- Online IBM.ServiceIP:db2ip_9_42_153_214-rs |- Online IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1 '- Offline IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2 Re-create the problem by placing TSAMP in Manual mode: Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Automation=Manual Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1 '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2 '- Online IBM.ServiceIP:db2ip_9_42_153_214-rs |- Online IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1 '- Offline IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2 and issue 'db2 takeover hadr on database hadrdb' on node "samp2" root:# lsrsrc -Ab IBM.Test Resource Persistent and Dynamic Attributes for IBM.Test resource 1: Name = "db2_HADRDB_samp2_UserInitiatedMove_db2inst1_db2inst1" ResourceType = 0 AggregateResource = "0x3fff 0xffff 0x00000000 0x00000000 0x00000000 0x00000000" ForceOpState = 0 TimeToStart = 0 TimeToStop = 0 WriteToSyslog = 0 MoveTime = 0 MoveFail = 0 ForceMoveState = 0 ActivePeerDomain = "hadrdom" NodeNameList = {"samp1"} OpState = 2 ConfigChanged = 0 ChangedAttributes = {} MoveState = [0,{}] OpQuorumState = 0 Check HADR roles after 'db2 takeover hadr ...' command completed successfully : $ db2pd -hadr -db hadrdb Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes) Standby Peer Sync 0 0 $ db2pd -hadr -db hadrdb HADR Information: Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes) Primary Peer Sync 0 0 But 'lssam' doesn't change : Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Automation=Manual Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1 '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2 '- Online IBM.ServiceIP:db2ip_9_42_153_214-rs |- Online IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1 '- Offline IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2 This shows that the Online/Offline states for the HADR database do not match the Primary/Standby role reported by 'db2pd -hadr ...' command. The mismatch exists because TSA is running in manual mode and so the temporary TSA IBM.Test resource remains set : root:# lsrsrc -Ab IBM.Test Resource Persistent and Dynamic Attributes for IBM.Test resource 1: Name = "db2_HADRDB_samp2_UserInitiatedMove_db2inst1_db2inst1" ResourceType = 0 AggregateResource = "0x3fff 0xffff 0x00000000 0x00000000 0x00000000 0x00000000" ForceOpState = 0 TimeToStart = 0 TimeToStop = 0 WriteToSyslog = 0 MoveTime = 0 MoveFail = 0 ForceMoveState = 0 ActivePeerDomain = "hadrdom" NodeNameList = {"samp1"} OpState = 2 ConfigChanged = 0 ChangedAttributes = {} MoveState = [0,{}] OpQuorumState = 0 Once TSA is switched back to auto mode, the resource state will be updated correctly and no action is needed. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Problem Description above. * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 Fix Pack 2 * **************************************************************** | |
Local Fix: | |
If you wish to stay in manual mode, the mismatch can be resolved by removing the IBM.Test resource: rmrsrc -s "Name = 'db2_HADRDB_samp2_UserInitiatedMove_db2inst1_db2inst1'" IBM.Test After the next HADR monitor is run for the associated HADR database, 'lssam' reflects the true Online/Offline state based on the location of the primary/standby roles : Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1 '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2 '- Online IBM.ServiceIP:db2ip_9_42_153_214-rs |- Offline IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1 '- Online IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2 | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows | |
Solution | |
First fixed in DB2 Version 9.7 Fix Pack 2 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 14.01.2010 21.03.2011 08.03.2012 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP2 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.2 |