DB2 - Problem description
Problem IT03630 | Status: Closed |
TAKEOVER HADR COMMAND SUCCEEDS BUT RESOURCE GROUP REPORTS FAILED OFFLINE IN LSSAM OUTPUT | |
product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
Problem description: | |
In a TSA/HADR high availability environment configured using db2haicu, if the public network adapters of the standby and primary nodes are defined within two separate network equivalency groupings, i.e. as seen in this "lssam -V" output: Online IBM.ResourceGroup:db2_db2inst1_host03_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_host03_0-rs -. '- Online IBM.Application:db2_db2inst1_host03_0-rs:host03 | Online IBM.ResourceGroup:db2_db2inst1_host04_0-rg Nominal=Online | '- Online IBM.Application:db2_db2inst1_host04_0-rs | -. '- Online IBM.Application:db2_db2inst1_host04_0-rs:host04 | | Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online | | '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs | | -. -. |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:host03 | | | | '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:host04 | | | | Online IBM.Equivalency:db2_db2inst1_host03_0-rg_group-equ | | | | '- Online IBM.PeerNode:host03:host03 | | | | Online IBM.Equivalency:db2_db2inst1_host04_0-rg_group-equ | | | | '- Online IBM.PeerNode:host04:host04 | | | | Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ | | | | |- Online IBM.PeerNode:host03:host03 | | | | '- Online IBM.PeerNode:host04:host04 DO | | DO Online IBM.Equivalency:db2_public_network_0 <' | | <' '- Online IBM.NetworkInterface:eth0:host03 DO DO Online IBM.Equivalency:db2_public_network_1 <' <' '- Online IBM.NetworkInterface:eth1:host04 then issuing a takeover HADR command on the database will succeed, but it will leave the resource model in the following state, as seen by "lssam": Online IBM.ResourceGroup:db2_db2inst1_host03_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_host03_0-rs '- Online IBM.Application:db2_db2inst1_host03_0-rs:host03 Online IBM.ResourceGroup:db2_db2inst1_host04_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_host04_0-rs '- Online IBM.Application:db2_db2inst1_host04_0-rs:host04 Failed offline IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Binding=Sacrificed Nominal=Online '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:host03 '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:host04 Online IBM.Equivalency:db2_db2inst1_host03_0-rg_group-equ '- Online IBM.PeerNode:host03:host03 Online IBM.Equivalency:db2_db2inst1_host04_0-rg_group-equ '- Online IBM.PeerNode:host04:host04 Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ |- Online IBM.PeerNode:host03:host03 '- Online IBM.PeerNode:host04:host04 Online IBM.Equivalency:db2_public_network_0 '- Online IBM.NetworkInterface:eth0:host03 Online IBM.Equivalency:db2_public_network_1 '- Online IBM.NetworkInterface:eth1:host04 As seen above, the HADR resource group is left in a "Failed Offline" state. In order to recover from this state, log in as root from either node and issue the following sequence of commands: 1) export CT_MANAGEMENT_SCOPE=2 2) rgreq -o lock db2_db2inst1_db2inst1_HADRDB-rg 3) rmrel -s "Name like 'db2_db2inst1_db2inst1_HADRDB-rs%'" 4) repeat steps 2 and 3 for every affected HADR database, where HADRDB is the database name 5) rgreq -o unlock db2_db2inst1_db2inst1_HADRDB-rg 6) repeat step 5 for every HADR database, where HADRDB is the database name After having followed these instructions, the resource model as shown by "lssam -V" should now look like this: Online IBM.ResourceGroup:db2_db2inst1_host03_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_host03_0-rs -. '- Online IBM.Application:db2_db2inst1_host03_0-rs:host03 | Online IBM.ResourceGroup:db2_db2inst1_host04_0-rg Nominal=Online | '- Online IBM.Application:db2_db2inst1_host04_0-rs | -. '- Online IBM.Application:db2_db2inst1_host04_0-rs:host04 | | Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online | | '- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs | | |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:host03 | | '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:host04 | | Online IBM.Equivalency:db2_db2inst1_host03_0-rg_group-equ | | '- Online IBM.PeerNode:host03:host03 | | Online IBM.Equivalency:db2_db2inst1_host04_0-rg_group-equ | | '- Online IBM.PeerNode:host04:host04 | | Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ | | |- Online IBM.PeerNode:host04:host04 | | '- Online IBM.PeerNode:host03:host03 DO | Online IBM.Equivalency:db2_public_network_0 <' | '- Online IBM.NetworkInterface:eth0:host03 DO Online IBM.Equivalency:db2_public_network_1 <' '- Online IBM.NetworkInterface:eth1:host04 Issuing a takeover HADR command after this, should now succeed. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * DB2 HADR/TSA users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 V10.5 FP5 * **************************************************************** | |
Local Fix: | |
Manually delete HADR dependencies against the public network equivalency grouping. See Error Description for more details. | |
Solution | |
Fixed in DB2 V10.5 FP5 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 06.08.2014 13.03.2015 13.03.2015 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.5.0.5 |