DB2 - Problem description
Problem IC91816 | Status: Closed |
TSA AUTOMATED HADR DATABASE DOES NOT FAILOVER AFTER UNPLUGGING PUBLIC NETWORK CABLE FROM THE PRIMARY SERVER | |
product: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
Problem description: | |
In a TSA-MP managed HADR environment, if the public network cable is unplugged from the HADR primary server, the HADR database is unable to failover to the standby server. See the following example for more details: - lssam output prior to unplugging the network cable: Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Nominal=Online |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs |- Online IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01 '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02 '- Online IBM.ServiceIP:db2ip_10_10_3_111-rs |- Online IBM.ServiceIP:db2ip_10_10_3_111-rs:node01 '- Offline IBM.ServiceIP:db2ip_10_10_3_111-rs:node02 Online IBM.ResourceGroup:db2_db2inst1_node01_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node01_0-rs '- Online IBM.Application:db2_db2inst1_hostA_0-rs:node01 Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node02_0-rs '- Online IBM.Application:db2_db2inst1_node02_0-rs:node02 Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ |- Online IBM.PeerNode:node01:node01 '- Online IBM.PeerNode:node02:node02 Online IBM.Equivalency:db2_db2inst1_node01_0-rg_group-equ '- Online IBM.PeerNode:node01:node01 Online IBM.Equivalency:db2_db2inst1_node02_0-rg_group-equ '- Online IBM.PeerNode:node02:node02 Online IBM.Equivalency:db2_private_network_0 |- Online IBM.NetworkInterface:en1:node01 '- Online IBM.NetworkInterface:en1:node02 Online IBM.Equivalency:db2_public_network_0 |- Online IBM.NetworkInterface:en2:node02 '- Online IBM.NetworkInterface:en2:node01 - lssam output after the network cable is unplugged: Pending Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg Request=Lock Nominal=Online |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs Control=StartInhibitedBecauseSuspended |- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01 '- Offline IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02 '- Online IBM.ServiceIP:db2ip_10_10_3_111-rs |- Online IBM.ServiceIP:db2ip_10_10_3_111-rs:node01 '- Offline IBM.ServiceIP:db2ip_10_10_3_111-rs:node02 Failed offline IBM.ResourceGroup:db2_db2inst1_node01_0-rg Binding=Sacrificed Nominal=Online '- Offline IBM.Application:db2_db2inst1_node01_0-rs '- Offline IBM.Application:db2_db2inst1_hostA_0-rs:node01 Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online '- Online IBM.Application:db2_db2inst1_node02_0-rs '- Online IBM.Application:db2_db2inst1_node02_0-rs:node02 Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ |- Online IBM.PeerNode:node01:node01 '- Online IBM.PeerNode:node02:node02 Online IBM.Equivalency:db2_db2inst1_node01_0-rg_group-equ '- Online IBM.PeerNode:node01:node01 Online IBM.Equivalency:db2_db2inst1_node02_0-rg_group-equ '- Online IBM.PeerNode:node02:node02 Online IBM.Equivalency:db2_private_network_0 |- Online IBM.NetworkInterface:en1:node01 '- Online IBM.NetworkInterface:en1:node02 Online IBM.Equivalency:db2_public_network_0 |- Online IBM.NetworkInterface:en2:node02 '- Offline IBM.NetworkInterface:en2:node01 As displayed in the above lssam output, HADR is stopped (resource is set to "Offline") on the original primary (node01), but node02 does not takeover the primary HADR role, i.e. the HADR resource for node02 is also set as "Offline". In addition to this, the virtual IP address (IBM.ServiceIP resource) still binds to the original primary server. (node01) ---------------------------------------------------------------- In the above scenario, whereby the public network cable is unplugged, the IBM.ServiceIP resource is not brought offline by TSA on the primary node(node01). There needs to be an additional dependency created from the HADR resource to the public network equivalency which will allow the HADR failover process to be initiated in the case of a public network cable pull. With this additional dependency in place, the HADR resource will be able to successfully failover from the primary to the standby in the event of a public network cable pull. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 10.1 Fix Pack 3. * **************************************************************** | |
Local Fix: | |
Verify if there exists a dependency from the HADR resource to the public network by issuing the 'lsrel -Ab' command as the DB2 instance owner. If the dependency exists, here is how it would be displayed: Managed Relationship 1: Class:Resource:Node[Source] = IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs Class:Resource:Node[Target] = {IBM.Equivalency:db2_public_network_0} Relationship = DependsOn Conditional = NoCondition Name = db2_db2inst1_db2inst1_HADRDB-rs_DependsOn_db2_public_network_0-r el ActivePeerDomain = hadr_dom ConfigValidity = If this dependency does not exist, then create it as follows: 1) Bring the cluster into maintenance mode by running the "db2haicu -disable" command as the DB2 instance owner. 2) As root from either node, run the following: "export CT_MANAGEMENT_SCOPE=2" "mkrel -p dependson -S IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs -G IBM.Equivalency:db2_public_network_0 db2_db2inst1_db2inst1_HADRDB-rs_DependsOn_db2_public_network_0-r el" 3) Verify that the dependency is now created via the "lsrel -Ab" command. 4) Once verified that the dependency exists, exit cluster maintenance mode by running the "db2haicu" command as the DB2 instance owner. | |
available fix packs: | |
DB2 Version 10.1 Fix Pack 3 for Linux, UNIX, and Windows | |
Solution | |
First fixed in Version 10.1 Fix Pack 3. | |
Workaround | |
not known / see Local fix | |
BUG-Tracking | |
forerunner : APAR is sysrouted TO one or more of the following: IC94057 IC94071 IC95313 follow-up : | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 22.04.2013 22.10.2013 22.10.2013 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.1.0.3 | |
10.1.0.3 |