DB2 - Problem description
Problem IC87040 | Status: Closed |
DB2 MEMBER FAILS TO COME ONLINE IF IN RESTART LIGHT MODE AND THE RESOURCE NOMINAL STATE IS OFFLINE | |
product: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
Problem description: | |
In a DB2 Purescale setup, a member's resource group might remain in a Nominal State of 'Offline' while in restart light mode if either: a) The resources were repaired or recreated while the member was waiting to fail back to its home host, i.e. one of the following two commands were issued: 'db2cluster -cm -repair -resources' 'db2cluster -cm -create -resources' OR b) The db2stop command was issued while the member was waiting to fail back to its home host and it failed. In this case, you can expect to see the following db2diag.log message: 2013-03-12-11.01.24.555727-240 E325857A610 LEVEL: Event PID : 13172902 TID : 1 PROC : db2stop INSTANCE: db2imdb1 NODE : 000 HOSTNAME: coralm224 EDUID : 1 FUNCTION: DB2 UDB, base sys utilities, sqleCheckStateAndOfflineRG, probe:2075 MESSAGE : A member's resource group was set to OFFLINE after a failure to stop. DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes 0 DATA #2 : String, 0 bytes Object not dumped: Address: 0x00000001100CCEC0 Size: 0 Reason: Zero-length data DATA #3 : String, 15 bytes db2_db2imdb1_1-rg DATA #4 : signed integer, 4 bytes 0 In a setup such as this one: *** db2nodes.cfg *** 0 coralm223 0 coralm223-ib0 - MEMBER 1 coralm224 0 coralm224-ib0 - MEMBER 128 coralm223 0 coralm223-ib0 - CF 129 coralm224 0 coralm224-ib0 - CF If member 1 is in restart light mode on another host, and either scenario a) or b) takes place, then the db2nodes.cfg file will look as follows: *** db2nodes.cfg *** 0 coralm223 0 coralm223-ib0 - MEMBER 1 coralm223 2 coralm223-ib0 - MEMBER <--- 128 coralm223 0 coralm223-ib0 - CF 129 coralm224 0 coralm224-ib0 - CF and the 'db2instance -list' output will show: ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME -- ---- ----- --------- ------------ ----- ---------------- ------------ ------- 0 MEMBER STARTED coralm223 coralm223 NO 0 0 coralm223-ib0 1 MEMBER STOPPED coralm224 coralm223 NO 0 2 coralm223-ib0 128 CF PRIMARY coralm223 coralm223 NO - 0 coralm223-ib0 129 CF CATCHUP coralm224 coralm224 NO - 0 coralm224-ib0 The lssam output will show that member 1's resource group is in a Nominal state of 'Offline': Offline IBM.ResourceGroup:db2_db2imdb1_1-rg Nominal=Offline '- Offline IBM.Application:db2_db2imdb1_1-rs |- Offline IBM.Application:db2_db2imdb1_1-rs:coralm223 '- Offline IBM.Application:db2_db2imdb1_1-rs:coralm224 | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * DB2 v10.1 FP1 and below * **************************************************************** * PROBLEM DESCRIPTION: * * In a DB2 Purescale setup, a member's resource group might * * remain * * in a Nominal State of 'Offline' while in restart light mode * * if * * either: * * * * a) The resources were repaired or recreated while the member * * was * * waiting to fail back to its home host, i.e. one of the * * following * * two commands were issued: * * * * 'db2cluster -cm -repair -resources' * * 'db2cluster -cm -create -resources' * * * * OR b) The db2stop command was issued while the member was * * waiting to fail back to its home host and it failed. In this * * case, you can expect to see the following db2diag.log * * message: * * * * 2013-03-12-11.01.24.555727-240 E325857A610 LEVEL: * * Event * * PID : 13172902 TID : 1 PROC : * * db2stop * * INSTANCE: db2imdb1 NODE : 000 * * HOSTNAME: coralm224 * * EDUID : 1 * * FUNCTION: DB2 UDB, base sys utilities, * * sqleCheckStateAndOfflineRG, probe:2075 * * MESSAGE : A member's resource group was set to OFFLINE after * * a * * failure to stop. * * DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes * * 0 * * DATA #2 : String, 0 bytes * * Object not dumped: Address: 0x00000001100CCEC0 Size: 0 * * Reason: * * Zero-length data * * DATA #3 : String, 15 bytes * * db2_db2imdb1_1-rg * * DATA #4 : signed integer, 4 bytes * * 0 * * * * * * In a setup such as this one: * * *** db2nodes.cfg *** * * 0 coralm223 0 coralm223-ib0 - MEMBER * * 1 coralm224 0 coralm224-ib0 - MEMBER * * 128 coralm223 0 coralm223-ib0 - CF * * 129 coralm224 0 coralm224-ib0 - CF * * * * If member 1 is in restart light mode on another host, and * * either * * scenario a) or b) takes place, then the db2nodes.cfg file * * will * * look as follows: * * *** db2nodes.cfg *** * * 0 coralm223 0 coralm223-ib0 - MEMBER * * 1 coralm223 2 coralm223-ib0 - MEMBER <--- * * 128 coralm223 0 coralm223-ib0 - CF * * 129 coralm224 0 coralm224-ib0 - CF * * * * and the 'db2instance -list' output will show: * * * * ID TYPE STATE HOME_HOST * * CURRENT_HOST * * ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME * * -- ---- ----- --------- * * ------------ ----- ---------------- * * ------------ ------- * * 0 MEMBER STARTED coralm223 coralm223 * * NO 0 0 coralm223-ib0 * * 1 MEMBER STOPPED coralm224 coralm223 * * NO 0 2 coralm223-ib0 * * 128 CF PRIMARY coralm223 coralm223 * * NO - 0 coralm223-ib0 * * 129 CF CATCHUP coralm224 coralm224 * * NO - 0 coralm224-ib0 * * * * The lssam output will show that member 1's resource group is * * in * * a Nominal state of 'Offline': * * Offline IBM.ResourceGroup:db2_db2imdb1_1-rg Nominal=Offline * * '- Offline IBM.Application:db2_db2imdb1_1-rs * * |- Offline IBM.Application:db2_db2imdb1_1-rs:coralm223 * * '- Offline IBM.Application:db2_db2imdb1_1-rs:coralm224 * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 10.1 FP2 * **************************************************************** | |
Local Fix: | |
Run the "chrg -o online <resource_group_name>" command as root to put the member resource group back into online state. You can verify the state of the member's resource group in the lssam output. | |
available fix packs: | |
DB2 Version 10.1 Fix Pack 2 for Linux, UNIX, and Windows | |
Solution | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 06.10.2012 03.10.2013 03.10.2013 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.1.0.2 | |
10.5.0.2 |