DB2 - Problem description
Problem IC95596 | Status: Closed |
IN A LARGE DPF/HA CLUSTER, IF ALL RESOURCES ARE STARTED AT ONCE IT IS POSSIBLE THAT CERTAIN RESOURCES ARE NOT BROUGHT ONLINE | |
product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
Problem description: | |
In a large DB2 DPF environment configured for high availability, if all resources are ordered online simultaneously then it is possible that certain db2 resources are not successfully started. In the lssam output, affected DB2 partition resources will be displayed as "Offline" while the overlying Resource Group (RG) will be displayed as "Pending Online". Here is an example of a DB2 partition resource group in this described state: Example lssam output of the problem: |- Pending online IBM.ResourceGroup:db2_db2inst1_115-rg Nominal=Online |- Offline IBM.Application:db2_db2inst1_115-rs |- Offline IBM.Application:db2_db2inst1_115-rs:bcu_node17 '- Offline IBM.Application:db2_db2inst1_115-rs:bcu_node21 Here are some examples of commands that will attempt to online all resources simultaneously: db2start chrg -o online -s 1=1 hastartdb2 (applicable only to ISAS) When the start orders are issued, TSA-MP calls the "/usr/sbin/rsct/sapolicies/db2/db2V97_start.ksh" script for each partition in order to start its corresponding resource. As each resource is started, a call to query for status, forces all other nodes in the cluster to run a monitor command against all other partition resources. The result is that multiple monitors are run in parallel due to the start activity which creates race conditions that interfere with TSA-MP's ability to capture all the needed return codes. Due to this, TSA-MP is not able to send start orders for all partitions as expected. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * TSAMP enabled environments * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to 10.5 Fix Pack 3. * **************************************************************** | |
Local Fix: | |
As root, comment out (or delete) the following line from the /usr/sbin/rsct/sapolicies/db2/db2V97_start.ksh file: runact -s "Name like 'db2_%${DB2INSTANCE?}%'" IBM.Application refreshOpState 2> /dev/null as it is not needed and is the source of this problem. | |
available fix packs: | |
DB2 Cancun Release 10.5.0.4 (also known as Fix Pack 4) for Linux, UNIX, and Windows | |
Solution | |
First fixed in 10.5 Fix Pack 3. | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 29.08.2013 02.04.2015 02.04.2015 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.5.0.4 |