DB2 - Problem description
Problem IC66748 | Status: Closed |
DB2START RESTART NEEDS SERIAL FAILOVER CODE RE-USE FROM V91 IN V97 (Parallel failover getting Time out errors or SQL6031N ) | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
Customers migrating from v91 to v97 are facing difficulties when trying to setup high availability (HA) environments. They are accustomed to using v8 and v91 methods for node fail-over. The functionality introduced in v95 and later for failover is: If no port 0 exists on the target machine (that is, no nodes are currently associated with it), then there are two steps to fail-over: 1. Serially move any node from the source machine that is not port 0 to the target at port 0. 2. In parallel, move all remaining nodes (or one by one if so desired with port 0 on source being last) If there does exist nodes on the target machine already, this operation does not require the serial step. All nodes can be moved in parallel. To move nodes in parallel, follow the following is necessary: Given: 0 one.ca.ibm.com 0 1 one.ca.ibm.com 1 2 one.ca.ibm.com 2 Port 0 non-existent on target machine 1. Serial $ db2start dbpartitionnum 2 restart hostname two.ca.ibm.com port 0 2. In Parallel $ db2start dbpartitionnum 1 restart hostname two.ca.ibm.com port 1 & $ db2start dbpartitionnum 0 restart hostname two.ca.ibm.com port 2 & Port 0 exists on target machine In Parallel $ db2start dbpartitionnum 2 restart hostname two.ca.ibm.com port 1 & $ db2start dbpartitionnum 1 restart hostname two.ca.ibm.com port 2 & $ db2start dbpartitionnum 0 restart hostname two.ca.ibm.com port 3 & Otherwise the following error will occur: SQL6031N Error in the db2nodes.cfg file at line number "3". Reason code "13". Also when trying to start on the target machine a node that is port 0 on the source machine first, the failover might appear to hung showing timeout errors in the db2diag.log: FUNCTION: DB2 UDB, base sys utilities, sqleGetSpinLock, probe:40 DATA #1 : String, 64 bytes Waited for other mlns to complete(mlns,elapsed time in seconds): DATA #2 : Hexdump, 4 bytes 0x0FFFFFFFFFFF74C8 : 0000 000A .... DATA #3 : Hexdump, 4 bytes 0x0FFFFFFFFFFF74CC : 0000 000C .... ... FUNCTION: DB2 UDB, base sys utilities, sqleGetSpinLock, probe:50 DATA #1 : String, 88 bytes Time out in port zero host restart,while waiting for other mlns on sam host to complete DATA #2 : Hexdump, 4 bytes 0x0FFFFFFFFFFF74C8 : 0000 000A .... DATA #3 : Hexdump, 4 bytes 0x0FFFFFFFFFFF74CC : 0000 00F0 The functionality for v9.5 and later introduced a parallel start feature, which means that logical port 0 will have to be moved last, not first. This APAR will allow for the old v9.1 behaviour to be available in v9.5 and later | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * The default behaviour will be Parallel: * * * * * * * * Use the following to turn on the Parallel feature: * * * * db2set DB2_PMODEL_SETTINGS=SERIAL_RESTART:FALSE * * * * The Serial is reset by: * * * * db2set DB2_PMODEL_SETTINGS=SERIAL_RESTART:TRUE * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 Fixpack 2 * **************************************************************** | |
Local Fix: | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 2 for Linux, UNIX, and Windows | |
Solution | |
Problem was first fixed in DB2 Version 9.7 Fixpack 2 | |
Workaround | |
not known / see Local fix | |
BUG-Tracking | |
forerunner : APAR is sysrouted TO one or more of the following: IC66750 IC67253 follow-up : | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 01.03.2010 23.04.2010 18.02.2011 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP2 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.2 | |
9.7.0.3 | |
9.7.0.3 |