home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC65460 Status: Closed

DB2 HA FAILS MONITORING FILESYSTEMS WHEN I/O ERRORS PRESENT.

product:
DB2 FOR LUW / DB2FORLUW / 970 - DB2
Problem description:
In integrated HA solution environment, when I/O problem occurs 
in the system, the mount may remain in Unknown state and no 
failover occur. 
 
The scenario is as follows: 
 
1.  I/O problem occurs in the system. 
2.  TSA calls the monitor script for the filesystem (registered 
as an IBM.Application). 
3.  The monitor script (provided by DB2 HA) attempts to touch a 
file on the filesystem (after verifying the fs is mounted.) 
4.  The touch generates an I/O error. 
5.  In the event of an I/O error.  The monitor script then will 
issue a call to the stop script to attempt to make sure the fs 
is unmounted. 
6.  The stop script attempts to umount the fs.  But in this 
case, there is a PID accessing the filesystem, preventing the 
umount. 
7.  The stop script will attempt to try 9 more times (sleeping 
for 10 seconds between each try.) 
8.  After the third try (29 seconds after TSA kicked off the 
monitor script), TSA kills the monitor script for exceeding the 
monitor script timeout period as registered for the resource. 
This also kills off the child process (stop script) before it 
can get through its 10 tries to umount. 
9.  Since TSA killed the monitor script, the resource state is 
'Unknown'. 
10.  TSA takes no action on a resource with an unknown state. 
Instead it will start the cycle again by calling the monitor 
script. 
11.  On the affected node, this continues until the machine is 
rebooted.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* DB2/TSA user                                                 * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* In integrated HA solution environment, when I/O              * 
* problemoccurs in the system, the mount may remain in Unknown * 
* stateand no failover occur.The scenario is as follows:1.     * 
* I/O problem occurs in the system.2.  TSA calls the monitor   * 
* script for the filesystem(registered as an                   * 
* IBM.Application).3.  The monitor script (provided by DB2 HA) * 
* attempts totouch a file on the filesystem (after verifying   * 
* the fs ismounted.)4.  The touch generates an I/O error.5.    * 
* In the event of an I/O error.  The monitor script thenwill   * 
* issue a call to the stop script to attempt to make surethe   * 
* fs is unmounted.6.  The stop script attempts to umount the   * 
* fs.  But in thiscase, there is a PID accessing the           * 
* filesystem, preventingthe umount.7.  The stop script will    * 
* attempt to try 9 more times(sleeping for 10 seconds between  * 
* each try.)8.  After the third try (29 seconds after TSA      * 
* kicked off themonitor script), TSA kills the monitor script  * 
* for exceedingthe monitor script timeout period as registered * 
* for theresource.  This also kills off the child process      * 
* (stopscript) before it can get through its 10 tries to       * 
* umount.9.  Since TSA killed the monitor script, the resource * 
* stateis 'Unknown'.10.  TSA takes no action on a resource     * 
* with an unknownstate.  Instead it will start the cycle again * 
* by calling themonitor script.11.  On the affected node, this * 
* continues until the machineis rebooted.                      * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to v97fp2.                                           * 
****************************************************************
Local Fix:
available fix packs:
DB2 Version 9.7 Fix Pack 2 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 3 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 3a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Solution
The monitor timeout is only 30, fixes are to either increase 
 
the mount monitor timeout to some value larger than 30 to allow 
the soft unmounting to complete; OR add a force option to the 
mountV95_stop.ksh to bypass the soft unmounting in the case 
where there is an IO error in the mount monitor and the mount 
monitor calls the mount stop. 
 
The fix in v97fp2 will contain this new "force" option.
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
07.01.2010
25.05.2010
25.05.2010
Problem solved at the following versions (IBM BugInfos)
9.7.FP2
Problem solved according to the fixlist(s) of the following version(s)
9.7.0.2 FixList