home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT03035 Status: Closed

RELAX CLUSTER COMMUNICATION GROUP SETTINGS IN A TSA/HADR ENVIRONMENT
CONFIGURED USING DB2HAICU

product:
DB2 FOR LUW / DB2FORLUW / A10 - DB2
Problem description:
In a TSA/HADR environment configured via db2haicu, the default 
tolerable network latency time between the two nodes is 8 
seconds. After 8 seconds of no communication between the nodes, 
RSCT declares loss of communication between the two nodes and 
recovery actions follow. It was found that the default of 8 
seconds was too restrictive and it is recommended that it is 
updated to 30 seconds instead. This relaxed value ensures that 
unnecessary recovery actions do not place in periods of high 
latency between the cluster nodes. 
 
Bullet 1 of the following technote has more details on this: 
http://www-01.ibm.com/support/docview.wss?uid=swg21624179 
 
"1. Relaxing Heartbeat Sensitivity settings 
 
The default values of 4 (sensitivity) and 1 (period) allow for 8 
seconds of network latency before RSCT decides that the 
heartbeat attempt between two nodes is unsuccessful and thus 
recover actions are necessary. We have found that in clusters 
where the servers are heavily utilized that the default 
heartbeat values are to stringent and need to be relaxed. 
Relaxing these settings can prevent unwanted behavior such as an 
unexpected reboot. We recommend changing the Sensitivity to 5 
and the Period to 3 which will allow for 30 seconds before RSCT 
declares a problem. 
 
To determine your clusters "CommGroup Name" issue the "lscomg" 
command. To modify the settings to our recommended values, issue 
the following from any node: 
 
chcomg -s 5 -p 3 <CommGroup_Name> 
 
Apply the change to all configured communication groups listed 
in the "lscomg" output. " 
 
In addition to relaxing the cluster communication group 
settings, the CritRsrcProtMethod is being updated from 1 to 3 in 
order to allow a sync to disk from memory before a machine is 
rebooted for critical resource protection reasons. 
 
Bullet 3 of the following technote has more details on this: 
http://www-01.ibm.com/support/docview.wss?uid=swg21624179 
 
"3. Change CritRsrcProtMethod setting from 1 to 3 
 
By default, whenever RSCT invokes CritRsrcProtMethod it issues a 
kernel panic that causes a hard reset and reboot of the OS. 
Often, with DB2 clusters this happens when there is an extreme 
load on a server causing heartbeats to be missed making RSCT 
think that it is no longer communicating with the rest of the 
cluster and ending up with a reboot. When this happens, any 
in-memory log/trace data is lost because there is no opportunity 
to flush it to disk with the default CritRsrcProtMethod setting 
of 1. Changing this value to 3 allows for a sync of what is in 
memory to be written to the disk prior to the reboot occurring 
... this means that valuable syslog, error report, trace and 
db2diag.log messages will be saved. 
 
chrsrc -c IBM.PeerNode CritRsrcProtMethod=3"
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* Users using db2haicu in  TSA/HADR setup                      * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* See Error Description                                        * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Users can upgrade to DB2 Version 10.1 fix pack 5 or higher   * 
* to avoid this defect                                         * 
****************************************************************
Local Fix:
Refer to bullets 1 and 3 of this technote: 
http://www-01.ibm.com/support/docview.wss?uid=swg21624179
Solution
First fixed in DB2 Version 10.1 fix pack 5
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
08.07.2014
12.08.2015
12.08.2015
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)
10.1.0.5 FixList