home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC81335 Status: Closed

THE PARALLEL MODE OF DB2_ALL MAY HANG IN A VERY LARGE DPF ENVIRONMENT

product:
DB2 FOR LUW / DB2FORLUW / 970 - DB2
Problem description:
When running db2_all in parallel mode (i.e. ';' option), db2_all 
sends the user's command to the target partitions, and spawns 
waiter processes in the partition where the command was run from 
(i.e. sender partition).  After the target partitions completes 
the user's command, they send a remote shell command back to the 
sender partition to inform that the command completed. 
 
Prior to the fix for this APAR, should this remote shell command 
fails for any reason, the sender partition exhibits a hang 
symptom, as it thinks that the user's command has not completed 
in the target partitions, when in fact it has. 
 
This is caused due to the lack of error handling of the failed 
remote shell command.  This APAR addresses the error handling. 
 
The cause of the remote shell command failure varies, but a 
common known cause is the excessive number of remote shell 
running at the same time.  For example, starting too many ssh at 
the same time may cause some of them to fail with the following 
error. 
 
ssh_exchange_identification: Connection closed by remote host 
 
Running too many rsh at the same time can fail with the 
following errors. 
 
socket: protocol failure in circuit setup. 
socket: All ports in use 
 
These types of capacity related failures may happen in a very 
large DPF environment.  e.g. several hundred partitions.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* Very large DPF environment                                   * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* When running db2_all in parallel mode (i.e. ';' option),     * 
* db2_all sends the user's command to the target partitions,   * 
* and spawns waiter processes in the partition where the       * 
* command was run from (i.e. sender partition).  After the     * 
* target partitions completes the user's command, they send a  * 
* remote shell command back to the sender partition to inform  * 
* that the command completed.                                  * 
*                                                              * 
* Prior to the fix for this APAR, should this remote shell     * 
* command fails for any reason, the sender partition exhibits  * 
* a hang symptom, as it thinks that the user's command has not * 
* completed in the target partitions, when in fact it has.     * 
*                                                              * 
* This is caused due to the lack of error handling of the      * 
* failed remote shell command.  This APAR addresses the error  * 
* handling.                                                    * 
*                                                              * 
* The cause of the remote shell command failure varies, but a  * 
* common known cause is the excessive number of remote shell   * 
* running at the same time.  For example, starting too many    * 
* ssh at the same time may cause some of them to fail with the * 
* following error.                                             * 
*                                                              * 
* ssh_exchange_identification: Connection closed by remote     * 
* host                                                         * 
*                                                              * 
* Running too many rsh at the same time can fail with the      * 
* following errors.                                            * 
*                                                              * 
* socket: protocol failure in circuit setup.                   * 
* socket: All ports in use                                     * 
*                                                              * 
* These types of capacity related failures may happen in a     * 
* very large DPF environment.  e.g. several hundred            * 
* partitions.                                                  * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to DB2 9.7 Fixpack 6.                                * 
****************************************************************
Local Fix:
Running db2_all in serial mode (i.e. without ';') does not have 
this problem.
available fix packs:
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Solution
First fixed in DB2 9.7 Fixpack 6.
Workaround
not known / see Local fix
BUG-Tracking
forerunner  : APAR is sysrouted TO one or more of the following: IC87826 
follow-up : 
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
10.02.2012
04.06.2012
04.06.2012
Problem solved at the following versions (IBM BugInfos)
9.7.FP6
Problem solved according to the fixlist(s) of the following version(s)
9.7.0.6 FixList