home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC71743 Status: Closed

DB2 HANGS DUE TO INDIRECT DEAD LATCH BETWEEN THE AGENT EXECUTING A QUERY
AND AN INDEPENDENT COORDINATOR AGENT SPAWNED

product:
DB2 FOR LUW / DB2FORLUW / 970 - DB2
Problem description:
When running out of package cache memory space, DB2 will spawn 
an independent coordinator agent to clean up package cache  This 
might cause an indirect dead latch between the agent executing a 
query and the spawned independent coordinator agent. The 
scenario of the dead latch situation can be illustrated as 
below. 
 
1) the agent executing a query spawns an independent coordinator 
agent to clean up package cache and is waiting for the new 
agent's post back while itself is holding a latch. 
 
For example, you might see the agent 8244 waiting for the post 
had the stacks similar to below: 
 
  semtimedop + 0x000a 
  sqloWaitEDUWaitPost + 0x019d 
  sqeIcoordCB18WaitForSanityCheckEP5sqlca + 0x003d 
  sqeIcoordCB26DispatchIndependentDBAgent+ 0x0281 
 
  ABPIcoordAgent16spawnIcoordAgent + 0x00cb 
 
  ABPDispatcher10spawnAgent + 0x006f 
 
  ABPDispatcher23activateTaskProContexts + 0x0165 
 
  abpActivateTaskProContextsPK8sqeAgent + 0x0069 
  sqlra_cache_del_var + 0x0a26 
 
  sqlra_csm_drop_var+ 0x01a8 
 
  sqlra_csm_clean_lru + 0x0e03 
  sqlra_cache_space_mgmtP8sqlrr_cbm + 0x00ec 
  sqlra_cache_reserve_memoryP8sqlrr_cbm + 0x02d8 
  sqlra_rollup_dyn_stmtPK11sqlmon_stmt + 0x04d6 
  sqlmon_conn8stmt_end + 0x0320 
 
  sqlmon_acb14agent_stmt_endEjbP8sqlrr_cb + 0x0668 
  sqlrr_rds_common_postP14db2UCinterfaceiil + 0x1350 
  sqlrr_prepareP14db2UCinterfaceP16db2UCprepareInfo + 0x02af 
 
 
 
and db2pd -latches show one of the latches it's holding was an 
RDS latch for the monitor: 
 
Address            Holder    Waiter    Filename            LOC 
     LatchType            HoldCount 
0x0000000200DBBB68 8244       25         Unknown 
681        SQLO_LT_sqlmon_conn__rds_latch 1 
 
 
2) On the other hand, the newly spawned independent coordinator 
agent is waiting for a database latch being hold by a third 
agent, so it never got a chance to post back to the agent which 
has spawned it. 
 
For example, you might see the spawned independent agent 8399 
had the stacks similar to below. 
 
sqloSpinLockConflict + 0x0240 
sqeLocalDatabase29ShouldWeStartBackgroundAgents + 0x042a 
 
sqlm_a_initP8sqeAgent + 0x023a 
sqeApplication20InitEngineComponents+ 0x06c9 
 
sqeApplication13AppStartUsing  + 0x04f2 
 
sqleSubAgentStartUsingP8sqeAgentP16SQLE_CLIENT_INFO + 0x035a 
sqeApplication22AppSecondaryStartUsing + 0x0202 
 
sqleIndCoordProcessRequestP8sqeAgent + 0x059f 
sqleIndCoordProcessRequestP8sqeAgent + 0x02d8 
sqleIndCoordProcessRequestP8sqeAgent + 0x0084 
sqeAgent6RunEDUEv + 0x0381 
 
 
and db2pd -latches show it's waiting for the database latch hold 
by the third agent 25: 
 
Address            Holder    Waiter    Filename            LOC 
     LatchType            HoldCount 
0x00000002004D7F48 25         8339       Unknown 
395        SQLO_LT_sqeLocalDatabase__dblatch 1 
 
 
3) The third agent is in turn waiting for the latch being hold 
by the original agent executing the query as in 1). 
 
For example, you might see the third agent 25 had the stacks 
similar to below. 
 
sqloSpinLockConflict + 0x0240 
sqloxult_trackP11sqlo_xlatch14SQLO_LT_VALUES + 0x005f 
sqm_collect_db_bp_data + 0x03b0 
 
sqm_snap_dbase + 0x013e 
 
sqlmonssagnt + 0x0466 
 
sqlmonssbackendP12SQLE_DB2RA_T + 0x0500 
sqlesrvrP14db2UCinterface + 0x05bd 
sqleMappingFnServerP5sqldaP5sqlca + 0x049d 
sqlerKnownProcedure + 0x0270 
 
sqlerCallDLP14db2UCinterfaceP9UCstpInfo + 0x0492 
 
and db2pd -latches show it's waiting for the RDS latch hold by 
the original agent 8244. 
 
Address            Holder    Waiter    Filename            LOC 
     LatchType            HoldCount 
 
0x0000000200DBBB68 8244       25         Unknown 
681        SQLO_LT_sqlmon_conn__rds_latch 1 
 
 
 
As a result, the dead latch occurs. Recycling the instance will 
solve the dead latch. 
 
This issue doesn't occur on DB2 version prior to DB2 UDB V9.7 
for LUW.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* The issue doesn't occur on DB2 versions prior to DB2 UDB     * 
* V9.7 for LUW                                                 * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* See above Error Description                                  * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to DB2 Version 9.7 Fix Pack 4                        * 
****************************************************************
Local Fix:
1) turn off the default statement switch (DFT_MON_STMT)  in the 
Database Manager Configuration. Also need to make sure there is 
no application turn on the database manager  statement switch. 
For example, db2 Governor tool will turn on the statement 
switch. If db2 Governor is used, it need to be disabled as well. 
 
2) increasing the package cache size(PCKCACHESZ) in the Database 
Configuration may also reduce the chance of getting the dead 
latch
available fix packs:
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Solution
Problem was first fixed in Version 9.7 Fix Pack 4
Workaround
See above Local Fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
07.10.2010
02.05.2011
02.05.2011
Problem solved at the following versions (IBM BugInfos)
9.7.FP4
Problem solved according to the fixlist(s) of the following version(s)
9.7.0.4 FixList