DB2 - Problem description
Problem IC71743 | Status: Closed |
DB2 HANGS DUE TO INDIRECT DEAD LATCH BETWEEN THE AGENT EXECUTING A QUERY AND AN INDEPENDENT COORDINATOR AGENT SPAWNED | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
When running out of package cache memory space, DB2 will spawn an independent coordinator agent to clean up package cache This might cause an indirect dead latch between the agent executing a query and the spawned independent coordinator agent. The scenario of the dead latch situation can be illustrated as below. 1) the agent executing a query spawns an independent coordinator agent to clean up package cache and is waiting for the new agent's post back while itself is holding a latch. For example, you might see the agent 8244 waiting for the post had the stacks similar to below: semtimedop + 0x000a sqloWaitEDUWaitPost + 0x019d sqeIcoordCB18WaitForSanityCheckEP5sqlca + 0x003d sqeIcoordCB26DispatchIndependentDBAgent+ 0x0281 ABPIcoordAgent16spawnIcoordAgent + 0x00cb ABPDispatcher10spawnAgent + 0x006f ABPDispatcher23activateTaskProContexts + 0x0165 abpActivateTaskProContextsPK8sqeAgent + 0x0069 sqlra_cache_del_var + 0x0a26 sqlra_csm_drop_var+ 0x01a8 sqlra_csm_clean_lru + 0x0e03 sqlra_cache_space_mgmtP8sqlrr_cbm + 0x00ec sqlra_cache_reserve_memoryP8sqlrr_cbm + 0x02d8 sqlra_rollup_dyn_stmtPK11sqlmon_stmt + 0x04d6 sqlmon_conn8stmt_end + 0x0320 sqlmon_acb14agent_stmt_endEjbP8sqlrr_cb + 0x0668 sqlrr_rds_common_postP14db2UCinterfaceiil + 0x1350 sqlrr_prepareP14db2UCinterfaceP16db2UCprepareInfo + 0x02af and db2pd -latches show one of the latches it's holding was an RDS latch for the monitor: Address Holder Waiter Filename LOC LatchType HoldCount 0x0000000200DBBB68 8244 25 Unknown 681 SQLO_LT_sqlmon_conn__rds_latch 1 2) On the other hand, the newly spawned independent coordinator agent is waiting for a database latch being hold by a third agent, so it never got a chance to post back to the agent which has spawned it. For example, you might see the spawned independent agent 8399 had the stacks similar to below. sqloSpinLockConflict + 0x0240 sqeLocalDatabase29ShouldWeStartBackgroundAgents + 0x042a sqlm_a_initP8sqeAgent + 0x023a sqeApplication20InitEngineComponents+ 0x06c9 sqeApplication13AppStartUsing + 0x04f2 sqleSubAgentStartUsingP8sqeAgentP16SQLE_CLIENT_INFO + 0x035a sqeApplication22AppSecondaryStartUsing + 0x0202 sqleIndCoordProcessRequestP8sqeAgent + 0x059f sqleIndCoordProcessRequestP8sqeAgent + 0x02d8 sqleIndCoordProcessRequestP8sqeAgent + 0x0084 sqeAgent6RunEDUEv + 0x0381 and db2pd -latches show it's waiting for the database latch hold by the third agent 25: Address Holder Waiter Filename LOC LatchType HoldCount 0x00000002004D7F48 25 8339 Unknown 395 SQLO_LT_sqeLocalDatabase__dblatch 1 3) The third agent is in turn waiting for the latch being hold by the original agent executing the query as in 1). For example, you might see the third agent 25 had the stacks similar to below. sqloSpinLockConflict + 0x0240 sqloxult_trackP11sqlo_xlatch14SQLO_LT_VALUES + 0x005f sqm_collect_db_bp_data + 0x03b0 sqm_snap_dbase + 0x013e sqlmonssagnt + 0x0466 sqlmonssbackendP12SQLE_DB2RA_T + 0x0500 sqlesrvrP14db2UCinterface + 0x05bd sqleMappingFnServerP5sqldaP5sqlca + 0x049d sqlerKnownProcedure + 0x0270 sqlerCallDLP14db2UCinterfaceP9UCstpInfo + 0x0492 and db2pd -latches show it's waiting for the RDS latch hold by the original agent 8244. Address Holder Waiter Filename LOC LatchType HoldCount 0x0000000200DBBB68 8244 25 Unknown 681 SQLO_LT_sqlmon_conn__rds_latch 1 As a result, the dead latch occurs. Recycling the instance will solve the dead latch. This issue doesn't occur on DB2 version prior to DB2 UDB V9.7 for LUW. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * The issue doesn't occur on DB2 versions prior to DB2 UDB * * V9.7 for LUW * **************************************************************** * PROBLEM DESCRIPTION: * * See above Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 Fix Pack 4 * **************************************************************** | |
Local Fix: | |
1) turn off the default statement switch (DFT_MON_STMT) in the Database Manager Configuration. Also need to make sure there is no application turn on the database manager statement switch. For example, db2 Governor tool will turn on the statement switch. If db2 Governor is used, it need to be disabled as well. 2) increasing the package cache size(PCKCACHESZ) in the Database Configuration may also reduce the chance of getting the dead latch | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
Solution | |
Problem was first fixed in Version 9.7 Fix Pack 4 | |
Workaround | |
See above Local Fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 07.10.2010 02.05.2011 02.05.2011 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP4 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.4 |