DB2 - Problem description
Problem IT01914 | Status: Closed |
HASH LATCH CONTENTION CAUSES POOR PERFORMANCE | |
product: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
Problem description: | |
This APAR applies to all platforms under the combination of the following conditions: 1) A LOCKLIST greater than 8000 pages is used. 2) Many applications are accessing one specific table concurrently via SELECT or DELETE queries, or via searched DELETE or CLOSE CURSOR operations against cursor opened FOR READ ONLY. If you suspect that you are hitting this problem, collect "db2pd -latches" at various intervals. The output that is relevant in this case is column 2 (Holder), column 3 (Waiter) and column 5 (LatchType). If you see many lines of output that have a LatchType of SQLO_LT_SQLP_LHSH__hshlatch (lock manager hash table latch) and have the same Holder value (EDU ID of the EDU holding the latch) with various Waiter values (EDU ID of an EDU waiting on the latch), then it is possible that you might be hitting this issue. Note that multiple unique holders may be present, as is the case in this example. Address Holder Waiter Filename LOC LatchType HoldCount 0x07000046EBBE0B58 798762 213934 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 129857 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 140132 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 ... repeats many times ... ... with the same Holder value ... ... with varying Waiter values ... 0x07000046EBBE0B58 798762 1186579 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 1190691 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 1200514 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 1202054 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 800313 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 467072 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 ... skip some other entries ... 0x07000046EBBE0B58 1156509 213934 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 129857 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 140132 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 ... repeats many times ... ... with the same Holder value (but different than the Holder value above) ... ... with varying Waiter values ... 0x07000046EBBE0B58 1156509 148094 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 149379 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 160167 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 ... skip other entries ... Once you have confirmed from "db2pd -latches" output that your environment might be suffering from this issue, you can collect additional information from agents to confirm that this specific problem is the issue in your environment. For each of the holder values in the "db2pd -latches" output, collect "db2pd -stacks <holder_EDU_ID>" to dump the stack trace of the EDUs waiting on the hash latch. This may need to be collected mutiple times in order to capture an instance when the EDU is actively holding the latch. The holder EDU stack that indicates the problem scenario looks like this: -------Frame------ ------Function + Offset------ 0x09000000000F7A94 thread_wait + 0x94 0x09000000397D1960 getConflictComplex__17SQLO_SLATCH_CAS64FCUl + 0x2A8 0x0900000039818BB8 getConflict__17SQLO_SLATCH_CAS64FCUl + 0x78 0x09000000397F8760 sqlplrl__FP9sqeBsuEduP14SQLP_LOCK_INFOCUl + 0xDE4 0x09000000397F6F50 sqldmclo__FP8sqeAgentPP8SQLD_CCBi + 0x3CC 0x090000003979CB20 sqlriclo__FP8sqlrr_cbP9sqlri_taoi + 0xA0 0x090000003979C960 sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 + 0x8A4 0x09000000396DCC2C sqlricls_simple__FP8sqlrr_cbil + 0xB5C 0x09000000396D6B84 sqlrr_process_close_request__FP8sqlrr_cbiN32 + 0x154 0x09000000396D6560 sqlrr_close__FP14db2UCinterfaceP15db2UCCursorInfo + 0xD64 In addition, for various waiter values in the "db2pd -latches" output, collect "db2pd -stacks <waiter_EDU_ID>". Again, you may need to collect this multiple times in order to capture an instance when the EDU is actively waiting on the latch. The waiter EDU stack that indicates the problem scenario looks like this: -------Frame------ ------Function + Offset------ 0x09000000000F7A94 thread_wait + 0x94 0x09000000397D1960 getConflictComplex__17SQLO_SLATCH_CAS64FCUl + 0x2A8 0x0900000039818BB8 getConflict__17SQLO_SLATCH_CAS64FCUl + 0x78 0x09000000398189E0 sqlplrq__FP9sqeBsuEduP14SQLP_LOCK_INFO + 0x98 0x090000003978AFD8 .sqldLockTable.fdpr.clone.161__FP8sqeAgentP14SQLP_LOCK_INFOUiUsi + 0xC4 0x090000003978B5E0 sqldScanOpen__FP8sqeAgentP14SQLD_SCANINFO1P14SQLD_SCANINFO2PPv + 0x544 0x0900000039798DF4 sqlriopn__FP8sqlrr_cbP9sqlri_taoPi + 0x484 0x09000000397AE240 sqlrita__FP8sqlrr_cb + 0x58 0x09000000397AE0C8 sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x34 0x09000000396D37EC sqlrr_process_fetch_request__FP14db2UCinterface + 0xF08 0x09000000396D17C4 sqlrr_open__FP14db2UCinterfaceP15db2UCCursorInfo + 0x12E8 If the three primary conditions are met, and holder EDU and waiter EDU stacks match those listed above, then you might obtain relief after applying the local fix or by upgrading to a newer level of DB2 that contains the fix for this APAR. Local Fix Apply the following registry setting and restart DB2. DB2_KEEPTABLELOCK=TRANSACTION | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 V101FP5 or higher version. * **************************************************************** | |
Local Fix: | |
DB2_KEEPTABLELOCK=TRANSACTION | |
Solution | |
Fixed on DB2 V101FP5 or higher version. | |
Workaround | |
DB2_KEEPTABLELOCK=TRANSACTION | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 20.05.2014 18.11.2014 18.11.2014 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.1.0.5 |