DB2 - Problem description
Problem IC98789 | Status: Closed |
HASH LATCH CONTENTION CAUSES POOR PERFORMANCE | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
This APAR applies to all platforms under the combination of the following conditions: 1) A LOCKLIST greater than 8000 pages is used. 2) Many applications are accessing one specific table concurrently via SELECT or DELETE queries, or via searched DELETE or CLOSE CURSOR operations against cursor opened FOR READ ONLY. If you suspect that you are hitting this problem, collect "db2pd -latches" at various intervals. The output that is relevant in this case is column 2 (Holder), column 3 (Waiter) and column 5 (LatchType). If you see many lines of output that have a LatchType of SQLO_LT_SQLP_LHSH__hshlatch (lock manager hash table latch) and have the same Holder value (EDU ID of the EDU holding the latch) with various Waiter values (EDU ID of an EDU waiting on the latch), then it is possible that you might be hitting this issue. Note that multiple unique holders may be present, as is the case in this example. Address Holder Waiter Filename LOC LatchType HoldCount 0x07000046EBBE0B58 798762 213934 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 129857 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 140132 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 ... repeats many times ... ... with the same Holder value ... ... with varying Waiter values ... 0x07000046EBBE0B58 798762 1186579 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 1190691 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 1200514 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 1202054 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 800313 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 0x07000046EBBE0B58 798762 467072 sqlpLockInternal.h 554 SQLO_LT_SQLP_LHSH__hshlatch 0 ... skip some other entries ... 0x07000046EBBE0B58 1156509 213934 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 129857 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 140132 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 ... repeats many times ... ... with the same Holder value (but different than the Holder value above) ... ... with varying Waiter values ... 0x07000046EBBE0B58 1156509 148094 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 149379 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 0x07000046EBBE0B58 1156509 160167 sqlpLockInternal.h 520 SQLO_LT_SQLP_LHSH__hshlatch 1 ... skip other entries ... Once you have confirmed from "db2pd -latches" output that your environment might be suffering from this issue, you can collect additional information from agents to confirm that this specific problem is the issue in your environment. For each of the holder values in the "db2pd -latches" output, collect "db2pd -stacks <holder_EDU_ID>" to dump the stack trace of the EDUs waiting on the hash latch. This may need to be collected mutiple times in order to capture an instance when the EDU is actively holding the latch. The holder EDU stack that indicates the problem scenario looks like this: -------Frame------ ------Function + Offset------ 0x09000000000FF858 thread_wait + 0x98 0x0900000045849C34 getConflictComplex__17SQLO_SLATCH_CAS64FCUl + 0x3D4 0x090000004584A258 getConflict__17SQLO_SLATCH_CAS64FCUl + 0xD8 0x0900000045EEA990 sqlplrl__FP9sqeBsuEduP14SQLP_LOCK_INFO + 0x3F0 0x090000004685F3A0 sqldmclo__FP8sqeAgentPP8SQLD_CCBi + 0x1BA0 0x09000000468545F0 sqlriclo__FP8sqlrr_cbP9sqlri_taoi + 0x550 0x0900000045B9BA98 sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 + 0x2B8 0x0900000045B9B4C8 sqlricls_simple__FP8sqlrr_cbil + 0x1488 0x09000000476F15AC sqlrr_process_close_request__FP8sqlrr_cbiN32 + 0x20C 0x0900000046EFEEE8 sqlrr_close__FP14db2UCinterfaceP15db2UCCursorInfo + 0x208 In addition, for various waiter values in the "db2pd -latches" output, collect "db2pd -stacks <waiter_EDU_ID>". Again, you may need to collect this multiple times in order to capture an instance when the EDU is actively waiting on the latch. The waiter EDU stack that indicates the problem scenario looks like this: -------Frame------ ------Function + Offset------ 0x09000000000F7A94 thread_wait + 0x94 0x0900000037F7314C getConflictComplex__17SQLO_SLATCH_CAS64FCUl + 0x318 0x0900000037F7392C getConflict__17SQLO_SLATCH_CAS64FCUl + 0x68 0x0900000037F255A4 getConflict__17SQLO_SLATCH_CAS64FCUl@glueFB@clone1 + 0x74 0x0900000037F254CC sqlpLatchHashEntryForTableLockExclusive__FP9SQLP_LHSH@glue13EF + 0x20 0x0900000037F927BC sqlplrq__FP9sqeBsuEduP14SQLP_LOCK_INFO + 0x104 0x0900000037FCBFC8 sqldLockTable__FP8sqeAgentP14SQLP_LOCK_INFOUiUsi + 0xF8 0x0900000037FCC650 sqldScanOpen__FP8sqeAgentP14SQLD_SCANINFO1P14SQLD_SCANINFO2PPv + 0x57C 0x0900000037FCB8BC sqlriopn__FP8sqlrr_cbP9sqlri_taoPi + 0x29C 0x0900000037FCB564 sqlriopn__FP8sqlrr_cbP9sqlri_taoPi@glue271 + 0x74 0x0900000037FA1D1C sqlrita__FP8sqlrr_cb + 0x6C 0x0900000037F9FEA4 sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x24 0x0900000038015A2C sqlrr_process_fetch_request__FP14db2UCinterface + 0x1E4 0x0900000037FF02E0 sqlrr_open__FP14db2UCinterfaceP15db2UCCursorInfo + 0xB0C If the three primary conditions are met, and holder EDU and waiter EDU stacks match those listed above, then you might obtain relief after applying the local fix or by upgrading to a newer level of DB2 that contains the fix for this APAR. Local Fix Apply the following registry setting and restart DB2. DB2_KEEPTABLELOCK=TRANSACTION | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 V97FP10 or higher version. * **************************************************************** | |
Local Fix: | |
DB2_KEEPTABLELOCK=TRANSACTION | |
Solution | |
Fixed on DB2 V97FP10 or higher version. | |
Workaround | |
DB2_KEEPTABLELOCK=TRANSACTION | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 17.01.2014 18.11.2014 18.11.2014 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP10 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.10 |