DB2 - Problem description
Problem IC92924 | Status: Closed |
DB2 HANGS WHEN STMM RESIZING BUFFERPOOL AFTER "SORT LIST SERVICES PROGRAMMING ERROR" | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
DB2 might hang with the following symptomps: 1. There is a lot of latch contention on the following latch SQLO_LT_sqeDBMgr__dbMgrLatch. 2. STMM shows a stack similar to the following: 0x090000001520BD88 sqloWaitEDUWaitPost + 0x218 0x0900000016D68120 sqlbRemInvalidPagesFromBufferPool__FP15SQLB_BufferPoolUiN32P12SQ LB_GLOBALS + 0x334 0x0900000016D64DA0 sqlbDecreaseBufferpoolSize__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALT ER_INFOP12SQLB_GLOBALS + 0x554 0x0900000016D63FE4 sqlbResizeBufferPool__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALTER_INF OP12SQLB_GLOBALS + 0x1E8 0x09000000165EBBA4 sqlbAlterAutomaticBufferPool__FUiiP8sqeAgent + 0x654 0x0900000015F3B058 sqlrlStmmAlterBufferPool__FP8sqeAgentPciT3P14db2UCinterfaceP5sql ca + 0x414 0x0900000015F88784 stmmAlterBufferPool__FP8sqeAgentPciT3 + 0x1F4 0x0900000015F85924 stmmResizeRecord__FP21stmmCostBenefitRecordP16sqeLocalDatabase + 0xEB4 0x0900000015F838B8 stmmDecreaseEntriesAndRemoveFromList__FP16sqeLocalDatabasePP21st mmCostBenefitRecordPUi + 0xBE0 0x0900000015F820F0 stmmResizeEntriesAndRemoveFromList__FPP21stmmCostBenefitRecordP1 6sqeLocalDatabase + 0xCC 0x0900000015F81E94 stmmTuneMemory__FPP21stmmCostBenefitRecordP16sqeLocalDatabase + 0x11C 0x0900000015566824 stmmMemoryTunerMain + 0x488 0x09000000153AA25C sqleIndCoordProcessRequest__FP8sqeAgent + 0x198 0x09000000150F1B94 RunEDU__8sqeAgentFv + 0x16C 0x09000000150EE418 EDUDriver__9sqzEDUObjFv + 0xF4 0x09000000150E51CC sqloEDUEntry + 0x264 There are not entries in the STMM logs showing this activity. 3. The last agent trying to deactivate the database is on the following stack: 0x090000001520BD88 sqloWaitEDUWaitPost + 0x218 0x09000000150D75C4 sqloWaitEDUWaitPost@glue113 + 0x78 0x09000000150D6FBC TermDbConnect__16sqeLocalDatabaseFP8sqeAgentP5sqlcai + 0x620 0x09000000150D282C AppStopUsing__14sqeApplicationFP8sqeAgentUcP5sqlca + 0xD88 0x09000000153E40F4 sqlesrspWrp__FP14db2UCinterface + 0xA8 0x09000000153E4368 sqleUCagentConnectReset + 0xF8 0x0900000015429340 @63@sqljsCleanup__FP8sqeAgentP14db2UCconHandle + 0x910 0x090000001542A2F0 @63@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x330 0x0900000015429D1C sqljsDrdaAsDriver__FP18SQLCC_INITSTRUCT_T + 0x100 0x09000000150F1D1C RunEDU__8sqeAgentFv + 0x2F4 0x09000000150EE418 EDUDriver__9sqzEDUObjFv + 0xF4 0x09000000150E51CC sqloEDUEntry + 0x264 4. At some point in time there are entries in the db2diag.log like which have the message "Sort Error. Failed sanity check before unfixing page, fixount is 0, aborting sort" as key: FUNCTION: DB2 UDB, sort/list services, sqlsSanityCheckPageAlreadyUnfixed, probe:4099 MESSAGE : ZRC=0x82130001=-2112684031=SQLS_NONSEVERE_PE "Sort List Services programming error." DIA8532C An internal processing error has occurred. DATA #1 : String, 81 bytes Sort Error. Failed sanity check before unfixing page, fixount is 0, aborting sort DATA #2 : Fix control block, PD_TYPE_SQLB_FIX_CB, 168 bytes accessMethod: SQLB_POOL_RELATIVE fixMode: 2 SQLBOLD/SQLBOLDS buffptr: 0x0000000000000000 bpdPtr: 0x0770000024973170 dmDebugHdl: 0 objectPageNum: 2280 empDiskPageNum: 4294967295 unfixFlags: 6 SQLB_UFIX_PURGE_MODE | SQLB_UFIX_DEFERRED_MODE dirtyState: SQLBCLEAN fixInfoFlags: regEDUid: 0 Pagekey: {pool:1;obj:2;type:128} PPNum:2280 And there is a matching trap file with a stack like: pthread_kill + 0x88 sqloDumpEDU + 0x34 sqldDumpContext__FP9sqeBsuEduiN42PCcPvT2 + 0xC4 sqldDumpContext__FP9sqeBsuEduiN42PCcPvT2@glue5AE + 0x98 sqlrr_dump_ffdc__FP8sqlrr_cbiT2 + 0x388 sqlzeDumpFFDC__FP8sqeAgentUiP5sqlcai + 0x30 sqlzeDumpFFDC__FP8sqeAgentUiP5sqlcai@glue534 + 0x80 sqlzeMapZrc__FP8sqeAgentUiUlT2P5sqlcaiPC12sqlzeContextb + 0x1F8 sqlrrMapZrc__FP8sqlrr_cbUiUli@glue3C3 + 0x80 sqlriclo__FP8sqlrr_cbP9sqlri_taoi + 0xA8 sqlriclo__FP8sqlrr_cbP9sqlri_taoi@glueBA6 + 0x78 sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 + 0x30 sqlricls_simple__FP8sqlrr_cbil + 0x170 sqlrr_process_close_request__FP8sqlrr_cbiN32 + 0x18C sqlrr_close__FP14db2UCinterfaceP15db2UCCursorInfo + 0x304 sqljs_ddm_clsqry__FP14db2UCinterfaceP13sqljDDMObject + 0x760 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UC interface + 0x180 .sqljsParse.fdpr.clone.0__FP13sqljsDrdaAsCbP14db2UCinterfaceP8sq eAgentb + 0x6DC @63@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x2D4 @63@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xB4 @63@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x2D0 This might show also with entries similar to : FUNCTION: DB2 UDB, sort/list services, sqlsSanityCheckPageAlreadyUnfixed, probe:4099 MESSAGE : ZRC=0x82130001=-2112684031=SQLS_NONSEVERE_PE "Sort List Services programming error." DIA8532C An internal processing error has occurred. DATA #1 : String, 81 bytes Sort Error. Failed sanity check before unfixing page, fixount is 0, aborting sort DATA #2 : Fix control block, PD_TYPE_SQLB_FIX_CB, 168 bytes accessMethod: SQLB_POOL_RELATIVE fixMode: 2 SQLBOLD/SQLBOLDS buffptr: 0x0000000000000000 bpdPtr: 0x07700000ca4af6a0 dmDebugHdl: 0 objectPageNum: 4 empDiskPageNum: 4294967295 unfixFlags: 2 SQLB_UFIX_PURGE_MODE dirtyState: SQLBCLEAN fixInfoFlags:Page 4 of 9 regEDUid: 0 Pagekey: {pool:1;obj:2;type:128} PPNum:4 .... CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x0900000012A79A84 pdLog + 0xF4 [1] 0x090000001410A168 pdLog@glue421 + 0x12C [2] 0x09000000133A8E6C sqlsmergerec__FP8sqeAgentP10SQLS_SLDESP10SQLS_MBUFSUl + 0x260 [3] 0x09000000128E0BA0 sqlsfetc__FP8sqeAgentP8SQLD_CCBiP10SQLD_DPREDPP10SQLD_VALUEP8SQL Z_RIDPc + 0x438 [4] 0x0900000012A5F288 sqlriPrefetchRIDs__FP8sqlrr_cbP8sqlri_lfl + 0x260 [5] 0x09000000128E1BD0 sqlriListFetch__FP8sqlrr_cb + 0x4C [6] 0x090000001293B16C sqlriNljnPiped__FP8sqlrr_cb + 0x26C [7] 0x090000001293C624 sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x30 [8] 0x0900000012B6F024 sqlrr_process_fetch_request__FP14db2UCinterface + 0x1C0 [9] 0x09000000129A68D0 sqlrr_fetch__FP14db2UCinterfaceP15db2UCCursorInfo + 0x38C The hung situation is due to the sort problem documented in point 4 above. STMM will hang on the page after the sort failure, making it impossible to deactivate the database or allowing new connections. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 Version 9.7 and Fix Pack 9. * **************************************************************** | |
Local Fix: | |
Make sure that you explicitly activate the database with "db2 activate database..." which will avoid hangs on disconnections of the last agent. Note that if you hit this issue, even after activating the database explicitly, STMM will still be stuck and you might need to restart the database for it to continue working. | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows | |
Solution | |
Problem was first fixed in DB2 Version 9.7 and Fix Pack 9. | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 07.06.2013 17.12.2013 17.12.2013 |
Problem solved at the following versions (IBM BugInfos) | |
9.0., 9.7. | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.9 | |
9.7.0.9 |