DB2 - Problem description
Problem IC76395 | Status: Closed |
DB2 crash when index is modified during concurrent table updates/inserts on zLinux | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
Due to a timing issue, DB2 may crash in addkeyToLeaf when an index is being modified by multiple updates or inserts. This problem only appears on systems running zLinux. The stack trace should look like this <StackTrace> -----FUNC-ADDR---- ------FUNCTION + OFFSET------ 0x0000020005132D52 ossDumpStackTrace + 0x0102 (/home/db2inst1/sqllib/lib64/libdb2osse.so.1) 0x000002000512DF8E _ZN11OSSTrapFile4dumpEmiP7siginfoPv + 0x00d2 (/home/db2inst1/sqllib/lib64/libdb2osse.so.1) 0x000002000284DFFA sqlo_trce + 0x0702 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00000200028E6E6E sqloEDUCodeTrapHandler + 0x0166 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002016DC09DF0 address: 0x2016dc09df0 0x00000200039C5C0A address: 0x00000200039C5C0A ; dladdress: 0x0000020000056000 ; offset in lib: 0x000000000396FC0A ; (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00000200039CC03E _Z8sqlischaP7SQLI_CBP11SQLI_SAGLOBij + 0x139a (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00000200039CAFA6 _Z8sqlischaP7SQLI_CBP11SQLI_SAGLOBij + 0x0302 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00000200039CAFA6 _Z8sqlischaP7SQLI_CBP11SQLI_SAGLOBij + 0x0302 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000393E478 _Z8sqliaddkP8sqeAgentP9SQLD_IXCBP8SQLD_KEYP12SQLI_KEYDATAP14SQLP _LOCK_INFOP8SQLP_LRBmP10SQLI_IXPCR + 0x0670 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000126F8C0 _Z13sqldKeyInsertP13SQLD_DFM_WORKP16SQLD_TABLE_CACHES2_P13SQLD_T DATARECim + 0x03d0 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000127312A _Z13sqldRowInsertP8sqeAgenttthmiPP10SQLD_VALUEP13SQLD_TDATARECP8 SQLZ_RID + 0x0682 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020002EEAFE6 _Z8sqlrinsrP8sqlrr_cbttitPP10SQLD_VALUEmP8SQLZ_RID + 0x00c2 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020002F4E2F4 _Z8sqlriisrP8sqlrr_cb + 0x0268 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020002F4B7DE _Z15sqlriSectInvokeP8sqlrr_cbP12sqlri_opparm + 0x01d2 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020002BF3D92 _Z29sqlrr_process_execute_requestP8sqlrr_cbi + 0x2182 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020002BD1EBC _Z13sqlrr_executeP14db2UCinterfaceP9UCstpInfo + 0x0b84 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000177F3B0 _Z19sqljs_ddm_excsqlsttP14db2UCinterfaceP13sqljDDMObject + 0x0418 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000174ECC2 _Z21sqljsParseRdbAccessedP13sqljsDrdaAsCbP13sqljDDMObjectP14db2U Cinterface + 0x0302 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000174F13A _Z10sqljsParseP13sqljsDrdaAsCbP14db2UCinterfaceP8sqeAgentb + 0x0316 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020001745EC4 address: 0x0000020001745EC4 ; dladdress: 0x0000020000056000 ; offset in lib: 0x00000000016EFEC4 ; (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000174949A address: 0x000002000174949A ; dladdress: 0x0000020000056000 ; offset in lib: 0x00000000016F349A ; (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x000002000174A040 _Z17sqljsDrdaAsDriverP18SQLCC_INITSTRUCT_T + 0x0134 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020001486FD4 _ZN8sqeAgent6RunEDUEv + 0x050c (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x0000020003A6BE92 _ZN9sqzEDUObj9EDUDriverEv + 0x01be (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00000200028E828E sqloEDUEntry + 0x02f6 (/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00000200000410C2 address: 0x00000200000410C2 ; dladdress: 0x000002000003A000 ; offset in lib: 0x00000000000070C2 ; (/lib64/libpthread.so.0) 0x0000020005647EC6 address: 0x0000020005647EC6 ; dladdress: 0x000002000556B000 ; offset in lib: 0x00000000000DCEC6 ; (/lib64/libc.so.6) </StackTrace> And you should see entries in the db2diag.log similar to the following: 2011-01-01-10.29.30.129152-180 I152730159A579 LEVEL: Severe PID : 8632 TID : 2205163710800PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DATABASE APPHDL : 0-52207 APPID: 111.111.111.11.51100.1104271323 AUTHID : AUTH1 EDUID : 596 EDUNAME: db2agent (DATABASE) 0 FUNCTION: DB2 UDB, relation data serv, sqlrr_dump_ffdc, probe:250 RETCODE : ZRC=0x87120007=-2028863481=SQLR_SEVERE_PGM_ERROR "Severe programming error" DIA8516C A severe internal processing error has occurred. . . . . 2011-01-01-10.30.00.165822-180 I152737720A1349 LEVEL: Severe PID : 8632 TID : 2205163710800PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DATABASE APPHDL : 0-52207 APPID: 111.111.111.11.51100.1104271323 AUTHID : AUTH1 EDUID : 596 EDUNAME: db2agent (DATABASE) 0 FUNCTION: DB2 UDB, relation data serv, sqlrr_dump_sibling, probe:140 MESSAGE : section stmt DATA #1 : Hexdump, 164 bytes 0x000002003F624760 : 494E 5345 5254 2049 4E54 4F20 5347 4246 INSERT INTO AUTH1 0x000002003F624770 : 2E52 4C5F 4841 4249 4C49 5441 4341 4F5F .TAB123456 0x000002003F624780 : 4D4F 5449 564F 2020 2843 4F5F 4348 565F (COL1 0x000002003F624790 : 4E41 5455 5241 4C5F 5052 4546 4549 5455 0x000002003F6247A0 : 5241 2C20 434F 5F46 414D 494C 4941 525F , COL2 0x000002003F6247B0 : 4641 4D2C 2043 4F5F 4348 565F 4E41 5455 , COL3 0x000002003F6247C0 : 5241 4C5F 5045 5353 4F41 2C20 2044 545F , COL4 0x000002003F6247D0 : 4841 4249 4C49 5441 4341 4F2C 2043 4F5F , COL5 0x000002003F6247E0 : 5345 515F 4D4F 5449 564F 2920 5641 4C55 ) VALU 0x000002003F6247F0 : 4553 2028 3F2C 3F2C 3F2C 7379 7364 6174 ES (?,?,?,sysdat 0x000002003F624800 : 652C 3F29 e,?) . . . 2011-01-01-11.00.00.150952-180 E152793289A1582 LEVEL: Critical PID : 8632 TID : 2205163710800PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DATABASE APPHDL : 0-52207 APPID: 111.111.111.11.51100.1104271323 AUTHID : AUTH1 EDUID : 596 EDUNAME: db2agent (DATABASE) 0 FUNCTION: DB2 UDB, oper system services, sqloEDUCodeTrapHandler, probe:90 MESSAGE : ADM14011C A critical failure has caused the following type of error: "Trap". The DB2 database manager cannot recover from the failure. First Occurrence Data Capture (FODC) was invoked in the following mode: "Automatic". FODC diagnostic information is located in the following directory: "/home/db2inst1/sqllib/db2dump/FODC_Trap_2011-01-01-10.30.00.509 763/" . DATA #1 : Signal Number Recieved, 4 bytes 11 DATA #2 : Siginfo, 128 bytes 0x000002016DC09DF8 : 0000 000B 2FED 3D68 0000 0002 0000 0000 ..../.=h........ 0x000002016DC09E08 : 0000 0200 17C0 A000 0000 0000 0000 0000 ................ 0x000002016DC09E18 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000002016DC09E28 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000002016DC09E38 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000002016DC09E48 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000002016DC09E58 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x000002016DC09E68 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ A different symptom that could be caused by this APAR is a crash with the following message in the db2diag.log: 2011-08-08-19.47.13.394127-180 I10853865A5731 LEVEL: Severe PID : 3812 TID : 2203964139856PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : COMP_TBP APPHDL : 0-45977 APPID: 10.210.60.43.34469.110927222302 AUTHID : USRCMP EDUID : 1171 EDUNAME: db2agent (COMP_TBP) 0 FUNCTION: DB2 UDB, index manager, sqliNormalAddKey, probe:776 MESSAGE : ZRC=0x87090054=-2029453228=SQLI_PRG_ERR "Program error" DIA8575C An index manager programming error occurred. -- As a second symptom, DB2 may crash with SEGV (Signal #11) in delkey when trying to call sqliLatchIndexIfNecessary() during an online table REORG cleanup on the zLinux platform. Here's a sample of the stack dump i.e. Stack #1 Signal #11 Timestamp 2011-08-31-01.12.23.735832 0 ------FUNCTION 1 OSSTrapFile::dump 2 sqlo_trce 3 sqloEDUCodeTrapHandler 4 delkey 5 procLeaf2Del 6 sqlischd 7 sqlischd 8 sqlischd 9 sqlischd 10 sqliOLRCleanup 11 sqldOLRCleanup 12 sqldDoCleanupLogEndMovePhase 13 sqldOnlineTableReorg 14 sqleIndCoordProcessRequest 15 sqeAgent::RunEDU 16 sqzEDUObj::EDUDriver 17 sqloEDUEntry 18 start_thread 19 msgrcv Here's the corresponding db2diag.log entries - 2011-08-31-01.12.24.698785-420 I1954496A485 LEVEL: Warning PID : 5215 TID : 2199207799120PROC : db2sysc 0 INSTANCE: xxxxxxx NODE : 000 DB : xxxxxx APPHDL : 0-729 APPID: *LOCAL.DB2.xxxxxxxxxxxx AUTHID : xxxxxxx EDUID : 87 EDUNAME: db2reorg (xxxxxx) 0 FUNCTION: DB2 UDB, RAS/PD component, pdEDUIsInDB2KernelOperation, probe:600 DATA #1 : String, 12 bytes _Z8sqlischd DATA #2 : String, 4 bytes sqli 2011-08-31-01.12.24.699157-420 I1954982A561 LEVEL: Severe PID : 5215 TID : 2199207799120PROC : db2sysc 0 INSTANCE: xxxxxxx NODE : 000 DB : xxxxxx APPHDL : 0-729 APPID: *LOCAL.DB2.xxxxxxxxxxxx AUTHID : xxxxxxx EDUID : 87 EDUNAME: db2reorg (xxxxxx) 0 FUNCTION: DB2 UDB, RAS/PD component, pdResilienceIsSafeToSustain, probe:800 DATA #1 : String, 37 bytes Trap Sustainability Criteria Checking DATA #2 : Hex integer, 8 bytes 0x0000000400021810 DATA #3 : Boolean, 1 bytes false 2011-08-31-01.12.24.699357-420 I1955544A559 LEVEL: Error PID : 5215 TID : 2199207799120PROC : db2sysc 0 INSTANCE: xxxxxxx NODE : 000 DB : xxxxxx APPHDL : 0-729 APPID: *LOCAL.DB2.xxxxxxxxxxxx AUTHID : xxxxxxx EDUID : 87 EDUNAME: db2reorg (xxxxxx) 0 FUNCTION: DB2 UDB, base sys utilities, sqleagnt_sigsegvh, probe:1 MESSAGE : Error in agent servicing application with coor_node: DATA #1 : Hexdump, 2 bytes 0x000002000AC0963C : 0000 .. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * zlinux users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 UDB version 9.7 fp 5 * **************************************************************** | |
Local Fix: | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows | |
Solution | |
Problem is first fixed in DB2 UDB version 9.7 fix pack 5 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 13.05.2011 13.06.2012 13.06.2012 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP5 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.5 |