DB2 - Problem description
Problem IT05793 | Status: Closed |
SQLO_LATCH_ERROR_EXPECTED_HELD ERROR LEADS TO DB2 CRASH IN WINDOWS | |
product: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
Problem description: | |
The database crashes when DB2 tries to unlock a latch which was either invalid or was already unlocked before, the error message logged in db2diag.log is : yyyy-mm-dd-23.30.00.450000+480 xxxxxxx LEVEL: Severe (OS) PID : 28436 TID : 22152 PROC : db2fmp64.exe INSTANCE: Instance_name NODE : 000 DB : Database_name APPID : *LOCAL.<dbname>.131017145330 EDUID : 22152 FUNCTION: DB2 UDB, SQO Latch Tracing, SQLO_SLATCH_CAS32::releaseConflictComplex, probe:360 MESSAGE : ZRC=0x870F011E=-2029059810=SQLO_LATCH_ERROR_EXPECTED_HELD "expected latch to be held." CALLED : OS, -, unspecified_system_function DATA #1 : String, 39 bytes Attempting to unlock an invalid latch: DATA #2 : File name, 16 bytes sqloLatchCAS32.C DATA #3 : Source file line number, 8 bytes 1503 DATA #4 : Codepath, 8 bytes 0 DATA #5 : String, 114 bytes 0x00000000: { held X: 0 reserved for X: 0 shared holders: 0 shared waiter: 0 exclusive waiter: 0 } DATA #6 : LatchMode, PD_TYPE_LATCH_MODE, 8 bytes 0x0 (invalid mode) DATA #7 : String, 560 bytes { state = 0x00000000 = { held X: 0 reserved for X: 0 shared holders: 0 shared waiter: 0 exclusive waiter: 0 } flags = 32768 (No X Starvation) numXPostsPending = 0 m_pSemPoolElement = 0x0000000000000000 cs = { lock = { 0x80 [ locked ] } flags = 0x0000 identity = SLATCH_DEFAULT::critical_section (26) } } DATA #8 : Hexdump, 24 bytes 0x00000001804DEA58 : 0000 0000 2283 0000 8000 1A00 0000 0000 ...."........... 0x00000001804DEA68 : 0000 0000 0000 0000 ........ CALLSTCK: [0] 0x000000018012B8C3 pdLogSysRC + 0x393 [1] 0x0000000180011FA4 SQLO_SLATCH_CAS32::dumpDiagInfoAndPanic + 0x1E2 [2] 0x00000001800124A0 SQLO_SLATCH_CAS32::releaseConflictComplex + 0x3BC [3] 0x0000000180012097 SQLO_SLATCH_CAS32::releaseConflict + 0x39 [4] 0x0000000180063BB8 sqlogmt2 + 0x24E [5] 0x000000000226A760 sqm_timestamp + 0x2C [6] 0x000000000244EA37 sqlmon_conn::stmt_start + 0x49D [7] 0x00000000023E52B8 sqlmon_acb::agent_stmt_start + 0x51E ...and then the database shutdown (crash) message is seen. Analysis : The above latch error, most of the times, is logged after a failure of some procedure which runs in the Administrative Task Scheduler (ATS). Failure of the procedure may also log similar error as below in db2diag.log : yyyy-mm-dd-23.15.06.504000+480 XXXX LEVEL: Error PID : 28436 TID : 22736 PROC : db2fmp64.exe INSTANCE: Instance_name NODE : 000 DB : Database_name APPID : *LOCAL.<database_name>.131017151501 EDUID : 22736 FUNCTION: DB2 UDB, Administrative Task Scheduler, AtsTask::executeTask, probe:600 MESSAGE : ZRC=0xFFFFFE4A=-438 DATA #1 : <preformatted> [IBM][CLI Driver][DB2/NT64] SQL0438N Application raised error or warning with diagnostic text: "SQL0438N Application raised error or warning with diagnostic text: "C". SQLSTATE=55555 Analysis shows that, this type of a latch problem happens due to an unwanted latch re-initialization in a timestamp related function in DB2. In the fix, we solve this problem of latch re-initialization in the failing function in order to avoid the crash. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Users on DB2 V10.1 Fixpak 4 and below * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 v10.1 Fixpak 5 * **************************************************************** | |
Local Fix: | |
There are 3 possible workarounds for the problem : 1> DB2_FMP_COMM_HEAPSZ=0 2> DB2_ATS_ENABLE=NO 3> Run the store procedure in question in a single thread FMP process i.e. NOT THREADSAFE. | |
Solution | |
First fixed in DB2 v10.1 Fixpak 5 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 26.11.2014 15.07.2015 15.07.2015 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
10.1.0.5 |