DB2 - Problem description
Problem IC70080 | Status: Closed |
Tablespace corruption due to IN-MEMORY POOL CONTROL BLOCK OUT OF SYNCH WITH POOL PAGE 0 IN REGARDS TO LAST INITIALIZED SMP EXTENT | |
product: | |
DB2 FOR LUW / DB2FORLUW / 950 - DB2 | |
Problem description: | |
DB2 has SMP (Space Map Page) extents at predefined locations in DMS tablespaces to state what extents are free, in use, or pending to be freed. SMP extents are allocated as needed and DB2 attempts to get rid of them once they are no longer needed. In the in-memory pool control block (sometimes called the poolCB or pdef), DB2 keeps track of various pieces of information related to the tablespace. One such thing is the last initialized SMP extent. It also allots that value to disk on the header page of the tablespace (this page is also known as pool page 0, since it's page 0 in the tablespace). Database startup uses that value on the header page to calculate the tablespace's high-water mark. It points to the last SMP extent in the tablespace so it navigates there to determine the last allocated extent. Tablespace corruption can happen when these conditions are met: o Free extents from the last SMP extent leaving the SMP extent empty (to get rid of it since it is no longer needed). Extents can be freed when objects are dropped, truncated, etc. o After doing this, the last used extent is now the one, which is right before that SMP extent (i.e. the new high-water mark points after that extent). Example: If we have an SMP extent at page 512,000 in the tablespace and it has some extents allocated from it, then an action would be required to free those extents up, leaving the SMP extent empty. And if the extent at page 511,992 (which includes pages 511,992 to 511,999) is in use and is not being removed as part of the same operation then our new high-water mark is going to be 512,000. The chances of actually emptying the last SMP extent and not freeing anything from the previous SMP extent at the same time is going to be quite low. In this case, we can end up setting the last initialized extent value in the pdef back to the previous SMP extent (which is correct) but we aren't setting it back correctly on the header page. As a result, these two values are out of synch. There are three scenarios to consider here: 1. If you were to shut down the database at that point and start it again, our code to calculate the high-water mark would attempt to read that SMP extent at page 512,000. Technically, it's not supposed to exist but it is still there on disk (because we didn't wipe it out). Therefore, we won't see the problem there. However, if the tablespace was to later grow such that we needed to start allocating a new SMP extent, we'd fail with a logic error at that point. 2. If you were to instead reduce the size of the tablespace at this point then you would lose the space where that SMP extent used to exist. If you shut down the database at this point and start it again, the attempt to recalculate the high-water mark will try to read past the end of the tablespace and fail. This problem has been addressed by APAR IC67502. 3. If the tablespace size was reduced, and then extended past where the old SMP existed and the service was interrupted by a shutdown/start up, then the result will yield an invalid page. This would occur because the code will try to recalculate the high-water mark and will arrive at the point where the SMP extend used to be. The fix for this APAR ensures that we update the value on the header page as well when we get rid of that unneeded SMP extent. When we get corruption due to this issue the entries in the db2diag.log will show as follows: FUNCTION: DB2 UDB, buffer pool services, sqlb_verify_page, probe:3 MESSAGE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad" DIA8400C A bad page was encountered. DATA #1 : String, 64 bytes Error encountered trying to read a page - information follows : DATA #2 : String, 23 bytes Page verification error DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes 11776007 DATA #4 : Object descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72 bytes Obj: {pool:9;obj:65534;type:14} Parent={9;65534} lifeLSN: 000000000000 tid: 0 0 0 extentAnchor: 0 initEmpPages: 0 poolPage0: 0 poolflags: 2122 objectState: 0 lastSMP: 0 pageSize: 4096 extentSize: 8 bufferPoolID: 1 partialHash: 4059955209 bufferPool: 0x7ffffff9a713bc00 DATA #5 : Bitmask, 4 bytes 0x00000002 DATA #6 : Page header, PD_TYPE_SQLB_PAGE_HEAD, 48 bytes pageHead: {pool:0;obj:0;type:0} PPNum:0 OPNum:0 begoff: 0 datlen: 0 pagebinx: 0 revnum: 0 pagelsn: 000000000000 flag: 0 signature: 0 cbits1to31: 0 cbits32to63: 0 CALLSTCK: [0] 0x7FFFFFFF7B017F30 __1cZsqlbLogReadAttemptFailure6FIpnQSQdDLB_OBJECT_DESC_IpnJSQdDL B_PAGE_ibLIpcpnMSQdDLB_GLOBALS__v_ + 0x150 [1] 0x7FFFFFFF7B01C8B8 __1cQsqlb_verify_page6FpnJSQdDLB_PAGE_pnQSQdDLB_OBJECT_DESC_IIpn MSQdDLB_GLOBALS_pL_i_ + 0x598 [2] 0x7FFFFFFF7B01965C sqlbReadPage + 0xE84 [3] 0x7FFFFFFF7AFF9334 __1cTsqlbGetPageFromDisk6FpnLSQdDLB_FIX_CB_i_i_ + 0x2EC [4] 0x7FFFFFFF7AF42404 __1cHsqlbfix6FpnLSQdDLB_FIX_CB__i_ + 0xA1C [5] 0x7FFFFFFF7B07AFA0 __1cYsqlbFindNewHighWaterMark6FHIpnJSQdDLP_LSN8_LpnMSQdDLB_GLOBA LS__i_ + 0xC38 [6] 0x7FFFFFFF7B06F4F8 __1cQsqlbDMSStartPool6FpnMSQdDLB_GLOBALS_pnMSQdDLB_POOL_CB__i_ + 0x7B8 [7] 0x7FFFFFFF7AF45B80 __1cOsqlbStartPools6FpnMSQdDLB_GLOBALS__i_ + 0x950 [8] 0x7FFFFFFF7AFDBA60 sqlbinit + 0xAB0 [9] 0x7FFFFFFF7B45AA70 __1cbBsqlePrepareForSerialization6FpnISQdDLE_BWA_pnIsqeAgent_pnK SQdDLER_GLOB_pnFsqlca_7_l_ + 0x2FA8 Other messages that point to this type of corruption can be: FUNCTION: DB2 UDB, buffer pool services, sqlb_verify_page, probe:3 MESSAGE : ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad" DIA8400C A bad page was encountered. DATA #1 : String, 64 bytes Error encountered trying to read a page - information follows : DATA #2 : String, 23 bytes Page verification error DATA #3 : Page ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes 11776007 DATA #4 : Object descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72 bytes Obj: {pool:9;obj:65534;type:14} Parent={9;65534} lifeLSN: 000000000000 tid: 0 0 0 extentAnchor: 0 initEmpPages: 0 poolPage0: 0 poolflags: 2122 objectState: 0 lastSMP: 0 pageSize: 4096 extentSize: 8 bufferPoolID: 1 partialHash: 4059955209 bufferPool: 0x7ffffff9b713bb40 DATA #5 : Bitmask, 4 bytes 0x00000002 DATA #6 : Page header, PD_TYPE_SQLB_PAGE_HEAD, 48 bytes pageHead: {pool:0;obj:0;type:0} PPNum:0 OPNum:0 begoff: 0 datlen: 0 pagebinx: 0 revnum: 0 pagelsn: 000000000000 flag: 0 signature: 0 cbits1to31: 0 cbits32to63: 0 CALLSTCK: [0] 0x7FFFFFFF7B017F30 __1cZsqlbLogReadAttemptFailure6FIpnQSQdDLB_OBJECT_DESC_IpnJSQdDL B_PAGE_ibLIpcpnMSQdDLB_GLOBALS__v_ + 0x150 [1] 0x7FFFFFFF7B01C8B8 __1cQsqlb_verify_page6FpnJSQdDLB_PAGE_pnQSQdDLB_OBJECT_DESC_IIpn MSQdDLB_GLOBALS_pL_i_ + 0x598 [2] 0x7FFFFFFF7B01965C sqlbReadPage + 0xE84 [3] 0x7FFFFFFF7AFF9334 __1cTsqlbGetPageFromDisk6FpnLSQdDLB_FIX_CB_i_i_ + 0x2EC [4] 0x7FFFFFFF7AF42404 __1cHsqlbfix6FpnLSQdDLB_FIX_CB__i_ + 0xA1C [5] 0x7FFFFFFF7B07AFA0 __1cYsqlbFindNewHighWaterMark6FHIpnJSQdDLP_LSN8_LpnMSQdDLB_GLOBA LS__i_ + 0xC38 [6] 0x7FFFFFFF7B06F4F8 __1cQsqlbDMSStartPool6FpnMSQdDLB_GLOBALS_pnMSQdDLB_POOL_CB__i_ + 0x7B8 [7] 0x7FFFFFFF7AF773E4 __1cRsqlbStartPoolRFwd6FpnMSQdDLB_GLOBALS_i_i_ + 0x214 [8] 0x7FFFFFFF7C8B2420 __1cQsqlpRfwFillSQdDLCA6Fipc000ipnFsqlca_i00h_v_ + 0x1A18 [9] 0x7FFFFFFF7C8B6800 __1cQsqlpRfwFillSQdDLCA6Fipc000ipnFsqlca_i00h_v_ + 0x5DF8 | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * Tablespace corruption due to IN-MEMORY POOL CONTROL BLOCKOUT * * OF SYNCH WITH POOL PAGE 0 IN REGARDS TO LAST INITIALIZEDSMP * * EXTENTDB2 has SMP (Space Map Page) extents at predefined * * locationsinDMS tablespaces to state what extents are free, * * in use, orpending to be freed. SMP extents are allocated as * * needed andDB2attempts to get rid of them once they are no * * longer needed.In the in-memory pool control block (sometimes * * called thepoolCBor pdef), DB2 keeps track of various pieces * * of informationrelated to the tablespace. One such thing is * * the lastinitialized SMP extent. It also allots that value * * to diskonthe header page of the tablespace (this page is * * also knownaspool page 0, since it's page 0 in the * * tablespace). Databasestartup uses that value on the header * * page to calculate thetablespace's high-water mark. It * * points to the last SMPextentin the tablespace so it * * navigates there to determine thelastallocated * * extent.Tablespace corruption can happen when these * * conditions aremet:o Free extents from the last SMP extent * * leaving the SMPextent empty (to get rid of it since it is no * * longerneeded).Extents can be freed when objects are dropped, * * truncated,etc.o After doing this, the last used extent is * * now the one,whichis right before that SMP extent (i.e. the * * new high-watermarkpoints after that extent).Example:If we * * have an SMP extent at page 512,000 in the tablespaceandit * * has some extents allocated from it, then an action * * wouldberequired to free those extents up, leaving the SMP * * extentempty.And if the extent at page 511,992 (which * * includes pages511,992to 511,999) is in use and is not being * * removed as part ofthesame operation then our new high-water * * mark is going to be512,000. The chances of actually * * emptying the last SMPextentand not freeing anything from the * * previous SMP extent at thesame time is going to be quite * * low.In this case, we can end up setting the last * * initializedextentvalue in the pdef back to the previous SMP * * extent (which iscorrect) but we aren't setting it back * * correctly on theheaderpage. As a result, these two values * * are out of synch. Therearethree scenarios to consider * * here:1. If you were to shut down the database at that point * * andstartit again, our code to calculate the high-water mark * * wouldattempt to read that SMP extent at page * * 512,000.Technically,it's not supposed to exist but it is * * still there on disk(because we didn't wipe it out). * * Therefore, we won't see theproblem there. However, if the * * tablespace was to later growsuch that we needed to start * * allocating a new SMP extent,we'dfail with a logic error at * * that point.2. If you were to instead reduce the size of the * * tablespaceatthis point then you would lose the space where * * that SMPextentused to exist. If you shut down the database * * at this pointandstart it again, the attempt to recalculate * * the high-watermarkwe'll will try to read past the end of the * * tablespace andfail.This problem has been addressed by APAR * * IC67502.3. If the tablespace size was reduced, and then * * extendedpastwhere the old SMP existed and the service was * * interrupted byashutdown/start up, then the result will yield * * an invalidpage.This would occur beause the code will try to * * recalculate thehigh-water mark and will arrive at the point * * where the SMPextend used to be.The fix for this APAR ensures * * that we update the value ontheheader page as well when we * * get rid of that unneeded SMPextent.When we get corruption * * due to this issue the entries in thedb2diag.log will show as * * follows:FUNCTION: DB2 UDB, buffer pool services, * * sqlb_verify_page,probe:3MESSAGE : * * ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad"DIA8400C A * * bad page was encountered.DATA #1 : String, 64 bytesError * * encountered trying to read a page - informationfollows :DATA * * #2 : String, 23 bytesPage verification errorDATA #3 : Page * * ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes11776007DATA #4 : Object * * descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72bytesObj: * * {pool:9;obj:65534;type:14} Parent={9;65534}lifeLSN: * * 000000000000tid: 0 0 0extentAnchor: * * 0initEmpPages: 0poolPage0: * * 0poolflags: 2122objectState: * * 0lastSMP: 0pageSize: * * 4096extentSize: * * 8bufferPoolID: 1partialHash: * * 4059955209bufferPool: 0x7ffffff9a713bc00DATA #5 : * * Bitmask, 4 bytes0x00000002DATA #6 : Page header, * * PD_TYPE_SQLB_PAGE_HEAD, 48 bytespageHead: * * {pool:0;obj:0;type:0} PPNum:0 OPNum:0begoff: * * 0datlen: 0pagebinx: * * 0revnum: 0pagelsn: 000000000000 * * flag: 0signature: * * 0cbits1to31: 0cbits32to63: * * 0CALLSTCK:[0] * * 0x7FFFFFFF7B017F30__1cZsqlbLogReadAttemptFailure6FIpnQSQdDLB_O * 0x150[1] * * 0x7FFFFFFF7B01C8B8__1cQsqlb_verify_page6FpnJSQdDLB_PAGE_pnQSQd * 0x598[2] 0x7FFFFFFF7B01965C sqlbReadPage + 0xE84[3] * * 0x7FFFFFFF7AFF9334__1cTsqlbGetPageFromDisk6FpnLSQdDLB_FIX_CB_i * + 0x2EC[4] 0x7FFFFFFF7AF42404 * * __1cHsqlbfix6FpnLSQdDLB_FIX_CB__i_+0xA1C[5] * * 0x7FFFFFFF7B07AFA0__1cYsqlbFindNewHighWaterMark6FHIpnJSQdDLP_L * 0xC38[6] * * 0x7FFFFFFF7B06F4F8__1cQsqlbDMSStartPool6FpnMSQdDLB_GLOBALS_pnM * 0x7FFFFFFF7AF45B80__1cOsqlbStartPools6FpnMSQdDLB_GLOBALS__i_ * * + 0x950[8] 0x7FFFFFFF7AFDBA60 sqlbinit + 0xAB0[9] * * 0x7FFFFFFF7B45AA70__1cbBsqlePrepareForSerialization6FpnISQdDLE * 0x2FA8Other messages that point to this type of corruption * * can be:FUNCTION: DB2 UDB, buffer pool services, * * sqlb_verify_page,probe:3MESSAGE : * * ZRC=0x86020001=-2046689279=SQLB_BADP "page is bad"DIA8400C A * * bad page was encountered.DATA #1 : String, 64 bytesError * * encountered trying to read a page - informationfollows :DATA * * #2 : String, 23 bytesPage verification errorDATA #3 : Page * * ID, PD_TYPE_SQLB_PAGE_ID, 4 bytes11776007DATA #4 : Object * * descriptor, PD_TYPE_SQLB_OBJECT_DESC, 72bytesObj: * * {pool:9;obj:65534;type:14} Parent={9;65534}lifeLSN: * * 000000000000tid: 0 0 0extentAnchor: * * 0initEmpPages: 0poolPage0: * * 0poolflags: 2122objectState: * * 0lastSMP: 0pageSize: * * 4096extentSize: * * 8bufferPoolID: 1partialHash: * * 4059955209bufferPool: 0x7ffffff9b713bb40DATA #5 : * * Bitmask, 4 bytes0x00000002DATA #6 : Page header, * * PD_TYPE_SQLB_PAGE_HEAD, 48 bytespageHead: * * {pool:0;obj:0;type:0} PPNum:0 OPNum:0begoff: * * 0datlen: 0pagebinx: * * 0revnum: 0pagelsn: 000000000000 * * flag: 0signature: * * 0cbits1to31: 0cbits32to63: * * 0CALLSTCK:[0] * * 0x7FFFFFFF7B017F30__1cZsqlbLogReadAttemptFailure6FIpnQSQdDLB_O * 0x150[1] * * 0x7FFFFFFF7B01C8B8__1cQsqlb_verify_page6FpnJSQdDLB_PAGE_pnQSQd * 0x598[2] 0x7FFFFFFF7B01965C sqlbReadPage + 0xE84[3] * * 0x7FFFFFFF7AFF9334__1cTsqlbGetPageFromDisk6FpnLSQdDLB_FIX_CB_i * + 0x2EC[4] 0x7FFFFFFF7AF42404 * * __1cHsqlbfix6FpnLSQdDLB_FIX_CB__i_+0xA1C[5] * * 0x7FFFFFFF7B07AFA0__1cYsqlbFindNewHighWaterMark6FHIpnJSQdDLP_L * 0xC38[6] * * 0x7FFFFFFF7B06F4F8__1cQsqlbDMSStartPool6FpnMSQdDLB_GLOBALS_pnM * 0x7FFFFFFF7AF773E4__1cRsqlbStartPoolRFwd6FpnMSQdDLB_GLOBALS_i_ * + 0x214[8] * * 0x7FFFFFFF7C8B2420__1cQsqlpRfwFillSQdDLCA6Fipc000ipnFsqlca_i00 * + 0x1A18[9] * * 0x7FFFFFFF7C8B6800__1cQsqlpRfwFillSQdDLCA6Fipc000ipnFsqlca_i00 * + 0x5DF8 * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 version 9.5 and Fix Pack 6 * **************************************************************** | |
Local Fix: | |
Restore corrupt tablespace from valid backup | |
available fix packs: | |
DB2 Version 9.5 Fix Pack 6a for Linux, UNIX, and Windows | |
Solution | |
Problem was first fixed in DB2 version 9.5 and Fix Pack 6 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 21.07.2010 20.09.2010 20.09.2010 |
Problem solved at the following versions (IBM BugInfos) | |
9.5. | |
Problem solved according to the fixlist(s) of the following version(s) |