DB2 - Problem description
Problem IC82898 | Status: Closed |
HADR STANDBY may run out of receive buffer space, during intensive insert/update XML dependant operations on PRIMARY. | |
product: | |
DB2 FOR LUW / DB2FORLUW / 950 - DB2 | |
Problem description: | |
As an effect you can observe high utilization of HADR receive buffer. In output of command: db2pd -d <DB_NAME> -hadr receive buffer is close to 100%: StandByRcvBufUsed: 100% Cause of the issue, is that XML log records does not setup correct log record blocking level, and as a consequence database wide blocking level is used what causes congestion on HADR STANDBY. During congestion redo master is either processing a database blocking log record or waiting for redo workers to complete the work in queues in order to process a database wide blocking log record. Problem can be further identified by analysis of stacks on STANDBY side. After issuing command: db2pd -stack all; you can see one of 2 situations in stacks created: 1. redo master is waiting for redo workers to finish the work left in the queues so that it (redo master) can proceed to replay the database wide blocking log records. Stack of db2redom(DB_NAME) : 0x09000000000EF1F8 thread_wait + 0x98 0x090000000A53A7A4 sqloWaitEDUWaitPost + 0x0 0x090000000858BF90 sqlprWaitDuringPRec__FP8sqeAgentP16SQLO_EDUWAITPOST + 0xD8 0x09000000098B52F4 sqlpPRecReadLog__FP8sqeAgentP8SQLP_ACBP9SQLP_DBCB + 0x1B54 ... Stack of 1 from redo workers pool db2redow(DB_NAME) stack: 0x090000000002D538 pread64 + 0x38 0x090000000A527350 sqloReadBlocks - 0x94 0x090000000A5274DC sqlbReadBlocks__FP16SqlbOpenFileInfoPvlUlUiPUlP12SQLB_GLOBALS + 0x28 0x090000000A52AD80 sqlbReadPage + 0x31C 0x090000000A5264FC .sqlbGetPageFromDisk__FP11SQLB_FIX_CBi_fdprpro_clone_135 + 0x330 ... or: 0x090000000002A358 pwrite64 + 0x38 0x090000000A53385C sqloseekwrite64 + 0xF0 0x090000000A5336C4 sqloWriteBlocks + 0x9C 0x090000000A532F68 sqlbWriteBlocks__FP16SqlbOpenFileInfoPvlUlUiPUlP12SQLB_GLOBALS + 0x38 0x0900000008D4EEAC @71@sqlbDMSWriteContainerData__FP20SQLB_DIRECT_WRITE_CBP13SQLB_M AP_INFOP16SqlbOpenFileInfoPcP12SQLB_GLOBALS + 0x154 ... Stack of rest of the redo workers in pool: 0x09000000000EF1F8 thread_wait + 0x98 0x090000000A53A7A4 sqloWaitEDUWaitPost + 0x0 0x09000000098B1658 sqlprFindQueue__FP9SQLP_DBCBUlT2PUl + 0x620 0x09000000098B0F88 sqlpPRecProcLog__FP8sqeAgentP8SQLP_ACBP9SQLP_DBCB + 0xC50 ... or: 0x0900000000BBF1F0 _p_nsleep + 0x10 0x090000000002B644 nsleep + 0xE4 0x0900000000144288 nanosleep + 0x188 0x09000000021DFDA0 ossSleep + 0x80 0x090000000A6D777C sqlorest + 0x40 0x09000000098B1510 sqlprFindQueue__FP9SQLP_DBCBUlT2PUl + 0x4D8 0x09000000098B0F88 sqlpPRecProcLog__FP8sqeAgentP8SQLP_ACBP9SQLP_DBCB + 0xC50 ... 2. redo master is processing a database wide blocking log record and so all redo workers queues are empty (hence no work for redo worker). Stack of db2redom(DB_NAME) : 0x090000000002D538 pread64 + 0x38 0x090000000A527350 sqloReadBlocks - 0x94 0x090000000A5274DC sqlbReadBlocks__FP16SqlbOpenFileInfoPvlUlUiPUlP12SQLB_GLOBALS + 0x28 0x090000000A52AD80 sqlbReadPage + 0x31C 0x090000000A5264FC .sqlbGetPageFromDisk__FP11SQLB_FIX_CBi_fdprpro_clone_135 + 0x330 Stack of db2redow(DB_NAME) EDUs: 0x09000000000EF1F8 thread_wait + 0x98 0x090000000A53A7A4 sqloWaitEDUWaitPost + 0x0 0x09000000098B1658 sqlprFindQueue__FP9SQLP_DBCBUlT2PUl + 0x620 0x09000000098B0F88 sqlpPRecProcLog__FP8sqeAgentP8SQLP_ACBP9SQLP_DBCB + 0xC50 ... or: 0x0900000000BBF1F0 _p_nsleep + 0x10 0x090000000002B644 nsleep + 0xE4 0x0900000000144288 nanosleep + 0x188 0x09000000021DFDA0 ossSleep + 0x80 0x090000000A6D777C sqlorest + 0x40 0x09000000098B1510 sqlprFindQueue__FP9SQLP_DBCBUlT2PUl + 0x4D8 0x09000000098B0F88 sqlpPRecProcLog__FP8sqeAgentP8SQLP_ACBP9SQLP_DBCB + 0xC50 ... | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 version 9.5 fixpack 10 * **************************************************************** | |
Local Fix: | |
No workaround available | |
Solution | |
Issue was first fixed in DB2 version 9.5 fixpack 10 | |
Workaround | |
not known / see Local fix | |
BUG-Tracking | |
forerunner : APAR is sysrouted TO one or more of the following: IC83953 follow-up : | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 25.04.2012 20.10.2012 09.12.2012 |
Problem solved at the following versions (IBM BugInfos) | |
9.5.FP10 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.5.0.10 |