DB2 - Problem description
Problem IC71861 | Status: Closed |
DB2 HADR PAIR CAN HANG WHILE PROCESSING AN INFORMATIONAL LOG RECORD ON STANDBY | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
A DB2 HADR pair can hang showing connect status "Congested" in the db2pd -hadr output: Database Partition 0 -- Database SAMPLE -- Active -- HADR Information: Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes) Primary Peer Nearsync 0 991669 ConnectStatus ConnectTime Timeout Congested Wed Sep 8 20:31:26 2010 (1283970686) 120 The ouput on standby will show that buffer is 100% full. The problem is caused while processing an informational log record on the STANDBY system. Note: The 'Congested' state is just an external symptom. A 'Congested' state will not always indicate a hang issue. A typical stack of db2redom in this situation will be: Thread 51 (Thread 0x2aaac17fe940 (LWP 12900)): #0 0x000000333a4d517a in semtimedop () from /lib64/libc.so.6 #1 0x00002aaaabca8d8b in sqloWaitEDUWaitPost () from /home/inst01/sqllib/lib64/libdb2e.so.1 #2 0x00002aaaad25ed66 in sqlprWaitDuringPRec(sqeAgent*, SQLO_EDUWAITPOST*) () from /home/inst01/sqllib/lib64/libdb2e.so.1 #3 0x00002aaaad25c6c6 in sqlpPRecReadLog(sqeAgent*, SQLP_ACB*, SQLP_DBCB*) () from /home/inst01/sqllib/lib64/libdb2e.so.1 #4 0x00002aaaad24e388 in sqlpParallelRecovery(sqeAgent*, sqlca*) () from /home/inst01/sqllib/lib64/libdb2e.so.1 #5 0x00002aaaac5ec2b4 in sqleSubCoordProcessRequest(sqeAgent*) () from /home/inst01/sqllib/lib64/libdb2e.so.1 #6 0x00002aaaab8d3d8e in sqeAgent::RunEDU() () from /home/inst01/sqllib/lib64/libdb2e.so.1 #7 0x00002aaaabf7af94 in sqzEDUObj::EDUDriver() () from /home/inst01/sqllib/lib64/libdb2e.so.1 #8 0x00002aaaabf7aeeb in sqlzRunEDU(char*, unsigned int) () from /home/inst01/sqllib/lib64/libdb2e.so.1 #9 0x00002aaaabcf6d62 in sqloEDUEntry () from /home/inst01/sqllib/lib64/libdb2e.so.1 #10 0x000000333b00673d in start_thread () from /lib64/libpthread.so.0 #11 0x000000333a4d3d1d in clone () from /lib64/libc.so.6 Normal idle would look like: sqlpPRecReadLog -> sqlpshrScanNext -> sqlorest (etc.) Where the hang shows: sqlpPRecReadLog -> sqlprWaitDuringPRec -> sqloWaitEDUWaitPost | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Problem Description above. * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 Fix Pack 4. * **************************************************************** | |
Local Fix: | |
The fewer redo workers you have, the more likely this is to be hit. You can use DB2BPVARS to configure the number of redo workers like described below. Step 1: set DB2BPVARS to point to the file that contains the new value: db2set DB2BPVARS=/home/userid/bpvars.txt (you can use whatever filename they want) Step 2: Add 1 line to this file: NOTE: the value '5' includes 4 workers and a master. If you want to try 6 (or 8) workers, they need to set this value to 7 (or 9). PREC_NUM_AGENTS=5 so the file looks like this: $cat /home/userid/bpvars.txt PREC_NUM_AGENTS=5 NOTE: the database needs to be re-cycled for this value to be picked up. | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
Solution | |
First fixed in DB2 Version 9.7 Fix Pack 4. | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 13.10.2010 03.05.2011 03.05.2011 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FP4 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.4 |