DB2 - Problem description
Problem IC82472 | Status: Closed |
WHEN DB MEMBER FAILURE OCCURS, READ ONLY TRANSACTIONS THROUGHPUT MAY DEGRADE FOR SEVERAL SECONDS. | |
product: | |
DB2 FOR LUW / DB2FORLUW / 980 - DB2 | |
Problem description: | |
When a DB member failure occurs, one of the surviving DB members may not process transactions for several seconds. It may occur when there is a large number of transactions. A few seconds after the DB member failure, the following messages appear in cfdiag.128.log. 2012-03-13-19.14.04.0292198000+540 E123456789A310 LEVEL : Error PID : 9633960 TID : 3599 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_ignore_msg() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e 2012-03-13-19.14.05.0159808000+540 E123456789A318 LEVEL : Error PID : 9633960 TID : 3599 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_reject_msg_phase_1() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e 2012-03-13-19.39.59.0468850000+540 E123456789A310 LEVEL : Error PID : 9633960 TID : 1800 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_ignore_msg() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e 2012-03-13-19.40.00.0398972000+540 E123456789A318 LEVEL : Error PID : 9633960 TID : 1800 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_reject_msg_phase_1() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Pure Scale * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to v9.8.5. * **************************************************************** | |
Local Fix: | |
Solution | |
After the node failure, the EDU continues to wait for the reception of the IGNORE notification for the failed member. Which only happens around 8 seconds later, blocking all transactions, that caused throughput degradation. | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 03.04.2012 13.06.2012 13.06.2012 |
Problem solved at the following versions (IBM BugInfos) | |
9.8.5 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.8.0.5 |