Home

Latest versions	fixlist
11.1.0.7
10.5.0.9
10.1.0.6
9.8.0.5
9.7.0.11
9.5.0.10
9.1.0.12

Have problems? - contact us.
Register for free
Contact form

DB2 - Problem description

Problem IC82472	Status: Closed
WHEN DB MEMBER FAILURE OCCURS, READ ONLY TRANSACTIONS THROUGHPUT MAY DEGRADE FOR SEVERAL SECONDS.
product:
DB2 FOR LUW / DB2FORLUW / 980 - DB2
Problem description:
When a DB member failure occurs, one of the surviving DB members may not process transactions for several seconds. It may occur when there is a large number of transactions. A few seconds after the DB member failure, the following messages appear in cfdiag.128.log. 2012-03-13-19.14.04.0292198000+540 E123456789A310 LEVEL : Error PID : 9633960 TID : 3599 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_ignore_msg() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e 2012-03-13-19.14.05.0159808000+540 E123456789A318 LEVEL : Error PID : 9633960 TID : 3599 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_reject_msg_phase_1() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e 2012-03-13-19.39.59.0468850000+540 E123456789A310 LEVEL : Error PID : 9633960 TID : 1800 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_ignore_msg() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e 2012-03-13-19.40.00.0398972000+540 E123456789A318 LEVEL : Error PID : 9633960 TID : 1800 HOSTNAME : host22 FUNCTION : CA trace, log_error MESSAGE : CA server has encountered an error. DATA #1 : process_reject_msg_phase_1() Multi-HCA invocation failed on HCA ID: 0 of 2 HCAs. status: 0x8006002e
Problem Summary:
**************************************************************** * USERS AFFECTED: * * Pure Scale * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to v9.8.5. * ****************************************************************
Local Fix:

Solution
After the node failure, the EDU continues to wait for the reception of the IGNORE notification for the failed member. Which only happens around 8 seconds later, blocking all transactions, that caused throughput degradation.
Workaround
not known / see Local fix
Timestamps
Date - problem reported : Date - problem closed : Date - last modified :	03.04.2012 13.06.2012 13.06.2012
Problem solved at the following versions (IBM BugInfos)
9.8.5
Problem solved according to the fixlist(s) of the following version(s)
9.8.0.5