suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT35712 Status: Closed

PURESCALE MAY HANG WHEN THERE ARE 2 MORE CONCURRENT NODE FAILURES AND ONE
OF THE NODE FAILURES CAUSES A DATABASE DEACTIVATION.

product:
DB2 FOR LUW / DB2FORLUW / B10 - DB2
Problem description:
pureScale may hang when there are 2 or more concurrent node
failures and a db deactivation is driven by one of the node
failures. Under certain timing conditions the database
deactivation may block indefinitely waiting for the termination
of a system application such as the db2periodic daemon.
This will in turn block incoming connections to the member which
are waiting for the database deactivation to complete.

The diag log will have messages indicating that node recovery
was completed for 2 or more members close to the same time.

For example:

2020-12-02-14.06.32.476438+480 E210897765E384        LEVEL: Info
PID     : 36218                TID : 46913088907008  PROC :
db2sysc 1
...
EDUID   : 22                   EDUNAME: db2pdbc 1
FUNCTION: DB2 UDB, base sys utilities, sqleExecuteNodeRecovery,
probe:200
DATA #1 : String, 34 bytes
Node recovery completed for node 0

and

2020-12-02-14.06.32.476438+480 E210897765E384        LEVEL: Info
PID     : 36218                TID : 46913088907008  PROC :
db2sysc 1
...
EDUID   : 22                   EDUNAME: db2pdbc 1
FUNCTION: DB2 UDB, base sys utilities, sqleExecuteNodeRecovery,
probe:200
DATA #1 : String, 34 bytes
Node recovery completed for node 2

db2pd -agents shows only system applications (for example
db2periodic) and one other agent which is driving a database
deactivation. For example, this output shows only the
db2periodic daemon and one other agent:

0x00002AC0E64F7680 78951    [001-13415] 52450      0
Coord    Inst-Active 0                   db2perio 0          0
NotSet SAMPLE*N1.DB2.200708193047
Thu Jul  9 03:30:45
0x00002AB6860BAF00 64778    [000-64778] 34251      0
SubAgent Inst-Active 0                   db2jcc_a 0          0
NotSet SAMPLE 10.134.83.81.64901.201111085015
n/a

The call stack of the system agent(s) will be blocked waiting to
receive an RPC reply. For example, the db2periodic daemon may be
blocked in a call stack that looks like this:

0x00002AAAAE42C97D _ZN11sqkfChannel13WaitRecvReadyEii + 0x02fd
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE429C28
_ZN11sqkfChannel13ReceiveBufferEPP10sqkfBufferi + 0x0678
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE404897
_ZN18sqkdBdsBufferTable12getNextReplyEP8SQLKD_CB + 0x0077
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE404420
_ZN18sqkdBdsBufferTable13getNextBufferEPP10sqkfBufferP8SQLKD_CB
+ 0x0a00
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE3F8671 address: 0x00002AAAAE3F8671 ; dladdress:
0x00002AAAAAEEA000 ; offset in lib: 0x000000000350E671 ;
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE3F82E1 address: 0x00002AAAAE3F82E1 ; dladdress:
0x00002AAAAAEEA000 ; offset in lib: 0x000000000350E2E1 ;
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE3F3AC0 address: 0x00002AAAAE3F3AC0 ; dladdress:
0x00002AAAAAEEA000 ; offset in lib: 0x0000000003509AC0 ;
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE3F4BDF
_Z17sqlkdReceiveReplyP23SQLKD_RQST_REPLY_FORMAT + 0x04cf
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAF27C907
_Z11sqlrkrpc_nlP8sqlrr_cbiiiPKsP15SQLR_RPCMESSAGEP13SQLO_MEM_POO
LP18SQLR_RPC_REPLY_HDRPbPlmP17SQLR_WLM_BDSREPLY + 0x1827
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAF27A7E2
_Z12sqlrkrpc_allP8sqlrr_cbiP15SQLR_RPCMESSAGEP13SQLO_MEM_POOLPP1
8SQLR_RPC_REPLY_HDRimP17SQLR_WLM_BDSREPLY + 0x1262
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE1382EA sqleRPCSync + 0x039a
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE1312D1
_Z16sqlePeriodicMainP16sqeLocalDatabaseP8sqeAgent + 0x10e1
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE07741F _Z26sqleIndCoordProcessRequestP8sqeAgent +
0x180f

The callstack of the other agent shows that is is performing a
database deactivation and blocked waiting on the completion of
system applications:

0x00002AAAAEFD9435 sqloWaitEDUWaitPost + 0x02a5
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE11DAB8
_ZN16sqeLocalDatabase13TermDbConnectEP8sqeAgentP5sqlcai + 0x2388
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE0ADF1B
_ZN14sqeApplication12AppStopUsingEP8sqeAgenthP5sqlca + 0x0c3b
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAB04AF0DC _Z24sqleSubAgentNodeRecoveryP8sqeAgent +
0x00bc
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE06C36E address: 0x00002AAAAE06C36E ; dladdress:
0x00002AAAAAEEA000 ; offset in lib: 0x000000000318236E ;
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE06A4BC _Z21sqleProcessSubRequestP8sqeAgent + 0x02ec
                (/home/db2sdin1/sqllib/lib64/libdb2e.so.1)
0x00002AAAAE086162 _ZN8sqeAgent6RunEDUEv + 0x04c2

Other agents attempting to connect to the database will be
blocking in StartUsingLocalDatabase, looping and waiting for
database deactivation to complete. For example:

0x00002AAAAE0FCDAB
_ZN8sqeDBMgr23StartUsingLocalDatabaseEP8SQLE_BWAP8sqeAgentRccP8s
qlo_gmtPb + 0x0e7b
0x00002AAAAE0A1F2F
_ZN14sqeApplication13AppStartUsingEP8SQLE_BWAP8sqeAgentccP5sqlca
Pc + 0x043f
0x00002AAAAE0A123A
_Z22sqleSubAgentStartUsingP8sqeAgentP16SQLE_CLIENT_INFO + 0x038a

0x00002AAAAE0B2353
_ZN14sqeApplication22AppSecondaryStartUsingEP8sqeAgentP16SQLE_CL
IENT_INFOP5sqlca + 0x0923
0x00002AAAAE08CFC7 _ZN8sqeAgent12initSubAgentEPi + 0x1f57
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* purescale user                                               *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to Db2 v11.1 Mod 4 FIXPACK 7.                        *
****************************************************************
Local Fix:
Solution
Workaround
****************************************************************
* USERS AFFECTED:                                              *
* purescale user                                               *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to Db2 v11.1 Mod 4 FIXPACK 7.                        *
****************************************************************
Comment
The problem is firstly fixed on Db2 v11.1 Mod 4 FIXPACK 7.
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
27.01.2021
27.10.2021
27.10.2021
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)