suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT37638 Status: Closed

HADR TAKEOVER FAILED WITH SQL1770 RC 7 DUE TO EVENT MONITOR BLOCKS FORCING
OFF ONLINE REORG ON PRIMARY

product:
DB2 FOR LUW / DB2FORLUW / B10 - DB2
Problem description:
On HADR standby database, a graceful TAKEOVER command might fail
with SQL1770N reason code 7.

$ db2 takeover hadr on db hadrdb
SQL1770N  Takeover HADR cannot complete. Reason code = "7".

This error is returned after the TAKEOVER command has been
issued for significant time, typically 10 minutes.  The
following message can be found in db2diag.log on standby.

2020-07-17-01.59.58.216931-240 I466139E592           LEVEL:
Error
PID     : 18218                TID : 140069126006528 PROC :
db2sysc
INSTANCE: db2inst1              NODE : 000            DB   :
HADRDB
HOSTNAME: host1
EDUID   : 393                  EDUNAME: db2hadrs.0.0 (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrEduAcceptEvent, probe:20240
DATA #1 : 
Standby has not received data from primary for 601 seconds.
Check the status of the primary. Aborting TAKEOVER.
hdrCurrentTime 1594965598 hdrLastLogRecvTime 1594964997
hdrGracefulTkTimeout 600


This failure is due to the primary database not able to complete
the takeover operation.  There are be different root causes.
One particular cause has been identified and addressed.  There
is an online reorg operation on the primary database that has
not finished.  This blocks the primary from completing the
takeover operation.   If stacks were collected on the primary
database while the TAKEOVER command was blocked, the following
stack from the db2reorg thread can been:

0x00007F62DBFE68FF sqloWaitEDUWaitPost + 0x03bf
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62DC616FE5
_Z21sqlplWaitForLockGrantP9sqeBsuEduP8SQLP_AWBPjl + 0x02e5
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62DC605CCE
_Z13sqlplWaitOnWPP9sqeBsuEduP14SQLP_LOCK_INFOP8SQLP_LRBP15SQLP_L
TRN_CHAINbbb + 0x147e
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62DC5FD56D
_Z24sqlplMakeNewRequestNonSDP9sqeBsuEduP14SQLP_LOCK_INFOP11SQLP_
TENTRYP8SQLP_LRBS6_P15SQLP_LTRN_CHAINbbb + 0x070d
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62DC4314C5 _Z7sqlplrqP9sqeBsuEduP14SQLP_LOCK_INFO +
0x0ee5
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62DC436DA0
_Z19sqlplDrainOldAccessP8sqeAgentP13SQLP_LOCKNAMEmbb + 0x0990
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62D4DB50C0
_Z20sqldOnlineTableReorgP8sqeAgenttthmittPciS1_iP9SQLP_LSN8S3_si
+ 0x3560
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62D4DB1A8E _Z13sqldOLRInvokeP8sqeAgentPc + 0x00be
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007F62DA0268EA _Z26sqleIndCoordProcessRequestP8sqeAgent +
0x15aa
        (/home/db2inst1/sqllib/lib64/libdb2e.so.1)


The above stack identifies the reorg is the application that is
blocking the completion of the TAKEOVER.  The stack file also
shows the lock that the reorg is waiting on:


Waiting on lock name: 0049000F000000000000000054 SQLP_TABLE
(obj={73;15})


If lock information is also collected (eg. db2pd -lock) during
the time TAKEOVER is hanging, the lock holder can be identified.
Even without this information, the table being reorganized is
shown with id (73;15).  This information can be used to confirm
that the table being reorganized is the target table of some
event monitor, and the lock is held by an active event monitor
fast writer thread.

In fact, such reorg operation would have been blocked by the
active event monitor, and will never be able to complete until
the event monitor is deactivated.  This occurs without the
TAKEOVER command.   Therefore, it is recommended that user
should first deactivate the event monitor before initiating the
reorg operation.

It is still undesirable for such condition to fail the TAKEOVER.
The TAKEOVER should detect and deactivate the event monitor and
force off the reorg operation to ensure successful completion of
the HADR role switch.
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* all                                                          *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to 11.5.6                                            *
****************************************************************
Local Fix:
Deactivate the event monitor for the table being reorganized on
primary before running takeover on standby.
Solution
Workaround
****************************************************************
* USERS AFFECTED:                                              *
* all                                                          *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to 11.5.6                                            *
****************************************************************
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
15.07.2021
15.07.2021
15.07.2021
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)