home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC66646 Status: Closed

HADR PRIMARY REINTEGRATION WILL FAIL WITH PRIMARY/STANDBY MISMATCH
AFTER THE PAIR REACHES PEER STATE

product:
DB2 FOR LUW / DB2FORLUW / 970 - DB2
Problem description:
The problem can be seen after a takeover by force is issued and 
a) the old-primary is deactivated and brought up as a standby 
or 
 b) the old-primary is killed and is brought up as a primary 
first instead of as a standby (which will fail),then trying to 
reintegrate it as a standby 
 
will cause a Primary/Standby lsn mismatch. The reason is that 
when the old-primary is deactivated or the old-primary is first 
brought up as a primary (which will eventually fail due to 
timeout). The last/current log file will be truncated and the 
minbufflsn, lowtranlsn and remote catchup start lsn will be 
moved to the start of next file, The same log record that is 
truncated on the old-primary is NOT truncated on the new Primary 
and so is used for writing more log records and so is used for 
writing more log records. When the old-Primary is reintegrated 
as a standby 
and if no log writes are done on the new-primary until this 
point a Peer connection is established between the 
Primary/Standby. 
After the peer state is established, when the new primary writes 
some logs, sends them to standby then it will result in a 
Primary/standby LSN mismatch on the standby server which will 
bring down the standby server. The error mssage "SQL1768N unable 
to start HADR. Reason code='7' " will be given. 
 
You may see the following log entries in the db2diag.log file. 
 
2010-02-10-10.36.47.166177-360 E121063953A371     LEVEL: Event 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrSetHdrState, probe:10000 
CHANGE  : HADR state set to S-Peer (was S-NearlyPeer) 
 
2010-02-10-10.36.51.574186-360 I121079812A498     LEVEL: Error 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
MESSAGE : Primary/standby mismatch. RCUStartLSN 0000000224D4000C 
not on record 
          boundary. RCU first page bytecount 4080, firstindex 
16, pagelsn 
          0002230BCFFB. 
 
          2010-02-10-10.36.51.574321-360 I121080311A438 
LEVEL: Severe 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
RETCODE : ZRC=0x87800145=-2021654203=HDR_ZRC_VALIDATION_REJECT 
          "HADR shuts down due to validation rejection"
Problem Summary:
The problem can be seen after a takeover by force is issued and 
 
 a) the old-primary is deactivated and brought up as a standby 
or 
 b) the old-primary is killed and is brought up as a primary 
first instead of as a standby (which will fail),then trying to 
reintegrate it as a standby 
 
will cause a Primary/Standby lsn mismatch. The reason is that 
when the old-primary is deactivated or the old-primary is first 
brought up as a primary (which will eventually fail due to 
timeout). The last/current log file will be truncated and the 
minbufflsn, lowtranlsn and remote catchup start lsn will be 
moved to the start of next file, The same log record that is 
truncated on the old-primary is NOT truncated on the new Primary 
and so is used for writing more log records and so is used for 
writing more log records. When the old-Primary is reintegrated 
as a standby 
and if no log writes are done on the new-primary until this 
point a Peer connection is established between the 
Primary/Standby. 
After the peer state is established, when the new primary writes 
some logs, sends them to standby then it will result in a 
Primary/standby LSN mismatch on the standby server which will 
bring down the standby server. The error mssage "SQL1768N unable 
to start HADR. Reason code='7' " will be given. 
 
You may see the following log entries in the db2diag.log file. 
 
2010-02-10-10.36.47.166177-360 E121063953A371     LEVEL: Event 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrSetHdrState, probe:10000 
CHANGE  : HADR state set to S-Peer (was S-NearlyPeer) 
 
2010-02-10-10.36.51.574186-360 I121079812A498     LEVEL: Error 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
MESSAGE : Primary/standby mismatch. RCUStartLSN 0000000224D4000C 
not on record 
          boundary. RCU first page bytecount 4080, firstindex 
16, pagelsn 
          0002230BCFFB. 
 
          2010-02-10-10.36.51.574321-360 I121080311A438 
LEVEL: Severe 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
RETCODE : ZRC=0x87800145=-2021654203=HDR_ZRC_VALIDATION_REJECT 
          "HADR shuts down due to validation rejection"
Local Fix:
Backup the new primary database and restore it on the standby 
machine and enable HADR to bring it up as a standby. 
If the system is in HA (TSA) environment fixing the APAR IC65836 
maybe avoid hitting this APAR
available fix packs:
DB2 Version 9.7 Fix Pack 3 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 3a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Solution
This issue is first fixed on DB2 V9.7fp3
Workaround
Backup the new primary database and restore it on the standby 
 
machine and enable HADR to bring it up as a standby. 
If the system is in HA (TSA) environment fixing the APAR IC65836 
maybe avoid hitting this APAR
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
25.02.2010
23.09.2010
23.09.2010
Problem solved at the following versions (IBM BugInfos)
9.7.FP3
Problem solved according to the fixlist(s) of the following version(s)
9.7.0.3 FixList
9.7.0.3 FixList