Bugs - research

CURSOR Admin-Scout

get the ultimate tool for Informix

Latest versions	fixlist
14.10.xC10
12.10.xC16.X5
11.70.xC9.XB
11.50.xC9.X2
11.10.xC3.W5

Have problems? - contact us.
Register for free
Contact form

Informix - Problem description

Problem IT27709	Status: Closed
PRIMARY AND SECONDARY UNABLE TO RECONNECT AFTER NETWORK FAILURE
product:
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10
Problem description:
In some cases it might be possible that a network interruption could cause the primary and hdr secondary to not reconnect without bouncing the hdr secondary. It is possible that this would only be encountered on HDR pairs where the secondary is an UPDATABLE secondary, or if SMX_PING_INTERVAL/SMX_PING_RETRY were configured differently on the primary and secondary servers. In this specific case, it appears that the issue is that HDR is not able to properly shut itself down after detecting the network problems. If it can't shutdown properly, then it consequently can't get to the code to attempt to reconnect. The symptoms of this problems can be identified by checking the state and stack of both the dr_prsend thread and the dr_prping thread. At the point where the tear down appears to be stuck onstat -g ath would show the 2 threads in the following states: Threads: tid tcb rstcb prty status vp-class name 159 112258d48 10feee060 3 join wait 32846355 14cpu dr_prsend ... 32846355 1d22fdc58 2c9555520 3 yield time 1cpu dr_prping The stacks would look like this: Stack for thread: 159 dr_prsend ... 0x000000001118a62c (oninit)mt_join 0x0000000010ea5030 (oninit)dr_session_thread 0x00000000111ca69c (oninit)startup Stack for thread: 32846355 dr_prping ... 0x00000000111831a0 (oninit)mt_yield 0x00000000112ed520 (oninit)smx_recv 0x0000000010e9b7ec (oninit)dr_isSecondaryInCheckpoint 0x0000000010e86e90 (oninit)dr_primary_ping 0x00000000111ca69c (oninit)startup Another key element would be the following sequence of events based on errors in the MSGPATH file. What would be seen is that on the PRIMARY server, you would see smx messages about connections being closed because other server was unresponsive. Then it would report that smx had created a new transport to the hdr secondary. Then on the hdr secondary, it would then report that it had smx connections closed because the other server was unresponse. It's important that this message occur at some point in time after the primary had it's smx connections report being closed and it creating the new transport. So here is sample error sequences: PRIMARY MSGPATH file: 23:40:37 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (120 seconds times the number of retries). 23:40:46 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (120 seconds times the number of retries). 23:40:56 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (120 seconds times the number of retries). 23:41:00 smx creates 1 transports to server allende3 23:42:55 WARNING: Detected slow or failing DNS service response 101 time(s). 23:54:30 DR: Receive error 23:54:30 dr_prsend thread : asfcode = -25582: oserr = 0: errstr = : Network connection is broken. 23:54:30 DR_ERR set to -1 SECONDARY MSGPATH file: 23:43:22 DR: ping timeout 23:43:22 DR: Receive error 23:43:22 dr_secrcv thread : asfcode = -25582: oserr = 0: errstr = : Network connection is broken. 23:43:22 DR_ERR set to -1 23:43:23 DR: Terminating redirected write subsystem due to server disconnect. All open redirected transactions will be rolled back. 23:43:24 Updates from secondary currently not allowed 23:43:24 ERROR: Mach11 proxyWritePostPBlobCmdSync failed 23:43:24 DR: Turned off on secondary server 23:45:16 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (360 seconds times the number of retries). 23:45:18 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (360 seconds times the number of retries). 23:45:25 The SMX connection between high availability servers was closed because the peer server was unresponsive for the timeout period (360 seconds times the number of retries). So the reported timings are important.
Problem Summary:
**************************************************************** * USERS AFFECTED: * * Users of IDS prior to 12.10.xC13. * **************************************************************** * PROBLEM DESCRIPTION: * * Primary and Secondary unable to reconnect after network * * failure. * **************************************************************** * RECOMMENDATION: * ****************************************************************
Local Fix:

Solution

Workaround
not known / see Local fix
Timestamps
Date - problem reported : Date - problem closed : Date - last modified :	09.01.2019 24.09.2019 24.09.2019
Problem solved at the following versions (IBM BugInfos)
12.10.xC13
Problem solved according to the fixlist(s) of the following version(s)

Informix Editions

Informix Editions

Informix Editions

Documentation

Documentation

Documentation

IBM Newsletter

IBM Newsletter

IBM Newsletter

Current Bugs

Current Bugs

Current Bugs

Bug Research

Bug Research

Bug Research

Bug Fixlists

Bug Fixlists

Bug Fixlists

Release Notes

Release Notes

Release Notes

Machine Notes

Machine Notes

Machine Notes

Release News

Release News

Release News

Product Lifecycle

Product Lifecycle

Lifecycle

Media Download

Media Download

Media Download

Service for Admins

Service for Admins

Service for Admins

Informix Monitoring

Informix Monitoring

Informix Monitoring

Support & Service

Support & Service

Support & Service

Admin-Scout for Informix

Admin-Scout for Informix

Admin-Scout for Informix

Respect for privacy - we respect your personal privacy

Our website only uses technically necessary cookies:
one session cookie, two language selection cookies and one login cookie if you log in to our site.

We do not track your activities or collect any personal data unless you register or submit a request via a form.
You can find detailed information here: Privacy policy.

Why do we do this? Your trust in us is our most valuable asset!