Informix - Problem description
Problem IT04342 | Status: Closed |
THE EVIDENCE.SH CAN CAUSE LIMITED CONNECTIVITY OR A COMPLETE INSTANCE BLOCK WHEN AUDITING IS TURNED ON | |
product: | |
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10 | |
Problem description: | |
If you have the auditing turned on (any level) and your instance hits an assertion failure which triggers the SYSALARMPROGRAM ($INFORMIXDIR/etc/evidence.sh by default), the instance may become unresponsive to new connection requests - or even get completely stuck - for 6 or more minutes. When auditing is turned on, the onstat command sends it's command line arguments to the onmode_mon thread in the server to be written into the audit trail. If the assertion failure occurs in a thread running on cpuvp 1, that cpuvp gets blocked (as it waits for SYSALARMPROGRAM to finish) and cannot serve the onmode_mon thread (which is bound to it) hence the onmode_mon thread can't accept the command line arguments sent by the onstats called from SYSALARMPROGRAM. In such a situation the onstat waits till the onmode_mon thread becomes available. If it doesn't do so in 5 seconds, the onstat gives up and continues to print the requested outputs. As the default SYSALARMPROGRAM calls the onstat ~73x, the total time the script runs is at least 365 seconds. During this time all the threads bound to cpuvp 1 (onmode_mon, listeners and others) can't run. If you have only one cpuvp configured, the whole instance is blocked, which may have some adverse effects. For example, in a MACH11 cluster environment managed by a connection manager (CM), this may lead to a split-brain situation (two primaries in cluster) as the CM initiates a failover (because it can't reach the blocked old primary) and promotes some of the secondaries to a new primary without killing the old one. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Update to IDS-12.10.xC5 * **************************************************************** | |
Local Fix: | |
A partial workaround may be: - make sure you have at least 2 cpuvp's configured - if you are using the default SYSALARMPROGRAM, find the "DO_ONSTAT_A=off" line in it and change it to "DO_ONSTAT_A=on". This will reduce the number of onstat calls from 73 to 8, so the time needed to complete the script should go from 365 to ~40 seconds | |
Solution | |
Problem Fixed In IDS-12.10.xC5 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 11.09.2014 16.10.2015 16.10.2015 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) | |
12.10.xC5 | |
12.10.xC5.W1 |