Informix - Problem description
Problem IT30876 | Status: Closed |
DEADLOCK BETWEEN RSS_SEND AND DR_PRSEND THREADS USING DRCB_NODE_COUNT_LOCK AND RELIABLECV_T CONDITION | |
product: | |
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10 | |
Problem description: | |
It may happen on an Instance in an HA environment that the server face in a complete freeze because of a deadlock wait situation between an RSS_send and an dr_prsend thread. The wait can be identified by the print of the locked mutexes Locked mutexes: mid addr name holder lkcnt waiter waittime 21294 cc52ce68 drcb_lock 25677 0 21295 cc52cf10 drcb_node_count_lo 25677 0 69992 24827 21307 cc534080 SynchSWMR_t::0xcc5 69992 0 Owner of the drcb_node_count_lock is the thread 25677 which is dr_prsend. The wait for this mutex is thread 69992 which is RSS_send. From the onstat -g ath we can see the following status for the threads 25677 dr_prsend 1cpu 11/01 11:40:40 2.5455 46375 cond wait ReliableCV 69992 RSS_Send_ie1_ix 8cpu 11/01 11:40:40 0.0029 7 mutex wait drcb_node_ the owner of the mutex, thread 25677 is waiting for an condition which is tied to the mutext the RSS_send is owning. The stacks for the threads are Stack for thread: 69992 RSS_Send_ie1_ixdpp01a_qa base: 0x00000000d8126000 len: 69632 pc: 0x000000000143ead7 tos: 0x00000000d8136a10 state: mutex wait vp: 8 0x000000000143ead7 (oninit) yield_processor_mvp 0x000000000144afae (oninit) mt_lock_wait 0x0000000001451072 (oninit) mt_lock_helper 0x0000000001202138 (oninit) cloneAttachCB 0x0000000001206bf2 (oninit) cloneSend_Int 0x00000000011f0b82 (oninit) cloneStdSend 0x0000000001419870 (oninit) th_init_initgls 0x000000000145f2b7 (oninit) startup Stack for thread: 25677 dr_prsend base: 0x00000000dc972000 len: 69632 pc: 0x000000000143ead7 tos: 0x00000000dc982c40 state: cond wait vp: 1 0x000000000143ead7 (oninit) yield_processor_mvp 0x0000000001453441 (oninit) mt_wait 0x000000000107db04 (oninit) reliablecv_wait 0x000000000107ed7b (oninit) synchswmr_reader_enter 0x000000000128e418 (oninit) SendGlobalVersionInfo 0x00000000011d3822 (oninit) dr_state_change 0x00000000011dbf46 (oninit) dr_session_thread 0x000000000145f2b7 (oninit) startup Additional in the customer environment where the problem was diagnosed, there were a lot of waiters for the condition ReliableCV, since there were reads on the tables syscluster and sysha_nodes. These are victims not the rootcause. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Update to Informix Server 12.10.xC14 or 14.10.xC4. * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Update to Informix Server 12.10.xC14 or 14.10.xC4. * **************************************************************** | |
Comment | |
Fixed in Informix Server 12.10.xC14 and 14.10.xC4. | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 07.11.2019 27.02.2020 27.02.2020 |
Problem solved at the following versions (IBM BugInfos) | |
12.10.xC14, 14.10.xC4 | |
Problem solved according to the fixlist(s) of the following version(s) |