Informix - Problem description
Problem IT31694 | Status: Closed |
ON WINDOWS, LOSING TRACK OF A CPU VP'S NUM_READY_THREADS CAN BURN 100% CPU CYCLES ON OTHERWISE IDLE SYSTEM | |
product: | |
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10 | |
Problem description: | |
On a seemingly idle Windows IDS server, it's possible to have a cpu vp using 100% cpu. For instance, on a 12 cpu Windows IDS 12.10.TC11 server, we were able to get stacks for the cpu vps from a memory dump. The stacks for 1cpu, 8cpu, 9cpu, 10cpu, 11cpu, 12cpu, 13cpu, 14cpu, 16cpu, 17cpu, 18cpu: oninit.exe!net_aio_poll(void *hPort, int timeout) Line 173 oninit.exe!NT_P(_VP *v) Line 1640 oninit.exe!NT_idle_loop(_VP*i_vp, unsigned int bz, int wakeup) Line 5210 oninit.exe!NT_idle_processor() Line 5124 oninit.exe!startup() Line 177 The stack for 15cpu is slightly different: oninit.exe!net_aio_poll(void *hPort, int timeout) Line 173 oninit.exe!NT_yield_processor_mvp() Line 18070 oninit.exe!NT_idle_processor() Line 5107 oninit.exe!startup() Line 177 Looking at process explorer, we could see that the oninit.exe thread for 15cpu was running at 100%. The underlying issue here is that the vp struct associated with that 15cpu has a positive num_ready_threads value but there are no threads in its ready queue(s). This keeps the idle vp from every sleeping as it constantly thinks there is a thread ready to run when there isn't. To identify this on an idle system, you can first observe the 100% cpu usage, but you can also look at "onstat -g sch" output. The cpu vp that is using up the cpu cycles will have a positive number in the Q-ln column with nothing in the ready queue "onstat -g rea". For instance, from "onstat -g sch" you can see the value 1 in the Q-ln column for 15cpu below: Thread Migration Statistics: vp pid class steal-at steal-sc idlvp-at idlvp-sc inl-polls Q-ln 1 9568 cpu 0 0 0 0 0 0 2 8184 adm 0 0 0 0 0 0 3 8156 lio 0 0 0 0 0 0 4 7212 pio 0 0 0 0 0 0 5 7156 aio 0 0 0 0 0 0 6 11088 msc 0 0 0 0 0 0 7 816 fifo 0 0 0 0 0 0 8 7476 cpu 0 0 0 0 0 0 9 10904 cpu 0 0 0 0 0 0 10 10940 cpu 0 0 0 0 0 0 11 10936 cpu 0 0 0 0 0 0 12 11096 cpu 0 0 0 0 0 0 13 8064 cpu 0 0 0 0 0 0 14 6256 cpu 0 0 0 0 0 0 15 5996 cpu 0 0 0 0 0 1 16 5984 cpu 0 0 0 0 0 0 17 6928 cpu 0 0 0 0 0 0 18 8056 cpu 0 0 0 0 0 0 19 924 soc 0 0 0 0 0 0 20 920 soc 0 0 0 0 0 0 21 10960 soc 0 0 0 0 0 0 22 10932 soc 0 0 0 0 0 0 This defect is being entered for defensive purposes. We should be able to identify this case and address it returning the idle cpu vp to normal behavior. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * **************************************************************** | |
Comment | |
Problem fixed in Informix Server versions 12.10.xC14 and 14.10.xC4. | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 29.01.2020 24.02.2020 24.02.2020 |
Problem solved at the following versions (IBM BugInfos) | |
12.10.xC14, 14.10.xC4 | |
Problem solved according to the fixlist(s) of the following version(s) |