suche 36x36
  • Admin-Scout-small-Banner
           

    CURSOR Admin-Scout

    get the ultimate tool for Informix

    pfeil  
Latest versionsfixlist
14.10.xC10 FixList
12.10.xC16.X5 FixList
11.70.xC9.XB FixList
11.50.xC9.X2 FixList
11.10.xC3.W5 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

Informix - Problem description

Problem IT37230 Status: Closed

RSS NODE BLOCKED:LOG_DROP POSSIBLE WHEN DROPPING LOGICAL LOGS ONPRIMARY
WITH CONCURRENT INDEX BUILD ON RSS NODE

product:
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10
Problem description:
There appears to be a timing issue where it's possible the RSS
node gets in this blocked state.  It will be unable to advance
its log position, so will start lagging behind the primary.
Additionally, the situation is a deadlock where the recovery
thread needs the index build to proceed, however, some of the
parallel index build threads need the recovery thread to proceed
so they can unblock to continue.

The index build session needs to be interrupted/killed with
onmode -z to get the server out of this condition.

So to identify the issue, first is the main onstat banner would
show the following LOG_DROP blocked condition:

IBM Informix Dynamic Server Version 12.10.FC9W1X4 -- Read-Only
(RSS) -- Up 18 days 05:10:21 -- 223444992 Kbytes
Blocked:LOG_DROP

Next, the xchg_2.0 recovery thread (note some of the parallel
index build threads can have the same name) stack would look
like the following:

yield_processor_mvp
wait4critex
isenter_critblock
rslogdrop
plogredo
rlogm_redo
scan_logredo
next_lscan
prod_loop1
producer_thread
startup

So it's waiting for all other threads to get out of critical
sections (users with X flag in onstat -u).  So there would be at
least 1 other thread in a critical section.

So the 1st user (from onstat -u output below) is the xchg_2.0
recovery thread.  The other user/thread that the server is
deadlocked with is the 2nd on, in this case from session
2327271.
173a98ce8        --B-XRD 178      informix -        0
0    0     16178    3106443
19c4f5c98        ----XR- 2327271  user1   -        0
0    0     0        0

From onstat -g ses  you should see the sqlexec
thread has spawned multiple other threads (used to perform the
fast/parallel index build):


session                                      #RSAM    total
used       dynamic
id       user     tty      pid      hostname threads  memory
memory     explain
2327271  user1   -        14376    host1 10       2306048
2198656    off

tid      name     rstcb            flags    curstk   status
3708024  sqlexec  1a95f1358        Y--P-R-  21455    cond wait
opened_up -
3708268  xchg_1.0 19c4f5c98        ----XR-  5903     running-
3708269  xchg_1.1 184e7c5d8        Y----R-  4255     cond wait
sortproc:0-
3708270  xchg_1.2 1b34db5f8        Y----R-  6639     cond wait
block     -
3708273  psortpro 184e724f8        Y----R-  3663     cond wait
backend:0 -
3708274  psortpro 19c4d7108        Y----R-  3663     cond wait
backend:0 -
3708277  psortpro 184e55438        Y----R-  5071     cond wait
block     -
3708278  psortpro 184e5b688        Y----R-  3663     cond wait
backend:1 -
3708279  psortpro 19c4dd358        Y----R-  3663     cond wait
backend:1 -
3708280  psortpro 184e4d718        Y----R-  3663     cond wait
backend:1 -

The SQL the session is running could be some query (if the
optimizer plan generated caused the server to create a temp
table and then an index build on that temp table, or it could
just be a manual index build on a temp table on the RSS node).

In this particular case, the SQL was a query where the optimizer
had required the building of a temp table and then building of
an index on the temp table, as can be seen by the stack of the
above sqlexec thread.  Main point here is that the sqlexec
thread has gone into code doing an index build for which ever
reason:

yield_processor_mvp
mt_wait
mt_wait_sem
open_xchg
open_btmrg
execute
fastidxbld
rsaddindex
fmamaddindex
fmaddindex
tmptab_create_index
filltemp
scan_open
join_open
join_open
join_open
join_open
merge_open
merge_open
sort_open
filltemp
scan_open
materialize_viewtmp
prepselect
open_cursor
sql_open
sq_open
sqmain

The xchg_1.0 thread (or 1 of the parallel index build xcgh
threads will be constantly trying to run or be running, while 1
or more of them would be in the block state.  The running 1 if
you can get a good stack should show it in next_sort.
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* Users of Informix Server prior to 12.10.xC15 and 14.10.xC7.  *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to Informix Server 12.10.xC15 or 14.10.xC7 (when     *
* available).                                                  *
****************************************************************
Local Fix:
Solution
Workaround
****************************************************************
* USERS AFFECTED:                                              *
* Users of Informix Server prior to 12.10.xC15 and 14.10.xC7.  *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to Informix Server 12.10.xC15 or 14.10.xC7 (when     *
* available).                                                  *
****************************************************************
Comment
Fixed in Informix Server 12.10.xC15 and 14.10.xC7.
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
11.06.2021
26.08.2021
26.08.2021
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)
Informix EditionsInformix Editions
Informix Editions
DocumentationDocumentation
Documentation
IBM NewsletterIBM Newsletter
IBM Newsletter
Current BugsCurrent Bugs
Current Bugs
Bug ResearchBug Research
Bug Research
Bug FixlistsBug Fixlists
Bug Fixlists
Release NotesRelease Notes
Release Notes
Machine NotesMachine Notes
Machine Notes
Release NewsRelease News
Release News
Product LifecycleProduct Lifecycle
Lifecycle
Media DownloadMedia Download
Media Download