home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IC81467 Status: Closed

WITH FILE SYSTEM CACHING ENABLED, SYSTEM OUTAGE DURING LOAD PROCESSING
MIGHT RESULT IN CORRUPTION

product:
DB2 FOR LUW / DB2FORLUW / 980 - DB2
Problem description:
(1) With file system caching enabled, IBM DB2 for Linux, UNIX, 
and Windows uses buffered disk writes for index rebuilds during 
LOAD operations. Buffered disk writes first go to the file 
system cache and after that when the buffered data needs to be 
physically written to disks, which is typically during the 
commit time, a sync operation must be called. 
 
As a result of an issue in tracking which files needs to be 
synchronized, DB2 mistakenly skips synchronizing some or all of 
the required files.  If a machine or file system outage occurs, 
the writes or data that are currently residing in the disk 
buffer and have not yet been written to the disk are lost.  The 
time period for which these writes and data are vulnerable is 
dependent on how aggressively the operating system and hardware 
flush file system cache.  Under normal conditions, all writes 
will be sent to disks eventually.  If an outage happens after 
the writes have been flushed from file system cache to disk, 
there will be no problems. For LOAD operations where the index 
creation phase of load is done in 'REBUILD' mode, an outage 
happening after the commit time (marking the LOAD as successful) 
and before the writes get physically written to disks, might 
lead to index corruption. 
 
This risk of index corruption only applies to DB2 running on AIX 
platforms. 
 
Note: In the above description, 'synchronizing' means calling 
the operating system function sync(). 
 
(2) LOAD operations write important information to binary load 
control files while it is running.  If a LOAD operation is 
interrupted or fails for any reason, the load terminate 
operation relies on information stored in the load control files 
to be able to restore the load target  table to its previous 
state. The load restart operation also relies on the information 
in the load control files to restart the load operation from the 
last consistency point. 
 
With file system caching enabled, the LOAD command also uses 
buffered disk writes for the load control files.  If a machine 
or file system outage occurs during LOAD processing, which is 
before the load operation completes successfully, the writes to 
the load control files that are currently residing in the disk 
buffer and have not yet been written to the disk are lost. 
After the system comes back up, the load target table is in load 
pending state.  Running LOAD TERMINATE or LOAD RESTART commands 
on the table might result in two different erroneous behavior: 
 
(a) LOAD TERMINATE or LOAD RESTART fails because it detects 
missing information in the load control files. 
 
(b) LOAD TERMINATE or LOAD RESTART is successful, but it fails 
to detect problem with the load control files, and restores 
incorrect information to the table. In this case, there is data 
corruption in the table. 
 
The risk of running into these two erroneous behaviors applies 
to DB2 running on all Linux, UNIX and Windows platforms.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* All LOAD users running on system with file system caching    * 
* enabled                                                      * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* (1) With file system caching enabled, IBM DB2 for Linux,     * 
* UNIX, and Windows uses buffered disk writes for index        * 
* rebuilds during LOAD operations. Buffered disk writes first  * 
* go to the file system cache and after that when the buffered * 
* data needs to be physically written to disks, which is       * 
* typically during the commit time, a sync operation must be   * 
* called.                                                      * 
*                                                              * 
* As a result of an issue in tracking which files needs to be  * 
* synchronized, DB2 mistakenly skips synchronizing some or all * 
* of the required files.  If a machine or file system outage   * 
* occurs, the writes or data that are currently residing in    * 
* the disk buffer and have not yet been written to the disk    * 
* are lost.  The time period for which these writes and data   * 
* are vulnerable is dependent on how aggressively the          * 
* operating system and hardware flush file system cache.       * 
* Under normal conditions, all writes will be sent to disks    * 
* eventually.  If an outage happens after the writes have been * 
* flushed from file system cache to disk,  there will be no    * 
* problems. For LOAD operations where the index creation phase * 
* of load is done in 'REBUILD' mode, an outage happening after * 
* the commit time (marking the LOAD as successful) and before  * 
* the writes get physically written to disks, might lead to    * 
* index corruption.                                            * 
*                                                              * 
* This risk of index corruption only applies to DB2 running on * 
* AIX platforms.                                               * 
*                                                              * 
* Note: In the above description, 'synchronizing' means        * 
* calling the operating system function sync().                * 
*                                                              * 
* (2) LOAD operations write important information to binary    * 
* load control files while it is running.  If a LOAD operation * 
* is interrupted or fails for any reason, the load terminate   * 
* operation relies on information stored in the load control   * 
* files to be able to restore the load target  table to its    * 
* previous state. The load restart operation also relies on    * 
* the information in the load control files to restart the     * 
* load operation from the last consistency point.              * 
*                                                              * 
* With file system caching enabled, the LOAD command also uses * 
* buffered disk writes for the load control files.  If a       * 
* machine or file system outage occurs during LOAD processing, * 
* which is before the load operation completes successfully,   * 
* the writes to the load control files that are currently      * 
* residing in the disk buffer and have not yet been written to * 
* the disk are lost.  After the system comes back up, the load * 
* target table is in load pending state.  Running LOAD         * 
* TERMINATE or LOAD RESTART commands on the table might result * 
* in two different erroneous behavior:                         * 
*                                                              * 
* (a) LOAD TERMINATE or LOAD RESTART fails because it detects  * 
* missing information in the load control files.               * 
*                                                              * 
* (b) LOAD TERMINATE or LOAD RESTART is successful, but it     * 
* fails to detect problem with the load control files, and     * 
* restores incorrect information to the table. In this case,   * 
* there is data corruption in the table.                       * 
*                                                              * 
* The risk of running into these two erroneous behaviors       * 
* applies to DB2 running on all Linux, UNIX and Windows        * 
* platforms.                                                   * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to IBM DB2 for Linux, Unix and Windows version 9.8   * 
* Fix Pack 5.                                                  * 
****************************************************************
Local Fix:
Disable file system cache to prevent both issues from occurring. 
If you already have file system cache enabled and have hit a 
system outage during load processing, perform the following 
steps: 
 
For the first issue (1), mark the invalid indexes as bad using 
the db2dart command, and rebuild them. 
 
For the second issue (2), if the LOAD TERMINATE or LOAD RESTART 
command fails as described in (a), or if you have not issued a 
LOAD TERMINATE or LOAD RESTART command yet, the table can be 
restored to previous state (which is before the start of the 
LOAD operation that failed due to system outage), by deleting 
the corrupted load control files and then issuing a LOAD 
TERMINATE command. 
 
Notes for the second issue (2): 
 
(i) Some disk space that were used to store LOB or LF table 
objects might become orphaned, which means the space will not be 
storing any data and cannot be reused. 
 
(ii) Load control files reside in the 
[db_dir]/load/DB2xxxxx.PID/DB2yyyyy.OID directory, where 
[db_dir] is the database path, typically ends in 
.../NODEmmmm/SQLnnnnn where xxxxx is the pool id  (tablespace 
id) of the load target table, in hexadecimal and yyyyy is the 
object id of the load target table, in hexadecimal and the load 
control file is  loadmmmm.CT1  (where mmmm is the partition 
number in a partitioned database environment.) 
 
Before deleting the corrupted load control files, copy or move 
all files in the [db_dir]/load/DB2xxxxx.PID/DB2yyyyy.OID 
directory to a backup location, then delete all files in the 
[db_dir]/load/DB2xxxxx.PID/DB2yyyyy.OID directory, and then 
issue a LOAD TERMINATE command.  For partitioned database 
environments, you must do this for all the database partitions, 
before issuing the LOAD TERMINATE command. Note that you only 
need to issue the LOAD TERMINATE command once in a partitioned 
database environment.
Solution
Problem first fixed in IBM DB2 for Linux, Unix and Windows 
version 9.8 Fix Pack 5.
Workaround
See Local Fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
15.02.2012
13.06.2012
13.06.2012
Problem solved at the following versions (IBM BugInfos)
9.8.,
9.8.FP5
Problem solved according to the fixlist(s) of the following version(s)
9.8.0.5 FixList