DB2 - Problem description
Problem IT05733 | Status: Closed |
SUB-OPTIMAL QUERY PERFORMANCE WHEN DISTRIBUTION STATS COLLECTED ON STRING COLUMN WITH A COMMON PREFIX LARGER THAN 32 BYTES | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
The RUNSTATS utility can collect two types of distribution statistics: quantile statistics and frequent value statistics. Quantile statistics provide information about how data values are distributed in relation to other values. Frequent value statistics provide information about a column and the data value with the highest number of duplicates, the value with the second highest number of duplicates, etc. When you collect distribution statistics on columns containing string data with a common prefix of more than 32 bytes, the statistics can be inaccurate as a result of design limitations in the RUNSTATS utility. If you compile a query with equality or range predicates on this column, this inaccuracy can result in a sub-optimal query execution plan. Example: In the following example, table T1 contains a column of type CHAR(40) with the following data: C1 ---------------------------------------- test.test.test.test.test.test.value1 test.test.test.test.test.test.value2 test.test.test.test.test.test.value3 test.test.test.test.test.test.value3 test.test.test.test.test.test.value3 where the first 35 bytes are the same in each value. If you collect distribution statistics on this column, you will see the following data in the SYSSTAT.COLDIST catalog view COLVALUE VALCOUNT TYPE SEQNO ---------------------------------------- -------------------- ---- ------ 'test.test.test.test.test.test.val' 5 F 1 'test.test.test.test.test.test.val' 0 Q 1 'test.test.test.test.test.test.val' 5 Q 2 and for a query referencing the column in a predicate, such as SELECT C1 FROM T1 WHERE C1 = 'test.test.test.test.test.test.value3' the optimizer will under estimate the number of rows qualifying the predicate, which can be confirmed by collecting an EXPLAIN of the query. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Users with tables containing columns containing * * string data with a common prefix of more than 32 bytes. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 version 9.7.0.11 * **************************************************************** | |
Local Fix: | |
You can solve this issue by disabling the collection of distribution statistics on the columns affected, while enabling the collection of distribution statistics on all other columns. The following is one example of a RUNSTATS command that disables collection of distribution statistics on one column, C1, while enabling it for all other columns in the table: RUNSTATS ON TABLE SCHEMA.T1 WITH DISTRIBUTION ON ALL COLUMNS AND COLUMNS(C1 NUM_FREQVALUES 0 NUM_QUANTILES 0) AND DETAILED INDEXES ALL If AUTO_RUNSTATS is enabled, you will need to apply a statistics profile to avoid AUTO_RUNSTATS overriding the modified RUNSTATS command. | |
Solution | |
Problem first fixed in DB2 version 9.7.0.11 | |
Workaround | |
See Local Fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 21.11.2014 22.10.2015 22.10.2015 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.0.11 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.11 |