DB2 - Problem description
Problem IC63387 | Status: Closed |
OPTIMIZER MAY OVERESTIMATE JOIN CARDINALITY UNDER CERTAIN CONDITIONS WHEN STATISTICS ARE NOT COLLECTED ON ONE OF THE TABLE | |
product: | |
DB2 FOR LUW / DB2FORLUW / 950 - DB2 | |
Problem description: | |
The DB2 Query Optimizer may choose a non-optimal query execution plan for an SQL statement that contains a join between two or more tables, when the following conditions are satisfied for at least one the joins: 1. the join consists of two or more equality predicates 2. statistics are collected on one of the tables and not the other 3. a unique index or primary key exists on the table that statistics are not collected on, with the join predicates applied to all the key columns 4. a column group statistic or index on the join predicate columns does not exist on the table that statistics are collected on For example, consider the following SQL statement SELECT * FROM T1, T2 WHERE T1.A=T2.A and T1.B=T2.B If an index on T1(A,B) does not exist and statistics are collected on T1 as follows: RUNSTATS ON TABLE <schema>.T1 WITH DISTRIBUTION AND DETAILED INDEXES ALL; and statistics are not collected on T2 but (A,B) is defined as the primary key of the table then, under these conditions, the optimizer may overestimate the cardinality of the join which may result in a non-optimal query execution plan. | |
Problem Summary: | |
ERROR DESCRIPTION: The DB2 Query Optimizer may choose a non-optimal query execution plan for an SQL statement that contains a join between two or more tables, when the following conditions are satisfied for at least one the joins: 1. the join consists of two or more equality predicates 2. statistics are collected on one of the tables and not the other 3. a unique index or primary key exists on the table that statistics are not collected on, with the join predicates applied to all the key columns 4. a column group statistic or index on the join predicate columns does not exist on the table that statistics are collected on For example, consider the following SQL statement SELECT * FROM T1, T2 WHERE T1.A=T2.A and T1.B=T2.B If an index on T1(A,B) does not exist and statistics are collected on T1 as follows: RUNSTATS ON TABLE <schema>.T1 WITH DISTRIBUTION AND DETAILED INDEXES ALL; and statistics are not collected on T2 but (A,B) is defined as the primary key of the table then, under these conditions, the optimizer may overestimate the cardinality of the join which may result in a non-optimal query execution plan. | |
Local Fix: | |
Any of the following actions will resolve this issue: 1. Collect statistics on all tables, or enable automatic statistics collection. Collecting statistics is one of the best practices to minimize the impact of SQL statements on performance, as described in the best practice paper on "Writing and Tuning Queries for Optimal Performance". This paper is available at: http://www.ibm.com/developerworks/data/bestpractices/ 2. Collect column group statistics on the table that statistics are collected on. In the example, column group statistics can be collected on T1(A,B) as follows: RUNSTATS ON TABLE <schema>.T1 ON ALL COLUMNS AND COLUMNS ((A,B)) WITH DISTRIBUTION AND DETAILED INDEXES ALL; If automatic runstats is enabled, a statistics profile will be required. The following article provides more information on how the DB2 Query Optimizer makes use of column group statistics and how to collect them: http://www.ibm.com/developerworks/data/library/techarticle/dm-06 12kapoor/index.html | |
available fix packs: | |
DB2 Version 9.5 Fix Pack 6a for Linux, UNIX, and Windows | |
Solution | |
The APAR is first fixed in the DB2 v9.5 fixpak6 | |
Workaround | |
Any of the following actions will resolve this issue: 1. Collect statistics on all tables, or enable automatic statistics collection. Collecting statistics is one of the best practices to minimize the impact of SQL statements on performance, as described in the best practice paper on "Writing and Tuning Queries for Optimal Performance". This paper is available at: http://www.ibm.com/developerworks/data/bestpractices/ 2. Collect column group statistics on the table that statistics are collected on. In the example, column group statistics can be collected on T1(A,B) as follows: RUNSTATS ON TABLE <schema>.T1 ON ALL COLUMNS AND COLUMNS ((A,B)) WITH DISTRIBUTION AND DETAILED INDEXES ALL; If automatic runstats is enabled, a statistics profile will be required. The following article provides more information on how the DB2 Query Optimizer makes use of column group statistics and how to collect them: http://www.ibm.com/developerworks/data/library/techarticle/dm-06 12kapoor/index.html | |
BUG-Tracking | |
forerunner : APAR is sysrouted TO one or more of the following: IC63389 follow-up : | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 24.09.2009 13.05.2010 13.05.2010 |
Problem solved at the following versions (IBM BugInfos) | |
9.5. | |
Problem solved according to the fixlist(s) of the following version(s) |