DB2 - Problem description
Problem IC61928 | Status: Closed |
IMPROVE PERFORMANCE OF A QUERY CONTAINING A SELECT DISTINCT IN A COMMON SUBQUERY EXPRESSION (CSE). | |
product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problem description: | |
When a query has a common subquery expression(CSE) that has a SELECT DISTINCT, DB2 may merge the CSE with another subquery or the main query. In this case, DB2 may choose to remove duplicates later rather than when the CSE is built in order to avoid an extra SORT just to remove duplicates. Duplicates may be removed later using a SORT required for some other purpose like an ORDER BY. . Normally this is not an issue as often predicates are applied below the CSE subquery and there may not be many duplicate values. However when the number of duplicate values in the CSE subquery is high, the benefit of avoiding the SORT and removing duplicates later is diminished. Reducing the size of the data early benefits subsequent join or other operations . If a CSE distinct is postponed, in the db2exfmt output, the optimized statement should have a bigger query that has the distinct and covers all the subquery joins in the CSE. Similarly, in the access plan, there is not such a join operator that covers exactly all the CSE subquery joins and has a SORT operator above it with the distinct columns as sort key. | |
Problem Summary: | |
When a query has a common subquery expression(CSE) that has a SELECT DISTINCT, DB2 may merge the CSE with another subquery or the main query. In this case, DB2 may choose to remove duplicates later rather than when the CSE is built in order to avoid an extra SORT just to remove duplicates. Duplicates may be removed later using a SORT required for some other purpose like an ORDER BY. . Normally this is not an issue as often predicates are applied below the CSE subquery and there may not be many duplicate values. However when the number of duplicate values in the CSE subquery is high, the benefit of avoiding the SORT and removing duplicates later is diminished. Reducing the size of the data early benefits subsequent join or other operations . If a CSE distinct is postponed, in the db2exfmt output, the optimized statement should have a bigger query that has the distinct and covers all the subquery joins in the CSE. Similarly, in the access plan, there is not such a join operator that covers exactly all the CSE subquery joins and has a SORT operator above it with the distinct columns as sort key. | |
Local Fix: | |
Manually rewrite the CSE using group by instead of distinct. This will force DB2 to remove the duplicates first. | |
available fix packs: | |
DB2 Version 9.7 Fix Pack 1 for Linux, UNIX, and Windows | |
Solution | |
module engn_sqnr fixed >= v97 fpk1 | |
Workaround | |
Manually rewrite the CSE using group by instead of distinct. This will force DB2 to remove the duplicates first. | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 08.07.2009 11.02.2010 11.02.2010 |
Problem solved at the following versions (IBM BugInfos) | |
9.7.FPk1 | |
Problem solved according to the fixlist(s) of the following version(s) | |
9.7.0.1 |