Predicate Pushdown and Columnstore Indexes

Andrew Smith Posted on

Predicate pushdown in SQL Server is a query plan optimisation that pushes predicates down the query tree, so that filtering occurs earlier within query execution than implied by the logical query execution ordering.

So with an inner join query for example, a predicate that appears in the WHERE clause and that logically takes place after the join matching, may be pushed down into the seek or scan that’s retrieving rows from a table for input into the join.

In this post I’m looking at when predicate pushdown occurs and when it doesn’t, for queries executing over columnstore indexes.

I’m using SQL Server 2014 and Microsoft’s ContosoRetailDW database, and considering the difference between the execution of these 2 queries:

CSPPQueries

They’re running over a star schema where the fact table referenced here is built as a clustered columnstore index. The fact table has 12,627,608 rows and the WHERE clause for each query above filters the fact table to exactly the same 8,627,583 rows. Thus both queries return the same aggregate values. When we execute them in the order above with SET STATISTICS TIME ON, we see:

SQL Server Execution Times:
   CPU time = 593 ms, elapsed time = 185 ms.

SQL Server Execution Times:
   CPU time = 1532 ms, elapsed time = 463 ms.

So the second query takes almost 3 times the CPU time of the first. Why is this? Well, if we compare the 2 query plans we see:

CSPPQueryPlans

The one structural difference here is that the second plan includes a filter operator after the columnstore index scan operator. And if we look at the stats for the columnstore index scan operator for each of the 2 queries we see:

CSPPQueryStats

So what’s happened is that in the first query, the WHERE clause predicate for DateKey has been pushed down into the index scan, so that the DateKey filtering is done as part of the data retrieval for input into the join operator. Because of this no more than the 8,627,583 rows required for the query aggregations are output from the index scan.

Whereas in the second query, the WHERE clause predicate is not pushed down, resulting in all rows from the fact table being output from the index scan such that they then need to be filtered by the additional filter operator.

The performance gain in the first query will be particularly enhanced if the predicate pushdown results in segment elimination, which is a columnstore optimisation that has the possibility of excluding whole columnstore segments from query processing as the result of (very fast) metadata checking.

So why is predicate pushdown occurring with the first query but not the second? It’s because in SQL Server 2014, string (char, varchar, nchar and nvarchar) based predicates on columnstore columns are not pushed down, whereas date-based (except “datetimeoffset”) and number-based (except “numeric”) predicates are.

This restriction has been removed for SQL Server 2016 though – see the “String Predicate Pushdown” section of this MSDN article for details.

Note: the example used in this post was taken from Niko Neugebauer’s very enlightening SQLBits session on Understanding & Solving Columnstore Problems.