Appearance
Storage Engine Filter Expressions
Storage Engine (SE) filter expressions are an advanced mechanism to reduce the amount of data read from disk when executing queries. These textual expressions are automatically generated upon query submission and are consumed by the SE during query execution.
The deeper a filter can be pushed into the query execution tree, the better. This minimizes the data read from disk, reducing the work and resources required to execute the query.
Consider the following query:
sql
SELECT x FROM result_table WHERE x = 3;
The corresponding simplified query tree:
SELECT x
|
FILTER (x = 3)
|
SCAN result_table
Without scan-level filtering in the storage engine, every row in result_table
would be scanned and loaded into memory. For example, if result_table
contains 100GB of data but only a few rows satisfy x = 3
, most of the scanning and memory usage would be wasted, leading to unnecessarily long execution times.
Rows are written to disk in "groups" called shards, which contain metadata such as the minimum and maximum values for each column. This metadata enables skipping entire shards that cannot satisfy the query conditions. For example, if all but one shard has a minimum value of 10 for column x
, those shards can be safely skipped when querying for x = 3
.
SE filter expressions provide more advanced scan-level filtering than previous mechanisms like bloom filters and scan restrictions. By reducing the data read from disk, these filters improve query execution time and reduce resource consumption.
Note: This document focuses on SE filter expressions and should not be confused with bloom filters or scan restrictions.
Basic Usage
The generation of SE filter expressions is controlled by the enable_se_filter_expression_generation
configuration parameter, which accepts a boolean value. To enable filter generation, set this parameter to true.
sql
SET enable_se_filter_expression_generation TO true;
CREATE TABLE result_table (x INT, y INT);
EXPLAIN (se_filters) SELECT x FROM result_table WHERE x = 3;
QUERY PLAN
---------------------------------------------------------------------------------
id rows_planned workers node
5 1 all SELECT
2 1 all SCAN result_table
(result_table.x = $0) AND scan_constraints: (result_table.x = $0)
SE Filter: (c1 = i3);
In the output above, the SE Filter line under the SCAN node shows the generated SE filter expression. The expression filters on column x
(the first column of the table, indicated by c1
) for rows where x
equals the integer value 3.
SE filter expressions are derived from a query’s JOIN and WHERE clauses. The following section describes the kinds of clauses that the planner will attempt to turn into an SE filter expressions.
Supported WHERE Clauses
The following types of WHERE clauses can generate SE filter expressions:
Simple Comparisons
<column> <op> <constant>
Supported operators: >, >=, <, <=, =, !=
Columns and constants cannot be NUMERIC/DECIMAL
NULL Tests
sql
SELECT * FROM result_table WHERE x IS NULL;
- Constant IN Lists
sql
SELECT * FROM result_table WHERE x IN (1, 2, 3);
- Subquery IN Lists
sql
SELECT * FROM result_table WHERE x IN (SELECT y FROM bar);
The table in the subquery must be replicated.
- AND/OR Combinations
sql
SELECT * FROM result_table WHERE x = 3 OR y = 5;
SELECT * FROM result_table WHERE x = 3 OR y in (1, 2, 3);`
If a clause doesn't meet these conditions, an UNK
term will be generated for the predicate column, which is ignored during filtering.
sql
EXPLAIN (se_filters) SELECT * FROM result_table WHERE x = 3 OR (y + 1 > 5);
QUERY PLAN
-----------------------------------------------------------------------
id rows_planned workers node
5 1 all SELECT
2 1 all SCAN result_table
((result_table.x = $0) OR ((result_table.y + $1) > $2))
SE Filter: ((c1 = i3) OR UNK);
Inlist Queries
An inlist query filters a column based on a set of values. The following examples demonstrate two equivalent queries:
sql
SELECT * FROM result_table WHERE x = 1 OR x = 2 OR x = 3;
SELECT * FROM result_table WHERE x IN (1, 2, 3);
We refer to such queries as inlist queries, regardless of whether the IN operator is explicitly used, as both check if x
belongs to a set of values.
When executing inlist queries, the generated SE filter expression depends on:
The number of values in the set.
The value of the
inlist_threshold
configuration parameter.
Let us analyze the EXPLAIN output for a query with different inlist_threshold
settings:
inlist_threshold Equal to the Set Size
sql
SET inlist_threshold TO 3;
EXPLAIN (se_filters) SELECT * FROM result_table WHERE x IN (1, 2, 3);
QUERY PLAN
-----------------------------------------------------------------------------------
id rows_planned workers node
5 1 all SELECT
2 1 all SCAN result_table
((result_table.x = $2) OR ((result_table.x = $0) OR (result_table.x = $1)))
SE Filter: ((c1 = i1) OR (c1 = i2) OR (c1 = i3));
inlist_threshold Less Than the Set Size
sql
SET inlist_threshold TO 2;
EXPLAIN (se_filters) SELECT * FROM result_table WHERE x IN (1, 2, 3);
QUERY PLAN
------------------------------------------------------------------------------------------
id rows_planned workers node
10 1 all SELECT
1 1 all SEMI LEFT INNER HASH JOIN ON (#inlist_100#.val = result_table.x)
4 1 all |-SCAN result_table
| result_table.x = bloom(7) AND scan_constraints: min_max(result_table.x)
| SE Filter: (c1 in (build7));
7 3 all |-BUILD
9 3 all SCAN TEMP 3
When the number of values in the set exceeds the inlist_threshold
, the planner will attempt to rewrite the query to use a SEMI JOIN, which improves performance for larger sets. For example, the SE filter expression c1 in (build7)
says that the first column in result_table
, in this case x
, must match the values in the build node with ID 7, which references a temporary table containing the set values.
Using Subqueries in Inlist Queries
SE filter expressions can also be generated when a subquery provides the values to filter on. However, the following conditions must be met:
The table used in the subquery must be replicated. For details on how to create a replicated table click here.
The subquery must return only one column. Multi-column inlists (example shown below) will not generate SE filter expressions:
sql
SELECT * FROM result_table WHERE (x, y) IN (SELECT x, y FROM bar);
The following is an example where an SE filter expression is generated when filtering x
on values in another table.:
sql
CREATE TABLE bar(y int) DISTRIBUTE REPLICATE;
EXPLAIN (se_filters) SELECT * FROM result_table WHERE x IN (SELECT y FROM bar WHERE y > 2);
QUERY PLAN
-----------------------------------------------------------------------------------------
id rows_planned workers node
13 1 all SELECT
1 1 all SEMI LEFT INNER HASH JOIN ON (bar.y = result_table.x)
4 1 all |-SCAN result_table
| result_table.x = bloom(7) AND scan_constraints: min_max(result_table.x)
| SE Filter: (c1 in (build7));
7 1 all |-BUILD
10 1 all SCAN bar
(bar.y > $0) AND scan_constraints: (bar.y > $0)
SE Filter: (c1 > i2);
Here, the SE filter expression for result_table
references a build node derived from a scan on bar
. Additionally, bar
has its own SE filter expression, specifying that column 1 (c1
) of bar
must be greater than 2.
Data Ordering
The order in which data is written to disk plays a significant role in the observed performance of SE filter expressions. Shards, the "groups" in which rows are written to disk, are the key unit to consider here. As mentioned earlier, the metadata on shards tracks information such as the minimum and maximum values for a column across the rows in that shard. For SE filter expressions to produce large speedups (by skipping as much data as possible), the range of values for a predicate column in a shard should be as narrow as possible.
Shards are immutable and do not change once written to disk - even when UPDATE
or DELETE
commands are issued. In both cases, the records affected by the query are marked as deleted. In the case of an UPDATE
, the modified record is appended as a new row in another shard. Over time, these operations can cause the data of interest to be spread across more shards, reducing the performance benefits SE filter expressions can provide.
When a workload is well understood, it may be advisable to periodically resort the data based on the columns of interest.
Top-Level OR Conditions
SE filter expressions apply only to the first top-level OR condition in a query. This restriction ensures the filter application remains computationally efficient.
Examples:
Single Top-Level OR Condition
sql
WHERE (col1 = const OR col2 = const OR col3 = const OR col1 = otherConst);
There is only one top-level OR here. All terms in the OR condition are included in the SE filter expression.
Nested OR and AND Conditions
sql
WHERE (col1 = const AND col2 = const AND (col3 = const1 OR col3 = const2 OR col4 = const));
Still only one top-level OR is present and all terms in the resulting SE filter are applied.
Multiple Top-Level OR Conditions
sql
WHERE (col1 = const) AND (col2 = const OR col3 = const) AND (col4 = const OR col5 = const);
SE filter expressions apply to columns col1
, col2
, and col3
. Filters on col4
and col5
are processed later in the execution engine.