Appearance
Plan Hinting: LEADING Hint
Overview
The LEADING hint allows you to specify the order in which tables are joined, and you can also control which tables are on the probe and build sides of the join. You can enforce both the order and the handedness of the join.
There are two forms of the Join Order hint:
- Force join order only:sql
LEADING ( <table_name>, <table_name> [, <table_name>...] )
- Force both order and handedness:sql
LEADING ( (<table_name>, <table_name>) [, ...] )
Each set of parenthesis creates a pair representing a join which handedness is set, in the form of (probe, build)
.
Examples:
Force Join Order Only
In this example, the planner is instructed to first join orders
with customers
, and then join the result with products
.
sql
yellowbrick=# EXPLAIN SETTING (LEADING (orders, customers, products))
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
id rows_planned workers node
21 1 all SELECT
1 1 all INNER HASH JOIN ON (orders.product_id = products.id)
4 10 all |-SCAN products
| products.id = bloom(7) AND scan_constraints: min_max(products.id)
7 1 all |-BUILD
8 1 all DISTRIBUTE ON HASH(orders.product_id)
9 1 all INNER HASH JOIN ON (customers.id = orders.customer_id)
12 5 all |-SCAN orders
| orders.customer_id = bloom(15) AND scan_constraints: min_max(orders.customer_id)
15 1 all |-BUILD
22 1 all DISTRIBUTE REPLICATE
18 1 all SCAN customers
But we can also instruct the planner to first join products
with orders
, and then join the result with customers
.
sql
yellowbrick=# EXPLAIN SETTING (LEADING (products, orders, customers))
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
id rows_planned workers node
20 1 all SELECT
1 1 all INNER HASH JOIN ON (customers.id = orders.customer_id)
2 5 all |-INNER HASH JOIN ON (orders.product_id = products.id)
5 10 all | |-SCAN products
| | products.id = bloom(8) AND scan_constraints: min_max(products.id)
8 5 all | |-BUILD
9 5 all | DISTRIBUTE ON HASH(orders.product_id)
11 5 all | SCAN orders
| orders.customer_id = bloom(14) AND scan_constraints: min_max(orders.customer_id)
14 1 all |-BUILD
21 1 all DISTRIBUTE REPLICATE
17 1 all SCAN customers
In both of these, the planner is free to set any table it wants as the probe/build of each join, based on their estimated rowcounts.
Force Join Order and Handedness
Let's now force both order and handedness.
sql
yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
id rows_planned workers node
20 1 all SELECT
1 1 all INNER HASH JOIN ON (products.id = orders.product_id)
2 1 all |-INNER HASH JOIN ON (orders.customer_id = customers.id)
5 1 all | |-SCAN customers
| | customers.id = bloom(8) AND scan_constraints: min_max(customers.id)
8 5 all | |-BUILD
9 5 all | DISTRIBUTE ON HASH(orders.customer_id)
11 5 all | SCAN orders
| orders.product_id = bloom(14) AND scan_constraints: min_max(orders.product_id)
14 10 all |-BUILD
21 10 all DISTRIBUTE REPLICATE
17 10 all SCAN products
In the above plan, the planner had no choice and joined customers
with orders
first with orders
as the build. Then, it joined the result with products
, using product as build.
Limitations
The LEADING hint needs to specify the join ordering of the whole plan it is working on. Only specifying two tables out of three will result in the hint being ignored. This results in a few limitations.
join_collapse_limit
and from_collapse_limit
The join_collapse_limit
and from_collapse_limit
parameters control how the planner treats the join order and subqueries when building query execution plans. In particular:
from_collapse_limit
affects whether the planner flattens subqueries in the FROM clause or treats them as separate subqueries.join_collapse_limit
affects the join order optimization, controlling how many join combinations the planner considers when generating a query plan.
When there are more joins in the query than either of these values, one needs one LEADING hint per separated subquery the planner is working with.
If join_collapse_limit
is set to 8 (its default value)
sql
yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
id rows_planned workers node
20 1 all SELECT
1 1 all INNER HASH JOIN ON (products.id = orders.product_id)
2 1 all |-INNER HASH JOIN ON (orders.customer_id = customers.id)
5 1 all | |-SCAN customers
| | customers.id = bloom(8) AND scan_constraints: min_max(customers.id)
8 5 all | |-BUILD
9 5 all | DISTRIBUTE ON HASH(orders.customer_id)
11 5 all | SCAN orders
| orders.product_id = bloom(14) AND scan_constraints: min_max(orders.product_id)
14 10 all |-BUILD
21 10 all DISTRIBUTE REPLICATE
17 10 all SCAN products
The three tables are in the same planning group and the hint is correctly applied. Now, let's set join_collapse_limit
to 2. There are now two planning groups: customer with orders and then the result with products [[customers, orders], products]. If we leave the hint as is, the hint is not used and the planner works as usual:
sql
yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
id rows_planned mem_planned mem_fixed mem_actual workers node
21 1 8.00Mi N 8.00Mi 0.00 all SELECT
1 1 0.00 N 0.00 0.00 all INNER HASH JOIN ON (orders.product_id = products.id)
4 10 99.00Mi H 99.00Mi 0.00 all |-SCAN products
| products.id = bloom(7) AND scan_constraints: min_max(products.id)
7 1 45.13Mi N 44.13Mi 0.00 all |-BUILD
8 1 18.00Mi H 18.00Mi 0.00 all DISTRIBUTE ON HASH(orders.product_id)
9 1 0.00 H 0.00 0.00 all INNER HASH JOIN ON (customers.id = orders.customer_id)
12 5 99.00Mi H 99.00Mi 0.00 all |-SCAN orders
| orders.customer_id = bloom(15) AND scan_constraints: min_max(orders.customer_id)
15 1 45.13Mi H 44.13Mi 0.00 all |-BUILD
22 1 16.00Mi H 16.00Mi 0.00 all DISTRIBUTE REPLICATE
18 1 99.00Mi H 99.00Mi 0.00 all SCAN customers
Planner configuration parameters
join_collapse_limit 2
-------------------------------------- Query Hints --------------------------------------
Not used hints:
LEADING( ((customers, orders), products))
To force the join in between customer and orders again, you need to make it its own hint:
sql
yellowbrick=# EXPLAIN(HINT) SETTING (LEADING ((customers, orders)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
id rows_planned workers node
21 1 all SELECT
1 1 all INNER HASH JOIN ON (orders.product_id = products.id)
4 10 all |-SCAN products
| products.id = bloom(7) AND scan_constraints: min_max(products.id)
7 1 all |-BUILD
8 1 all DISTRIBUTE ON HASH(orders.product_id)
9 1 all INNER HASH JOIN ON (orders.customer_id = customers.id)
12 1 all |-SCAN customers
| customers.id = bloom(15) AND scan_constraints: min_max(customers.id)
15 5 all |-BUILD
16 5 all DISTRIBUTE ON HASH(orders.customer_id)
18 5 all SCAN orders
Planner configuration parameters
join_collapse_limit 2
-------------------------------------- Query Hints --------------------------------------
Used hints:
LEADING( (customers, orders))
Genetic Query Optimization
The Genetic Query Optimization (GEQO) prevents use of the LEADING hint as it will not explore all join combinations.