Skip to content

Plan Hinting: LEADING Hint

Overview

The LEADING hint allows you to specify the order in which tables are joined, and you can also control which tables are on the probe and build sides of the join. You can enforce both the order and the handedness of the join.

There are two forms of the Join Order hint:

  1. Force join order only:
    sql
    LEADING ( <table_name>, <table_name> [, <table_name>...] )
  2. Force both order and handedness:
    sql
    LEADING ( (<table_name>, <table_name>) [, ...] )

Each set of parenthesis creates a pair representing a join which handedness is set, in the form of (probe, build).

Examples:

Force Join Order Only

In this example, the planner is instructed to first join orders with customers, and then join the result with products.

sql
yellowbrick=# EXPLAIN SETTING (LEADING (orders, customers, products))
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 21              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (orders.product_id = products.id)
  4             10       all   |-SCAN products
                               |   products.id = bloom(7) AND scan_constraints: min_max(products.id)
  7              1       all   |-BUILD
  8              1       all     DISTRIBUTE ON HASH(orders.product_id)
  9              1       all     INNER HASH JOIN ON (customers.id = orders.customer_id)
 12              5       all     |-SCAN orders
                                 |   orders.customer_id = bloom(15) AND scan_constraints: min_max(orders.customer_id)
 15              1       all     |-BUILD
 22              1       all       DISTRIBUTE REPLICATE
 18              1       all       SCAN customers

But we can also instruct the planner to first join products with orders, and then join the result with customers.

sql
yellowbrick=# EXPLAIN SETTING (LEADING (products, orders, customers))
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 20              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (customers.id = orders.customer_id)
  2              5       all   |-INNER HASH JOIN ON (orders.product_id = products.id)
  5             10       all   | |-SCAN products
                               | |   products.id = bloom(8) AND scan_constraints: min_max(products.id)
  8              5       all   | |-BUILD
  9              5       all   |   DISTRIBUTE ON HASH(orders.product_id)
 11              5       all   |   SCAN orders
                               |     orders.customer_id = bloom(14) AND scan_constraints: min_max(orders.customer_id)
 14              1       all   |-BUILD
 21              1       all     DISTRIBUTE REPLICATE
 17              1       all     SCAN customers

In both of these, the planner is free to set any table it wants as the probe/build of each join, based on their estimated rowcounts.

Force Join Order and Handedness

Let's now force both order and handedness.

sql
yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 20              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (products.id = orders.product_id)
  2              1       all   |-INNER HASH JOIN ON (orders.customer_id = customers.id)
  5              1       all   | |-SCAN customers
                               | |   customers.id = bloom(8) AND scan_constraints: min_max(customers.id)
  8              5       all   | |-BUILD
  9              5       all   |   DISTRIBUTE ON HASH(orders.customer_id)
 11              5       all   |   SCAN orders
                               |     orders.product_id = bloom(14) AND scan_constraints: min_max(orders.product_id)
 14             10       all   |-BUILD
 21             10       all     DISTRIBUTE REPLICATE
 17             10       all     SCAN products

In the above plan, the planner had no choice and joined customers with orders first with orders as the build. Then, it joined the result with products, using product as build.

Limitations

The LEADING hint needs to specify the join ordering of the whole plan it is working on. Only specifying two tables out of three will result in the hint being ignored. This results in a few limitations.

join_collapse_limit and from_collapse_limit

The join_collapse_limit and from_collapse_limit parameters control how the planner treats the join order and subqueries when building query execution plans. In particular:

  • from_collapse_limit affects whether the planner flattens subqueries in the FROM clause or treats them as separate subqueries.
  • join_collapse_limit affects the join order optimization, controlling how many join combinations the planner considers when generating a query plan.

When there are more joins in the query than either of these values, one needs one LEADING hint per separated subquery the planner is working with.

If join_collapse_limit is set to 8 (its default value)

sql
yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 20              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (products.id = orders.product_id)
  2              1       all   |-INNER HASH JOIN ON (orders.customer_id = customers.id)
  5              1       all   | |-SCAN customers
                               | |   customers.id = bloom(8) AND scan_constraints: min_max(customers.id)
  8              5       all   | |-BUILD
  9              5       all   |   DISTRIBUTE ON HASH(orders.customer_id)
 11              5       all   |   SCAN orders
                               |     orders.product_id = bloom(14) AND scan_constraints: min_max(orders.product_id)
 14             10       all   |-BUILD
 21             10       all     DISTRIBUTE REPLICATE
 17             10       all     SCAN products

The three tables are in the same planning group and the hint is correctly applied. Now, let's set join_collapse_limit to 2. There are now two planning groups: customer with orders and then the result with products [[customers, orders], products]. If we leave the hint as is, the hint is not used and the planner works as usual:

sql
yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 id   rows_planned   mem_planned   mem_fixed   mem_actual   workers   node
 21              1      8.00Mi N      8.00Mi         0.00       all   SELECT
  1              1        0.00 N        0.00         0.00       all   INNER HASH JOIN ON (orders.product_id = products.id)
  4             10     99.00Mi H     99.00Mi         0.00       all   |-SCAN products
                                                                      |   products.id = bloom(7) AND scan_constraints: min_max(products.id)
  7              1     45.13Mi N     44.13Mi         0.00       all   |-BUILD
  8              1     18.00Mi H     18.00Mi         0.00       all     DISTRIBUTE ON HASH(orders.product_id)
  9              1        0.00 H        0.00         0.00       all     INNER HASH JOIN ON (customers.id = orders.customer_id)
 12              5     99.00Mi H     99.00Mi         0.00       all     |-SCAN orders
                                                                        |   orders.customer_id = bloom(15) AND scan_constraints: min_max(orders.customer_id)
 15              1     45.13Mi H     44.13Mi         0.00       all     |-BUILD
 22              1     16.00Mi H     16.00Mi         0.00       all       DISTRIBUTE REPLICATE
 18              1     99.00Mi H     99.00Mi         0.00       all       SCAN customers
 Planner configuration parameters
   join_collapse_limit                     2

 -------------------------------------- Query Hints --------------------------------------

 Not used hints:
         LEADING( ((customers, orders), products))

To force the join in between customer and orders again, you need to make it its own hint:

sql
yellowbrick=# EXPLAIN(HINT)  SETTING (LEADING ((customers, orders)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 21              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (orders.product_id = products.id)
  4             10       all   |-SCAN products
                               |   products.id = bloom(7) AND scan_constraints: min_max(products.id)
  7              1       all   |-BUILD
  8              1       all     DISTRIBUTE ON HASH(orders.product_id)
  9              1       all     INNER HASH JOIN ON (orders.customer_id = customers.id)
 12              1       all     |-SCAN customers
                                 |   customers.id = bloom(15) AND scan_constraints: min_max(customers.id)
 15              5       all     |-BUILD
 16              5       all       DISTRIBUTE ON HASH(orders.customer_id)
 18              5       all       SCAN orders
 Planner configuration parameters
   join_collapse_limit                     2

 -------------------------------------- Query Hints --------------------------------------

 Used hints:
         LEADING( (customers, orders))

Genetic Query Optimization

The Genetic Query Optimization (GEQO) prevents use of the LEADING hint as it will not explore all join combinations.