Plan Hinting: LEADING Hint

Overview

The LEADING hint allows you to specify the order in which tables are joined, and you can also control which tables are on the probe and build sides of the join. You can enforce both the order and the handedness of the join.

There are two forms of the Join Order hint:

Force join order only:

sql

LEADING ( <table_name>, <table_name> [, <table_name>...] )

Force both order and handedness:

sql

LEADING ( (<table_name>, <table_name>) [, ...] )

Each set of parenthesis creates a pair representing a join which handedness is set, in the form of (probe, build).

Examples:

Force Join Order Only

In this example, the planner is instructed to first join orders with customers, and then join the result with products.

sql

yellowbrick=# EXPLAIN SETTING (LEADING (orders, customers, products))
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 21              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (orders.product_id = products.id)
  4             10       all   |-SCAN products
                               |   products.id = bloom(7) AND scan_constraints: min_max(products.id)
  7              1       all   |-BUILD
  8              1       all     DISTRIBUTE ON HASH(orders.product_id)
  9              1       all     INNER HASH JOIN ON (customers.id = orders.customer_id)
 12              5       all     |-SCAN orders
                                 |   orders.customer_id = bloom(15) AND scan_constraints: min_max(orders.customer_id)
 15              1       all     |-BUILD
 22              1       all       DISTRIBUTE REPLICATE
 18              1       all       SCAN customers

But we can also instruct the planner to first join products with orders, and then join the result with customers.

sql

yellowbrick=# EXPLAIN SETTING (LEADING (products, orders, customers))
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 20              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (customers.id = orders.customer_id)
  2              5       all   |-INNER HASH JOIN ON (orders.product_id = products.id)
  5             10       all   | |-SCAN products
                               | |   products.id = bloom(8) AND scan_constraints: min_max(products.id)
  8              5       all   | |-BUILD
  9              5       all   |   DISTRIBUTE ON HASH(orders.product_id)
 11              5       all   |   SCAN orders
                               |     orders.customer_id = bloom(14) AND scan_constraints: min_max(orders.customer_id)
 14              1       all   |-BUILD
 21              1       all     DISTRIBUTE REPLICATE
 17              1       all     SCAN customers

In both of these, the planner is free to set any table it wants as the probe/build of each join, based on their estimated rowcounts.

Force Join Order and Handedness

Let's now force both order and handedness.

sql

yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 20              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (products.id = orders.product_id)
  2              1       all   |-INNER HASH JOIN ON (orders.customer_id = customers.id)
  5              1       all   | |-SCAN customers
                               | |   customers.id = bloom(8) AND scan_constraints: min_max(customers.id)
  8              5       all   | |-BUILD
  9              5       all   |   DISTRIBUTE ON HASH(orders.customer_id)
 11              5       all   |   SCAN orders
                               |     orders.product_id = bloom(14) AND scan_constraints: min_max(orders.product_id)
 14             10       all   |-BUILD
 21             10       all     DISTRIBUTE REPLICATE
 17             10       all     SCAN products

In the above plan, the planner had no choice and joined customers with orders first with orders as the build. Then, it joined the result with products, using product as build.

Limitations

The LEADING hint needs to specify the join ordering of the whole plan it is working on. Only specifying two tables out of three will result in the hint being ignored. This results in a few limitations.

`join_collapse_limit` and `from_collapse_limit`

The join_collapse_limit and from_collapse_limit parameters control how the planner treats the join order and subqueries when building query execution plans. In particular:

from_collapse_limit affects whether the planner flattens subqueries in the FROM clause or treats them as separate subqueries.
join_collapse_limit affects the join order optimization, controlling how many join combinations the planner considers when generating a query plan.

When there are more joins in the query than either of these values, one needs one LEADING hint per separated subquery the planner is working with.

If join_collapse_limit is set to 8 (its default value)

sql

yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 20              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (products.id = orders.product_id)
  2              1       all   |-INNER HASH JOIN ON (orders.customer_id = customers.id)
  5              1       all   | |-SCAN customers
                               | |   customers.id = bloom(8) AND scan_constraints: min_max(customers.id)
  8              5       all   | |-BUILD
  9              5       all   |   DISTRIBUTE ON HASH(orders.customer_id)
 11              5       all   |   SCAN orders
                               |     orders.product_id = bloom(14) AND scan_constraints: min_max(orders.product_id)
 14             10       all   |-BUILD
 21             10       all     DISTRIBUTE REPLICATE
 17             10       all     SCAN products

The three tables are in the same planning group and the hint is correctly applied. Now, let's set join_collapse_limit to 2. There are now two planning groups: customer with orders and then the result with products [[customers, orders], products]. If we leave the hint as is, the hint is not used and the planner works as usual:

sql

yellowbrick=# EXPLAIN SETTING (LEADING (((customers, orders), products)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 id   rows_planned   mem_planned   mem_fixed   mem_actual   workers   node
 21              1      8.00Mi N      8.00Mi         0.00       all   SELECT
  1              1        0.00 N        0.00         0.00       all   INNER HASH JOIN ON (orders.product_id = products.id)
  4             10     99.00Mi H     99.00Mi         0.00       all   |-SCAN products
                                                                      |   products.id = bloom(7) AND scan_constraints: min_max(products.id)
  7              1     45.13Mi N     44.13Mi         0.00       all   |-BUILD
  8              1     18.00Mi H     18.00Mi         0.00       all     DISTRIBUTE ON HASH(orders.product_id)
  9              1        0.00 H        0.00         0.00       all     INNER HASH JOIN ON (customers.id = orders.customer_id)
 12              5     99.00Mi H     99.00Mi         0.00       all     |-SCAN orders
                                                                        |   orders.customer_id = bloom(15) AND scan_constraints: min_max(orders.customer_id)
 15              1     45.13Mi H     44.13Mi         0.00       all     |-BUILD
 22              1     16.00Mi H     16.00Mi         0.00       all       DISTRIBUTE REPLICATE
 18              1     99.00Mi H     99.00Mi         0.00       all       SCAN customers
 Planner configuration parameters
   join_collapse_limit                     2

 -------------------------------------- Query Hints --------------------------------------

 Not used hints:
         LEADING( ((customers, orders), products))

To force the join in between customer and orders again, you need to make it its own hint:

sql

yellowbrick=# EXPLAIN(HINT)  SETTING (LEADING ((customers, orders)) )
yellowbrick-# SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id JOIN products ON products.id = orders.product_id;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
 21              1       all   SELECT
  1              1       all   INNER HASH JOIN ON (orders.product_id = products.id)
  4             10       all   |-SCAN products
                               |   products.id = bloom(7) AND scan_constraints: min_max(products.id)
  7              1       all   |-BUILD
  8              1       all     DISTRIBUTE ON HASH(orders.product_id)
  9              1       all     INNER HASH JOIN ON (orders.customer_id = customers.id)
 12              1       all     |-SCAN customers
                                 |   customers.id = bloom(15) AND scan_constraints: min_max(customers.id)
 15              5       all     |-BUILD
 16              5       all       DISTRIBUTE ON HASH(orders.customer_id)
 18              5       all       SCAN orders
 Planner configuration parameters
   join_collapse_limit                     2

 -------------------------------------- Query Hints --------------------------------------

 Used hints:
         LEADING( (customers, orders))

Genetic Query Optimization

The Genetic Query Optimization (GEQO) prevents use of the LEADING hint as it will not explore all join combinations.

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

sys.lock

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Loading from Amazon S3

Loading from Azure Blob Storage

Setting up and Running a Spark Job

Setting Up the ybrelay Service

LDAP Authentication

Synchronizing Users and Groups

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

AWS Marketplace

Create Stack

Docker

Cloud: Configuration

Vanity DNS

Yellowbrick Manager

Cloud: Enterprise Edition Getting Started

SQL-Based Loads from External Storage

Cloud: Installation

CLI Install Instructions

Permissions

Private Install Instructions

Public Install Instructions

Cloud: Kubernetes Guides

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

GROUP BY Clause

Subqueries

Data Type Casting

DECIMAL

JSON

JSONB

SQL String Constants

Aggregate Functions

Conditional Expressions

Datetime Functions

Formatting Functions

Geospatial functions

Mathematical Functions

Network Address Functions

Pattern Matching

Regular Expression Details

SQL Operators and Pattern Matching Functions

SQL Conditions

SQL User Defined Function (UDF)

SQL UDF Create Function

String Functions

ENCRYPT_KS

System Functions

Type-Safe Casting Functions

Window Functions

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

Plan Hinting: LEADING Hint ​

Overview ​

Examples: ​

Force Join Order Only ​

Force Join Order and Handedness ​

Limitations ​

join_collapse_limit and from_collapse_limit ​

Genetic Query Optimization ​