Skip to content

Plan Hinting: SET Hint

Overview

SET hints allow users to set planner related configuration parameters to influence the query planner's decision-making process. The configuration parameter is applied for the duration of the query only.

sql
<parameter_name> = <parameter_value>

Examples

Setting a configuration parameter for a SELECT statement

The following example sets a parameters in the SETTING clause.

sql
premdb=# SETTING (ybd_query_tags='tagged_query')
SELECT * FROM foo ORDER BY 1;

Note that the tags that were set in this example are logged in sys.query and sys.log_query:

sql
premdb=# SELECT * FROM sys.log_query WHERE tags='tagged_query';
-[ RECORD 1 ]----------------+----------------------------------------------------------------------------------------------------
query_id                     | 327395
session_id                   | 24780
transaction_id               | 0
plan_id                      | 8EjFRXhVcfqyE8H1P+YBUgdcO-XUleSHmHFGw4lrkI4=
state                        | done
username                     | yellowbrick
application_name             | ybsql
database_name                | premdbtype                         | select
tags                         | tagged_query
error_code                   | 00000
error_message                | [NULL]
query_text                   | SETTING (ybd_query_tags='tagged_query') SELECT * FROM foo ORDER BY 1;
...

Setting a configuration parameter for an INSERT statement

The following example sets the same parameter for an INSERT statement:

sql
premdb=# SETTING (ybd_query_tags='tagged_query')
INSERT INTO foo SELECT x FROM bar;

Setting a configuration parameter for an EXPLAIN statement

The following example sets ybd_query_tags for an EXPLAIN statement:

sql
premdb=# EXPLAIN SETTING (ybd_query_tags='COUNT_STAR_QUERY')
SELECT COUNT(*) FROM newmatchstats WHERE seasonid>12;
                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
  1              1    single   SELECT
  3              1    single   AGGREGATE SIMPLE
  4             10       all   DISTRIBUTE SINGLE
  5             10       all   AGGREGATE SIMPLE
  6         342000       all   EXPRESSION
                                calculate: (NULL cast)
  9         342000       all   SCAN newmatchstats
                                (newmatchstats.seasonid::INT4 > $1) AND scan_constraints: (newmatchstats.seasonid > $1)
 Database: premdb
 Version: 5.3.0-20210825161549
 Hostname: yb007-mgr1.sjc.yellowbrick.io

(13 rows)

In turn, you can see the value that was set for ybd_query_tags in the tags column of sys.log_query:

sql
premdb=# SELECT * FROM sys.log_query WHERE tags='COUNT_STAR_QUERY';
-[ RECORD 1 ]----------------+----------------------------------------------------------------------------------------------------------
query_id                     | 337496
session_id                   | 24780
transaction_id               | 0
plan_id                      | 6lgLXILZfBFbqhwrnxWnhLMRxyx-bQfznLd9BPx8wGY=
state                        | done
username                     | yellowbrick
application_name             | ybsql
database_name                | premdb
type                         | EXPLAIN
tags                         | COUNT_STAR_QUERY
error_code                   | 00000
error_message                | [NULL]
query_text                   | EXPLAIN SETTING (ybd_query_tags='COUNT_STAR_QUERY') SELECT COUNT(*) FROM newmatchstats WHERE seasonid>12;
...

Improving performance by preventing join flipping

Let's now assume we have two tables :

  • orders: Contains 1 million rows, mostly with customer_id = 1 and order_date = '1997-03-21'
  • customers: Contains 1000 rows, with distinct customer_id values.

If you want to retrieve a list of all customers along with their recent orders (if any), you might write a query like this:

sql
SELECT c.customer_id, c.name, o.order_id, o.order_date
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';

Usually, the planner should start by scanning customers as it is the smaller table and then join it with order and will flip the LEFT into a RIGHT join internally:

sql
yellowbrick=# EXPLAIN(ANALYZE)
yellowbrick-# SELECT c.customer_id, c.name, o.order_id, o.order_date
yellowbrick-# FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';

                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 id   rows_planned   rows_actual   workers   node
 13          34133          1000       all   SELECT
  1          34133          1000       all   RIGHT OUTER HASH JOIN ON (c.customer_id = o.customer_id)
  2          33333            10       all   |-DISTRIBUTE ON HASH(o.customer_id)
  4          33333            10       all   | SCAN orders AS o
                                             |   (o.order_date >= $0) AND scan_constraints: (orders.order_date >= $0)
  7           1000          1000       all   |-BUILD
 14           1000          1000       all     EXPRESSION
                                                 calculate: (NULL cast)
 10           1000          1000       all     SCAN customers AS c

But here, the data is highly skewed and the filter WHERE o.order_date >= '2024-01-01' will actually remove almost all rows from the orders table. It is therefore better to force the planner to use a LEFT join by forbidding RIGHT joins. Let's set two parameters : forbidding right joins to improve performances and a tag, so we can easily identify the new plan in sys.log_query. Note that they must be separated by a semicolon:

sql
yellowbrick=# EXPLAIN(ANALYZE) SETTING(enable_right_joins=false, ybd_query_tags='force_left_join')
yellowbrick-# SELECT c.customer_id, c.name, o.order_id, o.order_date
yellowbrick-# FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';

                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 id   rows_planned   rows_actual   workers   node
 13          33333          1000       all   SELECT
  1          33333          1000       all   LEFT OUTER HASH JOIN ON (o.customer_id = c.customer_id)
  4           1000          1000       all   |-SCAN customers AS c
  7             10            10       all   |-BUILD
 14             10            10       all     EXPRESSION
                                                 calculate: (NULL cast)
  8          33333            10        all    DISTRIBUTE ON HASH(o.customer_id)
 10          33333            10        all    SCAN orders AS o
                                                 (o.order_date >= $0) AND scan_constraints: (orders.order_date >= $0)

Planner configuration parameters
   enable_right_joins                      off