Appearance
Plan Hinting: SET Hint
Overview
SET hints allow users to set planner related configuration parameters to influence the query planner's decision-making process. The configuration parameter is applied for the duration of the query only.
sql
<parameter_name> = <parameter_value>
Examples
Setting a configuration parameter for a SELECT statement
The following example sets a parameters in the SETTING
clause.
sql
premdb=# SETTING (ybd_query_tags='tagged_query')
SELECT * FROM foo ORDER BY 1;
Note that the tags that were set in this example are logged in sys.query
and sys.log_query
:
sql
premdb=# SELECT * FROM sys.log_query WHERE tags='tagged_query';
-[ RECORD 1 ]----------------+----------------------------------------------------------------------------------------------------
query_id | 327395
session_id | 24780
transaction_id | 0
plan_id | 8EjFRXhVcfqyE8H1P+YBUgdcO-XUleSHmHFGw4lrkI4=
state | done
username | yellowbrick
application_name | ybsql
database_name | premdbtype | select
tags | tagged_query
error_code | 00000
error_message | [NULL]
query_text | SETTING (ybd_query_tags='tagged_query') SELECT * FROM foo ORDER BY 1;
...
Setting a configuration parameter for an INSERT statement
The following example sets the same parameter for an INSERT
statement:
sql
premdb=# SETTING (ybd_query_tags='tagged_query')
INSERT INTO foo SELECT x FROM bar;
Setting a configuration parameter for an EXPLAIN statement
The following example sets ybd_query_tags
for an EXPLAIN
statement:
sql
premdb=# EXPLAIN SETTING (ybd_query_tags='COUNT_STAR_QUERY')
SELECT COUNT(*) FROM newmatchstats WHERE seasonid>12;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
id rows_planned workers node
1 1 single SELECT
3 1 single AGGREGATE SIMPLE
4 10 all DISTRIBUTE SINGLE
5 10 all AGGREGATE SIMPLE
6 342000 all EXPRESSION
calculate: (NULL cast)
9 342000 all SCAN newmatchstats
(newmatchstats.seasonid::INT4 > $1) AND scan_constraints: (newmatchstats.seasonid > $1)
Database: premdb
Version: 5.3.0-20210825161549
Hostname: yb007-mgr1.sjc.yellowbrick.io
(13 rows)
In turn, you can see the value that was set for ybd_query_tags
in the tags
column of sys.log_query
:
sql
premdb=# SELECT * FROM sys.log_query WHERE tags='COUNT_STAR_QUERY';
-[ RECORD 1 ]----------------+----------------------------------------------------------------------------------------------------------
query_id | 337496
session_id | 24780
transaction_id | 0
plan_id | 6lgLXILZfBFbqhwrnxWnhLMRxyx-bQfznLd9BPx8wGY=
state | done
username | yellowbrick
application_name | ybsql
database_name | premdb
type | EXPLAIN
tags | COUNT_STAR_QUERY
error_code | 00000
error_message | [NULL]
query_text | EXPLAIN SETTING (ybd_query_tags='COUNT_STAR_QUERY') SELECT COUNT(*) FROM newmatchstats WHERE seasonid>12;
...
Improving performance by preventing join flipping
Let's now assume we have two tables :
orders
: Contains 1 million rows, mostly withcustomer_id = 1
andorder_date = '1997-03-21'
customers
: Contains 1000 rows, with distinctcustomer_id
values.
If you want to retrieve a list of all customers along with their recent orders (if any), you might write a query like this:
sql
SELECT c.customer_id, c.name, o.order_id, o.order_date
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';
Usually, the planner should start by scanning customers as it is the smaller table and then join it with order and will flip the LEFT into a RIGHT join internally:
sql
yellowbrick=# EXPLAIN(ANALYZE)
yellowbrick-# SELECT c.customer_id, c.name, o.order_id, o.order_date
yellowbrick-# FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';
QUERY PLAN
--------------------------------------------------------------------------------------------------------
id rows_planned rows_actual workers node
13 34133 1000 all SELECT
1 34133 1000 all RIGHT OUTER HASH JOIN ON (c.customer_id = o.customer_id)
2 33333 10 all |-DISTRIBUTE ON HASH(o.customer_id)
4 33333 10 all | SCAN orders AS o
| (o.order_date >= $0) AND scan_constraints: (orders.order_date >= $0)
7 1000 1000 all |-BUILD
14 1000 1000 all EXPRESSION
calculate: (NULL cast)
10 1000 1000 all SCAN customers AS c
But here, the data is highly skewed and the filter WHERE o.order_date >= '2024-01-01'
will actually remove almost all rows from the orders table. It is therefore better to force the planner to use a LEFT join by forbidding RIGHT joins. Let's set two parameters : forbidding right joins to improve performances and a tag, so we can easily identify the new plan in sys.log_query
. Note that they must be separated by a semicolon:
sql
yellowbrick=# EXPLAIN(ANALYZE) SETTING(enable_right_joins=false, ybd_query_tags='force_left_join')
yellowbrick-# SELECT c.customer_id, c.name, o.order_id, o.order_date
yellowbrick-# FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';
QUERY PLAN
--------------------------------------------------------------------------------------------------------
id rows_planned rows_actual workers node
13 33333 1000 all SELECT
1 33333 1000 all LEFT OUTER HASH JOIN ON (o.customer_id = c.customer_id)
4 1000 1000 all |-SCAN customers AS c
7 10 10 all |-BUILD
14 10 10 all EXPRESSION
calculate: (NULL cast)
8 33333 10 all DISTRIBUTE ON HASH(o.customer_id)
10 33333 10 all SCAN orders AS o
(o.order_date >= $0) AND scan_constraints: (orders.order_date >= $0)
Planner configuration parameters
enable_right_joins off