Plan Hinting: SET Hint

Overview

SET hints allow users to set planner related configuration parameters to influence the query planner's decision-making process. The configuration parameter is applied for the duration of the query only.

sql

<parameter_name> = <parameter_value>

Examples

Setting a configuration parameter for a SELECT statement

The following example sets a parameters in the SETTING clause.

sql

premdb=# SETTING (ybd_query_tags='tagged_query')
SELECT * FROM foo ORDER BY 1;

Note that the tags that were set in this example are logged in sys.query and sys.log_query:

sql

premdb=# SELECT * FROM sys.log_query WHERE tags='tagged_query';
-[ RECORD 1 ]----------------+----------------------------------------------------------------------------------------------------
query_id                     | 327395
session_id                   | 24780
transaction_id               | 0
plan_id                      | 8EjFRXhVcfqyE8H1P+YBUgdcO-XUleSHmHFGw4lrkI4=
state                        | done
username                     | yellowbrick
application_name             | ybsql
database_name                | premdbtype                         | select
tags                         | tagged_query
error_code                   | 00000
error_message                | [NULL]
query_text                   | SETTING (ybd_query_tags='tagged_query') SELECT * FROM foo ORDER BY 1;
...

Setting a configuration parameter for an INSERT statement

The following example sets the same parameter for an INSERT statement:

sql

premdb=# SETTING (ybd_query_tags='tagged_query')
INSERT INTO foo SELECT x FROM bar;

Setting a configuration parameter for an EXPLAIN statement

The following example sets ybd_query_tags for an EXPLAIN statement:

sql

premdb=# EXPLAIN SETTING (ybd_query_tags='COUNT_STAR_QUERY')
SELECT COUNT(*) FROM newmatchstats WHERE seasonid>12;
                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node
  1              1    single   SELECT
  3              1    single   AGGREGATE SIMPLE
  4             10       all   DISTRIBUTE SINGLE
  5             10       all   AGGREGATE SIMPLE
  6         342000       all   EXPRESSION
                                calculate: (NULL cast)
  9         342000       all   SCAN newmatchstats
                                (newmatchstats.seasonid::INT4 > $1) AND scan_constraints: (newmatchstats.seasonid > $1)
 Database: premdb
 Version: 5.3.0-20210825161549
 Hostname: yb007-mgr1.sjc.yellowbrick.io

(13 rows)

In turn, you can see the value that was set for ybd_query_tags in the tags column of sys.log_query:

sql

premdb=# SELECT * FROM sys.log_query WHERE tags='COUNT_STAR_QUERY';
-[ RECORD 1 ]----------------+----------------------------------------------------------------------------------------------------------
query_id                     | 337496
session_id                   | 24780
transaction_id               | 0
plan_id                      | 6lgLXILZfBFbqhwrnxWnhLMRxyx-bQfznLd9BPx8wGY=
state                        | done
username                     | yellowbrick
application_name             | ybsql
database_name                | premdb
type                         | EXPLAIN
tags                         | COUNT_STAR_QUERY
error_code                   | 00000
error_message                | [NULL]
query_text                   | EXPLAIN SETTING (ybd_query_tags='COUNT_STAR_QUERY') SELECT COUNT(*) FROM newmatchstats WHERE seasonid>12;
...

Improving performance by preventing join flipping

Let's now assume we have two tables :

orders: Contains 1 million rows, mostly with customer_id = 1 and order_date = '1997-03-21'
customers: Contains 1000 rows, with distinct customer_id values.

If you want to retrieve a list of all customers along with their recent orders (if any), you might write a query like this:

sql

SELECT c.customer_id, c.name, o.order_id, o.order_date
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';

Usually, the planner should start by scanning customers as it is the smaller table and then join it with order and will flip the LEFT into a RIGHT join internally:

sql

yellowbrick=# EXPLAIN(ANALYZE)
yellowbrick-# SELECT c.customer_id, c.name, o.order_id, o.order_date
yellowbrick-# FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';

                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 id   rows_planned   rows_actual   workers   node
 13          34133          1000       all   SELECT
  1          34133          1000       all   RIGHT OUTER HASH JOIN ON (c.customer_id = o.customer_id)
  2          33333            10       all   |-DISTRIBUTE ON HASH(o.customer_id)
  4          33333            10       all   | SCAN orders AS o
                                             |   (o.order_date >= $0) AND scan_constraints: (orders.order_date >= $0)
  7           1000          1000       all   |-BUILD
 14           1000          1000       all     EXPRESSION
                                                 calculate: (NULL cast)
 10           1000          1000       all     SCAN customers AS c

But here, the data is highly skewed and the filter WHERE o.order_date >= '2024-01-01' will actually remove almost all rows from the orders table. It is therefore better to force the planner to use a LEFT join by forbidding RIGHT joins. Let's set two parameters : forbidding right joins to improve performances and a tag, so we can easily identify the new plan in sys.log_query. Note that they must be separated by a semicolon:

sql

yellowbrick=# EXPLAIN(ANALYZE) SETTING(enable_right_joins=false, ybd_query_tags='force_left_join')
yellowbrick-# SELECT c.customer_id, c.name, o.order_id, o.order_date
yellowbrick-# FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id AND o.order_date >= '2024-01-01';

                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 id   rows_planned   rows_actual   workers   node
 13          33333          1000       all   SELECT
  1          33333          1000       all   LEFT OUTER HASH JOIN ON (o.customer_id = c.customer_id)
  4           1000          1000       all   |-SCAN customers AS c
  7             10            10       all   |-BUILD
 14             10            10       all     EXPRESSION
                                                 calculate: (NULL cast)
  8          33333            10        all    DISTRIBUTE ON HASH(o.customer_id)
 10          33333            10        all    SCAN orders AS o
                                                 (o.order_date >= $0) AND scan_constraints: (orders.order_date >= $0)

Planner configuration parameters
   enable_right_joins                      off

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

sys.lock

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Loading from Amazon S3

Loading from Azure Blob Storage

Setting up and Running a Spark Job

Setting Up the ybrelay Service

LDAP Authentication

Synchronizing Users and Groups

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

AWS Marketplace

Create Stack

Docker

Cloud: Configuration

Vanity DNS

Yellowbrick Manager

Cloud: Enterprise Edition Getting Started

SQL-Based Loads from External Storage

Cloud: Installation

CLI Install Instructions

Permissions

Private Install Instructions

Public Install Instructions

Cloud: Kubernetes Guides

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

GROUP BY Clause

Subqueries

Data Type Casting

DECIMAL

JSON

JSONB

SQL String Constants

Aggregate Functions

Conditional Expressions

Datetime Functions

Formatting Functions

Geospatial functions

Mathematical Functions

Network Address Functions

Pattern Matching

Regular Expression Details

SQL Operators and Pattern Matching Functions

SQL Conditions

SQL User Defined Function (UDF)

SQL UDF Create Function

String Functions

ENCRYPT_KS

System Functions

Type-Safe Casting Functions

Window Functions

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

Plan Hinting: SET Hint ​

Overview ​

Examples ​

Setting a configuration parameter for a SELECT statement ​

Setting a configuration parameter for an INSERT statement ​

Setting a configuration parameter for an EXPLAIN statement ​

Improving performance by preventing join flipping ​