Join Elimination Optimization

Overview

Join elimination is a query optimization in which the planner removes joins that can be proven unnecessary. Since joins are among the most expensive SQL operations, eliminating them can significantly improve query performance.

Join elimination applies to both INNER and OUTER joins in Yellowbrick. It relies on declared PRIMARY KEY, FOREIGN KEY, and UNIQUE constraints to prove that removing a join does not change the query’s results.

Example Schema

The following schema will be used in the examples below:

sql

CREATE TABLE customer (
  customer_id INT PRIMARY KEY,
  name        VARCHAR(20) NOT NULL
);

CREATE TABLE address (
  address_id  INT PRIMARY KEY,
  customer_id INT UNIQUE,
  street      VARCHAR(50),
  CONSTRAINT fk_address_customer FOREIGN KEY (customer_id)
    REFERENCES customer(customer_id)
);

CREATE TABLE payment (
  payment_id  INT PRIMARY KEY,
  customer_id INT,
  amount      INT NOT NULL,
  CONSTRAINT fk_payment_customer FOREIGN KEY (customer_id)
    REFERENCES customer(customer_id)
);

customer → root entity
address → one-to-one relationship (unique per customer)
payment → one-to-many relationship (a customer may have multiple payments)

When Join Elimination Applies

INNER JOIN elimination

If a child table joins to its parent on a declared foreign key, and no columns from the parent are projected or filtered, the join can be removed.

Example:

sql

SELECT c.name
FROM customer c
JOIN address a ON c.customer_id = a.customer_id;

The join can be eliminated since the address table does not contribute to the result.

LEFT OUTER JOIN elimination

One-to-one case: If the inner table has a UNIQUE or PRIMARY KEY on the join column, the join cannot duplicate rows from the outer table.
With DISTINCT: If the outer relation’s output is made distinct (via DISTINCT or grouping), duplicates from one-to-many relationships can be eliminated.

Examples:

sql

-- One-to-one join
SELECT c.name
FROM customer c
LEFT JOIN address a ON c.customer_id = a.customer_id;

-- DISTINCT removes duplicates
SELECT DISTINCT c.name
FROM customer c
LEFT JOIN payment p ON c.customer_id = p.customer_id;

Insert valid data:

sql

INSERT INTO customer VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO address  VALUES (101, 1, 'Main St'), (102, 2, '2nd Ave');
INSERT INTO payment  VALUES (201, 1, 100), (202, 2, 200), (203, 2, 300);

Results:

INNER JOIN (customer ↔ address): returns all customers.
LEFT JOIN (customer ↔ address): returns all customers with their addresses.
LEFT JOIN + DISTINCT (customer ↔ payment): returns Alice and Bob once each.

In this case, join elimination is correct and improves performance.

Risk of Wrong Results

Because Yellowbrick does not enforce PRIMARY KEY, FOREIGN KEY, or UNIQUE constraints, the planner assumes declared keys are valid. If constraints are not valid, join elimination may produce wrong results:

Non-matching rows may appear.
Rows that should be preserved with NULL values may disappear.
Duplicates may be introduced or lost.

This is a semantic error, not just a performance issue.

Examples:

Now insert invalid data:

sql

-- Orphaned address (customer_id=99 does not exist)
INSERT INTO address VALUES (103, 99, 'Ghost Lane');

-- Payment with invalid customer
INSERT INTO payment VALUES (204, 77, 500);

Effects:

INNER JOIN (customer ↔ address):
- Expected: only valid customer-address pairs.
- With elimination: orphaned address rows may appear as if valid.
LEFT JOIN (customer ↔ address):
- Expected: Bob appears with NULL if no address exists.
- With elimination: Bob’s row may disappear entirely.
LEFT JOIN + DISTINCT (customer ↔ payment):
- Expected: one row per valid customer.
- With elimination: phantom customers may appear, or duplicates may be miscounted.

Disabling Join Elimination

If constraint validity cannot be guaranteed, you can disable join elimination with the configuration parameter enable_join_elimination.

sql

SET enable_join_elimination = off;

Workload Management

Distributing Data

Bulk Loading Tables

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Load Data with SQL

Loading Data from Object Storage

Loading from Amazon S3

Loading from Azure Blob Storage

Loading Tables with Spark

Setting up and Running a Spark Job

Setting Up the ybrelay Service

Trickle Loading Data via JDBC

Unloading Data to Object Storage

Unloading Data to Parquet Files

ybunload Command

Installing ybtools

Setting Up a Database Connection

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

Appliance

Appliance: Disk Encryption

Setting Up Encrypted Drives

Observability

Observability Metrics

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

Cloud

Configuring

Vanity DNS

Yellowbrick Manager

Installing

CLI Install Instructions

Public Install Instructions

Private Install Instructions

Self-Managed Install Instructions

Permissions

Kubernetes Guides

Observability

Observability Alerts

Observability Metrics

Databases

Backup & Restore

Overview

ybbackup Commands

ybbackupctl Commands

ybrestore Commands

Database Replication

Managing Replication

Setting Up Replication

Encrypting Sensitive Data

LDAP Integration

LDAP Authentication

Synchronizing Users and Groups

Metering

System Views

sys.lock

Workload Management

Creating WLM Resource Pools

Creating WLM Rules

Compatibility Parameters

Data Processing and Formatting

Feature Enablement

General

Tuning

Yellowbrick Row Store (YRS) Alerting Parameters

ybsql \copy Command

ybsql Properties and Variables

SQL Commands

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

Join Elimination Optimization

Overview

Example Schema

When Join Elimination Applies

INNER JOIN elimination

LEFT OUTER JOIN elimination

Examples:

Risk of Wrong Results

Examples:

Disabling Join Elimination

See Also