Skip to content

Accelerated GeoJoins

Yellowbrick supports accelerated geospatial joins using the H3 library to perform fast, scalable spatial filtering.
This optimization rewrites eligible inner joins on geography predicates to insert a filtering phase based on H3 cell identifiers, enabling high-performance execution.

This feature is available starting in version 7.3.1 and can be enabled via configuration parameters.


Overview

Joins between large geospatial datasets are expensive due to:

  • Quadratic pairwise comparisons
  • Complex geometry calculations
  • Lack of partitioning and hashing support for GEOGRAPHY types

To address these challenges, Yellowbrick integrates H3, a hierarchical hexagonal indexing system.
Instead of directly comparing geometries, the system breaks shapes into H3 cells, and joins are first performed on these cell identifiers. Candidate matches are then validated using precise geometry functions.

This approach significantly improves the performance and scalability of spatial joins.


How It Works

When a qualifying join is detected, Yellowbrick rewrites the query to:

  1. Generate H3 cell coverage of the input geometries at a specified resolution.
  2. Join inputs on shared H3 cell identifiers (BIGINT values).
  3. Use the original geography predicate as a final filtering step to confirm true spatial relationships.

H3 indexing is dynamically built during query execution. It works with:

  • Base tables
  • Views
  • Subqueries

No precomputed indexes or schema changes are required.


Supported Queries

Accelerated GeoJoins currently apply only to inner joins that use the ST_Intersects predicate.


Configuration Parameters

Use the following configuration parameters to enable or tune this feature:

ParameterDescriptionDefault
enable_geospatial_join_rewriteEnables automatic rewriting of spatial joins using H3-based acceleration.False
geospatial_join_rewrite_resolutionSets the H3 resolution used for cell coverage. Must be an integer between 0 and 15.3

Resolution Guidelines

Choosing the right resolution is critical for performance:

  • The default resolution is 3.
  • Best performance is typically achieved when each polygon is covered by a few hundred H3 cells.
  • You may need to experiment with different resolutions depending on the size, complexity, and distribution of your geospatial data.

Example

When enable_geospatial_join_rewrite is set to true, a standard query such as:

sql
SELECT *
FROM a
JOIN b ON ST_Intersects(a.geo, b.geo)

Is internally rewritten to a join that uses H3 cells for pre-filtering before applying the ST_Intersects predicate.


Advanced Usage: Manual Indexing

In advanced scenarios, users may choose to materialize H3 coverage and manage filtering explicitly. This is useful when performing repeated queries against a known set of polygons

Example

sql
CREATE TABLE idx1 AS
SELECT A.rowunique AS id, A.*, cell
FROM A
JOIN h3_coverage(A.geo, <resolution>, true) USING (cell);

CREATE TABLE idx2 AS
SELECT B.rowunique AS id, B.*, cell
FROM B
JOIN h3_coverage(B.geo, <resolution>, true) USING (cell);

WITH
    common_cells AS (
        SELECT idx1.id AS id1, idx2.id AS id2
        FROM idx1
        JOIN idx2 USING (cell)
    ),  -- Returns all common cells between geographies in A and B
    unique_candidates AS (
        SELECT MAX(idx1.geo) AS geo1, MAX(idx2.geo) AS geo2
        FROM common_cells
        GROUP BY id1, id2
    )  -- Removes duplicates when polygons share multiple cells
SELECT *
FROM unique_candidates
WHERE ST_Intersects(geo1, geo2);

This approach provides full control over indexing, filtering, and performance characteristics.


Notes

  • Additional predicates (e.g., ST_Within, ST_Equals) are planned for future releases.
  • Functions such as H3_COVERAGE, H3_POINT_TO_CELL, and H3_CELL_TO_HEXSTRING are available for direct use in queries.

See Also