Appearance
Accelerated GeoJoins
Yellowbrick supports accelerated geospatial joins using the H3 library to perform fast, scalable spatial filtering.
This optimization rewrites eligible inner joins on geography predicates to insert a filtering phase based on H3 cell identifiers, enabling high-performance execution.
This feature is available starting in version 7.3.1 and can be enabled via configuration parameters.
Overview
Joins between large geospatial datasets are expensive due to:
- Quadratic pairwise comparisons
- Complex geometry calculations
- Lack of partitioning and hashing support for
GEOGRAPHYtypes
To address these challenges, Yellowbrick integrates H3, a hierarchical hexagonal indexing system.
Instead of directly comparing geometries, the system breaks shapes into H3 cells, and joins are first performed on these cell identifiers. Candidate matches are then validated using precise geometry functions.
This approach significantly improves the performance and scalability of spatial joins.
How It Works
When a qualifying join is detected, Yellowbrick rewrites the query to:
- Generate H3 cell coverage of the input geometries at a specified resolution.
- Join inputs on shared H3 cell identifiers (
BIGINTvalues). - Use the original geography predicate as a final filtering step to confirm true spatial relationships.
H3 indexing is dynamically built during query execution. It works with:
- Base tables
- Views
- Subqueries
No precomputed indexes or schema changes are required.
Supported Queries
Accelerated GeoJoins currently apply only to inner joins that use the ST_Intersects predicate.
Configuration Parameters
Use the following configuration parameters to enable or tune this feature:
| Parameter | Description | Default |
|---|---|---|
enable_geospatial_join_rewrite | Enables automatic rewriting of spatial joins using H3-based acceleration. | False |
geospatial_join_rewrite_resolution | Sets the H3 resolution used for cell coverage. Must be an integer between 0 and 15. | 3 |
Resolution Guidelines
Choosing the right resolution is critical for performance:
- The default resolution is
3. - Best performance is typically achieved when each polygon is covered by a few hundred H3 cells.
- You may need to experiment with different resolutions depending on the size, complexity, and distribution of your geospatial data.
Example
When enable_geospatial_join_rewrite is set to true, a standard query such as:
sql
SELECT *
FROM a
JOIN b ON ST_Intersects(a.geo, b.geo)Is internally rewritten to a join that uses H3 cells for pre-filtering before applying the ST_Intersects predicate.
Advanced Usage: Manual Indexing
In advanced scenarios, users may choose to materialize H3 coverage and manage filtering explicitly. This is useful when performing repeated queries against a known set of polygons
Example
sql
CREATE TABLE idx1 AS
SELECT A.rowunique AS id, A.*, cell
FROM A
JOIN h3_coverage(A.geo, <resolution>, true) USING (cell);
CREATE TABLE idx2 AS
SELECT B.rowunique AS id, B.*, cell
FROM B
JOIN h3_coverage(B.geo, <resolution>, true) USING (cell);
WITH
common_cells AS (
SELECT idx1.id AS id1, idx2.id AS id2
FROM idx1
JOIN idx2 USING (cell)
), -- Returns all common cells between geographies in A and B
unique_candidates AS (
SELECT MAX(idx1.geo) AS geo1, MAX(idx2.geo) AS geo2
FROM common_cells
GROUP BY id1, id2
) -- Removes duplicates when polygons share multiple cells
SELECT *
FROM unique_candidates
WHERE ST_Intersects(geo1, geo2);This approach provides full control over indexing, filtering, and performance characteristics.
Notes
- Additional predicates (e.g.,
ST_Within,ST_Equals) are planned for future releases. - Functions such as
H3_COVERAGE,H3_POINT_TO_CELL, andH3_CELL_TO_HEXSTRINGare available for direct use in queries.
See Also