Skip to content

Using the Geospatial Functions (Pre-release)

Yellowbrick supports PostgreSQL-standard geospatial functions that enable spatial queries. This tutorial will guide you through, using some of these functions with NOAA's GHCN (Global Historical Climatology Network) weather data set. This tutorial will also specifically analyze data from observation stations within a 20 km radius of Raleigh, North Carolina, USA.

INFO

The geospatial functions are in public preview now, and so are disabled by default. The functions can be enabled with the configuration variable enable_geospatial, which can be applied at the system-level, session-level, user-level or query-level.

In this tutorial, you will incrementally build queries by using basic to advanced geospatial functions.

Prerequisites

Section 1: Understanding Spatial Data in Yellowbrick

The stations table stores the locations of weather stations. These locations are stored in the form of a latitude and longitude field for each station. To convert the station locations into spatial data types, run the following SQL statement:

Query: Converting Station Locations to Geospatial Points

sql
SELECT 
    id, 
    name, 
    ST_AsText(ST_Point(longitude, latitude)) AS geom
FROM noaa_ghcn_pds.stations
WHERE state = 'NC';

Explanation

  • ST_Point(longitude, latitude): Converts longitude and latitude into a geometry point.
  • Filters stations only in North Carolina (NC).

Section 2: Finding Stations Within a 20 km Radius of Raleigh, NC

Raleigh, NC has an approximate latitude and longitude of (35.7796, -78.6382).

Query: Find Nearby Weather Stations

sql
SELECT 
    st.id, 
    st.name, 
    st.latitude, 
    st.longitude
FROM noaa_ghcn_pds.stations st
WHERE ST_DWithin(
    ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'),  -- Raleigh, NC
    ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')::BOOLEAN,
    20000  -- 20 km radius
);

Explanation

  • ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'): Defines Raleigh, NC as a point in SRID 4326 (WGS 84).
  • ST_DWithin(A, B, 20000): Returns true if station B is within 20,000 meters (20 km) of A.

Section 3: Checking If Raleigh Covers a Weather Station

Instead of checking if a station is within a radius, you will check if Raleigh's geographical boundary covers a station.

Query: Checking If Raleigh Covers a Station

sql
SELECT 
    st.id, 
    st.name, 
    ST_Covers(
        ST_Buffer(ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'), 20000), 
        ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')
    )::BOOLEAN AS is_covered
FROM noaa_ghcn_pds.stations st;

Explanation

  • ST_Buffer(Raleigh, 20000): Creates a 20 km radius boundary around Raleigh.
  • ST_Covers(A, B): Checks if Raleigh's buffered area accommodates the station's point.

Section 4: Finding Precipitation Observations in the Area

Now, let's find the PRCP (precipitation) observations from the stations within 20 km of Raleigh.

Query: Get PRCP Observations in the Area

sql
SELECT 
    obs.id, 
    st.name, 
    obs.element, 
    obs.data_value / 254.0 as rainfall_inches
FROM noaa_ghcn_pds.observations obs
JOIN noaa_ghcn_pds.stations st 
    ON obs.id = st.id
WHERE obs.element = 'PRCP'
AND ST_DWithin(
    ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'), 
    ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')'), 
    20000
)
AND obs.year_date BETWEEN 20240101 AND 20241231;

Explanation

  • Finds PRCP (precipitation) observations for the year 2024 (BETWEEN 20240101 AND 20241231).
  • Uses ST_DWithin(A, B, 20000) to filter only stations within 20 km of Raleigh.

Section 5: Intersecting a Custom Polygon Boundary (Advanced)

Let's define a polygonal area around Raleigh and find stations inside it.

Query: Checking Intersections with a Polygon

sql
SELECT 
    st.id, 
    st.name, 
    ST_Intersects(
        ST_GeogFromText('SRID=4326;POLYGON((-78.7 35.75, -78.6 35.75, -78.6 35.80, -78.7 35.80, -78.7 35.75))'), 
        ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')
    ) AS intersects_polygon
FROM noaa_ghcn_pds.stations st
WHERE intersects_polygon;

Explanation

  • Defines a polygon boundary around Raleigh.
  • Uses ST_Intersects(A, B): Returns true if a station's point B intersects the polygon A.

Section 6: Visualizing the Spatial Data

The prior result set can be rendered in a tool such as DBeaver Enterprise Edition or JetBrains DataGrip. However, this result set requires some modifications to the SQL statements. It is to account for the limitation that these tools only support the GEOMETRY spatial data type, and Yellowbrick only supports the GEOGRAPHY data type. In DataGrip, connect to Yellowbrick by using the PostgreSQL driver and database containing the NOAA data, and run the following SQL statement:

sql
SELECT
    st.id,
    st.name,
    ST_AsText(ST_GeogFromText('SRID=4326;POLYGON((-78.7 35.75, -78.6 35.75, -78.6 35.80, -78.7 35.80, -78.7 35.75))')) boundary,
    ST_AsText(ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')) stations
FROM noaa_ghcn_pds.stations st
where ST_Contains(boundary::geography, stations::geography)

Switch to DataGrip's Geo Viewer to display the bounding polygon around Raleigh and the stations contained within.

Note: Convert the boundary and stations GEOGRAPHY values into their OGC Well-Known Text (WKT) values. It allows rendering within Geo Viewer and filter on those stations contained within the boundary.

DataGrip Geo View

Function Description Table

Query GoalFunction UsedDescription
Convert coordinates to geospatial pointsST_Point()Prepares data for geospatial queries
Find stations within 20 kmST_DWithin()Radius-based filtering
Check if Raleigh's area covers a stationST_Covers()Works with buffered regions
Get PRCP observations near RaleighST_DWithin()Joins weather data with station locations
Find stations inside a polygonST_Intersects()Works with custom boundaries
Render spatial dataST_Contains()Displays stations within a polygon