Appearance
Using the Geospatial Functions (Pre-release)
Yellowbrick supports PostgreSQL-standard geospatial functions that enable spatial queries. This tutorial will guide you through, using some of these functions with NOAA's GHCN (Global Historical Climatology Network) weather data set. This tutorial will also specifically analyze data from observation stations within a 20 km radius of Raleigh, North Carolina, USA.
INFO
The geospatial functions are in public preview now, and so are disabled by default. The functions can be enabled with the configuration variable enable_geospatial
, which can be applied at the system-level, session-level, user-level or query-level.
In this tutorial, you will incrementally build queries by using basic to advanced geospatial functions.
Prerequisites
- Pre-loaded NOAA data
Note: To load the NOAA data, go to page Using Ask TK to Generate SQL, Part 2: Loading the Sample Data and follow the steps to load the data using Yellowbrick Manager.
- The configuration variable
enable_geospatial
must be set to ON.
Section 1: Understanding Spatial Data in Yellowbrick
The stations
table stores the locations of weather stations. These locations are stored in the form of a latitude and longitude field for each station. To convert the station locations into spatial data types, run the following SQL statement:
Query: Converting Station Locations to Geospatial Points
sql
SELECT
id,
name,
ST_AsText(ST_Point(longitude, latitude)) AS geom
FROM noaa_ghcn_pds.stations
WHERE state = 'NC';
Explanation
ST_Point(longitude, latitude)
: Converts longitude and latitude into a geometry point.- Filters stations only in North Carolina (NC).
Section 2: Finding Stations Within a 20 km Radius of Raleigh, NC
Raleigh, NC has an approximate latitude and longitude of (35.7796, -78.6382).
Query: Find Nearby Weather Stations
sql
SELECT
st.id,
st.name,
st.latitude,
st.longitude
FROM noaa_ghcn_pds.stations st
WHERE ST_DWithin(
ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'), -- Raleigh, NC
ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')::BOOLEAN,
20000 -- 20 km radius
);
Explanation
ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)')
: DefinesRaleigh, NC
as a point in SRID 4326 (WGS 84).ST_DWithin(A, B, 20000)
: Returnstrue
if station B is within 20,000 meters (20 km) of A.
Section 3: Checking If Raleigh Covers a Weather Station
Instead of checking if a station is within a radius, you will check if Raleigh's geographical boundary covers a station.
Query: Checking If Raleigh Covers a Station
sql
SELECT
st.id,
st.name,
ST_Covers(
ST_Buffer(ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'), 20000),
ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')
)::BOOLEAN AS is_covered
FROM noaa_ghcn_pds.stations st;
Explanation
ST_Buffer(Raleigh, 20000)
: Creates a 20 km radius boundary around Raleigh.ST_Covers(A, B)
: Checks if Raleigh's buffered area accommodates the station's point.
Section 4: Finding Precipitation Observations in the Area
Now, let's find the PRCP (precipitation) observations from the stations within 20 km of Raleigh.
Query: Get PRCP Observations in the Area
sql
SELECT
obs.id,
st.name,
obs.element,
obs.data_value / 254.0 as rainfall_inches
FROM noaa_ghcn_pds.observations obs
JOIN noaa_ghcn_pds.stations st
ON obs.id = st.id
WHERE obs.element = 'PRCP'
AND ST_DWithin(
ST_GeogFromText('SRID=4326;POINT(-78.6382 35.7796)'),
ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')'),
20000
)
AND obs.year_date BETWEEN 20240101 AND 20241231;
Explanation
- Finds PRCP (precipitation) observations for the year 2024 (BETWEEN 20240101 AND 20241231).
- Uses
ST_DWithin(A, B, 20000)
to filter only stations within 20 km of Raleigh.
Section 5: Intersecting a Custom Polygon Boundary (Advanced)
Let's define a polygonal area around Raleigh and find stations inside it.
Query: Checking Intersections with a Polygon
sql
SELECT
st.id,
st.name,
ST_Intersects(
ST_GeogFromText('SRID=4326;POLYGON((-78.7 35.75, -78.6 35.75, -78.6 35.80, -78.7 35.80, -78.7 35.75))'),
ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')
) AS intersects_polygon
FROM noaa_ghcn_pds.stations st
WHERE intersects_polygon;
Explanation
- Defines a polygon boundary around Raleigh.
- Uses
ST_Intersects(A, B)
: Returnstrue
if a station's point B intersects the polygon A.
Section 6: Visualizing the Spatial Data
The prior result set can be rendered in a tool such as DBeaver Enterprise Edition or JetBrains DataGrip. However, this result set requires some modifications to the SQL statements. It is to account for the limitation that these tools only support the GEOMETRY
spatial data type, and Yellowbrick only supports the GEOGRAPHY
data type. In DataGrip, connect to Yellowbrick by using the PostgreSQL driver and database containing the NOAA data, and run the following SQL statement:
sql
SELECT
st.id,
st.name,
ST_AsText(ST_GeogFromText('SRID=4326;POLYGON((-78.7 35.75, -78.6 35.75, -78.6 35.80, -78.7 35.80, -78.7 35.75))')) boundary,
ST_AsText(ST_GeogFromText('SRID=4326;POINT(' || st.longitude || ' ' || st.latitude || ')')) stations
FROM noaa_ghcn_pds.stations st
where ST_Contains(boundary::geography, stations::geography)
Switch to DataGrip's Geo Viewer to display the bounding polygon around Raleigh and the stations contained within.
Note: Convert the boundary and stations
GEOGRAPHY
values into their OGC Well-Known Text (WKT) values. It allows rendering within Geo Viewer and filter on those stations contained within the boundary.
Function Description Table
Query Goal | Function Used | Description |
---|---|---|
Convert coordinates to geospatial points | ST_Point() | Prepares data for geospatial queries |
Find stations within 20 km | ST_DWithin() | Radius-based filtering |
Check if Raleigh's area covers a station | ST_Covers() | Works with buffered regions |
Get PRCP observations near Raleigh | ST_DWithin() | Joins weather data with station locations |
Find stations inside a polygon | ST_Intersects() | Works with custom boundaries |
Render spatial data | ST_Contains() | Displays stations within a polygon |