Appearance
Querying JSON Data
Yellowbrick supports SQL/JSON path language to efficiently query JSON/JSONB data via the :
operator.
This feature enables the use of JSON path expressions to access and extract JSON values. It can be applied to any expression of type JSONB.
It can also be combined with Flatten to access JSON documents with a flexible structure, offering powerful capabilities for handling hierarchical data.
JSON Path Expressions
To extract values, place the :
operator between the sql expression (For example, column name) and the JSON path expression.
If the
:
operator is applied to aVARCHAR
/TEXT
/JSON
, the left side is implicitly cast toJSONB
.
For the following examples, consider a table named jsontable
that contains a single JSONB column named data
:
sql
create table jsontable (data jsonb);
insert into jsontable values (
'{
"tickers": [
{
"INTC": {
"name": "Intel Corporation",
"open": 49.20,
"high": 49.38,
"low": 47.45,
"recent_closing_prices": [50.25,50.39,50.76,50.50,48.00]
}
},
{
"BRK": {
"BRK.A": {
"name": "Berkshire Hathaway Inc.",
"open": 539496,
"high": 550500,
"low": 539496
},
"BRK.B": {
"name": "Berkshire Hathaway Inc.",
"open": 356.32,
"high": 362.57,
"low": 355.94
}
}
}
]
}'
);
Extract a scalar value
sql
select data:$.tickers[0].INTC.name from jsontable;
-- "Intel Corporation"
Note: The query output is enclosed in double quotes because the query output is JSONB (valid JSON), not VARCHAR. See Return Type for more details.
Extract an array. Operations that return multiple results, such as Array Slicing, will return a JSON array
sql
select data:$.tickers[0].INTC.recent_closing_prices[2:4] from jsontable;
-- [50.76,50.50,48.00]
Following sections explain the JSON Path syntax supported in Yellowbrick:
Root Variable
The root element in a JSON document can be accessed using the special variable $
. Since all access operations commence from the root, this token can be omitted and is assumed to be present by default.
The $
token always returns the root element, whether it is an object, an array, or a scalar value.
Returning the Root Element
sql
select data:$ from jsontable;
sql
data
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"tickers":[{"INTC":{"high":49.38,"low":47.45,"name":"Intel Corporation","open":49.20,"recent_closing_prices":[50.25,50.39,50.76,50.50,48.00]}},{"BRK":{"BRK.A":{"high":550500,"low":539496,"name":"Berkshire Hathaway Inc.","open":539496},"BRK.B":{"high":362.57,"low":355.94,"name":"Berkshire Hathaway Inc.","open":356.32}}}]}
(1 row)
Member Accessor
A member accessor allows you to access JSON object members by specifying their key name. You can concatenate multiple names with .
to access nested objects.
While column names are case-insensitive, JSON Path key names are case-sensitive. See Case Sensitivity and Special Characters Handling for more details.
You can access member elements in a JSON object using two methods:
Dot notation
Syntax
sql
json_expr:<jsonpath>
Where <jsonpath>
is a valid JSON path and the key names can be chained:
sql
json_expr:$.a.b.c = json_expr:$.a:$.b:$.c
Returning a member element
sql
select data:$.tickers[0].INTC.high from jsontable;
sql
high
-------
49.38
(1 row)
Bracket Notation
The alternative syntax for accessing a specific key or set of keys is to use bracketed access.
Syntax
json_expr:$.<keyname>
is the same as:
json_expr:$['<keyname>']
Both queries below return the same result
sql
select data:$.tickers[0].INTC.high from jsontable;
sql
high
-------
49.38
(1 row)
sql
select data:$['tickers'][0]['INTC']['high'] from jsontable;
sql
high
-------
49.38
(1 row)
You can also access multiple member elements with:
Syntax
json_expr:$['<key 1>', '<key 2>', ... '<key N>']
Returning multiple member elements
sql
select data:$['tickers'][0]['INTC']['high', 'low'] from jsontable;
sql
data
---------------
[49.38,47.45]
(1 row)
Case Sensitivity and Special Characters Handling
The JSON path access, like JSON
, is case-sensitive. Special characters need to be escaped, see below examples:
Dot Notation | Bracket Notation | |
---|---|---|
{"abc":100} | json_expr:$.abc json_expr: $."abc" | json_expr:$['abc'] |
{"ABC":100} | json_expr:$.ABC json_expr : $."ABC" | json_expr:$['ABC'] |
{"'abc":100} | json_expr:$."'abc" | json_expr:$['''abc'] |
{"\"abc":100} | json_expr:$."""abc" | json_expr:$['"abc'] |
Element Accessor
With an element accessor, you can select one or more indexes from a JSON array using zero-based indexing.
Syntax
sql
json_expr:$[<index>]
Return a single element from an array
sql
select data:$.tickers[0].INTC.recent_closing_prices[0] from jsontable;
sql
recent_closing_prices
-----------------------
50.25
(1 row)
Array Intervals
The element accessor allows you to access a range of elements within a JSON array.
Syntax:
json_expr:$[<index start>:<index end>]
Return a range of elements from start to end index (inclusive)
sql
select data:$.tickers[0].INTC.recent_closing_prices[0:1] from jsontable;
sql
recent_closing_prices
-----------------------
[50.25,50.39]
(1 row)
Array Slicing
The element accessor supports accessing any number of elements from a JSON array.
Syntax
json_expr:$[<index #1>, <index #2>, ..., <index #N>]
Return multiple slices of elements
sql
select data:$.tickers[0].INTC.recent_closing_prices[0,3] from jsontable;
sql
recent_closing_prices
-----------------------
[50.25,50.50]
(1 row)
Combining Slicing and Intervals
Member accessors also support combining array intervals and slicing syntaxes. Indexes can overlap and do not need to be in ascending order.
sql
select data:$.tickers[0].INTC.recent_closing_prices[4, 0:1, 2] from jsontable;
sql
recent_closing_prices
---------------------------
[48.00,50.25,50.39,50.76]
(1 row)
Return Type
Values extracted using the :
operator always have the return type JSONB, but you can explicitly cast these values to any desired data type.
JSON strings retrieved from a JSON document are enclosed in double quotes (as per valid JSON format). When casting to VARCHAR
, the double quotes are removed. This behavior applies to other data types as well; the double quotes are effectively stripped during the casting process. However, the extracted value must still conform to the format of the target data type. For instance, when casting to INT
or FLOAT
, the extracted value must be a leaf element and must adhere to valid numeric syntax.
Semantics of NULL
Values
In JSON, values can be specified as null
(without quotes) to indicate that the value is unknown. It directly maps to SQL NULL when casting.
sql
\pset null '<null>';
select '{"answer":42,"question":null}':question::varchar;
sql
question
----------
<null>
(1 row)
Also note that the :
operator is null strict, which means that the following expression is not an error:
sql
select (NULL::JSONB):$.SomeAttribute;
sql
SomeAttribute
---------------
<null>
(1 row)
Missing Elements Handling
When a JSON path query encounters missing elements, there are two handling modes available:
ERROR ON ERROR
(default): throw a sql exceptionNULL ON ERROR
: returns sql null
Throwing Errors on Missing Elements
Syntax
<json_expr>:<jsonpath> ERROR ON ERROR
Example
sql
select '{"answer":true}':question error on error;
sql
ERROR: No key found at path: 'strict $."question"' for context '{"answer":true}'
Returning NULL on missing elements
Syntax
<json_expr>:<jsonpath> NULL ON ERROR
Example
sql
select '{"answer":true}':question null on error;
sql
question
----------
<null>
(1 row)
Note: since the :
operator is null-strict, the following expression also returns NULL
:
sql
select '{"answer":42}':question null on error:$.SomeAttribute;
sql
SomeAttribute
---------------
<null>
(1 row)
Configuration Parameter
By default, ERROR ON ERROR
is used when accessing missing elements. You can change this behavior by setting the json_missing_element_default
configuration parameter.
Additional Notes
Function Alias
The :
operator can also be expressed as a function call to JSON_EXTRACT.
sql
<json_expr>:<jsonpath> = JSON_EXTRACT(<json_expr>, '<jsonpath>')
Note that the function requires a compile-time constant as its second argument; it cannot be another column reference.
Operator Precedence
Casting has higher precedence than the :
operator. This means that the following sql query:
sql
select jsoncolumn::JSONB:<jsonpath>;
is equivalent to:
sql
select (jsoncolumn::JSONB):<jsonpath>;
Optimized JSONB Access
The :
operator supports chaining, but two equivalent expressions can have different performance characteristics due to intermediate materialization:
sql
SELECT <json_expr>:<key #1>:<key #1>:<key #1>
SELECT <json_expr>:<key #1>.<key #1>.<key #1>
The latter expression is generally faster because it avoids materializing intermediate results. Therefore, the planner automatically optimizes and combines such access patterns for improved performance.
Optimized JSON path access
sql
explain select data:$.tickers:$[0]:$.INTC.high from jsontable;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
id rows_planned workers node
5 1 all SELECT
2 1 all SCAN jsontable
calculate: jsontable.data:strict $.\"tickers\"[0].\"INTC\".\"high\"
explain select data:$.tickers[0].INTC.high from jsontable;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
id rows_planned workers node
5 1 all SELECT
2 1 all SCAN jsontable
calculate: jsontable.data:strict $.\"tickers\"[0].\"INTC\".\"high\"
Notice that both queries above utilize the same optimized execution plan.