Skip to content

Querying JSON Data

Yellowbrick supports SQL/JSON path language to efficiently query JSON/JSONB data via the : operator.

This feature enables the use of JSON path expressions to access and extract JSON values. It can be applied to any expression of type JSONB.

It can also be combined with Flatten to access JSON documents with a flexible structure, offering powerful capabilities for handling hierarchical data.

JSON Path Expressions

To extract values, place the : operator between the sql expression (For example, column name) and the JSON path expression.

If the : operator is applied to a VARCHAR/TEXT/JSON, the left side is implicitly cast to JSONB.

For the following examples, consider a table named jsontable that contains a single JSONB column named data:

sql
create table jsontable (data jsonb);

insert into jsontable values (
'{
  "tickers": [
    {
      "INTC": {
        "name": "Intel Corporation",
        "open": 49.20,
        "high": 49.38,
        "low": 47.45,
        "recent_closing_prices": [50.25,50.39,50.76,50.50,48.00]
      }
    },
    {
      "BRK": {
        "BRK.A": {
          "name": "Berkshire Hathaway Inc.",
          "open": 539496,
          "high": 550500,
          "low": 539496
        },
        "BRK.B": {
          "name": "Berkshire Hathaway Inc.",
          "open": 356.32,
          "high": 362.57,
          "low": 355.94
        }
      }
    }
  ]
}'
);

Extract a scalar value

sql
select data:$.tickers[0].INTC.name from jsontable;
-- "Intel Corporation"

Note: The query output is enclosed in double quotes because the query output is JSONB (valid JSON), not VARCHAR. See Return Type for more details.

Extract an array. Operations that return multiple results, such as Array Slicing, will return a JSON array

sql
select data:$.tickers[0].INTC.recent_closing_prices[2:4] from jsontable;
-- [50.76,50.50,48.00]

Following sections explain the JSON Path syntax supported in Yellowbrick:

Root Variable

The root element in a JSON document can be accessed using the special variable $. Since all access operations commence from the root, this token can be omitted and is assumed to be present by default.

The $ token always returns the root element, whether it is an object, an array, or a scalar value.

Returning the Root Element

sql
select data:$ from jsontable;
sql
data
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"tickers":[{"INTC":{"high":49.38,"low":47.45,"name":"Intel Corporation","open":49.20,"recent_closing_prices":[50.25,50.39,50.76,50.50,48.00]}},{"BRK":{"BRK.A":{"high":550500,"low":539496,"name":"Berkshire Hathaway Inc.","open":539496},"BRK.B":{"high":362.57,"low":355.94,"name":"Berkshire Hathaway Inc.","open":356.32}}}]}
(1 row)

Member Accessor

A member accessor allows you to access JSON object members by specifying their key name. You can concatenate multiple names with . to access nested objects.

While column names are case-insensitive, JSON Path key names are case-sensitive. See Case Sensitivity and Special Characters Handling for more details.

You can access member elements in a JSON object using two methods:

Dot notation

Syntax

sql
json_expr:<jsonpath>

Where <jsonpath> is a valid JSON path and the key names can be chained:

sql
json_expr:$.a.b.c = json_expr:$.a:$.b:$.c

Returning a member element

sql
select data:$.tickers[0].INTC.high from jsontable;
sql
high
-------
49.38
(1 row)

Bracket Notation

The alternative syntax for accessing a specific key or set of keys is to use bracketed access.

Syntax

json_expr:$.<keyname>

is the same as:

json_expr:$['<keyname>']

Both queries below return the same result

sql
select data:$.tickers[0].INTC.high from jsontable;
sql
high
-------
49.38
(1 row)
sql
select data:$['tickers'][0]['INTC']['high'] from jsontable;
sql
high
-------
49.38
(1 row)

You can also access multiple member elements with:

Syntax

json_expr:$['<key 1>', '<key 2>', ... '<key N>']

Returning multiple member elements

sql
select data:$['tickers'][0]['INTC']['high', 'low'] from jsontable;
sql
data      
---------------
 [49.38,47.45]
(1 row)

Case Sensitivity and Special Characters Handling

The JSON path access, like JSON, is case-sensitive. Special characters need to be escaped, see below examples:

Dot NotationBracket Notation
{"abc":100}json_expr:$.abc
json_expr:$."abc"
json_expr:$['abc']
{"ABC":100}json_expr:$.ABC
json_expr :$."ABC"
json_expr:$['ABC']
{"'abc":100}json_expr:$."'abc"json_expr:$['''abc']
{"\"abc":100}json_expr:$."""abc"json_expr:$['"abc']

Element Accessor

With an element accessor, you can select one or more indexes from a JSON array using zero-based indexing.

Syntax

sql
json_expr:$[<index>]

Return a single element from an array

sql
select data:$.tickers[0].INTC.recent_closing_prices[0] from jsontable;
sql
recent_closing_prices 
-----------------------
 50.25
(1 row)

Array Intervals

The element accessor allows you to access a range of elements within a JSON array.

Syntax:

json_expr:$[<index start>:<index end>]

Return a range of elements from start to end index (inclusive)

sql
select data:$.tickers[0].INTC.recent_closing_prices[0:1] from jsontable;
sql
recent_closing_prices 
-----------------------
 [50.25,50.39]
(1 row)

Array Slicing

The element accessor supports accessing any number of elements from a JSON array.

Syntax

json_expr:$[<index #1>, <index #2>, ..., <index #N>]

Return multiple slices of elements

sql
select data:$.tickers[0].INTC.recent_closing_prices[0,3] from jsontable;
sql
recent_closing_prices 
-----------------------
 [50.25,50.50]
(1 row)

Combining Slicing and Intervals

Member accessors also support combining array intervals and slicing syntaxes. Indexes can overlap and do not need to be in ascending order.

sql
select data:$.tickers[0].INTC.recent_closing_prices[4, 0:1, 2] from jsontable;
sql
recent_closing_prices   
---------------------------
 [48.00,50.25,50.39,50.76]
(1 row)

Return Type

Values extracted using the : operator always have the return type JSONB, but you can explicitly cast these values to any desired data type.

JSON strings retrieved from a JSON document are enclosed in double quotes (as per valid JSON format). When casting to VARCHAR, the double quotes are removed. This behavior applies to other data types as well; the double quotes are effectively stripped during the casting process. However, the extracted value must still conform to the format of the target data type. For instance, when casting to INT or FLOAT, the extracted value must be a leaf element and must adhere to valid numeric syntax.

Semantics of NULL Values

In JSON, values can be specified as null (without quotes) to indicate that the value is unknown. It directly maps to SQL NULL when casting.

sql
\pset null '<null>';
select '{"answer":42,"question":null}':question::varchar;
sql
question
----------
<null>
(1 row)

Also note that the : operator is null strict, which means that the following expression is not an error:

sql
select (NULL::JSONB):$.SomeAttribute;
sql
SomeAttribute
---------------
<null>
(1 row)

Missing Elements Handling

When a JSON path query encounters missing elements, there are two handling modes available:

  • ERROR ON ERROR (default): throw a sql exception
  • NULL ON ERROR: returns sql null

Throwing Errors on Missing Elements

Syntax

<json_expr>:<jsonpath> ERROR ON ERROR

Example

sql
select '{"answer":true}':question error on error;
sql
ERROR:  No key found at path: 'strict $."question"' for context '{"answer":true}'

Returning NULL on missing elements

Syntax

<json_expr>:<jsonpath> NULL ON ERROR

Example

sql
select '{"answer":true}':question null on error;
sql
question 
----------
 <null>
(1 row)

Note: since the : operator is null-strict, the following expression also returns NULL:

sql
select '{"answer":42}':question null on error:$.SomeAttribute;
sql
SomeAttribute 
---------------
<null>
(1 row)

Configuration Parameter

By default, ERROR ON ERROR is used when accessing missing elements. You can change this behavior by setting the json_missing_element_default configuration parameter.

Additional Notes

Function Alias

The : operator can also be expressed as a function call to JSON_EXTRACT.

sql
<json_expr>:<jsonpath> = JSON_EXTRACT(<json_expr>, '<jsonpath>')

Note that the function requires a compile-time constant as its second argument; it cannot be another column reference.

Operator Precedence

Casting has higher precedence than the : operator. This means that the following sql query:

sql
select jsoncolumn::JSONB:<jsonpath>;

is equivalent to:

sql
select (jsoncolumn::JSONB):<jsonpath>;

Optimized JSONB Access

The : operator supports chaining, but two equivalent expressions can have different performance characteristics due to intermediate materialization:

sql
SELECT <json_expr>:<key #1>:<key #1>:<key #1>

SELECT <json_expr>:<key #1>.<key #1>.<key #1>

The latter expression is generally faster because it avoids materializing intermediate results. Therefore, the planner automatically optimizes and combines such access patterns for improved performance.

Optimized JSON path access

sql
explain select data:$.tickers:$[0]:$.INTC.high from jsontable;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node   
  5              1       all   SELECT   
  2              1       all   SCAN jsontable   
                                 calculate: jsontable.data:strict $.\"tickers\"[0].\"INTC\".\"high\"
                                 
explain select data:$.tickers[0].INTC.high from jsontable;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 id   rows_planned   workers   node   
  5              1       all   SELECT   
  2              1       all   SCAN jsontable   
                                 calculate: jsontable.data:strict $.\"tickers\"[0].\"INTC\".\"high\"

Notice that both queries above utilize the same optimized execution plan.