Correlated Subqueries
A correlated subquery contains one or more correlation references between its own columns and the columns that the outer query produces. Yellowbrick processes correlated subqueries by rewriting them and "decorrelating" these references.
Correlated subqueries take on several different patterns. In a Yellowbrick database, the general form of a supported correlated subquery is as follows:
select * from table1
where table1.x =
(select sum(table2.a) from table2 where table2.b = table1.x);
In this pattern:
table1
is a table in the outer query (the main query), andtable2
is a table in the inner query (the subquery). Each row that the outer query produces is qualified (or disqualified) based on the result of the subquery. (Additional tables may appear in the outer and inner queries.)- The equal to (
=
) comparison operator is the qualifier in this example. Other operators may be used here: <, >, >=, <= - The
SUM
function is a non-windowed, non-distinct aggregate function that produces a single value that can be compared to each row in the outer query.AVG
,MIN
, andMAX
aggregate functions may also be used here, butCOUNT
is not supported. - The correlated subquery appears in the
WHERE
clause of the outer query, and its correlation reference is in theWHERE
clause of the subquery:
where table2.b = table1.x
- The subquery may have other predicates or join conditions that only apply to the tables in the
FROM
clause of the subquery.
Select-List Subqueries
Correlated subqueries are supported in the select list only if the value being returned by the subquery comes directly from that subquery. For example, team.teamid
comes directly from team
in the subquery:
premdb=# select (select teamid from team where team.atid=awayteam.atid) from awayteam;
In this case, the returned value (awayteam.atid
) is not from the subquery (which is using team
):
premdb=# select (select awayteam.atid from team where team.name=awayteam.name) from awayteam;
ERROR: Correlated references in SELECT lists are not supported
Note that correlated subqueries in the select list are allowed to contain COUNT()
and COUNT(DISTINCT)
aggregates, unlike those in the WHERE
clause.
Unsupported Patterns
Given this general supported pattern, note the following restrictions. The following correlated subqueries return an error message:
- Correlated subqueries connected with an OR condition:
You cannot connect two correlated subqueries with an
OR
condition. For example, the following query pattern returns an error:select * from table1 where table1.x > (select max(table2.a) from table2 where table2.b = table1.x) OR table.y < (select min(table2.a) from table2 where table2.b = table1.x) ;
- Correlated expressions on the right side of ALL, IN, NOT IN, EXISTS, and NOT EXISTS conditions:
The following query pattern returns an error:
select * from table1 where EXISTS (select sum(table2.a) from table2 where table2.b = table1.x);
Correlated subqueries with
EXISTS
andNOT EXISTS
conditions are not supported in general but do work in some cases.- COUNT aggregation in a correlated subquery:
The following query pattern returns an error:
select * from table1 where table1.x = (select COUNT(table2.a) from table2 where table2.b = table1.x);
- Correlated subqueries with GROUP BY or set operations (UNION, INTERSECT, EXCEPT) or correlations in the SELECT list, HAVING clause, ORDER BY clause, or WINDOW clause:
For example, the following query pattern returns an error:
select * from table1 where table1.x = (select sum(table2.a) from table2 where table2.b = table1.x GROUP BY table2.c);
- Skip-level correlation references
Correlation references that skip a query block are not supported. For example, in the following query pattern, the inner block that contains the correlation reference is separated from the main query block by another subquery. The correlation reference tries to correlate
table1
andtable3
.select * from table1 where table1.x = (select sum(table2.a) from table2 where table2.b = (select sum(table3.a from table3 where table1.x=table3.b));
Examples
The following example is a simple case of a correlated subquery that returns the output of a SUM
function and compares it with a column from a table in the outer query. The correlation reference is ht.htid=at.atid
.
premdb=# select * from awayteam at
where at.atid=
(select sum(ht.htid) from hometeam ht
where ht.htid=at.atid);
atid | name
------+---------
51 | Arsenal
(1 row)
The following similar query is not supported because it replaces the SUM
function with COUNT
:
premdb=# select * from awayteam at
where at.atid=(select count(ht.htid) from hometeam ht
where ht.htid=at.atid);
ERROR: COUNT aggregate functions in correlated subqueries are not supported
Parent topic:Subqueries