GROUP BY GROUPING SETS

Name zero or more GROUP BY expressions as grouping sets. Aggregates are computed for each discrete group.

GROUP BY GROUPING SETS (expression [, ...])

Within the outer pair of parentheses, expressions may be either named in a simple comma-separated list or isolated within their own inner pairs of parentheses. For example, grouping sets((city, state)) and grouping sets((city), (state)) are both valid; however, these constructs return different results:

grouping sets((city, state)) means group by city and state in one set of rows.
grouping sets((city), (state)), or grouping sets(city, state), means group by city in a set of rows, then group by state in a separate set of rows.

You may want to use inner sets of parentheses consistently for clarity.

Usage Notes

Duplicate grouping sets are not allowed.
Queries are limited to a maximum of 128 grouping sets.
The maximum length of each grouping set is 200 elements.
GROUP BY CUBE queries are limited to a maximum of 7 columns.
If the target table is empty, a grouping set query returns 0 rows.
Aliases defined in the SELECT list may be referenced in the GROUPING SETS clause. For example:

select c1 as a, c2 as b, ...
from ...
group by grouping sets((a), (b)) 
...

Ordinal numbers that reference GROUP BY expressions are also valid. For example:

select city, state, ...
from ...
group by grouping sets((1), (2))
...

Examples

Assume that you have a citypop table that stores population numbers by city and state, and that the table contains five rows:

testdb=# select * from citypop order by 2,3;
   city     | state |   pop    
-------------+-------+----------
 San Jose    | CA    |  1799000
 Los Angeles | CA    | 12459000
 Naples      | FL    |   398000
 Miami       | FL    |  6167000
 Boise       | ID    |   455000
(5 rows)

Note the results of the following two queries. The first one defines (city, state) as a single unit of aggregation:

testdb=# select city, state, sum(pop) from citypop 
group by grouping sets((city, state)) order by 1;
   city     | state |   sum    
-------------+-------+----------
 Boise       | ID    |   455000
 Los Angeles | CA    | 12459000
 Miami       | FL    |  6167000
 Naples      | FL    |   398000
 San Jose    | CA    |  1799000
(5 rows)

The query returns only five rows, with one sum for each different city/state group. This is the exact same result that you would get with a simple group by city, state clause. The use of GROUPING SETS in this particular case does not add any value but is a valid alternative to the basic GROUP BY syntax.

The second query defines city and state as distinct groups for aggregation within the same result set:

testdb=# select city, state, sum(pop) from citypop group by grouping sets((city), (state)) order by 1,2,3;
   city     | state |   sum    
-------------+-------+----------
 Boise       | NULL  |   455000
 Los Angeles | NULL  | 12459000
 Miami       | NULL  |  6167000
 Naples      | NULL  |   398000
 San Jose    | NULL  |  1799000
 NULL        | FL    |  6565000
 NULL        | CA    | 14258000
 NULL        | ID    |   455000
(8 rows)

This query returns eight rows: five rows for each different city, plus three rows for each different state. In this case, the use of GROUPING SETS provides a simple way to return results that have multiple levels of aggregation within a single query. NULL values are used to fill in the empty columns.

To take this example one step further, you can also produce a total row for the query by specifying () as a third, but empty grouping set:

testdb=# select city, state, sum(pop) from citypop group by grouping sets((city), (state), ()) order by 1,3;
   city     | state |   sum    
-------------+-------+----------
 Boise       | NULL  |   455000
 Los Angeles | NULL  | 12459000
 Miami       | NULL  |  6167000
 Naples      | NULL  |   398000
 San Jose    | NULL  |  1799000
 NULL        | ID    |   455000
 NULL        | FL    |  6565000
 NULL        | CA    | 14258000
 NULL        | NULL  | 21278000
(9 rows)

The ninth row is the grand total for the preceding sets of rows. An empty grouping set means that all rows are aggregated in a single group. This total row is produced even when no input rows exist.

Throughout these examples, note that NULL values occupy the empty columns when there is no match for a particular grouping column. Total rows contain NULL values for all grouping columns.

NULL Handling with Grouping Sets

When you are running queries that group by grouping sets, you have two types of NULL values to think about: those that exist in the table data itself, and those produced in aggregated rows.

When NULL values exist in the data for grouping set columns and expressions, aggregated rows return NULL values in the result. For example, the 14th row in the following query result represents the group where both position and cob are NULL. There are two such rows in the player table. In both of those rows, weekly_wages is also NULL. Therefore the 14th row is a series of three NULL values.

yellowbrick=# select position, cob, sum(weekly_wages) 
from player 
group by grouping sets((position, cob)) order by 1,2,3;
 position |     cob     |  sum   
----------+-------------+--------
 D        | Netherlands | 180000
 F        | Argentina   | 572000
 F        | Chile       | 410000
 F        | Egypt       | 200000
 F        | England     | 580000
 G        | Spain       | 200000
 M        | Algeria     | 200000
 M        | Belgium     | 250000
 M        | England     | 140000
 M        | France      | 378000
 M        | Germany     | 350000
 M        | Netherlands | 618000
 M        | Spain       | 250000
 [NULL]   | [NULL]      | [NULL]
(14 rows)

If you do not want this kind of row to be evaluated or displayed, filter out NULL values in the WHERE clause.

For grouping sets queries, you will see NULL values used in two other ways:

To fill in the blanks for subtotal and grand total rows. Total rows contain NULL values for all grouping columns.
To fill in the blanks when grouped aggregation occurs at different levels. NULL values occupy the empty columns when there is no match for a particular grouping column.

If you did not intend to return these aggregated rows, you may need to rewrite the query. If you want to return these rows only under certain conditions, you can apply a filter in the HAVING clause.

Configuring SSL/TLS for Tools and Drivers

Secure Connections for ODBC/JDBC Clients and ybsql

sys.lock

Bulk Load Examples

Running a Bulk Load

Loading Tables from Parquet Files

ybload Command

Loading from Amazon S3

Loading from Azure Blob Storage

Setting up and Running a Spark Job

Setting Up the ybrelay Service

LDAP Authentication

Synchronizing Users and Groups

Appliance: Disk Encryption

Setting Up Encrypted Drives

Remote Diagnostics

System Alerts

Creating an Alert Endpoint

Using the System Management Console

ybcli Reference

ybcli: config

AWS Marketplace

Create Stack

Docker

Cloud: Configuration

Vanity DNS

Yellowbrick Manager

Cloud: Enterprise Edition Getting Started

SQL-Based Loads from External Storage

Cloud: Installation

CLI Install Instructions

Permissions

Private Install Instructions

Public Install Instructions

Cloud: Kubernetes Guides

CREATE EXTERNAL FORMAT

CREATE EXTERNAL TABLE

CREATE TABLE

GRANT

Plan Hinting

SELECT

GROUP BY Clause

Subqueries

Data Type Casting

DECIMAL

JSON

JSONB

SQL String Constants

Aggregate Functions

Conditional Expressions

Datetime Functions

Formatting Functions

Geospatial functions

Mathematical Functions

Network Address Functions

Pattern Matching

Regular Expression Details

SQL Operators and Pattern Matching Functions

SQL Conditions

SQL User Defined Function (UDF)

SQL UDF Create Function

String Functions

ENCRYPT_KS

System Functions

Type-Safe Casting Functions

Window Functions

Creating WLM Resource Pools

Creating WLM Rules

Rule Examples

GROUP BY GROUPING SETS ​

Usage Notes ​

Examples ​

NULL Handling with Grouping Sets ​

GROUP BY GROUPING SETS

Usage Notes

Examples

NULL Handling with Grouping Sets