Google BigQuery Grouping Sets Alternative and Example

  • Post author:
  • Post last modified:April 18, 2023
  • Post category:GCP BigQuery
  • Reading time:5 mins read

In my other BigQuery related articles, we have seen BigQuery control flow statements, NVL and NVL2 alternatives in BigQuery, cursors in BigQuery, etc. In this article, we will check one of the important GROUP BY extensions GROUPING SETS alternative in Google BigQuery.

Google BigQuery Grouping Sets
Google BigQuery Grouping Sets

SQL GROUPING SETS in Google BigQuery

Many modern day analytics databases support GROUPING SETS. The GROUPING SETS option in SQL gives you an ability to combine multiple GROUP BY clauses into one GROUP BY clause. By definition, a grouping set is a group of columns by which you group.

The results GROUPING SETS are the equivalent of UNION ALL of the specified groups. so you can rewrite your original grouping sets query using the UNION ALL set operator to return the same result set.

For example, the following query defines the grouping sets that include country and city. This query will execute without any issues on databases such as Teradata, Snowflake, Netezza, etc.

SELECT county, city, sum(margin) 
   FROM sale_view
   GROUP BY  GROUPING SETS ((county),(city))
   order by 1,2 ;

|county     |city      |margin|
|-----------|----------|------|
|           |Avalon    |14,400|
|           |Long Beach|24,300|
|           |San Diego |19,500|
|Los Angeles|          |38,700|
|San Diego  |          |19,500|

However, same query will not work on Google GCP BigQuery. You will end getting error something like “Syntax error: Expected “(” but got identifier “SETS”“. It is because GCP BigQuery does not support grouping sets.

GROUPING SETS Alternative in BigQuery using UNION ALL

As mentioned earlier, grouping sets is a combination of multiple group by to get a same result set with the aggregated data for all grouping sets, you can use the UNION ALL operator.

Since UNION ALL operator requires all result set to have the same number of columns, you need to add NULL to the select list to the queries similar to the one in the following example.

SELECT county,NULL, sum(margin) 
   FROM sale_view
   GROUP BY county
union all 
SELECT NULL, city, sum(margin) 
   FROM sale_view
   GROUP BY city
ORDER BY  1, 2;
|county     |f0_       |f1_   |
|-----------|----------|------|
|           |Avalon    |14,400|
|           |Long Beach|24,300|
|           |San Diego |19,500|
|Los Angeles|          |38,700|
|San Diego  |          |19,500|

The query will generate a single result with the aggregates for all grouping sets. However, following are some of the drawbacks of using the UNION ALL method to write an alternate query.

  • The query will be lengthy as we are breaking the query into multiple subqueries for each column in groping sets.
  • The query will be slow. The BigQuery needs to execute multiple subqueries and combine result sets.

Related Articles,

Hope this helps 🙂