In my other BigQuery related articles, we have seen BigQuery grouping sets alternative, BigQuery control flow statements, NVL and NVL2 alternatives in BigQuery, cursors in BigQuery, etc. In this article, we will check one of the important GROUP BY extensions GROUP BY CUBE alternative in Google BigQuery.
SQL GROUP BY CUBE in Google BigQuery
GROUP BY CUBE is an extension of the GROUP BY clause similar to GROUP BY ROLLUP and GROUPING SETS which is used to analyzes data by grouping it into multiple dimensions. In addition to producing all the rows of a GROUP BY ROLLUP, GROUP BY CUBE adds all the “cross-tabulations” rows. Sub-total rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows
Many relational databases such as Oracle, Teradata, etc. support GROUP BY with CUBE function to group the result rows. However, GCP BigQuery does not support GROUP BY with CUBE function.
For example, the following query defines the group by with CUBE function that include county and city. This query will execute without any issues on databases such as Teradata.
SELECT county, city, SUM(margin) as total_sale
FROM sale_view
GROUP BY CUBE (county, city)
order by 1,2;
|county |city |total_sale|
|-----------|----------|----------|
| | |58,200 |
| |Avalon |14,400 |
| |Long Beach|24,300 |
| |San Diego |19,500 |
|Los Angeles| |38,700 |
|Los Angeles|Avalon |14,400 |
|Los Angeles|Long Beach|24,300 |
|San Diego | |19,500 |
|San Diego |San Diego |19,500 |
However, above query will not work on the GCP BogQuery. When you try to execute above query, you will end up getting “Syntax error: Unexpected keyword CUBE” error on Google BigQuery. It is because BigQuery does not support SQL GROUP BY with CUBE Function.
GROUP BY with CUBE BigQuery Alternative Using UNION ALL
It’s possible to create a GROUP BY WITH CUBE using the UNION ALL approach. The GROUP BY with CUBE function will combine multiple GROUP BY clauses. The results are the equivalent of UNION ALL of the specified groups. We will rewrite original query using the UNION ALL set operator to return the same result set.
For example, following query combines the multiple GROUP BY clause.
select county, city, SUM(margin) as total_sale
from sale_view
group by county, city
UNION ALL
select county, null as city, sum(margin) as total_sale
from sale_view
group by county
UNION ALL
select NULL as county ,city, sum(margin) as total_sale
from sale_view
group by city
UNION ALL
select NULL,NULL, sum(margin) as total_sale
from sale_view
order by 1,2;
|county |city |total_sale|
|-----------|----------|----------|
| | |58,200 |
| |Avalon |14,400 |
| |Long Beach|24,300 |
| |San Diego |19,500 |
|Los Angeles| |38,700 |
|Los Angeles|Avalon |14,400 |
|Los Angeles|Long Beach|24,300 |
|San Diego | |19,500 |
|San Diego |San Diego |19,500 |
The approach works well if you have predictable columns in the CUBE function.
However, with the above approach, the code can quickly become cumbersome and unwieldy. You can automate same using BigQuery control statements.
Related Articles,
- BigQuery Control Flow Statements – IF, LOOP, WHILE
- BigQuery Recursive Query Alternative – Example
- What is SQL Cursor Alternative in BigQuery?
- NVL and NVL2 Functions in BigQuery and Alternatives
- Google BigQuery Grouping Sets Alternative and Example
- Reuse Column Aliases in BigQuery – Lateral Column alias
Hope this helps 🙂