In my other articles, we have discussed GROUP BY with CUBE and GROUP BY ROLLUP alternatives. In this article, we will check this SQL GROUPING SETS alternative in Amazon Redshift with an example.
The GROUPING SETS option in SQL gives you the ability to combine multiple GROUP BY clauses into one GROUP BY clause. The GROUPING SETS is one of the powerful GROUP BY extension. The group set is a set of dimension columns. Amazon Redshift does not support GROUPING SETS.
SQL GROUPING SETS Alternative in Redshift
GROUP BY GROUPING SETS is equivalent to the UNION
of two or more GROUP BY operations in the same result set. Therefore, you can rewrite the original query using the UNION ALL set operator to return the same result set.
For example, the following query defines the grouping sets that include country and city. This query will execute without any issues on databases such as Teradata or Oracle.
SELECT county, city, sum(margin) as total
FROM sale_view
GROUP BY GROUPING SETS ((county),(city))
order by 1,2
;
And following is the output on Teradata BTEQ CLI.
County City total
------------ ------- ---------
? Avalon 14,400
? Diego 19,500
? Long Beach 24,300
Los Angeles ? 38,700
San Diego San ? 19,500
However, above query will not execute on Redshift cluster. You will end getting error something like “Invalid operation: syntax error at or near “SETS”“. It is because Amazon Redshift does not support grouping sets.
GROUPING SETS Alternative using UNION ALL
As mentioned, to get a same result set with the aggregated data for all grouping sets, you can use the UNION ALL operator.
Since UNION ALL operator requires all result set to have the same number of columns, you need to add NULL to the select list to the queries similar to the one in following example. As a result, output may contain NULL values.
SELECT county,NULL as city, sum(margin) as total
FROM dbo.sale_view
GROUP BY county
union all
SELECT NULL as county, city, sum(margin) as total
FROM dbo.sale_view
GROUP BY city
ORDER BY 1, 2
;
And following is the output on PSQL CLI.
county | City | total
-------------+---------------+-------
Los Angeles | | 38700
San Diego | | 19500
| Avalon | 14400
| Long Beach | 24300
| San Diego | 19500
(5 rows)
Blank value represents NULL.
The query will generate a single result with the aggregates for all grouping sets. However, following are some of the drawbacks of using the UNION ALL method.
- As you will be breaking the query into multiple subqueries, it will be lengthy.
- The query will be slow. The Redshift database needs to execute multiple subqueries and combine result sets.
Related Articles,
- Redshift Comparison Operators – ALL, SOME, ANY Alternative
- QUALIFY Clause in Redshift – Alternative and Examples
- What are INSTR Alternative Functions in Redshift?
Hope this helps 🙂