SQL GROUPING SETS Alternative in Redshift

In my other articles, we have discussed GROUP BY with CUBE and GROUP BY ROLLUP alternatives. In this article, we will check this SQL GROUPING SETS alternative in Amazon Redshift with an example.

The GROUPING SETS option in SQL gives you the ability to combine multiple GROUP BY clauses into one GROUP BY clause. The GROUPING SETS is one of the powerful GROUP BY extension. The group set is a set of dimension columns. Amazon Redshift does not support GROUPING SETS.

SQL GROUPING SETS Alternative in Redshift

GROUP BY GROUPING SETS is equivalent to the UNION of two or more GROUP BY operations in the same result set. Therefore, you can rewrite the original query using the UNION ALL set operator to return the same result set.

For example, the following query defines the grouping sets that include country and city. This query will execute without any issues on databases such as Teradata or Oracle.

SELECT county, city, sum(margin) as total
FROM sale_view
GROUP BY  GROUPING SETS ((county),(city))
order by 1,2
;

And following is the output on Teradata BTEQ CLI.

County		City		total
------------	-------		---------
?		Avalon		14,400
?		Diego		19,500
?		Long Beach	24,300
Los Angeles 	?		38,700
San Diego San 	?		19,500

However, above query will not execute on Redshift cluster. You will end getting error something like “Invalid operation: syntax error at or near “SETS”“. It is because Amazon Redshift does not support grouping sets.

GROUPING SETS Alternative using UNION ALL

As mentioned, to get a same result set with the aggregated data for all grouping sets, you can use the UNION ALL operator.

Since UNION ALL operator requires all result set to have the same number of columns, you need to add NULL to the select list to the queries similar to the one in following example. As a result, output may contain NULL values.

SELECT county,NULL as city, sum(margin) as total
   FROM dbo.sale_view
   GROUP BY county
union all 
SELECT NULL as county, city, sum(margin) as total
   FROM dbo.sale_view
   GROUP BY city
ORDER BY  1, 2  
   ;

And following is the output on PSQL CLI.

   county    |    City	     |  total
-------------+---------------+-------
 Los Angeles |               | 38700
 San Diego   |               | 19500
             | Avalon        | 14400
             | Long Beach    | 24300
             | San     Diego | 19500
(5 rows)

Blank value represents NULL.

The query will generate a single result with the aggregates for all grouping sets. However, following are some of the drawbacks of using the UNION ALL method.

As you will be breaking the query into multiple subqueries, it will be lengthy.
The query will be slow. The Redshift database needs to execute multiple subqueries and combine result sets.