SQL GROUP BY ROLLUP Function Alternative in Redshift

Post author:Vithal S
Post last modified:April 6, 2021
Post category:Redshift
Reading time:5 mins read

In my other articles, we have discussed GROUP BY with GROUPING SETS and GROUP BY with CUBE alternatives. In this article, we will check the SQL GROUP BY ROLLUP function alternative in Amazon Redshift.

The relational databases such as Oracle, Teradata, etc. support GROUP BY ROLLUP function to group the result rows. However, Amazon Redshift does not support GROUP BY ROLLUP function.

SQL GROUP BY ROLLUP Alternative in Redshift

Similar to GROUP BY with CUBE, the GROUP BY ROLLUP is an extension of the GROUP BY clause that produces sub-total rows. Sub-total rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.

For example, the following query defines the group by clause with ROLLUP function that include pid and county. This is a Teradata query and will execute without any issues.

SELECT pid, county, SUM(sale) as total_sale
FROM sale_view
GROUP BY ROLLUP (pid,county)
order by 1,2;

Following is the output on Teradata server.

PID County        total_sale
--- ------------- -----------
  ? ?                  247350
  1 ?                  194225
  1 Los Angeles        111350
  1 San Diego San       82875
  2 ?                   53125
  2 Los Angeles         53125

But, the same query with ROLLUP function will not work on the Amazon Redshift server. When you try to execute above query, you will end up getting “Invalid operation: function rollup(integer, character varying) does not exist“ error on Redshift. It is because Redshift does not support SQL GROUP BY ROLLUP Function.

GROUP BY ROLLUP Function Alternative Using UNION ALL

Basically, the GROUP BY ROLLUP function is a combination of multiple GROUP BY clauses. The results are the equivalent of UNION ALL of the specified groups. You can rewrite original query using the UNION ALL Redshift set operator to return the same result set.

For example, following query combines the multiple GROUP BY clause to return sub-total rows.

select pid, county, SUM(sale) as total_sale
from sale_view 
group by pid, county
UNION ALL
select pid, null as county, sum(sale) as total_sale
from sale_view  
group by pid
UNION ALL
select NULL,NULL, sum(sale) as total_sale
from sale_view 
order by 1,2;

Following is the output.

 pid |   county    | total_sale
-----+-------------+------------
   1 | Los Angeles |     111350
   1 | San Diego   |      82875
   1 |             |     194225
   2 | Los Angeles |      53125
   2 |             |      53125
     |             |     247350
(6 rows)

Note that, blank value represent NULL.

The discussed approach works well if you have predictable columns in the ROLLUP function.