SQL GROUP BY with CUBE Function Alternative in Redshift

In my other articles, we have discussed GROUP BY with GROUPING SETS and GROUP BY ROLLUP alternatives. In this article, we will check the SQL GROUP BY CUBE function alternative in Amazon Redshift.

The relational databases such as Oracle, Teradata, etc. support GROUP BY with CUBE function to group the result rows. However, Amazon Redshift does not support GROUP BY with CUBE function.

SQL GROUP BY with CUBE Function Alternative in Redshift

The GROUP functions such as ROLLUP, CUBE, GROUPING SETS are an extension of the GROUP BY clause. The CUBE allows you to generate subtotals like the ROLLUP extension. In addition, the CUBE extension will generate subtotals for all combinations of grouping columns specified in the GROUP BY clause.

For example, the following query defines the group by clause with CUBE function that include pid and county. This is a Teradata query and will execute without any issues.

SELECT pid, county, SUM(sale) as total_sale
FROM sale_view
GROUP BY CUBE (pid,county)
order by 1,2;

Following is the output.

PID County         total_sale
--- ------------- -----------
  ? ?                  247350
  ? Los Angeles        164475
  ? San Diego San       82875
  1 ?                  194225
  1 Los Angeles        111350
  1 San Diego San       82875
  2 ?                   53125
  2 Los Angeles         53125

But, the same query will not work on the Amazon Redshift server. When you try to execute above query, you will end up getting “Invalid operation: function cube(integer, character varying) does not exist“ error on Redshift. It is because Redshift does not support SQL GROUP BY with CUBE Function.

GROUP BY with CUBE Function Alternative Using UNION ALL

Basically, the GROUP BY with CUBE function will combine multiple GROUP BY clauses. The results are the equivalent of UNION ALL of the specified groups. We will rewrite original query using the UNION ALL Redshift set operator to return the same result set.

For example, following query combines the multiple GROUP BY clause.

select pid, county, SUM(sale) as total_sale
from sale_view 
group by pid, county
UNION ALL
select pid, null as county, sum(sale) as total_sale
from sale_view  
group by pid
UNION ALL
select NULL as pid ,county, sum(sale) as total_sale
from sale_view 
group by county
UNION ALL
select NULL,NULL, sum(sale) as total_sale
from sale_view 
order by 1,2;

Here is the output.

 pid |   county    | total_sale
-----+-------------+------------
   1 | Los Angeles |     111350
   1 | San Diego   |      82875
   1 |             |     194225
   2 | Los Angeles |      53125
   2 |             |      53125
     | Los Angeles |     164475
     | San Diego   |      82875
     |             |     247350
(8 rows)

Note that, blank value represent NULL.

The discussed approach works well if you have predictable columns in the CUBE function.