The cumulative sum or running total is one of the interesting problems where you have to calculate the sum or average using current result and previous row value. Most of the modern analytical database line Netezza, Teradata, Oracle, Vertica provides supports to analytical functions. You can make use of those analytical functions along with window specification to calculate the cumulative sum and average. In this article, we will check how to calculate Redshift Cumulative Sum (running total) or cumulative average with some examples.
Redshift Cumulative Sum
As explained earlier, cumulative sum or a running total is the summation of a sequence of numbers which is updated each time a new number is added to the sequence, by adding the value of the new number to the previous running total.
Cumulative sum or running total is commonly used every day, for example, in the transaction system, the machine may keep a running total of all transactions made .
Redshift allows you to calculate running total using windows functions with specifications.
Redshift Cumulative Sum Syntax
Following is the syntax of a Redshift sum analytical function and its window specifications that you can use to compute the cumulative sum or running total of the column value based on your requirements:
SUM(column | expression)
OVER( PARTITION BY columns ORDER BY sort columns rows unbounded preceding |
rows between <N> preceding and current row ); -- Check this
Redshift Cumulative Sum Example
Below is an examples on calculating Redshift Cumulative Sum. For cumulative SUM, you will either have to use “unbounded preceding” or “unbounded following” in your windows specification.
For example, consider following example that demonstrates usage of “unbounded preceding” to consider preceding result for cumulative calculation.
SELECT s_qty,
Sum(s_price)
OVER(
partition BY NULL
ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM sales_fact;
Output:
S_QTY | Cumulative Sum
----------+----------------
1 | 1000
100 | 11000
150 | 26000
200 | 28000
250 | 53000
300 | 83000
2000 | 103000
(7 rows)
Redshift Cumulative Average
Similar to cumulative sum, cumulative average or a running average is the average of a sequence of number or column values.
Cumulative average or running average is commonly used every day, for example, calculate average grade of all the students in class.
Redshift Cumulative Average Syntax
Following is the syntax of a Redshift avg analytical function and its window specifications that you can use to compute the cumulative average or running average of the column value based on your requirements:
AVG(column | expression)
OVER( PARTITION BY columns ORDER BY sort columns rows unbounded preceding |
rows between <N> preceding and current row );
Redshift Cumulative Average Example
Following example demonstrates on calculating Redshift cumulative average. For cumulative AVG, you will either have to use “unbounded preceding” or “unbounded following”.
Below example demonstrates usage of “unbounded preceding” to consider preceding result for cumulative calculation.
SELECT s_qty,
Avg(s_price)
OVER(
partition BY NULL
ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Average"
FROM sales_fact;
Output:
S_QTY | Cumulative Average
----------+----------------
1 | 1,000
100 | 5,500
150 | 8,666.66667
200 | 7,000
250 | 10,600
300 | 13,833.33333
2000 | 14,714.28571
(7 rows)
Teradata CSUM Alternative in Amazon Redshift
Teradata supports special function, CSUM, to calculate the cumulative sum. Many relational databases do not support these kind of functions.
Following is the example of CSUM function.
SELECT Item, SalesDate, CSUM(Revenue,Item,SalesDate) AS CumulativeSales
FROM
(SELECT Item, SalesDate, SUM(Sales) AS Revenue
FROM DailySales
WHERE StoreId=5 AND SalesDate BETWEEN
'1990-01-01' AND '1990-01-31'
GROUP BY Item, SalesDate) AS ItemSales
ORDER BY SalesDate;
You can rewrite above query by using standard window analytics SUM function.
For example,
SELECT Item, SalesDate, SUM(Revenue) OVER (order by Item,SalesDate
ROWS UNBOUNDED PRECEDING )AS CumulativeSales
FROM
(SELECT Item, SalesDate, SUM(Sales) AS Revenue
FROM DailySales
WHERE StoreId=5 AND SalesDate BETWEEN
'1990-01-01' AND '1990-01-31'
GROUP BY Item, SalesDate) AS ItemSales
ORDER BY SalesDate;
Related Articles
Hope this helps 🙂