You can make use of the Cloudera impala Analytic functions to calculate the cumulative sum or running sum. Sum and Average analytical functions are used along with window options to calculate the Cloudera Impala Cumulative Sum or running sum.
Cloudera Impala Cumulative Sum, Average Syntax:
Below are the Syntax for Cloudera Impala Cumulative SUM, AVG analytic functions. You can defined ORDER BY clause with column inside OVER clause.
SUM([DISTINCT| ALL] expression)[OVER (analytic_clause)] AVG([DISTINCT| ALL] expression)[OVER (analytic_clause)]
Cloudera Impala Cumulative Sum, Average Examples
Impala Cumulative Sum and Average.
Query:
select name, amount, SUM(amount) over ( --partition by name order by name rows between unbounded preceding and current row ) as 'cumulative Sum' from test;
Output:
name | amount | cumulative sum |
abc | 100 | 100 |
abc | 200 | 300 |
bcd | 100 | 400 |
bcd | 100 | 500 |
bcd | 100 | 600 |
cde | 400 | 1000 |
cde | 400 | 1400 |
efg | 600 | 2000 |
efg | 600 | 2600 |
Query:
select name, amount, AVG(amount) over ( --partition by name order by name rows between unbounded preceding and current row ) as 'cumulative avg' from test;
Output:
name | amount | cumulative avg |
abc | 100 | 100 |
abc | 200 | 150 |
bcd | 100 | 133.333333333 |
bcd | 100 | 125 |
bcd | 100 | 120 |
cde | 400 | 166.666666667 |
cde | 400 | 200 |
efg | 600 | 250 |
efg | 600 | 288.888888889 |
Read:
- Commonly used Cloudera Impala Date Functions and Examples
- Cloudera Impala Generate Sequence Numbers without UDF
- Netezza ROWNUM Pseudo Column Alternative
- Run Impala SQL Script File Passing argument and Working Example
- An Introduction to Hadoop Cloudera Impala Architecture
- Commonly used Hadoop Hive Commands