Latest version of Hive HQL supports the window analytics functions. You can make use of the Hadoop Hive Analytic functions to calculate the cumulative sum or running sum and cumulative average. Sum and Average analytical functions are used along with window options to calculate the Hadoop Hive Cumulative Sum or running sum.
Hadoop Hive Cumulative Sum, Average Syntax:
Below are the Syntax for Apache Hive Cumulative SUM, AVG analytic functions. You can use these function within query you have requirement to calculate cumulative SUM or AVG.
SUM([DISTINCT | ALL] expression) [OVER (analytic_clause)]; AVG([DISTINCT | ALL] expression) [OVER (analytic_clause)];
Hadoop Hive Cumulative Sum, Average Examples
Hive Cumulative Sum and Average.
Query:
select name, amount, SUM(amount) over ( order by name rows between unbounded preceding and current row ) cumulative_Sum from test;
Output:
name | amount | cumulative sum |
abc | 100 | 100 |
abc | 200 | 300 |
bcd | 100 | 400 |
bcd | 100 | 500 |
bcd | 100 | 600 |
cde | 400 | 1000 |
cde | 400 | 1400 |
efg | 600 | 2000 |
efg | 600 | 2600 |
Query:
select name, amount, AVG(amount) over ( order by name rows between unbounded preceding and current row ) cumulative_avg from test;
Output:
name | amount | cumulative avg |
abc | 100 | 100 |
abc | 200 | 150 |
bcd | 100 | 133.333333333 |
bcd | 100 | 125 |
bcd | 100 | 120 |
cde | 400 | 166.666666667 |
cde | 400 | 200 |
efg | 600 | 250 |
efg | 600 | 288.888888889 |
Read:
- Cloudera Impala Generate Sequence Numbers without UDF
- Netezza ROWNUM Pseudo Column Alternative
- Run Impala SQL Script File Passing argument and Working Example
- An Introduction to Hadoop Cloudera Impala Architecture
- Commonly used Hadoop Hive Commands
- Hadoop Hive Dynamic Partition and Examples
- Hive String Functions and Examples
- Hive Join Types and Examples
Is there any way i don’t want the sum but want to append all the preceding rows for same name. for example like below
name date amount
abc mon 100
abc mon 200
abc tue 100
abc tue 200
abc tue 300 so here 300 is the new value and 100,200 from mon will also show in tue. Do we have any way to do this in hive
Hi,
Did you check LEAD and LAG functions in Hive?
For your problem, you can use LAG function to get previous value.
You can read: Hadoop Hive Analytic Functions
Let me know how it goes.
Thanks