Hadoop Hive Cumulative Sum, Average and Example

Post author:Vithal S
Post last modified:March 16, 2019
Post category:BigData
Reading time:3 mins read

Latest version of Hive HQL supports the window analytics functions. You can make use of the Hadoop Hive Analytic functions to calculate the cumulative sum or running sum and cumulative average. Sum and Average analytical functions are used along with window options to calculate the Hadoop Hive Cumulative Sum or running sum.

Hadoop Hive Cumulative Sum, Average Syntax:

Below are the Syntax for Apache Hive Cumulative SUM, AVG analytic functions. You can use these function within query you have requirement to calculate cumulative SUM or AVG.

SUM([DISTINCT
| ALL] expression)
[OVER (analytic_clause)];

AVG([DISTINCT
| ALL] expression)
[OVER (analytic_clause)];

Hadoop Hive Cumulative Sum, Average Examples

Hive Cumulative Sum and Average.

Query:

select
name,
amount,
 SUM(amount)
over
 (
 order by name
 rows between unbounded preceding and current row
 ) cumulative_Sum
from test;

Output:

name	amount	cumulative sum
abc	100	100
abc	200	300
bcd	100	400
bcd	100	500
bcd	100	600
cde	400	1000
cde	400	1400
efg	600	2000
efg	600	2600

Query:

select
name,
amount,
 AVG(amount)
over
 (
 order by name
 rows between unbounded preceding and current row
 ) cumulative_avg
from test;

Output:

name	amount	cumulative avg
abc	100	100
abc	200	150
bcd	100	133.333333333
bcd	100	125
bcd	100	120
cde	400	166.666666667
cde	400	200
efg	600	250
efg	600	288.888888889

Read:

Tags: Hadoop, Hive

This Post Has 2 Comments

Johny April 11, 2018

Is there any way i don’t want the sum but want to append all the preceding rows for same name. for example like below

name date amount
abc mon 100
abc mon 200
abc tue 100
abc tue 200
abc tue 300 so here 300 is the new value and 100,200 from mon will also show in tue. Do we have any way to do this in hive
1. Vithal S April 11, 2018
  
  Hi,
  
  Did you check LEAD and LAG functions in Hive?
  
  For your problem, you can use LAG function to get previous value.
  
  You can read: Hadoop Hive Analytic Functions
  
  Let me know how it goes.
  
  Thanks

Comments are closed.