Hadoop Hive Cumulative Sum, Average and Example

  • Post author:
  • Post last modified:March 16, 2019
  • Post category:BigData
  • Reading time:3 mins read

Latest version of Hive HQL supports the window analytics functions. You can make use of the Hadoop Hive Analytic functions to calculate the cumulative sum or running sum and cumulative average. Sum and Average analytical functions are used along with window options to calculate the Hadoop Hive Cumulative Sum or running sum.

Hadoop Hive Cumulative Sum, Average Syntax:

Below are the Syntax for Apache Hive Cumulative SUM, AVG analytic functions. You can use these function within query you have requirement to calculate cumulative SUM or AVG.

SUM([DISTINCT
| ALL] expression)
[OVER (analytic_clause)];

AVG([DISTINCT
| ALL] expression)
[OVER (analytic_clause)];

Hadoop Hive Cumulative Sum, Average Examples

Hive Cumulative Sum and Average.

Query:

select
name,
amount,
 SUM(amount)
over
 (
 order by name
 rows between unbounded preceding and current row
 ) cumulative_Sum
from test;

Output:

name amount cumulative sum
abc 100 100
abc 200 300
bcd 100 400
bcd 100 500
bcd 100 600
cde 400 1000
cde 400 1400
efg 600 2000
efg 600 2600

Query:

select
name,
amount,
 AVG(amount)
over
 (
 order by name
 rows between unbounded preceding and current row
 ) cumulative_avg
from test;

Output:

name amount cumulative avg
abc 100 100
abc 200 150
bcd 100 133.333333333
bcd 100 125
bcd 100 120
cde 400 166.666666667
cde 400 200
efg 600 250
efg 600 288.888888889

Read:

This Post Has 2 Comments

  1. Johny

    Is there any way i don’t want the sum but want to append all the preceding rows for same name. for example like below

    name date amount
    abc mon 100
    abc mon 200
    abc tue 100
    abc tue 200
    abc tue 300 so here 300 is the new value and 100,200 from mon will also show in tue. Do we have any way to do this in hive

    1. Vithal S

      Hi,

      Did you check LEAD and LAG functions in Hive?

      For your problem, you can use LAG function to get previous value.

      You can read: Hadoop Hive Analytic Functions

      Let me know how it goes.

      Thanks

Comments are closed.