Built-in Greenplum Analytics Functions and Examples

  • Post author:
  • Post last modified:March 22, 2018
  • Post category:Greenplum
  • Reading time:6 mins read

Window functions or Greenplum analytics functions compute an aggregated value that is based on a group of rows. These functions allow the application developers to more easily write complex online analytical processing (OLAP) queries using standard SQL commands.

For example, with Greenplum analytics functions or windows expressions, users can calculate moving averages or sums over various intervals, ranks as selected column values etc.

Greenplum Analytics Functions

Read:

Greeplum Analytic Functions Examples

Here are the examples of some commonly used Greenplum analytics functions:

COUNT Analytics functions

This functions returns the number of rows in query or group of rows

Example;

tutorial=> SELECT pat_id,dept_id, count(*) OVER(PARTITION BY dept_id ORDER BY dept_id) cnt FROM patient; 
pat_id | dept_id | cnt 
--------+---------+----- 
 5 | 101 | 2 
 3 | 101 | 2 
 8 | 103 | 2 
 9 | 103 | 2 
 1 | 100 | 3 
 4 | 100 | 3 
 2 | 100 | 3 
 7 | 102 | 2 
 6 | 102 | 2 
 10 | 104 | 1 
(10 rows)

SUM Analytics Function

Just like normal SUM function, SUM analytics function is used to compute the sum of all rows in table or rows within group

Example;

tutorial=> SELECT pat_id,dept_id,ins_amt, SUM(ins_amt) OVER(PARTITION BY dept_id ORDER BY dept_id) dept_amt FROM Patient; 
 pat_id | dept_id | ins_amt | dept_amt 
--------+---------+---------+---------- 
 5 | 101 | 100010 | 203010 
 3 | 101 | 103000 | 203010 
 8 | 103 | 100600 | 270600 
 9 | 103 | 170000 | 270600 
 1 | 100 | 100000 | 370000 
 4 | 100 | 120000 | 370000 
 2 | 100 | 150000 | 370000 
 7 | 102 | 100000 | 210000 
 6 | 102 | 110000 | 210000 
 10 | 104 | 109000 | 109000 
(10 rows)

MIN and MAX Analytics Functions

MIN and MAX analytics functions are used to compute min and max of rows or group of rows within group.

Syntax:

MIN(colum | expr) OVER(PARTITION BY expr |col ORDER BY expr | col);
MAX(colum | expr) OVER(PARTITION BY expr |col ORDER BY expr | col);

Example:

tutorial=> SELECT pat_id, dept_id, ins_amt, MIN(ins_amt) OVER(PARTITION BY dept_id ORDER BY dept_id),MAX(ins_amt) OVER(PARTITION BY dept_id ORDER BY dept_id) from patient; 
 pat_id | dept_id | ins_amt | min | max 
--------+---------+---------+--------+-------- 
 5 | 101 | 100010 | 100010 | 103000 
 3 | 101 | 103000 | 100010 | 103000 
 9 | 103 | 170000 | 100600 | 170000 
 8 | 103 | 100600 | 100600 | 170000 
 1 | 100 | 100000 | 100000 | 150000 
 2 | 100 | 150000 | 100000 | 150000 
 4 | 100 | 120000 | 100000 | 150000 
 6 | 102 | 110000 | 100000 | 110000 
 7 | 102 | 100000 | 100000 | 110000 
 10 | 104 | 109000 | 109000 | 109000 
(10 rows)

LEAD and LAG Analytics Functions

LEAD and LAG analytics functions provides access to more than one row of the same table without doing a self join. You can use these functions to analyze various change and variations in the data.

Syntax:

LEAD(expr [,offset] [,exprdefault]) OVER ( [PARTITION BY expr] ORDER BY expr );
LAG( expr [, offset ] [, default]) OVER ( [PARTITION BYexpr ] ORDER BY expr );

LEAD and LAG provides access to a row at a given physical offset after that position. If offset is not specified, the default offset is 1. default sets the value that is returned if the offset goes beyond the scope of the window. If default is not specified, the default value is null.

Example:

tutorial=> SELECT PAT_ID, DEPT_ID, INS_AMT, LEAD(INS_AMT,1,0) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC ) AS LEAD_INS_AMT, LAG(INS_AMT,1,0) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC ) AS LAG_INS_AMT FROM PATIENT; 
 pat_id | dept_id | ins_amt | lead_ins_amt | lag_ins_amt 
--------+---------+---------+--------------+------------- 
 5 | 101 | 100010 | 103000 | 0 
 3 | 101 | 103000 | 0 | 100010 
 8 | 103 | 100600 | 170000 | 0 
 9 | 103 | 170000 | 0 | 100600 
 1 | 100 | 100000 | 120000 | 0 
 4 | 100 | 120000 | 150000 | 100000 
 2 | 100 | 150000 | 0 | 120000 
 7 | 102 | 100000 | 110000 | 0 
 6 | 102 | 110000 | 0 | 100000 
 10 | 104 | 109000 | 0 | 0 
(10 rows)

FIRST_VALUE and LAST_VALUE Analytics Functions

These analytics functions return the first value and last values in an ordered set of values.

Syntax:

FIRST_VALUE( expr ) OVER ( [PARTITION BY expr ] ORDER BY expr [ROWS|RANGEframe_expr ] );
LAST_VALUE(expr) OVER ( [PARTITION BY expr] ORDER BY expr[ROWS|RANGE frame_expr] );

Example:

tutorial=> SELECT PAT_ID,DEPT_ID, INS_AMT, FIRST_VALUE(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ) AS LOW_INS_AMT, LAST_VALUE(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ) AS HIGH_INS_AMT FROM PATIENT; 
 pat_id | dept_id | ins_amt | low_ins_amt | high_ins_amt 
--------+---------+---------+-------------+-------------- 
 5 | 101 | 100010 | 100010 | 103000 
 3 | 101 | 103000 | 100010 | 103000 
 8 | 103 | 100600 | 100600 | 170000 
 9 | 103 | 170000 | 100600 | 170000 
 1 | 100 | 100000 | 100000 | 150000 
 4 | 100 | 120000 | 100000 | 150000 
 2 | 100 | 150000 | 100000 | 150000 
 7 | 102 | 100000 | 100000 | 110000 
 6 | 102 | 110000 | 100000 | 110000 
 10 | 104 | 109000 | 109000 | 109000 
(10 rows)

percent_rank Analytics Function

Calculates the rank of row R minus 1, divided by 1 less than the number of rows that are being evaluated within a window partition.

Syntax:

PERCENT_RANK () OVER ( [PARTITION BY expr] ORDER BY expr );

Example:

tutorial=> SELECT PAT_ID, DEPT_ID, INS_AMT, PERCENT_RANK () OVER (PARTITION BY DEPT_ID ORDER BY INS_AMT ) AS PERCENT_RANK FROM PATIENT; 
 pat_id | dept_id | ins_amt | percent_rank 
--------+---------+---------+-------------- 
 5 | 101 | 100010 | 0 
 3 | 101 | 103000 | 1 
 8 | 103 | 100600 | 0 
 9 | 103 | 170000 | 1 
 1 | 100 | 100000 | 0 
 4 | 100 | 120000 | 0.5 
 2 | 100 | 150000 | 1 
 7 | 102 | 100000 | 0 
 6 | 102 | 110000 | 1 
 10 | 104 | 109000 | 0 
(10 rows)

ntile Analytics Function

Divides an ordered data set into a number of buckets and assigns a bucket number to each row.

Syntax:

NTILE(expr) OVER ( [PARTITION BY expr] ORDER BY expr );

Example:

tutorial=> SELECT PAT_ID, DEPT_ID, INS_AMT, NTILE (4) OVER ( ORDER BY INS_AMT ) AS NTILE FROM PATIENT; 
 pat_id | dept_id | ins_amt | ntile 
--------+---------+---------+------- 
 7 | 102 | 100000 | 1 
 1 | 100 | 100000 | 1 
 5 | 101 | 100010 | 1 
 8 | 103 | 100600 | 2 
 3 | 101 | 103000 | 2 
 10 | 104 | 109000 | 2 
 6 | 102 | 110000 | 3 
 4 | 100 | 120000 | 3 
 2 | 100 | 150000 | 4 
 9 | 103 | 170000 | 4 
(10 rows)

cume_dist Analytics Function

Calculates the cumulative distribution of a value in a group of values. Rows with equal values always evaluate to the same cumulative distribution value.

Syntax:

CUME_DIST() OVER ( [PARTITION BY expr ] ORDER BY expr );

Example:

tutorial=> SELECT PAT_ID, DEPT_ID, INS_AMT, cume_dist () OVER ( PARTITION BY dept_id ORDER BY ins_amt ) AS cume_dist FROM PATIENT; 
 pat_id | dept_id | ins_amt | cume_dist 
--------+---------+---------+------------------- 
 5 | 101 | 100010 | 0.5 
 3 | 101 | 103000 | 1 
 8 | 103 | 100600 | 0.5 
 9 | 103 | 170000 | 1 
 1 | 100 | 100000 | 0.333333333333333 
 4 | 100 | 120000 | 0.666666666666667 
 2 | 100 | 150000 | 1 
 7 | 102 | 100000 | 0.5 
 6 | 102 | 110000 | 1 
 10 | 104 | 109000 | 1 
(10 rows)

Row_number, Rank and Dense Rank Greenplum analytics Functions

The row_number analytic function is used to assign unique values to each row or rows within group.

The Rank analytic function is used to get rank of the rows in column or within group. Rows with equal values receive the same rank with next rank value skipped. The rank analytic function is used in top n analysis.

The Dense rank function returns the rank of a value in a group. Rows with equal values for ranking criteria receive the same rank and assign rank in sequential order i.e. no rank values are skipped. The rank analytic function is used in top n analysis

Syntax:

ROW_NUMBER() OVER( window_spec); RANK() OVER( window_spec); DENSE_RANK() OVER( window_spec);

Examples;

tutorial=> SELECT PAT_ID, DEPT_ID, INS_AMT, ROW_NUMBER() OVER (ORDER BY INS_AMT) AS RN, RANK() OVER (ORDER BY INS_AMT ) AS RK, DENSE_RANK() OVER (ORDER BY INS_AMT ) AS DENSE_RK FROM PATIENT; 
 pat_id | dept_id | ins_amt | rn | rk | dense_rk 
--------+---------+---------+----+----+---------- 
 7 | 102 | 100000 | 1 | 1 | 1 
 1 | 100 | 100000 | 2 | 1 | 1 
 5 | 101 | 100010 | 3 | 3 | 2 
 8 | 103 | 100600 | 4 | 4 | 3 
 3 | 101 | 103000 | 5 | 5 | 4 
 10 | 104 | 109000 | 6 | 6 | 5 
 6 | 102 | 110000 | 7 | 7 | 6 
 4 | 100 | 120000 | 8 | 8 | 7 
 2 | 100 | 150000 | 9 | 9 | 8 
 9 | 103 | 170000 | 10 | 10 | 9 
(10 rows)