Redshift analytic functions compute an aggregate value that is based on a group of rows in the tables. A Redshift PostgreSQL analytic function works on the group of rows and ignores the NULL in the data if you specify explicitly in the function.
Redshift is basically a data warehouse analytics system and provides many useful functions that can perform day to day aggregations that save lot of times during the development. Usage of Redshift analytic function improves the performance of the query.
Frequently used Redshift analytical functions are as follows:
COUNT Analytic Function
Returns number of rows in query or group of rows in table.
Syntax:
COUNT(column reference | value expression | *) over(window_spec);
For Example;
select product_id, Prod_cat , count(*) over ( partition by prod_cat order by sal_amt rows unbounded preceding) as sale_cnt from sample_sale order by sal_amt ;
SUM Analytic Function
Just like sum function, Redshift Sum analytic function is used to compute the sum of columns or expression. Sum analytic function is used to compute the sum of all rows of table or rows within the groups.
Syntax:
SUM(column | expression) OVER( window_spec);
For example:
Calculate sum insured amount of all patients within each department. Query and output as follows:
select product_id, Prod_cat , sum(sal_amt) over ( partition by prod_cat order by sal_amt rows unbounded preceding) as sale_sum from sample_sale order by sal_amt ;
MIN and MAX Analytic Function
Like the SQL MIN and MAX functions, Redshift analytic MIN and MAX functions are used to compute the MIN and MAX of the rows in the column or expression and on rows within group.
Syntax:
MIN(column | expression) OVER( window_spec); MAX(column | expression) OVER( window_spec);
For example;
Calculate Min and Max of insured amount of all patients within each department. Query and output as follows:
select product_id, Prod_cat , min(sal_amt) over ( partition by prod_cat order by sal_amt rows unbounded preceding) as sale_min, max(sal_amt) over ( partition by prod_cat order by sal_amt rows unbounded preceding) as sale_max from sample_sale order by sal_amt ;
LEAD and LAG Analytic Function
Lead and Lag Redshift analytic functions used to compare different rows of a table by specifying an offset from the current row. You can use these functions to analyze change and variation.
Syntax:
LEAD(column, offset, default) OVER( window_spec); LAG(column, offset, default) OVER( window_spec);
The default value of offset is 1. Offset is the relative position of the row to be accessed. If there is no row next/prior to access the LEAD/LAG function returns NULL, You can change this NULL value by specifying the “default” values.
For example;
Get the insured amount of the patient later and prior than the current rows in each department. Query and output as follows:
select product_id, Prod_cat , lead(product_id) over ( order by sal_amt ) as product_id_lead, lag(product_id) over ( order by sal_amt ) as product_id_lag from sample_sale order by sal_amt;
FIRST_VALUE and LAST_VALUE Analytic Function
You can use the Redshift first_value and last_value analytic functions to find the first value and last value in a column or expression or within group of rows. You must specify the sort criteria to determine the first and last values.
Syntax:
FIRST_VALUE(column | expression) OVER( window_spec); LAST_VALUE(column | expression) OVER( window_spec);
For example;
Compute the lowest and highest insured patients in each department. Query and output as follows:
select product_id, Prod_cat , first_value(sal_amt) over ( partition by prod_cat order by sal_amt rows unbounded preceding) as sale_first_val, last_value(sal_amt) over ( partition by prod_cat order by sal_amt rows unbounded preceding) as sale_first_val from sample_sale order by sal_amt ;
ROW_NUMBER, RANK and DENSE_RANK Analytical Functions
The row_number Redshift analytic function is used to assign unique values to each row or rows within group.
The Rank Redshift analytic function is used to get rank of the rows in column or within group. Rows with equal values receive the same rank with next rank value skipped. The rank analytic function is used in top n analysis.
The Dense rank Redshift function returns the rank of a value in a group. Rows with equal values for ranking criteria receive the same rank and assign rank in sequential order i.e. no rank values are skipped. The rank analytic function is used in top n analysis
Syntax:
ROW_NUMBER() OVER( window_spec); RANK() OVER( window_spec); DENSE_RANK() OVER( window_spec);
For example;
Assign row number, rank on insured amount using Redshift analytic functions. Query and output as follows:
select product_id, Prod_cat , row_number() over ( order by product_id ) as row_numb, rank() over ( order by product_id ) as row_rank, dense_rank() over ( order by product_id ) as row_dense_rank from sample_sale order by product_id ;
Related Articles
- What are Redshift Sequence Alternative? – Example
- Redshift Cumulative SUM, AVERAGE and Examples
- Redshift ROWNUM Pseudo Column Alternative
- Amazon Redshift WITH Clause Syntax, Usage and Examples
- Commonly used Redshift Date Functions and Examples
- Redshift Reuse Computed column – Derived Column
- Redshift Nested Window Functions and Examples
Hope this helps 🙂