Netezza Analytic Functions

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Netezza
  • Reading time:6 mins read

Netezza analytic functions compute an aggregate value that is based on a group of rows. A Netezza SQL analytic function works on the group of rows and ignores the NULL in the dat

netezza analytic functions

Netezza server is basically an analytics system and provides many useful functions that can perform day to day aggregations. As these functions are native to Netezza server, hence use of these Netezza analytic functions improves performance of SQL queries as well as server performance.

Frequently used Netezza analytical functions are as follows:

COUNT Analytic Function

Returns the number of rows in query or group of rows.

Syntax:

COUNT(column reference | value expression | *) over(window_spec)

For Example;

Consider the below sample patient details. We will also be using this patient table to demonstrate subsequent Netezza analytic functions.

TRAINING.ADMIN(ADMIN)=> select * from PATIENT;

PAT_ID | INS_AMT | DEPT_ID 
--------+---------+--------- 
 1 | 100000 | 111 
 3 | 150000 | 222 
 5 | 50000 | 111 
 5 | 890000 | 222 
 7 | 110000 | 333 
 2 | 150000 | 111 
 4 | 250000 | 222 
 6 | 90000 | 111 
 8 | 10000 | 444

and lets say, you want to write query to count number of patients in each department. Query and output as follows:

SELECT PAT_ID, 
DEPT_ID, 
COUNT(*) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC) AS PAT_CNT 
FROM PATIENT;

PAT_ID | DEPT_ID | PAT_CNT 
--------+---------+--------- 
 2 | 111 | 4 
 1 | 111 | 4 
 6 | 111 | 4 
 5 | 111 | 4 
 4 | 222 | 3 
 3 | 222 | 3 
 5 | 222 | 3 
 7 | 333 | 1 
 8 | 444 | 1

SUM Analytic Function

Just like sum function, sum Netezza analytic function is used to compute the sum of columns or expression. Sum analytic function is used to compute the sum of all rows of table or rows within the groups.

Syntax:

SUM(column | expression) OVER( window_spec)

For example:

Calculate sum insured amount of all patients within each department. Query and output as follows:

SELECT PAT_ID, 
DEPT_ID, 
INS_AMT, 
SUM(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC) AS TOTAL_INS_AMT 
FROM PATIENT ;

PAT_ID | DEPT_ID | INS_AMT | TOTAL_INS_AMT 
--------+---------+---------+--------------- 
 2 | 111 | 150000 | 390000 
 1 | 111 | 100000 | 390000 
 6 | 111 | 90000 | 390000 
 5 | 111 | 50000 | 390000 
 4 | 222 | 250000 | 1290000 
 3 | 222 | 150000 | 1290000 
 5 | 222 | 890000 | 1290000 
 7 | 333 | 110000 | 110000 
 8 | 444 | 10000 | 10000

MIN and MAX Analytic Function

Like the SQL MIN and MAX functions, Netezza analytic MIN and MAX functions are used to compute the MIN and MAX of the rows in the column or expression and on rows within group.

Syntax:

MIN(column | expression) OVER( window_spec)
MAX(column | expression) OVER( window_spec)

For example;

Calculate Min and Max of insured amount of all patients within each department. Query and output as follows:

SELECT PAT_ID, 
DEPT_ID, 
INS_AMT, 
MIN(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC) AS MIN_INS_AMT, 
MAX(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC) AS MAX_INS_AMT 
FROM PATIENT ;

PAT_ID | DEPT_ID | INS_AMT | MIN_INS_AMT | MAX_INS_AMT 
--------+---------+---------+-------------+------------- 
 2 | 111 | 150000 | 50000 | 150000 
 1 | 111 | 100000 | 50000 | 150000 
 6 | 111 | 90000 | 50000 | 150000 
 5 | 111 | 50000 | 50000 | 150000 
 4 | 222 | 250000 | 150000 | 890000 
 3 | 222 | 150000 | 150000 | 890000 
 5 | 222 | 890000 | 150000 | 890000 
 7 | 333 | 110000 | 110000 | 110000 
 8 | 444 | 10000 | 10000 | 10000

LEAD and LAG Analytic Function

Lead and Lag Netezza analytic functions used to compare different rows of a table by specifying an offset from the current row. You can use these functions to analyze change and variation.

Syntax:

LEAD(column, offset, default) OVER( window_spec)
LAG(column, offset, default) OVER( window_spec)

The default value of offset is 1. Offset is the relative position of the row to be accessed. If there is no row next/prior to access the LEAD/LAG function returns NULL, You can change this NULL value by specifying the “default” values.

For example;

Get the insured amount of the patient later and prior than the current rows in each department. Query and output as follows:

SELECT PAT_ID, 
DEPT_ID, 
INS_AMT, 
LEAD(INS_AMT,1,0) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC ) AS LEAD_INS_AMT, 
LAG(INS_AMT,1,0) OVER (PARTITION BY DEPT_ID ORDER BY DEPT_ID ASC ) AS LAG_INS_AMT 
FROM PATIENT;

PAT_ID | DEPT_ID | INS_AMT | LEAD_INS_AMT | LAG_INS_AMT 
--------+---------+---------+--------------+------------- 
 1 | 111 | 100000 | 150000 | 0 
 2 | 111 | 150000 | 90000 | 100000 
 6 | 111 | 90000 | 50000 | 150000 
 5 | 111 | 50000 | 0 | 90000 
 7 | 333 | 110000 | 0 | 0 
 3 | 222 | 150000 | 890000 | 0 
 5 | 222 | 890000 | 250000 | 150000 
 4 | 222 | 250000 | 0 | 890000 
 8 | 444 | 10000 | 0 | 0

FIRST_VALUE and LAST_VALUE Analytic Function

You can use the Netezza first_value and last_value Netezza analytic functions to find the first value and last value in a column or expression or within group of rows. You must specify the sort criteria to determine the first and last values.

Syntax:

FIRST_VALUE(column | expression) OVER( window_spec)
LAST_VALUE(column | expression) OVER( window_spec)

For example;

Compute the lowest and highest insured patients in each department. Query and output as follows:

SELECT PAT_ID, 
DEPT_ID, 
INS_AMT, 
FIRST_VALUE(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY INS_AMT ) AS LOW_INS_AMT, 
LAST_VALUE(INS_AMT) OVER (PARTITION BY DEPT_ID ORDER BY INS_AMT ) AS HIGH_INS_AMT 
FROM PATIENT;

PAT_ID | DEPT_ID | INS_AMT | LOW_INS_AMT | HIGH_INS_AMT 
--------+---------+---------+-------------+-------------- 
 2 | 111 | 150000 | 50000 | 150000 
 1 | 111 | 100000 | 50000 | 150000 
 6 | 111 | 90000 | 50000 | 150000 
 5 | 111 | 50000 | 50000 | 150000 
 7 | 333 | 110000 | 110000 | 110000 
 5 | 222 | 890000 | 150000 | 890000 
 4 | 222 | 250000 | 150000 | 890000 
 3 | 222 | 150000 | 150000 | 890000 
 8 | 444 | 10000 | 10000 | 10000

ROW_NUMBER, RANK and DENSE_RANK Analytical Functions

The row_number Netezza analytic function is used to assign unique values to each row or rows within group.

The Rank Netezza analytic function is used to get rank of the rows in column or within group. Rows with equal values receive the same rank with next rank value skipped. The rank analytic function is used in top n analysis.

The Dense rank Netezza function returns the rank of a value in a group. Rows with equal values for ranking criteria receive the same rank and assign rank in sequential order i.e. no rank values are skipped. The rank analytic function is used in top n analysis

Syntax:

ROW_NUMBER() OVER( window_spec)
RANK() OVER( window_spec)
DENSE_RANK() OVER( window_spec)

For example;

Assign row number, rank on insured amount using netezza analytic functions. Query and output as follows:

SELECT PAT_ID, 
DEPT_ID, 
INS_AMT, 
ROW_NUMBER() OVER (ORDER BY INS_AMT) AS RN, 
RANK() OVER (ORDER BY INS_AMT ) AS RK, 
DENSE_RANK() OVER (ORDER BY INS_AMT ) AS DENSE_RK 
FROM PATIENT;

PAT_ID | DEPT_ID | INS_AMT | RN | RK | DENSE_RK 
--------+---------+---------+----+----+---------- 
 8 | 444 | 10000 | 1 | 1 | 1 
 5 | 111 | 50000 | 2 | 2 | 2 
 6 | 111 | 90000 | 3 | 3 | 3 
 1 | 111 | 100000 | 4 | 4 | 4 
 7 | 333 | 110000 | 5 | 5 | 5 
 2 | 111 | 150000 | 6 | 6 | 6 
 3 | 222 | 150000 | 7 | 6 | 6 
 4 | 222 | 250000 | 8 | 8 | 7 
 5 | 222 | 890000 | 9 | 9 | 8

Read: