Teradata Analytics Functions and Examples

Teradata analytic functions compute an aggregate value that is based on a group of rows optionally partitioning among rows based on given partition column. Just like other analytics systems, Teradata analytics functions works on the group of rows and optionally ignores the NULL in the data.

Teradata Analytics Functions

Teradata also released the analytics system which provides more useful methods. The regular Teradata also provides some useful analytics function that you can use to perform day to day aggregation for reporting.

Read:

Teradata Analytics Functions

Frequently used Teradata analytics functions as follows:

Teradata COUNT Analytic Function

The count analytics function returns the number of rows in column or group of rows if you specify partition column.

Syntax:

COUNT(column reference | value expression | *) over(window_spec)

For Example;

Consider the below sample patient details data. We will also be using this table to demonstrate subsequent Teradata analytic functions.

ID	Insurance_amt	DEP_ID
7	110000	333
5	50000	111
5	890000	222
3	150000	222
1	100000	111
8	10000	444
6	90000	111
4	250000	222
2	150000	111

Let us say you want to count number of patients in each department. You can use the COUNT analytics function.

SELECT ID, 
DEP_ID, 
COUNT(*) OVER (PARTITION BY DEP_ID ORDER BY DEP_ID ASC) AS PAT_CNT 
FROM patient_dtl
ORDER BY ID ;

ID DEP_ID PAT_CNT
1 111 4
2 111 4
3 222 3
4 222 3
5 222 3
5 111 4
6 111 4
7 333 1
8 444 1

Teradata SUM Analytic Function

Teradata SUM analytic function is used to compute the sum of values in columns or given expression. Sum analytic function is used to compute the sum of all rows of table column or rows within the group of rows if you specify the partition column.

Syntax:

SUM(column | expression) OVER( window_spec)

For example:

Calculate sum insured amount of all patients within each department. Query and output as follows:

SELECT ID, 
DEP_ID, 
Insurance_amt, 
SUM(Insurance_amt) OVER (PARTITION BY DEP_ID ORDER BY DEP_ID ASC) AS TOTAL_INS_AMT 
FROM patient_dtl
ORDER by ID;

ID DEP_ID Insurance_amt TOTAL_INS_AMT
1 111 100000 390000
2 111 150000 390000
3 222 150000 1290000
4 222 250000 1290000
5 222 890000 1290000
5 111 50000 390000
6 111 90000 390000
7 333 110000 110000
8 444 10000 10000

Teradata MIN and MAX Analytic Function

Teradata analytics MIN and MAX functions are used to compute the MIN and MAX of the rows in the column or expression and on rows within group if partition column is specified.

Syntax:

MIN(column | expression) OVER( window_spec);
MAX(column | expression) OVER( window_spec);

For example;

Calculate Min and Max of insured amount of all patients within each department. Query and output as follows:

SELECT ID, 
DEP_ID, 
Insurance_amt, 
MIN(Insurance_amt) OVER (PARTITION BY DEP_ID ORDER BY DEP_ID ASC) AS MIN_INS_AMT, 
MAX(Insurance_amt) OVER (PARTITION BY DEP_ID ORDER BY DEP_ID ASC) AS MAX_INS_AMT 
FROM patient_dtl 
ORDER BY ID; 

ID DEP_ID Insurance_amt MIN_INS_AMT MAX_INS_AMT
1 111 100000 50000 150000
2 111 150000 50000 150000
3 222 150000 150000 890000
4 222 250000 150000 890000
5 222 890000 150000 890000
5 111 50000 50000 150000
6 111 90000 50000 150000
7 333 110000 110000 110000
8 444 10000 10000 10000

Teradata FIRST_VALUE and LAST_VALUE Analytic Function

You can use Teradata first_value and last_value analytic functions to find the first value and last value in a column or expression or within group of rows. These functions are very handy if you are looking for first or last values in column or within group of rows for comparison purpose. It is mandatory to specify sort criteria to determine the first and last values.

Syntax:

FIRST_VALUE(column | expression) OVER( window_spec);
LAST_VALUE(column | expression) OVER( window_spec);

For example;

Compute the lowest and highest insured patients in each department. Query and output as follows:

SELECT ID, 
DEP_ID, 
Insurance_amt, 
FIRST_VALUE(Insurance_amt) OVER (PARTITION BY DEP_ID ORDER BY Insurance_amt ) AS LOW_INS_AMT, 
LAST_VALUE(Insurance_amt) OVER (PARTITION BY DEP_ID ORDER BY Insurance_amt ) AS HIGH_INS_AMT 
FROM patient_dtl
ORDER BY ID;

ID DEP_ID Insurance_amt LOW_INS_AMT HIGH_INS_AMT
1 111 100000 50000 100000
2 111 150000 50000 150000
3 222 150000 150000 150000
4 222 250000 150000 250000
5 222 890000 150000 890000
5 111 50000 50000 50000
6 111 90000 50000 90000
7 333 110000 110000 110000
8 444 10000 10000 10000

Teradata ROW_NUMBER, RANK and DENSE_RANK Analytical Functions

ROW_NUMBER : The row_number Teradata analytic function is used to assign unique values to each row of the column or rows within group.

RANK: The Rank Teradata analytic function is used to get rank of the rows in column or within group. Rows with equal values receive the same rank with subsequent rank value skipped. The rank analytic function is usually used in top n analysis.

DENSE_RANK : The Dense rank Teradata function returns the rank of a value in a group. Rows with equal values for ranking criteria receive the same rank and assign rank in sequential order i.e. no rank values are skipped. The rank analytic function is also used in top n analysis

Syntax:

ROW_NUMBER() OVER( window_spec);
RANK() OVER( window_spec);
DENSE_RANK() OVER( window_spec);

For example;

Assign row number, rank on insured amount using Teradata analytic functions. Query and output as follows:

SELECT ID, 
DEP_ID, 
Insurance_amt, 
ROW_NUMBER() OVER (ORDER BY Insurance_amt) AS RN, 
RANK() OVER (ORDER BY Insurance_amt ) AS RK, 
DENSE_RANK() OVER (ORDER BY Insurance_amt ) AS DENSE_RK 
FROM patient_dtl
ORDER BY RN;

ID DEP_ID Insurance_amt RN RK DENSE_RK
8 444 10000 1 1 1
5 111 50000 2 2 2
6 111 90000 3 3 3
1 111 100000 4 4 4
7 333 110000 5 5 5
2 111 150000 6 6 6
3 222 150000 7 6 6
4 222 250000 8 8 7
5 222 890000 9 9 8

Also read my other posts on Apache Hive