Redshift Reuse Computed column – Derived Column

  • Post author:
  • Post last modified:July 8, 2021
  • Post category:Redshift
  • Reading time:6 mins read

Redshift Derived columns are columns that are derived from the previously computed columns in same SELECT statement. Derived columns or computed columns are virtual columns that are not physically stored in the table. Their values are re-calculated every time they are referenced in a query. Many PostgreSQL relational databases such as Netezza supports reuse of the calculated column within the same SELECT statement but Amazon Redshift does not support reuse of calculated derived columns. In this article, we will identify the alternate methods that you can use.

What are derived columns?

Before jumping into alternate methods to reuse of the computed column in Redshift, let us first understand what is derived column in SQL queries.

To understand derived columns, let us consider an example.

For example, consider below SQL query:

select col1,
col2*10 as col2_1,
Col2_1*100 as col3_1
from your_table;

As you can see in above SQL query example, col3_1 uses previously derived or computed column col2_1 for its calculations. Col3_1 is called derived columns and as of now, this type of reference to previously computed column is now supported in Amazon Redshift. They are called lateral column alias.

Derived columns are columns that you derive from other previously computed column in same query context.

Redshift Reuse Computed column

The derived columns are very common in relational databases. You will find them extensively used in complex queries. You may face challenges when you try to port queries which has derived columns in it.

There are two methods that you can use to reuse computed columns in Redshift:

Redshift lateral column alias

Amazon Redshift now enables you to write queries that refer to a column alias within the same query immediately after it is declared, improving the readability of complex SQL queries.

For example,

SELECT col1,
       col2 * 10    AS col2_1,
       col2_1 * 100 AS col3_1
FROM   sample_table1 ; 
 col1 |   col2_1 |   col3_1
------+----------+----------
    1 |       10 |     1000
   10 |      100 |    10000
   20 |      200 |    20000
(3 rows)
Compute Column Using Redshift WITH clause

This is one of the easiest method that you can use to derive columns.

For example, consider below query that compute derive column.

WITH cte1 
     AS (SELECT col1, 
                col1 * 10 AS der_col2 
         FROM   sample_table1) 
SELECT a.col1, 
       b.der_col2, 
       b.der_col2 * 100 AS der_col3
FROM   sample_table1 a 
       JOIN cte1 b 
         ON( a.col1 = b.col1 );

 col1 | der_col2 | der_col3
------+----------+----------
    1 |       10 |     1000
   10 |      100 |    10000
   20 |      200 |    20000
(3 rows)

As you can see, der_col2 is derived from the col1 and der_col3 is derived by reusing der_col2 .

You cam read more about Redshift WITH clause in my other post:

Subquery to Derive Column

Another simple method is to use subquery to derive column and reuse that column in upper SELECT statement.

For example, consider below query with subquery.

SELECT sub.col1, 
       sub.der_col2, 
       sub.der_col2 * 100 AS der_col3
FROM
(
SELECT col1, 
                col1 * 10 AS der_col2 
         FROM   sample_table1
) as sub;

 col1 | der_col2 | der_col3
------+----------+----------
    1 |       10 |     1000
   10 |      100 |    10000
   20 |      200 |    20000
(3 rows)

As you can see, der_col2 is derived from the col1 in subquery and der_col3 is derived by reusing der_col2 in upper SELECT clause.

You read more about subqueries in my other post:

Hope this helps 🙂