Apache Hive Derived Column Support and Alternative

  • Post author:
  • Post last modified:March 22, 2018
  • Post category:BigData
  • Reading time:3 mins read

Derived columns are columns that are derived from the previously derived or computed columns in same table. Derived columns or computed columns are virtual columns that are not physically stored in the table. Their values are re-calculated every time they are referenced in a query. Many relational databases such as Netezza supports derived columns but Apache Hive does not support derived columns. In this article, we will check Apache Hive Derived Column Support and Alternative method that you can use to derive columns.

Apache Hive Derived Column Support and Alternative

What are derived columns?

Before going in details about methods to derive column, first, understand what derived column is. To understand derived columns, let us consider an example.

Consider below SQL query:

select col1,
col2*10 as col2_1,
Col2_1*100 as col3_1
from some_table;

As you can see in above SQL query example, col3_1 uses previously derived or computed column col2_1 for its calculations. Col3_1 is called derived columns and this type of reference to previously computed column is not supported in Apache Hive.

Hive derived columns are columns that you derive from other previously computed column in same query context.

Apache Hive Derived Column Alternative

In most of the cases when you are porting SQL queries from existing relational databases to Apache Hive, you may come up across the derived columns. You will have to find alternate method to derive those columns.

As an alternative to relational database derived columns, you can use Apache Hive WITH clause or Common Table Expressions (CTE).

You can read more about Apache Hive WITH clause or common table expression (CTE) in my other post:

Apache Hive Derived Column Alternative Examples

Below is an example of the derived column alternative in Hive:

with CTE1 as (
select col1, col1*10 as col2_1
from table1)
select a.col1,
b.col2_1,
b.col2_1*100 as col3_1
from table1 a
join CTE1 b
on(a.col1 = b.col1);

OK

1       10      1000

2       20      2000

3       30      3000

4       40      4000

Time taken: 8.544 seconds, Fetched: 4 row(s)

Use Hive WITH clause to create CTE which has derived column. You can simply join that common table expression or CTE with base table to derive required columns.

Related readings: