In some application, you may have to derive column values from base tables. For example, you may have to find out maximum value of aggregated column data. In this scenario, you will have to create aggregated data first and then apply MAX function on that column. You can achieve this by using Apache Hive derived tables. We will check type of derived tables supported in Hive with some examples.
Apache Hive Derived Tables
Apache Hive derived tables is a subquery which will be there in FROM clause of the HiveQL statements or is a common table expression (CTE) which is nothing but Hive WITH clause.
The main reason it is called a derived table is because it essentially functions as a table as far as the entire HiveQL query is concerned. Apache Hive derived tables are built dynamically using additional SELECT within the query. Hive derived tables makes use of temporary space and discarded as soon as the query execution is completed.
As mentioned earlier, command table expression or WITH clause is also a derived table. Derived tables created using WITH clause can be used in many locations within query statement. WITH clause is always at the top of query statement followed by SELECT statement of main query.
Apache Hive Derived Tables Example
There are two types of derived tables: derived table in FROM clause and common table expression (CTE) – WITH clause
Derived Table in FROM Clause
Below is the example of creating derived table within FROM clause of HiveQL query:
https://gist.github.com/6475755f4ae893be7b0383cec7315e6d
Common Table Expressions (CTE) – WITH Clause
With the Help of Hive Common Table Expression (CTE) or WITH clause you can reuse piece of query result in same query construct. You can also improve the Hadoop Hive query using WITH clause by using piece of derived table script in with clause and use that CTE in your main SELECT statement. You can use Hive WITH clause to remove complexity.
Read my other post on Hive WITH Clause: