A subquery in Hive is a select expression that is enclosed in parentheses as a nested query block in a HiveQL query statement. The subquery in Hive is like other relational database subquery that may return zero to one or more values to its upper select statements. In this article, we will check Apache Hive supported subqueries and some examples.
Apache Hive Supported Subqueries
As mentioned above, Hive subquery is a select expression enclosed in parenthesis as a nested query block.
You can use these nested query blocks in any of the following HiveQL statements:
- SELECT
- CREATE TABLE AS
- INSERT INTO
The upper query or parent query that contains the subquery is called a super query or outer query. Subqueries are used to calculate or derive values that will be used in outer SELECT statements.
Related reading:
Types of Hive Subqueries
Not all types of subqueries that are supported in relational databases are supported in Hive.
Below are type of subqueries that are supported:
Table Subquery
You can write the subquery in place of table name. Below is the example of using subqueries in table:
You should provide the alias name to the table subquery, otherwise you may end up with error.
Subquery in WHERE clause
Apache Hive supports writing subquery in WHERE clause. These types of subqueries are widely used in HiveQL queries and statements. You can either return the single value or multiple values from the query from WHERE clause. If you are returning single values, use equality operator otherwise IN operator.
Below is the example demonstrating subqueries in Hive WHERE clause:
Correlated Subquery
Apache Hive provides limited support to correlated sub queries. Correlated subqueries are queries in which subquery refers to the column from parent table clause.
You can read more about correlated subquery on other post :
- Apache Hive Correlated Subquery and it’s Restrictions
- Apache Spark SQL Supported Subqueries and Examples