Apache Hive Supported Subqueries and Examples

  • Post author:
  • Post last modified:June 2, 2021
  • Post category:BigData
  • Reading time:4 mins read

A subquery in Hive is a select expression that is enclosed in parentheses as a nested query block in a HiveQL query statement. The subquery in Hive is like other relational database subquery that may return zero to one or more values to its upper select statements. In this article, we will check Apache Hive supported subqueries and some examples.

Apache Hive Supported Subqueries

Apache Hive Supported Subqueries

As mentioned above, Hive subquery is a select expression enclosed in parenthesis as a nested query block.

You can use these nested query blocks in any of the following HiveQL statements:

  • SELECT
  • CREATE TABLE AS
  • INSERT INTO

The upper query or parent query that contains the subquery is called a super query or outer query. Subqueries are used to calculate or derive values that will be used in outer SELECT statements.

Related reading:

Types of Hive Subqueries

Not all types of subqueries that are supported in relational databases are supported in Hive.

Below are type of subqueries that are supported:

Table Subquery

You can write the subquery in place of table name. Below is the example of using subqueries in table:

https://gist.github.com/97c7e78334ef64319e0a45a4357d892e

You should provide the alias name to the table subquery, otherwise you may end up with error.

Subquery in WHERE clause

Apache Hive supports writing subquery in WHERE clause. These types of subqueries are widely used in HiveQL queries and statements. You can either return the single value or multiple values from the query from WHERE clause. If you are returning single values, use equality operator otherwise IN operator.

Below is the example demonstrating subqueries in Hive WHERE clause:

https://gist.github.com/277e71ec65ac712989c5eceb8880ddcf

Correlated Subquery

Apache Hive provides limited support to correlated sub queries. Correlated subqueries are queries in which subquery refers to the column from parent table clause.

You can read more about correlated subquery on other post :