How to Write Hive Conditional Join and Example

  • Post author:
  • Post last modified:January 6, 2020
  • Post category:BigData
  • Reading time:5 mins read

The join in Hive is used to combine the records from multiple tables based on the joining condition. The joining condition can be on the common columns between participating tables. Hadoop Hive supports various join types. You may get the requirement to choose the joining values based on certain conditions. For example, you can have OR condition in the joining condition. In this article, we will check how to write Conditional Join in Hive with some examples.

How to Write Hive Conditional Join and Example

Hive Conditional Join

You never know what type of requirement that you are going to get when you work on different data source. One of such a requirement is to write conditional join. For example, use conditional functions in the joining condition.

Related Articles,

Another requirement could be to use OR condition to define different joining condition to return records based on the result of the condition.

Hive Join OR Condition Alternative

As mentioned earlier, this is one of the common requirements to use OR operator in the join condition.

For example, following query will work without any issue with relational databases.

select * from test1 as t1
join test2 as t2
on (t1.col1 = t2.col1 and t1.col1 = 1)
or (t1.col1 = t2.col2 and t1.col1 = 2);

+-----------+-----------+------------+------------+--+
| col1      | col2      | col2       | col3       |
+-----------+-----------+------------+------------+--+
| 1         | 1         | 1          | 2          |
| 2         | 2         | 1          | 2          |
+-----------+-----------+------------+------------+--+

But, the same example will not work on Apache Hive. You will end up getting error something like, “OR not supported in JOIN currently”.

Following example is the Hive alternative queries to deal with OR operators in Hive joining condition.

select * from test1 as t1
join test2 as t2
on (t1.col1 = t2.col1 and t1.col1 = 1)
union all
select * from test1 as t1
join test2 as t2
on (t1.col1 = t2.col2 and t1.col1 = 2);

+-----------+-----------+------------+------------+--+
| _u1.col1  | _u1.col2  | _u1._col2  | _u1._col3  |
+-----------+-----------+------------+------------+--+
| 1         | 1         | 1          | 2          |
| 2         | 2         | 1          | 2          |
+-----------+-----------+------------+------------+--+

Use CASE Statement in Hive Join Condition

The Hive set operators such as UNION ALL will be heavy on the system if the participating tables are huge.

You can write the same query using conditional CASE statement.

For example, consider below query that demonstrates usage of CASE statement in the join condition.

select * from test1 as t1, test2 as t2
where  t1.col1 = case when t1.col1 = 1 then t2.col1 
when t1.col1 = 2 then t2.col2 end; 

+----------+----------+----------+----------+--+
| t1.col1  | t1.col2  | t2.col1  | t2.col2  |
+----------+----------+----------+----------+--+
| 1        | 1        | 1        | 2        |
| 2        | 2        | 1        | 2        |
+----------+----------+----------+----------+--+

As you can see, we have used the CASE statement to choose the column for joining condition.

Note that, we have used the WHERE clause instead of inner join to avoid Both left and right aliases encountered in JOIN ‘col2’error.

Related Articles,

Hope this helps 🙂