Hadoop Hive Transactional Table Update join and Example

  • Post author:
  • Post last modified:November 2, 2019
  • Post category:BigData
  • Reading time:5 mins read

As you know Apache Hive is a data warehouse framework on top of Hadoop HDFS. Since it contains tables, you may want to update records of that table based on how your data changes. Until recently Apache Hive was not supporting transactions. Starting Hive 0.14 and above supports transactional table. You need to enable ACID properties in order to use update, delete, merge in your Hive queries. In this article, we will address How to use update join on your Hive transactional table.

Hadoop Hive Transactional Table Update join and Example

You can also update Hive table without setting ACID properties. But in this article we will stick to Hive transactional tables. Update Joins are usually used in incremental load such as implementing slowly changing dimension tables.

Hive Transactional Table Update join

Apache Hive does support simple update statements that involve only one table that you are updating. You can use the Hive update statement with only static values in your SET clause.

For example, consider below simple update statement with static value.

UPDATE sales_by_month 
SET    total_revenue = 14.60 
WHERE  store_id = 3;

In reality, update statements are much more complex that involves two or more tables.

For example, consider below update statement which uses two tables. Employee table will be updated with value which are present in empl table. These types of statements are called update join conditions.

UPDATE employee A
SET E_EmpNo = B.E_EmpNo
FROM empl B
WHERE
A.E_EmpNo = B.E_EmpNo;

Hive MERGE as an Update join Alternative

Luckily, you have Hive Merge statement. MERGE statement is supported starting from Hive 2.2 and above. You can use merge statement as an alternative to Hive update join.

For example, consider below example to use MERGE statement as an alternative query to update tables using join conditions.

merge into employee1 
using empl z 
on employee1.E_EmpNo=z.E_EmpNo 
when matched then 
update set id=z.E_EmpNo;

Related Articles

Hope this helps 🙂