Hive Merge Tables Statement – Alternative and Example

  • Post author:
  • Post last modified:November 2, 2019
  • Post category:BigData
  • Reading time:6 mins read

The MERGE query or statement in SQL is used to perform incremental load. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the condition. i.e. you can update old values and insert new records. The MERGE statement in SQL are mainly used to implement slowly changing dimensions. As of now, Hive does not support MERGE statement. In this article, we will check what is Hive Merge tables alternative with an example.

Sometimes, update insert is also called UPSERT.

Related Article,

SQL Merge Statement

Use the SQL MERGE command to insert, update, or delete rows in a target table using data from a source such as a table, view, or sub-query and based on rules specified in a matching condition in the merge statements.

For examples, let us check sample SQL merge statement that we will be converting to Hive equivalent later.

Note that, starting from Hive 2.2, merge statement is supported in Hive if you create transaction table.

MERGE INTO merge_demo1 A 
using merge_demo2 B 
ON ( A.id = b.id ) 
WHEN matched THEN 
  UPDATE SET A.lastname = B.lastname 
WHEN NOT matched THEN 
  INSERT (id, 
          firstname, 
          lastname) 
  VALUES (B.id, 
          B.firstname, 
          B.lastname);

As you can see in the above example, Merge statement performs an update and insert operations on a target table based on the results of a join with another data set, such as a table or view. The join can match a source row with only one target row.

Hive Merge Tables Statement Alternative

As mentioned earlier, hive does not support MERGE command. We have to identify the alternative method to handle both MATCHED and NOT MATCHED conditions.

Let us rewrite SQL MERGE statement so that we can implement same on Hive.

Test Data

Create sample tables on Hive.

--This table holds the records
CREATE TABLE merge_demo1
(
ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100)
);

-- Reference table
CREATE TABLE merge_demo2
(
ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100)
);

Insert sample records to tables.

--Sample records inserted to table
INSERT INTO merge_demo1 VALUES (1, 'aaaa', 'bbbb');
INSERT INTO merge_demo1 VALUES (2, 'cccc', 'dddd');
INSERT INTO merge_demo1 VALUES (3, 'eeee', 'ffff');
INSERT INTO merge_demo1 VALUES (4, 'gggg', 'hhhh');
INSERT INTO merge_demo1 VALUES (5, 'iiii', 'jjjj');

--Let us say there is last name change for 'cccc' and 'eeee'
INSERT INTO merge_demo2 VALUES (2, 'cccc', 'kkkk');
INSERT INTO merge_demo2 VALUES (3, 'eeee', 'llll');

Hive Merge Tables Statement Alternative Examples

Merge statement is rewritten into multiple steps to handle both MATCHED and NOT MATCHED conditions:

For example,

-- Drop temp table if exists
DROP TABLE IF EXISTS merge_demo1wmmergeupdate; 

-- Create temporary tables to hold merge records
CREATE TABLE merge_demo1wmmergeupdate LIKE merge_demo1; 

-- Insert records when condition is MATCHED
INSERT INTO table merge_demo1WMMergeUpdate 
SELECT A.id        AS ID, 
       A.firstname AS FirstName, 
       CASE 
         WHEN B.id IS NOT NULL THEN B.lastname 
         ELSE A.lastname 
       end         AS LastName 
FROM   merge_demo1 AS A 
       LEFT OUTER JOIN merge_demo2 AS B 
                    ON A.id = B.id; 

-- Insert records when condition is NOT MATCHED
INSERT INTO merge_demo1wmmergeupdate 
SELECT B.id        AS ID, 
       B.firstname AS FirstName, 
       B.lastname  AS LastName 
FROM   merge_demo2 AS B 
       LEFT OUTER JOIN merge_demo1wmmergeupdate AS A 
                    ON A.id = B.id 
WHERE  A.id IS NULL; 

-- Drop origianal table
DROP TABLE IF EXISTS merge_demo1; 

-- Rename temp table to origianal table
ALTER TABLE merge_demo1wmmergeupdate 
  RENAME TO merge_demo1;
Results

Original Table:

+-----------------+------------------------+-----------------------+--+
| id  | firstname  | lastname  |
+-----------------+------------------------+-----------------------+--+
| 1               | aaaa                   | bbbb                  |
| 2               | cccc                   | dddd                  |
| 3               | eeee                   | ffff                  |
| 4               | gggg                   | hhhh                  |
| 5               | iiii                   | jjjj                  |
+-----------------+------------------------+-----------------------+--+

Merged Tables:

+-----------------+------------------------+-----------------------+--+
| id  | firstname  | lastname  |
+-----------------+------------------------+-----------------------+--+
| 1               | aaaa                   | bbbb                  |
| 2               | cccc                   | kkkk                  |
| 3               | eeee                   | llll                  |
| 4               | gggg                   | hhhh                  |
| 5               | iiii                   | jjjj                  |
+-----------------+------------------------+-----------------------+--+

Related Article

Please let me know if you are using different method.

Hope this helps 🙂