The MERGE query or statement in SQL is used to perform incremental load. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the condition. i.e. you can update old values and insert new records. The merge command is widely used in incremental load where you have to update old records and insert new records if any. In this article, we will check Cloudera Impala Merge Statement along with some Impala native UPSERT command.
In SQL world, the merge statement is also referred to as UPSERT command.
Related Article,
SQL Merge Statement
You can use the SQL MERGE command or statement to insert, update, or delete rows in a target table using data from a source such as a table, view, or sub-query and based on rules specified in a matching condition in the merge statements.
You have to specify the rules what merge statement should do when condition is MATCHED or NOT MATCHED.
For examples, let us check sample SQL merge statement that we will be converting to Impala equivalent set of queries later.
MERGE INTO merge_table1 A
using merge_table2 B
ON ( A.id = b.id )
WHEN matched THEN
UPDATE SET A.lastname = B.lastname
WHEN NOT matched THEN
INSERT (id,
firstname,
lastname)
VALUES (B.id,
B.firstname,
B.lastname);
As you can see in the above example, Merge statement performs an update and insert operations on a target table based on the joining condition and matching rules. The join can match a source row with only one target row.
SQL MERGE command is not available in Cloudera impala. We have to identify the alternative method to handle both MATCHED and NOT MATCHED conditions.
Related Article
Test data
Create sample tables on Impala.
--This table holds the records
CREATE TABLE merge_table1
(
ID INT,
FirstName string,
LastName string
);
-- Reference table
CREATE TABLE merge_table2
(
ID INT,
FirstName string,
LastName string
);
Insert sample records to tables.
--Sample records inserted to table
INSERT INTO merge_table11 VALUES (1, 'aaaa', 'bbbb');
INSERT INTO merge_table11 VALUES (2, 'cccc', 'dddd');
INSERT INTO merge_table11 VALUES (3, 'eeee', 'ffff');
INSERT INTO merge_table11 VALUES (4, 'gggg', 'hhhh');
INSERT INTO merge_table11 VALUES (5, 'iiii', 'jjjj');
--Let us say there is last name change for 'cccc' and 'eeee'
INSERT INTO merge_table2 VALUES (2, 'cccc', 'kkkk');
INSERT INTO merge_table2 VALUES (3, 'eeee', 'llll');
Merge Statement in Impala
There are couple of way you can re-implement the SQL merge statement in Cloudera Impala.
- Merge Statement Alternative Approach
- Impala UPSERT Statement
Let us check these two options in details.
Impala Merge Statement Alternative Examples
In this section, we will check SQL merge statement alternative in Impala. Merge statement in Impala is rewritten into multiple steps to handle both MATCHED and NOT MATCHED conditions:
For example,
-- Drop temp table if exists
DROP TABLE IF EXISTS merge_table1wmmergeupdate;
-- Create temporary tables to hold merge records
CREATE TABLE merge_table1wmmergeupdate LIKE merge_table1;
-- Insert records when condition is MATCHED
INSERT INTO table merge_table1WMMergeUpdate
SELECT A.id AS ID,
A.firstname AS FirstName,
CASE
WHEN B.id IS NOT NULL THEN B.lastname
ELSE A.lastname
end AS LastName
FROM merge_table1 AS A
LEFT OUTER JOIN merge_table2 AS B
ON A.id = B.id;
-- Insert records when condition is NOT MATCHED
INSERT INTO merge_table1wmmergeupdate
SELECT B.id AS ID,
B.firstname AS FirstName,
B.lastname AS LastName
FROM merge_table2 AS B
LEFT OUTER JOIN merge_table1wmmergeupdate AS A
ON A.id = B.id
WHERE A.id IS NULL;
-- Drop origianal table
DROP TABLE IF EXISTS merge_table1;
-- Rename temp table to origianal table
ALTER TABLE merge_table1wmmergeupdate
RENAME TO merge_table1;
Results
Original Table:
+----+-----------+----------+
| id | firstname | lastname |
+----+-----------+----------+
| 1 | aaaa | bbbb |
| 2 | cccc | dddd |
| 3 | eeee | ffff |
| 4 | gggg | hhhh |
| 5 | iiii | jjjj |
+----+-----------+----------+
Merged Tables:
+----+-----------+----------+
| id | firstname | lastname |
+----+-----------+----------+
| 1 | aaaa | bbbb |
| 2 | cccc | kkkk |
| 3 | eeee | llll |
| 4 | gggg | hhhh |
| 5 | iiii | jjjj |
+----+-----------+----------+
Impala UPSERT Statement
The CDH 5.10 and above supports UPSERT statement. UPSERT statement is Cloudera implementation of SQL basic merge statement. You can insert and update records using UPSERT but delete is not yet supported. UPSERT statement will work only on the kudu tables. Because Kudu tables can efficiently handle small incremental changes, the VALUES clause is more practical to use with Kudu tables than with HDFS-based tables. You can’t use it in normal Impala or Hive tables.
Impala UPSERT Statement Syntax
Below is the CDH UPSERT statement:
UPSERT INTO [TABLE] [db_name.]table_name
[(column_list)]
{
select_statement
| VALUES (value [, value ...]) [, (value [, value ...]) ...]
}
Impala UPSERT Statement Example
Below is the example of Impala UPSER statement:
UPSERT INTO merge_table1 values (1, 'aaaa', 'bbbb'),(1, 'aaab', 'bbba');
You can use any of the above methods based on your requirements.
Hope this helps ?