Amazon Redshift Merge Statement - Example

The MERGE command or statement in standard SQL is used to perform incremental load. i.e. load only new set of records into target table. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the merge condition. As of now, Amazon Redshift doesn’t support a single merge statement. In this article, we will check what is Redshift merge statement with an example. Merge table or Update insert is also called UPSERT.

Page Contents

Introduction to Amazon Redshift.

Understanding the Merge Statement

Merge Statement in Amazon Redshift

When to use Merge Statement in Amazon Redshift

Benefits of using Merge Statement

Limitations of Merge Statement

Conclusion

Introduction to Amazon Redshift

Amazon Redshift is a powerful data warehousing solution provided by Amazon Web Services (AWS) that allows you to store, manage and analyze large amounts of data using highly scalable and cost-effective Redshift cluster. AWS Redshift is scalable, flexible, and cost-effective, making it a popular choice for large scale data warehouse solutions. With Amazon Redshift, users can easily create and manage data warehouses without having to worry about the underlying infrastructure, such as storage, compute, and networking resources.

It Redshift also offers a range of features and capabilities, including fast query performance, automatic snapshot and data recovery, easy data loading and unloading using COPY command, and seamless integration with other AWS services such as S3, EC2, Glue, etc.

Understanding the Merge Statement

Let us understand the SQL Merge statement with an example

SQL Merge Statement Example

Use the SQL MERGE command to insert, update rows in a target table using data from a source such as a table, view, or sub-query and based on rules specified in a matching condition in the merge statements.

For examples, let us check sample SQL merge statement.

MERGE INTO merge_demo1 A 
using merge_demo2 B 
ON ( A.id = b.id ) 
WHEN matched THEN 
  UPDATE SET A.lastname = B.lastname 
WHEN NOT matched THEN 
  INSERT (id, 
          firstname, 
          lastname) 
  VALUES (B.id, 
          B.firstname, 
          B.lastname);

The above SQL code is an example of a Merge Statement that is merging data from two tables, merge_demo1 and merge_demo2, based on a matching ID column.

In the MERGE INTO statement you specify the target table (merge_demo1) that the data will be merged into, while the using statement (subquery or table) is used to specify the source table (merge_demo2) that the data will be merged from. The ON clause specifies the join condition between the two tables.

The WHEN matched THEN UPDATE statement is used to update the lastname column in the merge_demo1 table with the corresponding lastname value from the merge_demo2 table, for records that have a matching ID value.

The WHEN NOT matched THEN INSERT statement is used to insert new records into the merge_demo1 table for records in merge_demo2 that do not have a matching ID value. It specifies the columns to be inserted, along with the values to be inserted for each column.

Merge Statement in Amazon Redshift

The Merge statement in SQL is a powerful data manipulation command that combines the functionality of INSERT, UPDATE, and DELETE statements into a single statement. The merge statement is generally used in CDC (Change Data Capture), usually, used to implement SCD (Slowly changing dimension).

In the Amazon Redshift MERGE statement conditionally merges rows from a source table into a target table. Earlier, this can only be achieved by using multiple insert, update or delete statements separately.

Note that, Amazon Redshift merge statement support is in preview as of now. It is just a matter of time and will be made available to all users.

Following is the example of Redshift Merger statement.

CREATE TABLE target (id INT, name CHAR(10), PRIMARY KEY(id));
CREATE TABLE source (id INT, name CHAR(10));

INSERT INTO target VALUES (101, 'Bob'), (101, 'John'), (102, 'Susan');
INSERT INTO source VALUES (101, 'Tony'), (103, 'Alice');

merge into target
		using source on
	target.id = source.id
	when matched then
update set
	id = source.id,
	name = source.name
	when not matched then
insert values (source.id, source.name);

SELECT * FROM target;
 id  |    name
-----+------------
 101 | Tony
 102 | Susan
 103 | Alice
(3 rows)

Redshift Merge Command Alternative

As mentioned earlier, Redshift does not support MERGE command as of now. There are two main conditions in merge statements: MATCHED and NOT MATCHED. We have to identify the alternative method to handle both conditions.

Let us rewrite SQL MERGE statement so that we can implement same on Redshift.

Test Data

Create sample tables on Redshift database.

--Target Table
CREATE TABLE merge_demo1
(
ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100)
);

-- Reference table
CREATE TABLE merge_demo2
(
ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100)
);

Insert sample records to created tables.

--Sample records inserted to table
INSERT INTO merge_demo1 VALUES (1, 'aaaa', 'bbbb');
INSERT INTO merge_demo1 VALUES (2, 'cccc', 'dddd');
INSERT INTO merge_demo1 VALUES (3, 'eeee', 'ffff');
INSERT INTO merge_demo1 VALUES (4, 'gggg', 'hhhh');
INSERT INTO merge_demo1 VALUES (5, 'iiii', 'jjjj');

--Let us say there is last name change for 'cccc' and 'eeee'
INSERT INTO merge_demo2 VALUES (2, 'cccc', 'kkkk');
INSERT INTO merge_demo2 VALUES (3, 'eeee', 'llll');
INSERT INTO merge_demo2 VALUES (6, 'zzzz', 'yyyy');

Redshift Merge Command Alternative Examples

Merge statement is rewritten into multiple steps to handle both MATCHED and NOT MATCHED conditions.

Below are the steps that we can follow to implement merge command or statement in Redshift database.

Create temp table similar to target table
Insert records when the condition is MATCHED
Insert records when the condition is NOT MATCHED
Drop target table
Rename temp table as target table

For example, following example demonstrates above steps

-- Drop temp table if exists
DROP TABLE IF EXISTS merge_demo1temp; 

-- Create temporary tables to hold merge records
CREATE TABLE merge_demo1temp (like merge_demo1);

-- Insert records when condition is MATCHED
INSERT INTO merge_demo1temp
SELECT A.id        AS ID, 
       A.firstname AS FirstName, 
       CASE 
         WHEN B.id IS NOT NULL THEN B.lastname 
         ELSE A.lastname 
       end         AS LastName 
FROM   merge_demo1 AS A 
       LEFT OUTER JOIN merge_demo2 AS B 
                    ON A.id = B.id; 

-- Insert records when condition is NOT MATCHED
INSERT INTO merge_demo1temp
SELECT B.id        AS ID, 
       B.firstname AS FirstName, 
       B.lastname  AS LastName 
FROM   merge_demo2 AS B 
       LEFT OUTER JOIN merge_demo1temp AS A 
                    ON A.id = B.id 
WHERE  A.id IS NULL; 

-- Drop origianal table
DROP TABLE IF EXISTS merge_demo1; 

-- Rename temp table to origianal table
ALTER TABLE merge_demo1temp
  RENAME TO merge_demo1;

Results

Original Table:

 select * from merge_demo1;
 id | firstname | lastname
----+-----------+----------
  1 | aaaa      | bbbb
  2 | cccc      | dddd
  3 | eeee      | ffff
  4 | gggg      | hhhh
  5 | iiii      | jjjj
(5 rows)

Merged Table:

 select * from merge_demo1;
 id | firstname | lastname
----+-----------+----------
  1 | aaaa      | bbbb
  2 | cccc      | kkkk
  3 | eeee      | llll
  4 | gggg      | hhhh
  5 | iiii      | jjjj
  6 | zzzz      | yyyy
(6 rows)

This is a simple method to use and understand. Please let me know if you are using different method.

When to use Merge Statement in Amazon Redshift

The Merge statement in Redshift is typically used when you need to synchronize data between two tables. Here are some common scenarios where you might use the Merge statement:

Change Data Capture (CDC): Change Data Capture (CDC) is used to capture the changes made to a table’s data. In the CDC technique, you will update existing data and insert new records if any.
Updating a target table based on a source table: If you have a source table containing new or updated data that needs to be merged into a target table, you can use the Merge statement to update the target table with the new data.
Updating and inserting records in a single statement: The Merge statement allows you to update and insert records in a single statement, making it a more efficient way to synchronize data between tables.
Updating and deleting records in a single statement: The Merge statement allows you to update and delete records in a single statement, making it a more efficient way to synchronize data between tables.

Benefits of using Merge Statement in Redshift

There are several benefits to using the Merge statement in Amazon Redshift. Here are few benefits:

Incremental updates: The Merge statement can be used to perform incremental updates, which means that only the changes made to the data are updated.
Reduces code complexity: The Merge statement simplifies the code required to update, insert, and delete data in a table. This reduces code complexity and makes the code easier to understand.
Improves performance: By combining multiple operations such as update, insert and delete into a single statement, the Merge statement can improve performance when dealing with large datasets.
Easy to sync two tables: The merge statement allows you to sync two tables easily.

Limitations of Amazon Redshift Merge Statement

Following are some of the limitations of using merge statement in Amazon Redshift:

Avoid using Merge statement on System tables: Amazon Redshift does not support system tables in your merge statements.
Avoid using MERGE statement with external table: Redshift does not support the merge statement with external tables.
source_table and target_table can’t be the same table
You can’t use the WITH clause in a MERGE statement
Rows in source_table can’t match multiple rows in target_table.
Redshift Merge command with duplicate records: MERGE can remove duplicate values if you have duplicate values in a primary key column. This is possible because Amazon Redshift doesn’t enforce primary key constraints.

Conclusion

Merge statement in Amazon Redshift data warehouse is a powerful SQL statement that simplifies the process of updating and synchronizing data between tables, improves performance, reduces code complexity, and provides transactional consistency.

Hope this helps 🙂