How to Handle Duplicate Records in Snowflake Insert?

Post author:Vithal S
Post last modified:July 16, 2021
Post category:Snowflake
Reading time:4 mins read

Snowflake does not enforce constraints on tables. Handling duplicate records in Snowflake insert is one of the common requirements while performing a data load. Snowflake supports many methods to identify and remove duplicate records from the table. In this article, we will check how to handle duplicate records in the Snowflake insert statement. It is basically one of the alternative methods to enforce the primary key constraints on Snowflake table.

Handle Duplicate Records in Snowflake Insert

Snowflake allows you to identify a column as a primary key, but it doesn’t enforce uniqueness on them. The constraints such as primary, foreign and unique key constraints on Snowflake tables are informative purpose only. Snowflake enforces only NOT NULL constraint. The Snowflake cloud database allows you to insert duplicate records, even if you have primary key constraints defined on the table.

MERGE Statement to Restrict Duplicate Insert in Snowflake Tables

The merge command in SQL is a command that allows you to update, delete, or insert into a source table using target table. Based on the matching condition rows from the tables are updated, deleted, or new records are inserted.

You can read more about Snowflake merge statement in my other articles;

Snowflake Merge Statement Syntax, Usage and Examples

We can use Snowflake MERGE statement to restrict the duplicate insert in Snowflake table. The merge statement will insert records only if it’s not present in the source table. Otherwise, the record will be ignored. Alternatively, you can update if you don’t want to ignore already existing records.

Following is the simple example of Snowflake merge statement.

MERGE INTO merge_test
using merge_test2
ON merge_test.a = merge_test2.a
WHEN matched THEN
  UPDATE SET merge_test.b = merge_test2.b
WHEN NOT matched THEN
  INSERT (a,
          b)
  VALUES (merge_test2.a,
          merge_test2.b);

Restrict Duplicate Insert in Snowflake Tables with MERGE Statement Example

Now that you know how merge statement works. You can use WHEN NOT MATCHED condition to insert unique records into Snowflake table, hence, simulating primary key constraint.

For example,

-- Source Table
select * from MERGE_TEST;
|A  |B  |
|---|---|
|1  |11 |
|4  |14 |

-- Insert unique records
MERGE INTO merge_test
using (SELECT *
       FROM   (VALUES (1, 11),
                      (2, 12),
                      (3, 13),
                      (4, 14) ) tab(a, b)) tab1
ON ( merge_test.a = tab1.a )
WHEN NOT matched THEN
  INSERT (a,
          b)
  VALUES (tab1.a,
          tab1.b); 
-- Check the source table
select * from MERGE_TEST;
|A  |B  |
|---|---|
|1  |11 |
|2  |12 |
|3  |13 |
|4  |14 |