When you load data into Snowflake, Snowflake reorganizes that data into micro partition and stores into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage. Snowflake uses S3, Blob storage or GCP cloud storage. However, all these storages are immutable. Obviously the question would be how Snowflake internally performs or handles updates when you execute the update command?
How Snowflake Internally Handles Updates?
Many people would have thought of this question when they were going through Snowflake architecture. It is a complex question. Snowflake uses cloud storage to organize and store data. The cloud storage such as S3, Blob storage, GCP storage are immutable. But, the basic idea is as follows.
What happens when you load data into Snowflake table?
When you load data, records or rows are stored in immutable micros partitions on S3 or any other cloud storage. Basically, a table is a collection of many micro partitions depends on your data size.
For example,
You can see two micros partitions, P1 and P2 are created when you load records into the table
What happens when you add new records to Snowflake table using an INSERT statement?
When you add data using an INSERT statement, the new micro-partition will be created and a reference to this micro partition is stored in a catalog database.
For example, when you execute following insert statement,
INSERT INTO test_table values (1, ‘Judy’),
Snowflake creates a new micro-partition P3 is created for new records that you insert.
What happens when you execute an UPDATE statement on the Snowflake table to modify the records?
When you execute an UPDATE command to modify records from Snowflake table;
- its old micro-partition is marked as inactive,
- a new micro-partition is created, containing the modified record, and other records from that micro-partition.
- the new micro-partition is added to the table’s list and reference is updated to new partition marking it as active
- inactive micro-partitions are not deleted for some time, allowing time-travel.
For example, when you issue following update statement,
UPDATE test_table SET Name = ‘Jenny’ WHERE ID = 4;
The Snowflake will create a new micro-partition P4 containing the modified record, and other records from that micro-partition. Old micro-partition is marked as inactive.
What happens when you execute DELETE statement on the Snowflake table to remove the records?
When you execute DELETE command to remove records from Snowflake table;
- its old micro-partition is marked as inactive,
- a new micro-partition is created, containing the all records from the old partition except the record that you want to delete
- the new micro-partition is added to the table’s list and reference is updated to new partition marking it as active
- inactive micro-partitions are not deleted for some time, allowing time-travel.
For example, when you issue following delete statement,
DELETE FROM test_table WHERE ID = 4;
The Snowflake will create a new micro-partition P5 containing the all records from the old partition except the record that you want to delete. Old micro-partition is marked as inactive.
Related Articles,
- Snowflake Update Join Syntax – Update using other Table
- Snowflake Merge Statement Syntax, Usage and Examples
- How to Recover Deleted Rows in Snowflake – Time Travel Examples
Hope this helps 🙂