Snowflake Cloud Data Warehouse Best Practices

Snowflake cloud data warehouse is optimized to work with cloud environments such as AWS, GCP, and Azure. Snowflake data warehouse is fast, reliable and designed for cloud. In this article, we will check what are Snowflake cloud data warehouse best practices.

Snowflake Cloud Data Warehouse Best Practices

If you follow the Snowflake official documentation. You will find many optimization methods. We have also discussed how to optimize the table structure in my other articles.

Following are some of the best practices that you can use when working with Snowflake cloud data warehouse.

Decide Warehouse Size based on Environment
Separate Warehouse for Data Loading and Query Execution
Warehouse for Data Loading
Enable Auto-scaling
Optimize Insert Statements
Refrain from Executing Simple Queries
Enable Warehouse Auto-suspension
Enable Warehouse Auto-resumption
Disable Warehouse Auto-suspension for Frequent Queries

Now, let us check these best practices in brief.

Decide Warehouse Size based on Environment

Create separate warehouse for different environment such as development, testing and production.

For queries in development or testing environments, smaller warehouses sizes (such as X-Small, Small, Medium) may be sufficient.
For queries in production environments, larger warehouse sizes (such as Large, X-Large, 2X-Large, etc.) may be sufficient and cost effective.

Separate Warehouse for Data Loading and Query Execution

As a performance point of view, it is always better to create a separate warehouse for your data loading and query execution. Start with smaller size and based on the performance, you can manually resize the warehouse.

Warehouse for Data Loading

Increasing the size of a warehouse does not always improve data loading performance. Data loading performance is influenced more by the number of files being loaded (and the size of each file) than the size of the warehouse.

A smaller warehouse (Small, Medium, Large) is generally sufficient for all your data loading needs.

Enable Auto-scaling

If you are using an enterprise edition of Snowflake, multi-cluster warehouses should be configured to run in an Auto-scale mode, which enables Snowflake to automatically start and stop clusters as needed.

Optimize Insert Statements

Add individual INSERT statements to file and use bulk loading copy command for better performance.

Refrain from Executing Simple Queries

Do not execute Small queries such as SELECT 1, SELECT CURRENT_DATE, SELECT COUNT(1) FROM TABLE, etc. Snowflake uses per second billing. Smaller queries accumulate to more credit usage. Execute simple queries only when necessary.

Enable Warehouse Auto-suspension

You can suspend the warehouse when you are done with the task. instead of dropping it, you can simply suspend it to save credits. By default, auto-suspend is enabled. Snowflake automatically suspends the warehouse if it is inactive for the specified period of time.

Enable Warehouse Auto-resumption

You can set variable to enable the warehouse as soon as the query is submitted. By default, auto-resume is enabled. Snowflake automatically resumes the warehouse when any statement that requires a warehouse is submitted and the warehouse is the current warehouse for the session.

Disable Warehouse Auto-suspension for Frequent Queries

You should disable auto-suspension when you have a heavy, steady workload for the warehouse.

For example, if your job is executes queries every 5 mins, then it does not make sense to enable auto-suspend.

Each warehouse, when running, maintains a cache of table data accessed as queries are processed by the warehouse. This will improve the performance if you are reusing the table in your queries. This cache is dropped when the warehouse is suspended, which may result in slower initial performance for some queries after the warehouse is resumed. And, each time warehouse resumes, you are billed for the minimum credit usage (i.e. 60 seconds).