Optimize Redshift Table Design to Improve Performance

The performance of the Redshift database is directly proportional to the optimal table design in your database. Optimizing Amazon Redshift table structure is very important aspect to speed up your data loading and unloading process. In this article, we will check out some tricks to optimize Redshift table design to improve performance.

Optimize Redshift Table Design

There is no specific set of rules to optimize Redshift table structure. Creating optimal table design is based on the type of data that you are about to load.

However, here are some of the methods that Amazon Redshift recommend while designing tables for better performance.

Choose Best Distribution Style
Choose Best Sort Keys
Use CTAS to Re-Create Table
Define Constraints – Primary and Foreign Keys
Use Small data types
Choose compression based on your data
Use Data/Time data type whenever possible

Choose Best Distribution Style

Choosing right distribution style is very important aspect of the Redshift performance improvement. You should choose optimal distribute style so that there should not be any data skews. If your table is taking long time to load then you may want to revisit the table structure for distribution style.

For more details read: How Redshift Distributes Table Data? Importance of right Distribution Key

Choose Best Sort Keys

When you create a table, you can specify one or more columns as the sort key. Amazon Redshift stores your data on disk in sorted order according to the sort key. Timestamp Columns, Range Columns, Join Columns are best suited for sort key clause.

For more details Read: How to Select Redshift Sort Key- Choose Best Sort Key

Use CTAS to Re-Create Table

Use “Create Table AS” method whenever you are re-creating tables. With CTAS option, data distributes on the data slices without involving leader node, hence faster and it is the easier way.

Define Constraints – Primary and Foreign Keys

You should define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans. Note that, primary and foreign key constraints are not enforced.

Use Small Data Types

Don’t make it a practice to use the maximum column size for convenience. Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly.

Consider CHAR if you are storing single character values such as True or False.

Choose compression based on your data

You can specify compression encodings when you create a table in Redshift, but, automatic compression is recommended and produces the best results. Let COPY command analyze your data and apply best compression technique to your underlying empty table as a part of data loading.

Redshift Columnar Compression Types – Compression Encoding

Use Data/Time data type whenever possible

Amazon Redshift stores DATE and TIMESTAMP data more efficiently than CHAR or VARCHAR, which results in better query performance. Use the DATE or TIMESTAMP data type whenever possible.

Hope this helps 🙂