The performance of the Redshift database is directly proportional to the optimal table design in your database. Optimizing Amazon Redshift table structure is very important aspect to speed up your data loading and unloading process. In this article, we will check out some tricks to optimize Redshift table design to improve performance.
Optimize Redshift Table Design
There is no specific set of rules to optimize Redshift table structure. Creating optimal table design is based on the type of data that you are about to load.
However, here are some of the methods that Amazon Redshift recommend while designing tables for better performance.
- Choose Best Distribution Style
- Choose Best Sort Keys
- Use CTAS to Re-Create Table
- Define Constraints – Primary and Foreign Keys
- Use Small data types
- Choose compression based on your data
- Use Data/Time data type whenever possible
Choose Best Distribution Style
Choosing right distribution style is very important aspect of the Redshift performance improvement. You should choose optimal distribute style so that there should not be any data skews. If your table is taking long time to load then you may want to revisit the table structure for distribution style.
For more details read: How Redshift Distributes Table Data? Importance of right Distribution Key
Choose Best Sort Keys
When you create a table, you can specify one or more columns as the sort key. Amazon Redshift stores your data on disk in sorted order according to the sort key. Timestamp Columns, Range Columns, Join Columns are best suited for sort key clause.
For more details Read: How to Select Redshift Sort Key- Choose Best Sort Key
Use CTAS to Re-Create Table
Use “Create Table AS” method whenever you are re-creating tables. With CTAS option, data distributes on the data slices without involving leader node, hence faster and it is the easier way.
Define Constraints – Primary and Foreign Keys
You should define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans. Note that, primary and foreign key constraints are not enforced.
Related Articles
Use Small Data Types
Don’t make it a practice to use the maximum column size for convenience. Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly.
Consider CHAR if you are storing single character values such as True or False.
Choose compression based on your data
You can specify compression encodings when you create a table in Redshift, but, automatic compression is recommended and produces the best results. Let COPY command analyze your data and apply best compression technique to your underlying empty table as a part of data loading.
Related Articles
Use Data/Time data type whenever possible
Amazon Redshift stores DATE and TIMESTAMP data more efficiently than CHAR or VARCHAR, which results in better query performance. Use the DATE or TIMESTAMP data type whenever possible.
Related Articles
- Optimize Query Performance with Redshift Collocated Tables
- How to Choose Correct Compression Encode in Redshift?
Hope this helps 🙂