Just like using a right distribution style, selecting right sort key is important to boost your Redshift database performance. How data is stored onto the disk blocks defines the performance of the queries that you are running against those tables. It is not that easy to change the sort key once defined. In this article, we will check how to change Redshift table sort key with an example.
Page Content
Introduction
Redshift is a fast, fully-managed, petabyte-scale cloud data warehouse solution provided by Amazon Web Services (AWS). One of the key features of Redshift is the ability to optimize query performance through the use of a sort key. The sort key determines the order in which data is stored on disk, making it easier and faster to retrieve data for queries that specify a filter on the sort key column. Choosing the right sort key for your Redshift tables can significantly improve query performance, but what if you need to change it? In this post, we will discuss how to change the sort key for your Redshift tables.
Understanding the Redshift Table Sort Key
Before we dive into changing the sort key for a Redshift table, it’s important to understand what the sort key is and how it works. In Redshift, the sort key is a column or set of columns that determine the order in which data is physically stored on disk. By default, Redshift sorts data based on the order of insertion into the table, which is not always optimal for query performance. Choosing the right sort key can significantly improve query performance, especially for large tables with many columns.
For more information on Redshift sort key, read: How to Select Redshift Sort Key- Choose Best Sort Key
Why to Change Redshift Table Sort Key?
A sort key is a field in your Redshift database table that determines the order in which the data is physically stored in the database disk. For the rows stored on each slice, they are stored in SORTKEY order. Amazon Redshift stores your data on disk in sorted order according to the sort key that you mention while creating Redshift table.
For example, consider store_sale table, You are more likely to retrieve data based on purchase date. If you a choose purchase_date column as a sort key, data will be ordered from oldest to newest purchases.
You may have to change the sort key if you are using a different column in your queries. You have to re-create table with a different sort key.
How to Change Redshift Table Sort Key?
You cannot use the alter table command to change to sort key column. The only option is to recreate the table with different sort key.
For example, consider below CTAS example to recreate the table with different sort key in Redshift.
Note that, you should keep the distribution style as it is. We are not changing distribution style, we are only changing the sort key of the table.
create table event_new_dist
distkey (eventid)
sortkey (eventid, dateid)
as
select eventid, venueid, dateid, eventname
from event;
Note that, this works well with a table which has a relatively small amount of data. If your table is very large, you can follow my other post Change Redshift Table Distribution Style and Example. Use the same method as used to change distribution key for large table.
Related Articles
Add Sort Key to Existing Table in Redshift
In Redshift database, You cannot add a column that is the distribution key (DISTKEY) or a sort key (SORTKEY) of the table.
The only option is to create a new table with sort key and insert data from old table to the new table.
create table new_events(
eventid int,
venueid int,
eventname varchar(300))
distkey (eventid)
sortkey (eventid, dateid)
;
Insert data into new table.
insert into new_events select eventid, venueid, dateid, eventname from event;
Conclusion
In conclusion, changing the sort key for a Redshift table can significantly improve query performance, but it requires careful planning and execution. Understanding the sort key, choosing the right one for your table, and following best practices such as backing up your data, testing the impact of the change, and monitoring performance can help ensure a smooth transition. Redshift is a powerful data warehouse solution that provides a wide range of optimization features, and choosing the optimal sort key is just one of the ways to unlock its full potential. With the right sort key and best practices in place, you can achieve faster query performance and gain insights from your data more efficiently.
Related Article
- How to Select Redshift Sort Key- Choose Best Sort Key
- How to Choose Correct Compression Encode in Redshift?
Hope this helps 🙂