Amazon Redshift is a cluster of nodes with separate disks, memory and CPU. Redshift distributes optimally across all the nodes based on the type of distribution you choose on the table or materialized views. In this article, we will learn about amazon redshift distribution types and some of examples.
Amazon Redshift Distribution Types
There are three distribution types available in the Amazon Redshift; EVEN, KEY, ALL and AUTO. You can choose any methods based on your requirement and type of joining that you are going to perform on the tables.
You may also read:
- How to Optimize Query Performance on Redshift?
- How Redshift Distributes Table Data? Importance of right Distribution Key
- Change Redshift Table Distribution style and Example
Redshift EVEN distribution
In this type of Redshift distribution, leader node distributes the data to all data slices in a round-robin fashion. The even distribution is appropriate when you are not using the table in any kind of joins. You can choose even distribution in case if you are not clear on using KEY or ALL distribution.
Redshift Even distribution Example
Below is the example to create table with EVEN distribution:
create table sample ( id int, name varchar(100), age int ) DISTSTYLE EVEN;
Redshift KEY distribution
In Redshift KEY distribution, rows are distributed according to the values in one column. The leader node will place the rows to same data slice. If two tables are distributed on the same column and when you join those two tables on distribution column then the required data is available in same data slice thus making collocated tables. The collocated tables improve the performance.
Redshift KEY distribution Examples
Below is the example to create table with KEY distribution:
create table sample ( id int, name varchar(100), age int ) DISTSTYLE KEY DISTKEY(ID) ;
Redshift ALL distribution
If you specify the ALL distribution style during table creation then leader node distributes the copy of tables every node available in the cluster. If the table is small and want make collocated tables then this distribution style is optimal.
The table loading process will take longer time if you have the table distributed on ALL style.
Redshift ALL distribution Example
Below is the example to create table with ALL distribution:
create table sample ( id int, name varchar(100), age int ) DISTSTYLE ALL ;
Automatically Pick the Best Distribution Style
The latest version of Amazon Redshift can now automatically assign an optimal distribution style based on the size of the table data. With the automatic selection of the right distribution style, you get better query performance and storage space utilization across nodes.
Read:
- Amazon Redshift Architecture
- Redshift Analytic Functions and Examples
- How to Alter Redshift Table column Data type? Explanation
- Commonly used Redshift Date Functions and Examples
Hi Vithal,
Your comment on small tables for the All distribution should be removed as it is misleading.
Refer to: http://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html “Small dimension tables do not benefit significantly from ALL distribution, because the cost of redistribution is low.”
Hi,
Thank you for the information.
In MPP environment collocation of tables improves join performance and each node work independent to each other. If your table is very small and joining that table in query then ALL distribution is optimal.
Thank you