Amazon Redshift, is a fast, fully managed, petabyte-scale data warehousing service. Redshift supports various columnar compression encoding techniques. In this article we will check Redshift column compression types – Compression Encoding techniques for Redshift performance optimization.
Amazon Redshift supports data compression for reducing the amount of disk storage used by a table. Amazon Redshift compression encoding can significantly reduce the storage required for a table and improve query performance by reducing I/O operations. The column compression in Redshift also helps storage optimization.
Redshift Column Compression Types – Compression Encoding
Columnar compression is an important technology in Amazon Redshift. It helps to reduce customer costs by increasing the effective storage capacity of nodes and improves performance by reducing I/O needed to process various SQL requests. based on the source data and type of the columns in the redshift table, you can decide the compression. You should have clear knowledge on the source data.
The following table lists the available Amazon Redshift compression types:
Encoding Types | Keyword to be used | Supporting Data Type |
Raw (no compression) | RAW | All |
Byte dictionary | BYTEDICT | All except BOOLEAN |
Delta | DELTA
DELTA32K | SMALLINT, INT, BIGINT, DATE, TIMESTAMP, DECIMAL
INT, BIGINT, DATE, TIMESTAMP, DECIMAL |
LZO | LZO | All except BOOLEAN, REAL, and DOUBLE PRECISION |
Mostlyn | MOSTLY8
MOSTLY16 MOSTLY32 | SMALLINT, INT, BIGINT, DECIMAL
INT, BIGINT, DECIMAL BIGINT, DECIMAL |
Run-length | RUNLENGTH | All |
Text | TEXT255
TEXT32K | VARCHAR only
VARCHAR only |
Zstandard | ZSTD | All |
Data compression in Amazon Redshift supports following commonly used column compression types:
- Run Length Encoding (RLE) – The Redshift RLE compression compresses repeating values into a single value and a count of the number of repetitions.
- Byte dictionary encoding in Redshift – It replaces repeating values with a unique identifier. This is most effective when a column has a limited number of unique values.
- Redshift DEFLATE compression/Zlib compression – It compresses data using the DEFLATE compression algorithm. This is a general-purpose compression algorithm that provides good compression ratios for most data types.
Redshift Compression Examples
Following create table is an example of usage of Redshift column compression types:
CREATE TABLE Test_Compression_Types
(
key INTEGER NOT NULL encode raw,
employee_num CHAR(11) NOT NULL encode lzo,
employee_type CHAR(1) NOT NULL encode runlength,
employee_status CHAR(1) NOT NULL encode lzo,
employee_tier CHAR(1) NOT NULL encode lzo,
service_flag char(1) NOT NULL encode runlength,
employee_flag char(1) NOT NULL encode ZSTD,
account_update_date DATE NOT NULL encode delta32k,
period_months INTEGER NOT NULL encode bytedict,
creation_date DATE NOT NULL encode delta32k,
category char(5) NOT NULL encode bytedict,
extended_flag CHAR(1) NOT NULL encode runlength,
business_date DATE NOT NULL encode delta32k,
address1 varchar(100) NOT NULL encode TEXT32K,
address2 varchar(100) NOT NULL encode TEXT255,
mobile_number INTEGER NOT NULL encode MOSTLY16,
pin SMALLINT NOT NULL encode MOSTLY8
)
distkey (key);
Analyze Redshift Table Automatic Compression
As a best practice for Redshift compression, you can run ANALYZE COMPRESSION
to get recommendations for each column encoding scheme, based on a sample data stored in redshift table. ANALYZE COMPRESSION is an advisory tool and doesn’t modify the column encodings of the table.
Based on the output of analyze compression command, you can recreate the table with recommendation compression type. You can either use the recommendation or can have your standard compression encodings techniques on the Redshift tables. You can check the table columnar compression by using the ANALYZE command. As a part of the Redshift performance optimization, you should always choose a right compression encoding technique.
training=# Analyze compression Test_Compression_Types;
Amazon Redshift Compression Benefits
Amazon Redshift compression provides many benefits that can improve the performance and efficiency of your storage and data warehouse.
Following are few of the key benefits of Redshift compression:
- Reduced table storage : By compressing your data in Redshift table, you can significantly reduce the amount of storage space required to store your data. This can help you save on storage costs and make it easier to manage your data warehouse.
- Better concurrency: Column compression reduces the amount of data that needs to be retrieved from disk and can help improve concurrency.
- Improved query performance: Compressed data requires less disk I/O and network bandwidth to retrieve, thus faster query performance.
- Better data compression ratios: With the help of advance compression algorithms, Redshift can achieve high compression ratios at the same time maintaining query performance.
Read:
- Redshift User Defined Functions Examples
- Redshift ANALYZE command and Examples
- Optimize Redshift Table Design to Improve Performance
Hope this helps 🙂