Redshift Column Compression Types – Compression Encoding

Amazon Redshift, is a fast, fully managed, petabyte-scale data warehousing service. Redshift supports various columnar compression encoding techniques. In this article we will check Redshift column compression types – Compression Encoding techniques for Redshift performance optimization.

Amazon Redshift supports data compression for reducing the amount of disk storage used by a table. Amazon Redshift compression encoding can significantly reduce the storage required for a table and improve query performance by reducing I/O operations. The column compression in Redshift also helps storage optimization.

Redshift Column Compression Types – Compression Encoding

Columnar compression is an important technology in Amazon Redshift. It helps to reduce customer costs by increasing the effective storage capacity of nodes and improves performance by reducing I/O needed to process various SQL requests. based on the source data and type of the columns in the redshift table, you can decide the compression. You should have clear knowledge on the source data.

The following table lists the available Amazon Redshift compression types:

Encoding Types	Keyword to be used	Supporting Data Type
Raw (no compression)	RAW	All
Byte dictionary	BYTEDICT	All except BOOLEAN
Delta	DELTA DELTA32K	SMALLINT, INT, BIGINT, DATE, TIMESTAMP, DECIMAL INT, BIGINT, DATE, TIMESTAMP, DECIMAL
LZO	LZO	All except BOOLEAN, REAL, and DOUBLE PRECISION
Mostlyn	MOSTLY8 MOSTLY16 MOSTLY32	SMALLINT, INT, BIGINT, DECIMAL INT, BIGINT, DECIMAL BIGINT, DECIMAL
Run-length	RUNLENGTH	All
Text	TEXT255 TEXT32K	VARCHAR only VARCHAR only
Zstandard	ZSTD	All

Data compression in Amazon Redshift supports following commonly used column compression types:

Run Length Encoding (RLE) – The Redshift RLE compression compresses repeating values into a single value and a count of the number of repetitions.
Byte dictionary encoding in Redshift – It replaces repeating values with a unique identifier. This is most effective when a column has a limited number of unique values.
Redshift DEFLATE compression/Zlib compression – It compresses data using the DEFLATE compression algorithm. This is a general-purpose compression algorithm that provides good compression ratios for most data types.

Redshift Compression Examples

Following create table is an example of usage of Redshift column compression types:

CREATE TABLE Test_Compression_Types
(
  key INTEGER NOT NULL encode raw,
  employee_num CHAR(11) NOT NULL encode lzo,
  employee_type CHAR(1) NOT NULL encode runlength,
  employee_status CHAR(1) NOT NULL encode lzo,
  employee_tier CHAR(1) NOT NULL encode lzo,
  service_flag char(1) NOT NULL encode runlength,
  employee_flag char(1) NOT NULL encode ZSTD,
  account_update_date DATE NOT NULL encode delta32k,
  period_months INTEGER NOT NULL encode bytedict,
  creation_date DATE NOT NULL encode delta32k,
  category char(5) NOT NULL encode bytedict,
  extended_flag CHAR(1) NOT NULL encode runlength,
  business_date DATE NOT NULL encode delta32k,
 address1 varchar(100) NOT NULL encode TEXT32K,
 address2 varchar(100) NOT NULL encode TEXT255,
 mobile_number INTEGER NOT NULL encode MOSTLY16,
 pin SMALLINT NOT NULL encode MOSTLY8
 )
distkey (key);

Analyze Redshift Table Automatic Compression

As a best practice for Redshift compression, you can run ANALYZE COMPRESSION to get recommendations for each column encoding scheme, based on a sample data stored in redshift table. ANALYZE COMPRESSION is an advisory tool and doesn’t modify the column encodings of the table.

Based on the output of analyze compression command, you can recreate the table with recommendation compression type. You can either use the recommendation or can have your standard compression encodings techniques on the Redshift tables. You can check the table columnar compression by using the ANALYZE command. As a part of the Redshift performance optimization, you should always choose a right compression encoding technique.

training=# Analyze compression Test_Compression_Types;

Amazon Redshift Compression Benefits

Amazon Redshift compression provides many benefits that can improve the performance and efficiency of your storage and data warehouse.

Following are few of the key benefits of Redshift compression:

Reduced table storage : By compressing your data in Redshift table, you can significantly reduce the amount of storage space required to store your data. This can help you save on storage costs and make it easier to manage your data warehouse.
Better concurrency: Column compression reduces the amount of data that needs to be retrieved from disk and can help improve concurrency.
Improved query performance: Compressed data requires less disk I/O and network bandwidth to retrieve, thus faster query performance.
Better data compression ratios: With the help of advance compression algorithms, Redshift can achieve high compression ratios at the same time maintaining query performance.

Read:

Hope this helps 🙂