Amazon Redshift Data Types and Best Practices

  • Post author:
  • Post last modified:February 8, 2023
  • Post category:Redshift
  • Reading time:7 mins read

Data type is an attribute that specifies type of data of any object that is going to be stored in that particular column. Each column, variable and expression has related data type in SQL. However, different database offers the different data types for columns. Redshift data types are almost similar to what the traditional RDBMS supports. Amazon Redshift data types are similar to IBM Netezza data types.

When you issue Redshift create table command each column in a database tables must have name and a data type associated with it. The data type is based on the types of data which are stored inside the each column of the table

Read:

Amazon Redshift Data Types

Following is the list of an example of the data types available in Redshift at this time.

Here is the Redshift CREATE TABLE example having all the supported Redshift data types at this time:

CREATE TABLE REDSHIFT_TABLE_NAME ( 
SMALLINT_COLUMN SMALLINT|INT2,
INTEGER_COLUMN INTEGER, 
BIGINT_COLUMN BIGINT, 
BOOLEAN_COLUMN BOOL, 
CHAR_COLUMN CHARACTER(2), 
NCHAR_COLUMN NCHAR(2),
NVARCHAR_COLUMN NVARCHAR(2),
TEXT_COLUMN TEXT,
DATE_COLUMN DATE,
TIMESTAMP_COLUMN TIMESTAMP, 
TIMESTAMPTZ_COLUMN TIMESTAMP WITH TIME ZONE,
DOUBLE_PRECISION_COLUMN DOUBLE PRECISION, 
FLOAT_COLUMN FLOAT4, 
REAL_COLUMN REAL,
NUMERIC_COLUMN NUMERIC(18,0), 
DECIMAL_COLUMN DECIMAL(18,0), 
) distkey(BIGINT_COLUMN)
compound sortkey(BIGINT_COLUMN, INTEGER_COLUMN);

Above data types are categorized into following different data type groups.

  • Redshift Numeric Data Types: Numeric data types include integers, decimals, and floating-point numbers.
  • Redshift Character Data Types: Character data types include CHAR (character) , VARCHAR (character varying), NCHAR, NVARCHAR, TEXT and BPCHAR. All these character data types are internally resolve into CHAR and VARCHAR.
  • Redshift Date and Time Data Types: Datetime data types include DATE, TIME, TIMETZ, TIMESTAMP, and TIMESTAMPTZ.
  • Redshift Binary Data Types: Binary data types includes VARBYTE, VARBINARY, or BINARY VARYING column to store variable-length binary value with a fixed limit
  • Redshift Geometric Data Types: Redshift Geometric data types includes spatial data with the GEOMETRY and GEOGRAPHY data types
  • Redshift Boolean Data Types: Boolean column stores and outputs t for true and f for false.
  • Redshift SUPER Data Type: Use the SUPER data type to store semistructured data or documents as values.
  • Redshift HLLSKETCH data type: Use the HLLSKETCH data type for HyperLogLog sketches

Redshift Data Types Best Practices

Below are some of the Redshift data type’s usage best practices. These practices holds good for all other MPP data bases.

  • INTEGER types provide better performance so convert NUMERIC types with scale 0 to INTEGER types.
  • Use INT instead of BIGINT: Redshift uses 8 bytes to store BIGINT values while INT uses 4 bytes. If you are sure that the values will fit in INT, use it instead of BIGINT to save storage space.
  • Floating point data types (REAL/DOUBLE PRECISION) are, by definition, lossy in nature and affect the overall Redshift performance. Use them only if you working with very large numbers or very small fractions
  • Avoid using FLOAT: FLOAT requires 8 bytes of storage, which is the same as DOUBLE PRECISION. However, DOUBLE PRECISION provides better precision than FLOAT. Use DOUBLE PRECISION instead of FLOAT.
  • Inconsistent data types for the same column on different tables affects performance so always use the same data types for same columns on different tables.
  • Use Char(x) instead of Varchar(x) when you expect the data to be a fixed length as this not only helps to save disk space but also helps performance due to reduced I/O. However, Redshift stores CHAR values with padding, meaning that if the stored value is shorter than the column width, Redshift pads the value with spaces. Use VARCHAR instead of CHAR to avoid wasting storage space.

Restrictions: Redshift Database Maximum Lengths

Below are some of LIMITS on columns, names, sortkey, and rowsize in Redshift databases

  • Columns: Maximum columns per Redshift table or view – 1600
  • Names: maximum length of database and column names in Redshift – 127 Characters
  • SortKey: Maximum numbers of sortkey you can define on Redshift table : 400 columns
  • Row size: Maximum row size : 65,535 bytes.

Related Articles

Hope this helps 🙂