Redshift ANALYZE Command to Collect Statistics and Best Practices

Redshift Analyze command is used to collect the statistics on the tables that query planner uses to create optimal query execution plan using Redshift Explain command.

Analyze command obtain sample records from the tables, calculate and store the statistics in STL_ANALYZE table. You can generate statistics on entire tables or on subset of columns. You can specify comma-separated column list for analyze command.

How Statistics are collected?

Redshift collects statistics in various ways.

Statistics are automatically collected for certain database operations.
Collect statistics for entire table or subset of columns using Redshift ANALYZE commands.
You can generate statistics on entire database or single table.

Automatic Redshift Statistic Collection

Amazon Redshift runs the ANALYZE command to collect statistics for following commands:

CREATE TABLE AS
CREATE TEMP TABLE AS
SELECT INTO

When you need to Run Redshift ANALYZE Command?

You don’t need to collect statistics on all columns or on external tables. If the data in the Redshift tables changes substantially, analyze the columns that are frequently used in following commands:

Sorting and grouping operations
Joins
Query predicates – columns used in FILTER, GROUP BY, SORTKEY, DISTKEY

Redshift ANALYZE Command Syntax

Below is the ANALYZE command syntax:

ANALYZE [ VERBOSE ]
[ [ table_name [ ( column_name [, ...] ) ] ]
[ PREDICATE COLUMNS | ALL COLUMNS ];

Where;

VERBOSE – Display the ANALYZE command progress information.
Table_name – Name of the table to be analyzed.
Column_name – Name of the tables in the column to be analyzed.
PREDICATE COLUMNS | ALL COLUMNS – Specify whether to analyze predicate columns or all column. By default it is ALL COLUMNS.

Redshift ANALYZE Command Threshold

To improve Redshift system performance and reduce processing time, Redshift skips ANALYZE for a table if the percentage of table rows that have changed since the last ANALYZE command run is lower than the threshold specified by the analyze_threshold_percent parameter.

You can set the variable before collecting statistics using analyze command.

set analyze_threshold_percent to 30;

Redshift Analyze Best Practices

Below are some of best practices to run ANALYZE command:

To improve the query performance, run ANALYZE command before running complex queries.
Try to run ANALYZE command with PREDICATE COLUMNS clause. This will save your time and cluster resources.
Run ANALYZE on the table that undergo significant changes i.e. on the table you perform DELETE and UPDATE regularly.
Schedule the ANALYZE command at regular interval to keep statistics up-to-date

Redshift ANALYZE Command Examples

Analyze all tables in given Redshift database:

analyze verbose;

training=# analyze verbose; 
...
INFO: analyzing "public.tr_test" 
ANALYZE 
training=#

Analyze single table in Redshift Database:

analyze tr_test;

training=# analyze tr_test; 
ANALYZE SKIP

Redshift Analyze specific columns from redshift table:

analyze tr_test(id, name);

training=# analyze tr_test11(id, name); 
ANALYZE

Redshift Analyze only predicate columns:

analyze tr_test predicate columns;

training=# analyze tr_test1 predicate columns; 
ANALYZE

Redshift Analyze ALL columns from Redshift table:

analyze tr_test ALL COLUMNS;

training=# analyze tr_test2 ALL COLUMNS; 
ANALYZE

Read: