Redshift Analyze command is used to collect the statistics on the tables that query planner uses to create optimal query execution plan using Redshift Explain command.
Analyze command obtain sample records from the tables, calculate and store the statistics in STL_ANALYZE table. You can generate statistics on entire tables or on subset of columns. You can specify comma-separated column list for analyze command.
How Statistics are collected?
Redshift collects statistics in various ways.
- Statistics are automatically collected for certain database operations.
- Collect statistics for entire table or subset of columns using Redshift ANALYZE commands.
- You can generate statistics on entire database or single table.
Automatic Redshift Statistic Collection
Amazon Redshift runs the ANALYZE command to collect statistics for following commands:
- CREATE TABLE AS
- CREATE TEMP TABLE AS
- SELECT INTO
When you need to Run Redshift ANALYZE Command?
You don’t need to collect statistics on all columns or on external tables. If the data in the Redshift tables changes substantially, analyze the columns that are frequently used in following commands:
- Sorting and grouping operations
- Joins
- Query predicates – columns used in FILTER, GROUP BY, SORTKEY, DISTKEY
Redshift ANALYZE Command Syntax
Below is the ANALYZE command syntax:
ANALYZE [ VERBOSE ] [ [ table_name [ ( column_name [, ...] ) ] ] [ PREDICATE COLUMNS | ALL COLUMNS ];
Where;
- VERBOSE – Display the ANALYZE command progress information.
- Table_name – Name of the table to be analyzed.
- Column_name – Name of the tables in the column to be analyzed.
- PREDICATE COLUMNS | ALL COLUMNS – Specify whether to analyze predicate columns or all column. By default it is ALL COLUMNS.
Redshift ANALYZE Command Threshold
To improve Redshift system performance and reduce processing time, Redshift skips ANALYZE for a table if the percentage of table rows that have changed since the last ANALYZE command run is lower than the threshold specified by the analyze_threshold_percent parameter.
You can set the variable before collecting statistics using analyze command.
set analyze_threshold_percent to 30;
Redshift Analyze Best Practices
Below are some of best practices to run ANALYZE command:
- To improve the query performance, run ANALYZE command before running complex queries.
- Try to run ANALYZE command with PREDICATE COLUMNS clause. This will save your time and cluster resources.
- Run ANALYZE on the table that undergo significant changes i.e. on the table you perform DELETE and UPDATE regularly.
- Schedule the ANALYZE command at regular interval to keep statistics up-to-date
Redshift ANALYZE Command Examples
Analyze all tables in given Redshift database:
analyze verbose; training=# analyze verbose; ... INFO: analyzing "public.tr_test" ANALYZE training=#
Analyze single table in Redshift Database:
analyze tr_test; training=# analyze tr_test; ANALYZE SKIP
Redshift Analyze specific columns from redshift table:
analyze tr_test(id, name); training=# analyze tr_test11(id, name); ANALYZE
Redshift Analyze only predicate columns:
analyze tr_test predicate columns; training=# analyze tr_test1 predicate columns; ANALYZE
Redshift Analyze ALL columns from Redshift table:
analyze tr_test ALL COLUMNS; training=# analyze tr_test2 ALL COLUMNS; ANALYZE
Read:
- Redshift VACUUM Command and Examples
- Redshift Explain Command and Examples
- How to Optimize Query Performance on Redshift?