Greenplum Analyze and Examples

  • Post author:
  • Post last modified:May 6, 2019
  • Post category:Greenplum
  • Reading time:3 mins read

The most important prerequisites for good query performance is to collect the table statistics time to time using Greenplum analyze command.

Greenplum analyze

Greenplum analyze collects statistics about the contents of tables in the database, and stores the results in the system catalog table pg_statistic. Greenplum database uses these statistics to determine the best execution plan for the queries.

Syntax:

ANALYZE [VERBOSE] [ROOTPARTITION [ALL] ] [table [ (column [, ...] ) ]]

Where:

ROOTPARTITION [ALL]: Collect statistics only on the root partition of partitioned tables.

VERBOSE: Enables display of progress messages.

Table: Name of the table to analyze. By default, it’ll collect the stats for all tables in the database.

Column: Name of the particular column to analyze. By default, all columns.

How and When to run Greenplum Analyze?

It is a good idea to run analyze periodically, or just after making major changes in the contents of a table. You should follow the common stratergy to run the vacuum and anayze once in a day when the server workload is low. Run Analyze in below scenarios

  • You should run analyze when data loading is completed
  • Collect statistics afeter CREATE INDEX operations
  • Run Greenplum analyze when you INSERT, UPDATE or DELETE data
  • Run analyze when you perform major changes to the table conent

There are three ways to initiate an analyze operation:

  • You can run the ANALYZE command directly.
  • You can run the analyzedb management utility outside of the database, at the command line.
  • An automatic analyze (set gp_autostats_mode parameter) operation can be triggered when DML operations are performed on tables that have no statistics or when a DML operation modifies a number of rows greater than a specified threshold (set gp_autostats_on_change_threshold value).

Related Articles:

Greenplum Analyze Examples

Collect statistics about table patient:

tutorial=> analyze patient; 
ANALYZE

Collect statistics about single column from patient table:

tutorial=> analyze patient(pat_id); 
ANALYZE

Collect statistics for all table in database:

tutorial=> analyze; 
ANALYZE