Netezza Generate Statistics: A Guide and Best Practices

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Netezza
  • Reading time:4 mins read

Netezza uses the cost based optimizer to determine best methods determine redistribution, scan, join, join orders. The optimizer uses statistics information to determine the most efficient way to run a query. System uses the Netezza GENERATE STATISTICS command to generate information about each column proportion of the duplicate values,unique values, NULL values and the maximum and minimum values.

Netezza GENERATE STATISTICS

How Statistics are collected?

Netezza system collects statistics in various ways:

  • Statistics are automatically generated for certain database operations
  • Collect full statistics by running Netezza GENERATE STATISTICS command. This operation is required when table changes significantly, say at least 10%
  • Just In Time Statistics (JIT): Runs at every execution time using sampling and zone maps
  • You can generate statistics on database or on individual tables. You should maintain the table by running Netezza GENERATE STATISTICS SQL command on regular bases. It usually collects all level of statistics, including dispersion values.

Automatic Netezza Database Statistics

The Netezza system automatically generates two ‘Base Statistics’ (table row count and min/max values for non-character columns) when performing certain database operations:

Netezza GENERATE STATISTICS Syntax

Syntax of the command:

 GENERATE STATISTICS ON <tablename> [ (<col>[,<col>… ]) ];

Usage:

The following provides sample usage.

Generate statistics for all column of the table TEST:

 TRAINING.ADMIN(ADMIN)=>GENERATE STATISTICS ON TEST;

Generate statistics for all the tables in database TRAINING:

 TRAINING.ADMIN(ADMIN)=>GENERATE STATISTICS;

Netezza GENERATE STATISTICS Best Practices

Following are some of best practices when you are working on generate statistics:

  • It is very important to collect full statistics by running Netezza GENERATE STATISTICS command when table changes significantly, say at least 10%
  • You should always collect statistics when perform nzload, insert, update, delete or truncate operations
  • Generate statistics for columns which are often used. For examples, columns used in JOIN conditions, WHERE clause, GROUP BY clause and ORDER BY clause
  • You should GENERATE STATISTICS on all TEMP tables those are explicitly used in JOIN
  • You should not abort the GENERATE STATISTICS command as this will disable the zone maps and rebuilt it. Any interruption leaves the zone maps disabled and this of course impact the performance.

Read:

JUST IN TIME (JIT) Statistics and Usage

Netezza automatically gathers statistics on the table with actual restrictions using sampler scan functionality, Just-In-Time (JIT) for planning.

Read: Netezza Just in time Statistics

JIT Stats is automatically run on all qualified tables, such as:

  • The table contains more than 5 Million rows
  • Query contains at least one column restriction
  • Restrictions do not contain sub-query or sub-plan expressions
  • Must participate in a JOIN or must have a MATERIALIZED VIEW associated with them
  • Must be a database table on disk No system tables, EXTERNAL TABLES or virtual tables
  • JIT Dispersion values calculated on queried columns of tables > 500 Million rows
  • JIT stats does not eliminate the need to run GENERATE STATISTICS

Related reading: