Analyze and Optimize Netezza Query Performance

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Netezza
  • Reading time:6 mins read

Analyze and Optimize Netezza Query Performance is very important part if you are performing the reporting on top of Netezza or processing large amount of data. Netezza query tuning is very important part to improve performance.

Analyze and Optimize Netezza query performance

Netezza uses cost based optimizer to determine the best methods for scan and join operations, orders and redistribute or broadcast operations. Netezza system may redistribute the data for some situations like joins, grouping aggregates, create tables and when loading data into Netezza tables. System will decide the redistribution based on the cost like table size.

Read:

Netezza Query Performance Issues

Identify the issue with your query;

  • Determine what is running on the system (sessions and queries)
  • Look for long-running queries
  • For long-running queries analyze the Plan file

Query performance may be impacted by:

  • Processing skew
  • Large table broadcast
  • Large table redistribute
  • Zone maps not invoked
  • Poorly written SQL queries
  • Check for Processing Skew

Here are several ways to determine the skew in Netezza. You can either use the nzadmin tool to determine the skew in the table or use nzsql query to determine same.

Table Skew

Verify the table distribution and any skew using nzadmin tool.

You can also use the nzstats or nzsql to verify the distributiona and data skew

$ nzstats -type table
$ nzsql –d retail 
$retail(admin)=>SELECT datasliceid, COUNT(datasliceid) AS "Rows" 
FROM Table_name GROUP BY datasliceid 
ORDER BY "Rows";

Output:

DATASLICEID Rows
19 2997656
35 2998000
45 3001234
3 3005260
2 3009093
1 2990987

You can also check the processing skew by running below query;

$nzsqa responders –sys

Avoid Large Table Broadcast

Imagine a copy of a table with billions of records in each SPU of Netezza box, SPU will became full or system may perform poorly. Lets take example, you have the table A with billions of records in it, and other large table B. Both tables are distributed on different columns and now when you join these two table, there is no chance that these two table will become co-located, hence table A will be broadcasted to all SPU in the Netezza box. This process will cause some system to fill up the partitions or system may crash.

Note that an important db parameter “factrel_size_threshold” holds that triggering number; any table beyond these many number of rows is considered as a fact table. That is, if this parameter is set to 1 million, any table holding more than 1 million rows is considered as a fact table and will not result in this broadcast.

$ nzsql –d retail 
$retail(admin)=>Show factrel_size_threshold;
$retail(admin)=>Set factrel_size_threshold = 10000000;

Choose Better Distributions Key to Optimize Netezza Query Performance

The distribution of the data across the various disks is the single most important factor that can impact performance.

Follow below guidelines:

  • Choose a column that evenly distributes data across all SPUs
  • Choose columns that are frequently used in large table joins to get collocated joins
  • Use the same data type for the join columns
  • Choose a single column – Avoid multi-column distributions if possible
  • Choose RANDOM distribution for small dimension and lookup tables only

Read:

Importance of Netezza Zone Maps

Zone maps are internal mapping structure to the extents, that take advantage if internal ordering of data to eliminate the extents that need not be scanned. Netezza create the zone maps for column with data types such as integer, data, time, flags etc.

Netezza will also create the zone maps for the columns which are used in organize on clause by Clustered base tables.

Read:

Analyze Poorly written SQL queries to Optimize Netezza Query Performance

Poorly written SQL is often times the primary offender affecting query performance. You should understand the Netezza SQL grammar and write the SQL queries as per standards.

Optimize Netezza Query Performance: INSERT/UPDATE/DELETE

Follow the below guidelines to improve the performance of the SQL queries;

  • Use TRUNCATE TABLE to delete all rows from a table instead of delete
  • Do not DROP the table and (re)create the table
  • For singleton INSERT, UPDATE, DELETE operations always wrap inside a transaction (BEGIN/COMMIT)
  • To perform a bulk UPDATE or DELETE to many rows, insert into a TEMP table then perform an UPDATE JOIN or DELETE JOIN
  • When unloading large data sets use EXTERNAL TABLES instead of nzsql

Read: