Today there is a increased demand in advanced analytics on big data. Netezza is designed with built in functionalities to perform advanced analytics on really big data sets. To improve the performance, you should follow some Netezza best practices.
Best practice should not mean hundreds of rules and regulations to follow. Recommended that basic principles are followed on following features of Netezza:
Netezza Best Practices on Distributions
In a Netezza data warehouse appliance good distribution is fundamental element of good performance. You must choose the right distribution key to distribute the data evenly across all the data slices. The best distribution method is to make the tables “collocated”.
Bad distribution cause the data skew. That means the machine fills up much quicker. That affects the query in hand and others as the data slice has more work to do. Consider distributing on random if you don’t know the right distribution key.
Netezza Best Practices on choosing Data Types
Data warehouses contain large volumes of data . Tables may contain billions of numbers of rows. It is important to select a data type that will minimize disk storage requirements and minimize scan time. The right choosing right data type even reduce the disk I/O.
Netezza Best Practices on Statistics
The optimizer relies on GENERATE STATISTICS to gather statistics about the tables in the database to determine the most efficient way to execute a query. It is important that the optimizer always has high quality statistics to choose the best execution plan from all possible plans
The nzsql GENERATE STATISTICS command generates statistics about each table column’s proportion of stats, including duplicate values, maximum value, minimum value, null values, dispersion values and updates the system catalog table.
Netezza Best Practices on Zone Maps
A zone map is an internal mapping structure to show the range (min and max) of values within each page.
During scans, zone maps reduces I/O by skipping pages that did not qualify the query parameters . Zone maps are internal to the system thus no administration involved.
Netezza Best Practices on Clustered Base Tables
A Netezza clustered base tables (CBT) are user table that has data which is organized using one to four organizing keys columns. You can specify max four columns in organize on clause and those columns should not be a part of distribute on clause.
Netezza Best Practices on Groom Table Command
Groom tables that receive frequent updates or deletes or if a load or insert is aborted as this will result in deleted rows. You can groom at a record level to remove all deleted records regardless of their location. This will give you the best space gains but take longer.
For the best performance with Netezza data warehouse appliance use star schema as much as possible. That is, fact and dimensions should use same key.