Working with Netezza Clustered Base Tables (CBT)

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Netezza
  • Reading time:4 mins read

A Netezza clustered base tables (CBT) are user table that has data which is organized using one to four organizing keys columns. You can specify max four columns in organize on clause and those columns should not be a part of distribute on clause.

An organizing key is a column of the table that you specify for clustering the table records; organizing table helps Netezza to save records in same or nearby extents. You can organize the records using “ORGANIZE ON” clause. Netezza does create zone maps on organizing columns, which will accelerate the performance of queries on that particular table.

Read:

When to use Netezza Clustered Base Tables (CBT)?

CBT’s are most commonly used for fact table which has billions of records. If such a large table doesn’t have organizing column and that column is used to restrict, scanning such a huge table requires lot of processing time as full table could be needed to get relevant records. By organizing records on restrictive column allows Netezza to group records in same or nearby extents, this process will also create zone maps that improves the performance.

How Netezza Clustered Base Tables (CBT) Works?

Below simple example is a model of sales table. Each colour indicates unique transaction. In its unorganized form, data is organized by date/time of transaction. If you query sales table by restricting results on date/time, those query will perform well because organization matches the query restriction.

If you use the transaction type in restricting query, then you can improve the performance of query by organizing data on transaction type column. Netezza groups those transactions types in same or nearby extents.

Netezza Clustered base tables

Cluster base table benefits

  • Multi-dimensional lookups: CBT supports multi-dimensional lookups by organizing data on one-to-four columns
  • Zone maps: Netezza creates the zone maps on organizing columns hence the performance would be better
  • Save disk space:CBT do not replicate the base table data
  • Cluster base tables increase the supported data types for zone-mapped columns
  • CBT allows incremental organization of data within user tables. CBTs can help you to eliminate or reduce pre-sorting of new table records prior to a load/insert operation

Selecting organizing key for Netezza Clustered Base Tables (CBT)

Netezza allows you to select one-to-four columns as organizing key. We will not use all four columns as organizing key columns, at most three key columns are used.

  • Review the column and data type of the very large table. You should also verify the type of queries that run against those tables.
  • If you typically run queries on one dimensional i.e. if you use one column, say, date to restrict data then you can load data on date column
  • If you run queries on two dimension i.e. use two columns, say ID and pin/zip to restrict data then CBT can do trick for you to increase performance
  • The organizing key must be columns that can be referenced in zone maps

Reorganizing the Table data and Copy Netezza Clustered Base Tables (CBT)

Netezza reorganize the data on specified columns when you invoke GROOM TABLE command. Groom command reorganizes data in each slice. User can perform SELECT, UPDATE, DELETE and INSERT during grooming operation.

Read:

If you copy the CBT’s by using command like CREATE TABLE AS, new table will not get organizing key columns. Instead, you should use ALTER TABLE ORGANIZE ON command to add organizing keys.