Netezza Implicit Skew with an Example

  • Post author:
  • Post last modified:February 27, 2018
  • Post category:Netezza
  • Reading time:3 mins read

Netezza Implicit Skew is the Netezza skew that occurs within the database when processing large data sets. The implicit Netezza skew are very difficult to identify. You can read more on Netezza Skew and How to avoid it.

Netezza Implicit Skew

Netezza Implicit Skew

Netezza implicit skew is occurs when data get redistributed or broadcasted on some other column to perform join operations. Data will be redistributed or broadcasted to perform co-located joins. The column on which data get redistributed or joined could be skewed that is, most of the redistributed data get inserted to in few or single disk (SPU) on the system. These types of skews can only occur in run-time when we join the two large tables using SQL.

Read:

Netezza Implicit Skew Example

Let us understand the actual Implicit Netezza Skew situation with the help of an example.

In the following tables PAT ID refers to patient ID and DIVN refers to hospital division. In the below example DIVN number is same for all the patients indicating they belong to same DIVN in hospital (e.g. cardiac, urology etc)

PAT_ID DIVN_ID
100 2001
101 2001
102 2001
103 2001
104 2001

Below is the DIVN table:

DIVN_ID DIVN_NAME
2001 Cardiac
2002 Urology

If you write query that that take Patients belongs to particular division just like below

SELECT  PAT_IDFROMPATIENT PATJOINDIVISION DIVON (PAT.DIVN_ID = DIV.DIVN_ID);

In order to join above two tables, data should have to be co-located. Data in one of the table will have to be redistributed or broadcasted to make co-located tables. Say DIVN table get re-distributed on DIVN_ID. This makes the Netezza to place all data belong to the DIVN_ID 2001 to single disk causing skew.  These types of skew are called implicit or intermediate skew. You cannot easily identify the implicit Netezza Skew without proper knowledge of data and table structures.

If there are any implicit data skew system may stop working or you may end up getting Netezza SPU partition errors.