Netezza Implicit Skew is the Netezza skew that occurs within the database when processing large data sets. The implicit Netezza skew are very difficult to identify. You can read more on Netezza Skew and How to avoid it.
Netezza Implicit Skew
Netezza implicit skew is occurs when data get redistributed or broadcasted on some other column to perform join operations. Data will be redistributed or broadcasted to perform co-located joins. The column on which data get redistributed or joined could be skewed that is, most of the redistributed data get inserted to in few or single disk (SPU) on the system. These types of skews can only occur in run-time when we join the two large tables using SQL.
Read:
- Netezza Skew and How to avoid it.
- Netezza SPU Partition error- Resolved
- Importance of right Netezza Distribution key
- Netezza FPGA – FAST Framework in Netezza
- IBM Netezza Sequence and how to Create/Use it
Netezza Implicit Skew Example
Let us understand the actual Implicit Netezza Skew situation with the help of an example.
In the following tables PAT ID refers to patient ID and DIVN refers to hospital division. In the below example DIVN number is same for all the patients indicating they belong to same DIVN in hospital (e.g. cardiac, urology etc)
PAT_ID | DIVN_ID |
100 | 2001 |
101 | 2001 |
102 | 2001 |
103 | 2001 |
104 | 2001 |
Below is the DIVN table:
DIVN_ID | DIVN_NAME |
2001 | Cardiac |
2002 | Urology |
If you write query that that take Patients belongs to particular division just like below
SELECT PAT_IDFROMPATIENT PATJOINDIVISION DIVON (PAT.DIVN_ID = DIV.DIVN_ID);
In order to join above two tables, data should have to be co-located. Data in one of the table will have to be redistributed or broadcasted to make co-located tables. Say DIVN table get re-distributed on DIVN_ID. This makes the Netezza to place all data belong to the DIVN_ID 2001 to single disk causing skew. These types of skew are called implicit or intermediate skew. You cannot easily identify the implicit Netezza Skew without proper knowledge of data and table structures.
If there are any implicit data skew system may stop working or you may end up getting Netezza SPU partition errors.