Collocated Joins in Netezza: Optimizing Query Performance

Post author:Vithal S
Post last modified:April 24, 2019
Post category:Netezza
Reading time:4 mins read

In Netezza, if two tables are distributed on same column then such tables are called collocated tables. If you join two collocated tables than each SPU in system works 100% independent to each other as the relevant rows required for joining these tables exist in the same SPU. These types of joins are called collocated joins.

Collocated Joins

Collocated Joins Example

Let’s take an examples, assume that we have two tables a patient and hospital, both are distributed on patient id. Netezza distributes the rows with same distribution key (column) to same data slice.

Read:

Let’s say, we have inserted sample 5 records to each table. If you verify the distribution, Netezza should distribute rows with same Patient id to same data slice.

Patient table:

SELECT DATASLICEID,PAT_ID,HOSP_ID,PAT_NAME
FROM PAT_DTL
GROUP BY DATASLICEID,PAT_ID,HOSP_ID,PAT_NAME ;

DATASLICEID	PAT_ID	HOSP_ID	PAT_NAME
2	1	111	abc
3	2	111	bcd
4	3	222	cde
5	4	111	efg
6	5	222	fgh

Hospital table:

SELECT DATASLICEID,PAT_ID,HOSP_ID,HOSP_NAME
FROM HOSP_DTL
GROUP BY DATASLICEID,PAT_ID,HOSP_ID,HOSP_NAME;

DATASLICEID	PAT_ID	HOSP_ID	HOSP_NAME
2	1	111	AAA
3	2	111	AAA
4	3	222	BBB
5	4	111	AAA
6	5	222	BBB

If you join these two tables, each SPU in the Netezza server works 100% independent to other SPU as required data is available in itself as shown in above example.

In Collocated joins, each SPU operates independently without any network traffic or communication between SPU’s hence improves the performance of the query.

Collocated Joins, Redistribute and Broadcast

If you join two tables on columns which are not distribution key, then system has to redistribute either of the table on the join column to make collocated joins. This process of redistribution will impact the performance.

The decision about the re-distribution or broadcast is made by the optimizer and is based on the expected table size. If the table is large, Netezza will re-distribute otherwise broadcast. In the broadcast, Netezza host will send the copy of small table to each SPU thus making collocated joins

Tags: Collocated Joins, Netezza, Netezza Joins, Optimize Netezza Query Performance

This Post Has 2 Comments

Tom Harrocks June 5, 2018

When we talk about the “Same” column does this mean a column named the same or does this mean that each table is distributed on a column within each table and then both those columns are used in the join.
For example.. assuming T1 and T2 are both distributed respectively on C1 and C2. Further, we assume that C1 and C2 are the same data type…. The would a join of the form
SELECT T1.C1,T2.C2
FROM T1
INNER JOIN T2 ON T1.C1 – T2.C2

Would this yield a co-located join?
1. Vithal S June 6, 2018
  
  Hi Tom,
  
  If two tables are distributed on same column which has same data and if you join two tables on that column then tables will be co-located. For examples, consider T1 is distributed on C1 (say date_SK) and T2 is distributed on C2 (say same date_sk), both tables will be co-located if you join them on C1 and C2.
  
  Thanks,

Comments are closed.