Identify and Remove Netezza Duplicate Records in Table

  • Post author:
  • Post last modified:September 11, 2019
  • Post category:General
  • Reading time:3 mins read

Netezza do not have primary or unique key. You can insert the duplicate records in the table. There are no constraints to ensure uniqueness or primary key, but if you have a table and have loaded data twice, then you can de-duplicate in several ways. Below methods explain you how to identify and Remove Netezza Duplicate Records

Read:

1. Use Intermediate and DISTINCT Keyword

You can remove the Netezza duplicate records by creating another table using DISTINCT keyword while selecting from original table.

For example:

create table int_table as select distinct * from base_table;

2. Use Netezza ROWID to Identify Duplicate Records

You can identify the duplicate records in Netezza table by using rowid column. Below is the example that identifies the duplicate records in Netezza table.

select * from test t1
where rowid = (select max(rowid) from test t2 where t1.id = t2.id );

This method is supported almost all the databases like Oracle, mysql etc.

3. Use Netezza ROWID to Delete Duplicate Records:

You can delete the duplicate records in Netezza table by using rowid column. Below is the example that delete the duplicate records in Netezza table.

delete from test t1
where rowid <> (select max(rowid) from test t2 where t1.id = t2.id );

This method is also supported almost all the databases like Oracle, mysql etc.

4. Use Intermediate table with row_number() analytics functions:

The row_number analytic function is used to rank or number the rows. Here we use the row_number function to rank the rows for each group of record and then select only record from that group:

For example;

Create table int_table As 
Select col1, col2 
From 
( 
 Select col1, col2 
 row_number() over (partition by col1 order by col2) rno 
 From table1 
) 
Where rno = 1;

You can use any of the above method to identify and Remove Netezza Duplicate Records