Migrating Netezza Data to Hadoop Ecosystem and Sample Approach

  • Post author:
  • Post last modified:March 4, 2019
  • Post category:BigData
  • Reading time:4 mins read

In my other post ‘Migrating Netezza to Impala SQL Best Practices’, we have discussed various best practices to migrate the Netezza SQL scripts to Impala SQL. In this article, we will discuss steps on Migrating Netezza Data to Hadoop Ecosystem.

Migrating Netezza Data to Hadoop HDFS

Migrating Netezza Data to Hadoop Ecosystem – Offload Netezza data to Hadoop HDFS

Now a days Hadoop ecosystem is gaining popularity and organization with huge data wants to migrate to Hadoop ecosystem for their faster analytics that includes real-time or near real-time.

Steps to Migrating Netezza Data to Hadoop Ecosystem

There are two steps two steps to migrate Netezza data to Hadoop HDFS.

Migrate DDL and DML’s to Hadoop ecosystem

Migrating DDL or DML is a tricky part in this whole migration process. You have to convert DDL’s if you want to use Hive or Impala on top of Hadoop HDFS.

While migrating Netezza to Hadoop, expect some differences with respect to syntax and data types. You have to consider all the syntax changes for the compatibility.

Read my other post ‘Migrating Netezza to Impala SQL Best Practices for more information on migrating Netezza DDL and DML’s to Hadoop ecosystem.

Migrate Netezza Data to Hadoop Ecosystem

Another important step in this migration is offload Netezza data to Hadoop HDFS. There are couple options available:

Related articles,

Import data using Apache Sqoop

You can use the Sqoop to migrate data from Netezza to Hadoop HDFS. Sqoop allows easy import of data from structured data stores such as Netezza databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system onto Hadoop HDFS, and populate tables in Hive and HBase. Sqoop can integrates with Oozie, allowing you to schedule and automate tasks.

Read on how to use the Apache Sqoop on my other post: Import data using Apache Sqoop

Sqoop works best if both Netezza and Hadoop ecosystems are on same network i.e. connected to same private LAN.

Offload Netezza Data to External Drive

If you have bandwidth limitation then the suggestion would be to mount storage in Netezza Host or edge node to offload required tables in the form of flat files ( preferably text files).

You can use Netezza external table to offload the large tables. Read below post to learn information on how to use and create Netezza external tables:

Then you can again mount that storage to Hadoop edge node as well and move those flat files to Hadoop HDFS. This is one of the fastest way to get data into Hadoop ecosystem.

Follow post to learn on how to mount external drive on Linux system.

This Post Has 2 Comments

  1. Mani

    Hi,

    Could you please provide steps about how to migrate data from netezza to hadoop using putty.

    Thanks,
    Mani

    1. Vithal S

      Hi Mani,

      Simple and best approach is to login to your Hadoop cluster using Putty and use sqoop command to migrate required table to Hadoop ecosystem.

      You can follow the post: Import data using Apache Sqoop

      Please let me know how it goes 🙂

      Thanks

Comments are closed.