With growing data, organizations are moving computation part to Hadoop ecosystem. Apache Sqoop is an open source tool to import data from relational databases to Hadoop and vice versa. Apache Sqoop is one of the easiest tool to import relational database such as Netezza into Hadoop ecosystem. The Sqoop command allows you to import all tables, single table, execute query and store result in Hadoop HDFS. In this article, we will check how to import Netezza tables using Sqoop with some practical examples.
Sqoop uses a connector based architecture which supports plugins that provide connectivity to new external systems.
Import Netezza Tables using Sqoop
Netezza is relational database system with limited storage, usually organizations migrate required workload to Hadoop.
Related Articles:
- Import data using Apache Sqoop
- Sqoop import Relational Database Table into HBase Table
- Sqoop Command with Secure Password
- Steps to Import Oracle Tables using Sqoop
Apache Sqoop supports Netezza connectors. Using Sqoop, you can import the data from external system such as Netezza on to HDFS, and populate tables in Hive and HBase.
Import Netezza Tables into Hive using Sqoop
As mentioned earlier, you can use sqoop to import Netezza tables into Hive metastore.
Apache Sqoop does following when you invoke import:
- It populats the Hive metastore (tables) with the appropriate metadata for the table.
- invokes the necessary commands to load the table or partition
Sqoop Command to Import Netezza Table
Below is the Sqoop connection string or Sqoop import command that you can use to import Netezza tables into Hadoop Hive or Hadoop HDFS.
sqoop import --connect jdbc:netezza://host_ip/database --table tableName --username userName --password password --hive-import --hive-database DatabaseName --create-hive-table --hive-table tableName -m 1
Import Netezza Tables into Hive using Sqoop Example
Below is the example of importing CUSTOMER table into Hive database table:
$ sqoop import-all-tables --connect jdbc:netezza://192.168.100.123/RETAIL --username admin --password password --hive-import --hive-database vithal_test --create-hive-table -m 1
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
...
...
Time taken: 0.843 seconds
Note: /tmp/sqoop-impadmin/compile/363c30ac0559f7783dd473977a3d71ca/WEB_SALES.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties
OK
Time taken: 0.423 seconds
Loading data to table vithal_test.web_sales
Table vithal_test.web_sales stats: [numFiles=1, numRows=0, totalSize=402, rawDataSize=0]
OK
Time taken: 0.912 seconds
You can verify the Hive database for created table.
0: jdbc:hive2://192.168.100.203:10000/default> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| customer |
| test1 |
+-----------+--+
2 rows selected (0.038 seconds)
Import all Tables into Apache Hive using Sqoop
You can use sqoop to import all tables from Netezza database into Hive.
Sqoop command to Import all Tables
Below is the sqoop command or connection string that you can use to import all tables from relational databases.
sqoop import-all-tables --connect jdbc:netezza://host_ip/database --username admin --password password --hive-import --hive-database vithal_test --create-hive-table -m 1
As it will create jar and perform migration, it may take some time to migrate all your tables.
Import all Netezza Tables into Hive using Sqoop Example
Below is the example of importing all table from Netezza database into Hive database table:
$ sqoop import-all-tables --connect jdbc:netezza://192.168.100.123/RETAIL --username admin --password password --hive-import --hive-database vithal_test --create-hive-table -m 1
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
...
...
Time taken: 0.843 seconds
Note: /tmp/sqoop-impadmin/compile/363c30ac0559f7783dd473977a3d71ca/WEB_SALES.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties
OK
Time taken: 0.423 seconds
Loading data to table vithal_test.web_sales
Table vithal_test.web_sales stats: [numFiles=1, numRows=0, totalSize=402, rawDataSize=0]
OK
Time taken: 0.912 seconds
You can verify the Hive database for imported table.
0: jdbc:hive2://192.168.100.203:10000/default> show tables;
+-------------------------+--+
| tab_name |
+-------------------------+--+
| catalog_sales |
| customer |
| customer_address |
| customer_demographics |
| date_dim |
| household_demographics |
| income_band |
| item |
| store |
| store_returns |
| store_sales |
| test1 |
| warehouse |
| web_sales |
+-------------------------+--+
14 rows selected (0.038 seconds)
Hope this helps 🙂