Hadoop systems are mostly best suited for batch processing. Reporting is not recommended on Hadoop Hive or Impala. Sometimes to enable faster reporting, organizations transfer the processed data from Hadoop ecosystem to high performance relational databases such as Netezza. In this article, we will check Sqoop export Hive tables into Netezza with working examples.
Sqoop Export Hive Tables into Netezza
In some cases, data processed by Hadoop ecosystem may be needed in production systems hosted on relational databases to help run additional critical business functions and generate reports. The Sqoop can exports a set of files from HDFS to an RDBMS. The target table must already exist in the database. The input files are read and parsed into a set of records according to the user-specific delimiters. You can either export HDFS directory or Hive tables based on your requirements.
Related articles:
Because the target system is RDBMS, the default operation is to transform these into a set of INSERT statements that inject the records into the database. There is also an update record option available with Sqoop export command.
Sqoop Export Hive Table to Netezza Table Command
Below is the Sqoop export command or connection string that you can use to export Hive table into Netezza.
sqoop export --connect jdbc:netezza://host_ip/DatabaseName --table CUSTOMER_1 --username admin --password password --hcatalog-table CUSTOMER -m 1
Sqoop Export Hive Tables into Netezza Example
Below is the example to export Hive table into Netezza database:
$ sqoop export --connect jdbc:netezza://192.168.100.200/NEWTPCDS --table CUSTOMER_1 --username admin --password password --hcatalog-table CUSTOMER -m 1
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
...
...
19/03/08 11:55:56 INFO mapreduce.ExportJobBase: Exported 90 records.
19/03/08 11:55:56 INFO mapreduce.ExportJobBase: Publishing HCatalog export job data to Listeners
You can verify the Netezza database for the exported data.
=> SELECT C_CUSTOMER_SK, C_CUSTOMER_ID, C_CURRENT_CDEMO_SK FROM CUSTOMER_1 LIMIT 5;
C_CUSTOMER_SK | C_CUSTOMER_ID | C_CURRENT_CDEMO_SK
---------------+------------------+--------------------
6013 | AAAAAAAANHHBAAAA | 1376433
18529 | AAAAAAAABGIEAAAA | 1269298
43621 | AAAAAAAAFGKKAAAA | 962929
96330 | AAAAAAAAKEIHBAAA | 1882761
6013 | AAAAAAAANHHBAAAA | 1376433
(5 rows)
Hope this helps?