Apache Sqoop can be used to transform relational database table into HBase tables. You have to follow some process to import relational database or data warehouse tables into HBase schema. In this article, we will check on Sqoop import relational database table into HBase table and some working examples for the same.
Sqoop import Relational Database Table into HBase Table
You cannot directly import entire data warehouse or relational database tables into HBase. HBase is column oriented and the schema design is way different for HBase tables compared to Hive or Cloudera Impala.
Read my other post on Sqoop Export HBase table
You have to write separate sqoop command to import data for each column in table.
Sqoop import Relational Database Table into HBase Table Examples
We will demonstrate the importing Netezza relational data warehouse table into HBase. The process is same for other relational databases such as Mysql, Oracle, Teradata, SQL Server, Sybase etc. Please note that, we have used simple table for demonstration, we have not tested the process with Sqoop import all tables to HBase.
You have to create the HBase table first before attempting to use Sqoop import with HBase related options.
Check HBase Created Tables
Below is the table description that we have created in HBase:
hbase(main):002:0> describe 'sample_test' Table sample_test is ENABLED sample_test COLUMN FAMILIES DESCRIPTION {NAME => 'location', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRE SSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSIO N => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.4640 seconds
HBase Sqoop Import
Below are the Sqoop commands that we have used for this import:
Please not that, we will use ROW_KEY from Netezza table as default row_key in HBase table.
HBase Import First column
$ sqoop import \ > --connect jdbc:netezza:// hostname /mydb \ > --username vithal -P \ > --table SAMPLE_TEST \ > --columns "ROW_KEY,NAME" \ > --hbase-table sample_test \ > --column-family name \ > --hbase-row-key ROW_KEY -m 1 Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/09/22 15:12:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245 Enter password: … 17/09/22 15:13:04 INFO mapreduce.ImportJobBase: Retrieved 4 records.
HBase Import Other Column
$ sqoop import \ > --connect jdbc:netezza://hostname/mydb \ > --username vithal -P \ > --table SAMPLE_TEST \ > --columns "ROW_KEY,LOCATION" \ > --hbase-table sample_test \ > --column-family name \ > --hbase-row-key ROW_KEY -m 1 Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/09/22 15:14:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245 Enter password: … 17/09/22 15:15:06 INFO mapreduce.ImportJobBase: Retrieved 4 records.
Verify the HBase Table
Now verify the HBase table for the imported data:
hbase(main):003:0> scan 'sample_test' ROW COLUMN+CELL 100 column=name:LOCATION, timestamp=1506073444702, value=Bengaluru 100 column=name:NAME, timestamp=1506073321891, value=ABC 100 column=name:name, timestamp=1506073182109, value=abc 101 column=name:LOCATION, timestamp=1506073444702, value=Bengaluru 101 column=name:NAME, timestamp=1506073321891, value=BCD 102 column=name:LOCATION, timestamp=1506073444702, value=Mumbai 102 column=name:NAME, timestamp=1506073321891, value=CDE 103 column=name:LOCATION, timestamp=1506073444702, value=New Delhi 103 column=name:NAME, timestamp=1506073321891, value=DEF 4 row(s) in 0.1450 seconds
Importing relational database table in HBase, Hive or Impala allows you start using Hadoop ecosystem straight away without worrying about anything.
You can also check out Apache Sqoop import commands and various examples in below articles:
- Apache Sqoop Architecture
- Sqoop Export HBase Table into Relational Database
- Sqoop Official Documentation
- Sqoop Command with Secure Password
- Import data using Apache Sqoop
- Export data using Apache Sqoop