Sqoop import Relational Database Table into HBase Table

  • Post author:
  • Post last modified:March 7, 2019
  • Post category:BigData
  • Reading time:5 mins read

Apache Sqoop can be used to transform relational database table into HBase tables. You have to follow some process to import relational database or data warehouse tables into HBase schema. In this article, we will check on Sqoop import relational database table into HBase table and some working examples for the same.

Sqoop import Relational Database Table into HBase Table

Sqoop import Relational Database Table into HBase Table

You cannot directly import entire data warehouse or relational database tables into HBase. HBase is column oriented and the schema design is way different for HBase tables compared to Hive or Cloudera Impala.

Read my other post on Sqoop Export HBase table 

You have to write separate sqoop command to import data for each column in table.

Sqoop import Relational Database Table into HBase Table Examples

We will demonstrate the importing Netezza relational data warehouse table into HBase. The process is same for other relational databases such as Mysql, Oracle, Teradata, SQL Server, Sybase etc. Please note that, we have used simple table for demonstration, we have not tested the process with Sqoop import all tables to HBase.

You have to create the HBase table first before attempting to use Sqoop import with HBase related options.

Check HBase Created Tables

Below is the table description that we have created in HBase:

hbase(main):002:0> describe 'sample_test'
Table sample_test is ENABLED
sample_test
COLUMN FAMILIES DESCRIPTION
{NAME => 'location', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRE
SSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSIO
N => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

2 row(s) in 0.4640 seconds

HBase Sqoop Import

Below are the Sqoop commands that we have used for this import:

Please not that, we will use ROW_KEY from Netezza table as default row_key in HBase table.

HBase Import First column
$ sqoop import \
> --connect jdbc:netezza:// hostname /mydb \
> --username vithal -P \
> --table SAMPLE_TEST \
> --columns "ROW_KEY,NAME" \
> --hbase-table sample_test \
> --column-family name \
> --hbase-row-key ROW_KEY -m 1
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/09/22 15:12:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
Enter password:
…
17/09/22 15:13:04 INFO mapreduce.ImportJobBase: Retrieved 4 records.
HBase Import Other Column
$ sqoop import \
> --connect jdbc:netezza://hostname/mydb \
> --username vithal -P \
> --table SAMPLE_TEST \
> --columns "ROW_KEY,LOCATION" \
> --hbase-table sample_test \
> --column-family name \
> --hbase-row-key ROW_KEY -m 1
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/09/22 15:14:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.0.0-1245
Enter password:
…
17/09/22 15:15:06 INFO mapreduce.ImportJobBase: Retrieved 4 records.

Verify the HBase Table

Now verify the HBase table for the imported data:

hbase(main):003:0> scan 'sample_test'
ROW COLUMN+CELL
 100 column=name:LOCATION, timestamp=1506073444702, value=Bengaluru
 100 column=name:NAME, timestamp=1506073321891, value=ABC
 100 column=name:name, timestamp=1506073182109, value=abc
 101 column=name:LOCATION, timestamp=1506073444702, value=Bengaluru
 101 column=name:NAME, timestamp=1506073321891, value=BCD
 102 column=name:LOCATION, timestamp=1506073444702, value=Mumbai
 102 column=name:NAME, timestamp=1506073321891, value=CDE
 103 column=name:LOCATION, timestamp=1506073444702, value=New Delhi
 103 column=name:NAME, timestamp=1506073321891, value=DEF

4 row(s) in 0.1450 seconds

Importing relational database table in HBase, Hive or Impala allows you start using Hadoop ecosystem straight away without worrying about anything.

You can also check out Apache Sqoop import commands and various examples in below articles: