Load HBase Table from Apache Hive – Examples

  • Post author:
  • Post last modified:March 21, 2018
  • Post category:BigData
  • Reading time:5 mins read

In my other post “Sqoop import Relational Database Table into HBase Table” you learned on how to import data from relational database systems such as Netezza. And we have also seen how to export the HBase table data to relational database using Hive framework. In this article, we will check how to load HBase table from Apache Hive with an example.

Load HBase Table from Apache Hive

Why you want Load HBase Table from Apache Hive?

This is obvious question, why you want to load HBase table from apache Hive?

You may offload part of the data warehouse platform to Hadoop ecosystem. The offload process includes extracting data from a source system and then incrementally loading data into HBase and Hive.

You may also want to Read:

Apache Hive and HDFS are generally write-once and read-many systems. Data is inserted or appended to a file which has table on top of it. Generally, you cannot be update or overwrite Hive table without deleting the whole file and writing it again with the updated data set. To overcome this issue, we will send Hive table data to HBase that require update values.

Load HBase Table from Apache Hive

HBase table schema and Hive schema are very different, you cannot directly map the columns between Hive and HBase. HBase stores data in the form of key/value pair, column families and column qualifiers are different concept in HBase compared to Hive.

You can follow below steps to load HBase table from Apache Hive:

Create Hive External Table on Top of HBase Table

First step is to create the Hive external table on top of HBase table that you want to populate. Use below script to create external table:

CREATE EXTERNAL TABLE ext_hbase_personal (id INT, name STRING, city STRING, age INT) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, personal_data:name,personal_data:city, personal_data:age") 
TBLPROPERTIES("hbase.table.name" = "personal");

Related Reading:

Insert Data into External table from Hive table

Now the second step is to insert the data into Hive external table by selecting data from Hive table that you want to offload.

hive> insert into hbase_hive_personal select * from personal;
Query ID = user1_20171009151453_1c037adb-612b-422f-9881-882286a0a21d
Total jobs = 1
Launching Job 1 out of 1
…

OK
Time taken: 13.45 seconds

Verify HBase Table

Now verify the HBase table for the updated records:

hbase(main):001:0> scan 'personal'
ROW COLUMN+CELL
 2 column=personal_data:age, timestamp=1505971701012, value=24
 2 column=personal_data:city, timestamp=1505971701012, value=Bengaluru
 2 column=personal_data:name, timestamp=1505971701012, value=sham
 3 column=personal_data:age, timestamp=1505971701012, value=27
 3 column=personal_data:city, timestamp=1505971701012, value=New Delhi
 3 column=personal_data:name, timestamp=1505971701012, value=Guru
 4 column=personal_data:age, timestamp=1505971701012, value=26
 4 column=personal_data:city, timestamp=1505971701012, value=NY
 4 column=personal_data:name, timestamp=1505971701012, value=John
 5 column=personal_data:age, timestamp=1505971701012, value=30
 5 column=personal_data:city, timestamp=1505971701012, value=DC
 5 column=personal_data:name, timestamp=1505971701012, value=Rock
 6 column=personal_data:age, timestamp=1507542290695, value=35
 6 column=personal_data:city, timestamp=1507542290695, value=NYC
 6 column=personal_data:name, timestamp=1507542290695, value=Mike
 7 column=personal_data:age, timestamp=1507542290695, value=32
 7 column=personal_data:city, timestamp=1507542290695, value=DC
 7 column=personal_data:name, timestamp=1507542290695, value=Mikey

6 row(s) in 0.1790 seconds

Key values 6 and 7 are newly added records.

To cross verify check Hive table as well:

hive> select * from personal;
OK
6 Mike NYC 35
7 Mikey DC 32

Time taken: 0.229 seconds, Fetched: 2 row(s)

Please let me know if you find any better method.

You can also check out Apache Sqoop import commands and various examples in below articles: