In my other post “Sqoop import Relational Database Table into HBase Table” you learned on how to import data from relational database systems such as Netezza. And we have also seen how to export the HBase table data to relational database using Hive framework. In this article, we will check how to load HBase table from Apache Hive with an example.
Why you want Load HBase Table from Apache Hive?
This is obvious question, why you want to load HBase table from apache Hive?
You may offload part of the data warehouse platform to Hadoop ecosystem. The offload process includes extracting data from a source system and then incrementally loading data into HBase and Hive.
You may also want to Read:
- Access HBase Tables from Impala working Examples
- Sqoop Export HBase Table into Relational Database
- Hadoop HDFS Data Warehouse Design considerations
Apache Hive and HDFS are generally write-once and read-many systems. Data is inserted or appended to a file which has table on top of it. Generally, you cannot be update or overwrite Hive table without deleting the whole file and writing it again with the updated data set. To overcome this issue, we will send Hive table data to HBase that require update values.
Load HBase Table from Apache Hive
HBase table schema and Hive schema are very different, you cannot directly map the columns between Hive and HBase. HBase stores data in the form of key/value pair, column families and column qualifiers are different concept in HBase compared to Hive.
You can follow below steps to load HBase table from Apache Hive:
Create Hive External Table on Top of HBase Table
First step is to create the Hive external table on top of HBase table that you want to populate. Use below script to create external table:
CREATE EXTERNAL TABLE ext_hbase_personal (id INT, name STRING, city STRING, age INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, personal_data:name,personal_data:city, personal_data:age") TBLPROPERTIES("hbase.table.name" = "personal");
Related Reading:
Insert Data into External table from Hive table
Now the second step is to insert the data into Hive external table by selecting data from Hive table that you want to offload.
hive> insert into hbase_hive_personal select * from personal; Query ID = user1_20171009151453_1c037adb-612b-422f-9881-882286a0a21d Total jobs = 1 Launching Job 1 out of 1 … OK Time taken: 13.45 seconds
Verify HBase Table
Now verify the HBase table for the updated records:
hbase(main):001:0> scan 'personal' ROW COLUMN+CELL 2 column=personal_data:age, timestamp=1505971701012, value=24 2 column=personal_data:city, timestamp=1505971701012, value=Bengaluru 2 column=personal_data:name, timestamp=1505971701012, value=sham 3 column=personal_data:age, timestamp=1505971701012, value=27 3 column=personal_data:city, timestamp=1505971701012, value=New Delhi 3 column=personal_data:name, timestamp=1505971701012, value=Guru 4 column=personal_data:age, timestamp=1505971701012, value=26 4 column=personal_data:city, timestamp=1505971701012, value=NY 4 column=personal_data:name, timestamp=1505971701012, value=John 5 column=personal_data:age, timestamp=1505971701012, value=30 5 column=personal_data:city, timestamp=1505971701012, value=DC 5 column=personal_data:name, timestamp=1505971701012, value=Rock 6 column=personal_data:age, timestamp=1507542290695, value=35 6 column=personal_data:city, timestamp=1507542290695, value=NYC 6 column=personal_data:name, timestamp=1507542290695, value=Mike 7 column=personal_data:age, timestamp=1507542290695, value=32 7 column=personal_data:city, timestamp=1507542290695, value=DC 7 column=personal_data:name, timestamp=1507542290695, value=Mikey 6 row(s) in 0.1790 seconds
Key values 6 and 7 are newly added records.
To cross verify check Hive table as well:
hive> select * from personal; OK 6 Mike NYC 35 7 Mikey DC 32 Time taken: 0.229 seconds, Fetched: 2 row(s)
Please let me know if you find any better method.
You can also check out Apache Sqoop import commands and various examples in below articles:
- Sqoop import Relational Database Table into HBase Table
- Apache Sqoop Architecture
- Sqoop Official Documentation
- Sqoop Command with Secure Password
- Import data using Apache Sqoop
- Export data using Apache Sqoop