Apache HBase is column oriented scalable database built on top of Hadoop HDFS. The HBase is an open-source implementation of Google’s BigTable. In this article, we will check Apache HBase data model and explanation.
Apache HBase Data Model
The Apache HBase Data Model is designed to accommodate structured or semi-structured data that could vary in field size, data type and columns.
HBase stores data in tables, which have rows and columns. The table schema is very different from traditional relational database tables. You can consider HBase table as a multi-dimensional map.
Apache HBase Data Model Terminologies
The Apache HBase Data Model is made up of different logical components such as Tables, Rows, Column Families, Columns, Cells and Versions.
Below given explanation of each logical components of HBase data model:
Tables
HBase tables are logical collection of the multiple rows stored in a separate region partition. Rows are stored on the row key of the HBase table.
Read my other post on create HBase tables:
Row
A row is consists of row key and one or more column with values associated with them. The rows are sorted and stored based on the row key of the given table.
For more information on designing row key, refer my other post
Column
One or more column in the HBase table are grouped together as Column Families. Column is identified by the column qualifier concatenated with column family name. For example, ColumnFamily:ColumnName. Rows in the HBase tables can have multiple columns.
Students:Name and Branch:Bname are the column in the reference diagram.
Column Families
Columns in the HBase tables are grouped together into column families. Each row in HBase can have one or more column families and multiple columns associated with them. These columns along with family are stored in the low level storage called HFile.
Students and Branch are the column families in the reference diagram.
Cell
A cell in HBase stores the data and is a unique combination of row key, column family, and column qualifier, and contains a value and a timestamp.
Version
The data stored in a cell is versioned and versions of data are identified by the timestamp associated with them.
The version number is configurable and by default it is 3.
You can refer other posts for more information on the versions:
- Create HBase Table using shell commands and Examples
- Alter HBase Tables using Shell Commands and Examples
Hallo, my name is ozie. May i ask to you? I have problem when importing data into hbase. I am using importtsv. It problem is the number of columns in my dataset is very much (1000 columns). Do I have to write all the columns one by one or is there another way that can automatically increase the number of columns according to the file? Thankyou. I’m sorry if I interrupt your time.
Hi Ozie,
I’m afraid there is no automated way to identify column using importtsv. You have to explicitly map data set columns to HBase column family.
Thanks,