HBase Table Schema Design and Concept

  • Post author:
  • Post last modified:February 27, 2018
  • Post category:BigData
  • Reading time:5 mins read

HBase table can scale to billions of rows and many number of column based on your requirements. This table allows you to store terabytes of data in it. The HBase table supports the high read and write throughput at low latency. A single value in each row is indexed; this value is known as the row key. In this article, we will check HBase table schema design and concept.

HBase Table Schema Design and Concept

HBase Table Schema Design General Concepts

The HBase schema design is very different compared to the relation database schema design. Below are some of general concept that should be followed while designing schema in Hbase:

  • Row key: Each table in HBase table is indexed on row key. Data is sorted lexicographically by this row key. There are no secondary indices available on HBase table.
  • Automaticity: Avoid designing table that requires atomacity across all rows. All operations on HBase rows are atomic at row level.
  • Even distribution: Read and write should uniformly distributed across all nodes available in cluster. Design row key in such a way that, related entities should be stored in adjacent rows to increase read efficacy.

HBase Schema Row key, Column family, Column qualifier, individual and Row value Size Limit

Consider below is the size limit when designing schema in Hbase:

  • Row keys: 4 KB per key
  • Column families: not more than 10 column families per table
  • Column qualifiers: 16 KB per qualifier
  • Individual values: less than 10 MB per cell
  • All values in a single row: max 10 MB

HBase Row Key Design

When choosing row key for HBase tables, you should design table in such a way that there should not be any hotspotting. To get best performance out of HBase cluster, you should design a row key that would allow system to write evenly across all the nodes.

Poorly designed row key can cause the full table scan when you request some data out of it.

Type of HBase Row Keys

There are some commonly used HBase row keys:

Reverse Domain Names

If you are storing data that is represented by the domain names then consider using reverse domain name as a row keys for your HBase Tables. For example, com.company.name.

This technique works perfectly fine when you have data spread across multiple reverse domains. If you have very few reverse domain then you may end up storing data on single node causing hotspotting.

Hashing

When you have the data which is represented by the string identifier, then that is good choice for your Hbase table row key. Use hash of that string identifier as a row key instead of raw string. For example, if you are storing user data that is identified by user ID’s then hash of user ID is better choice for your row key.

Timestamps

When you retrieve data based on time when it was stored, it is best to include the timestamp in your row key. For example, you are trying to store the machine log identified by machine number then append the timestamp to the machine number when designing row key, machine001#1435310751234.

Combines Row Key

You can combine multiple key to design row key for your HBase table based on your requirements.

HBase Column Families and Column Qualifiers

Below are some of guidance on column families and column qualifier:

Column Families

In HBase, you have upto 10 column families to get best performance out of HBase cluster. If your row contains multiple values that are related to each other, then you should place then in same family names. Also, the names of your column families should be short, since they are included in the data that is transferred for each request.

Column Qualifiers

You can create as many column qualifiers as you need in each row. The empty cells in the row does not consume any space. The names of your column qualifiers should be short, since they are included in the data that is transferred for each request.

Creating HBase Schema Design

You can create the schema using Apache HBase shell or Java API’s:

Below is the example of create table schema:

hbase(main):001:0> create 'test_table_schema', 'cf'
0 row(s) in 2.7740 seconds

=> Hbase::Table - test_table_schema

Read: