Splitting HBase Tables, Examples and Best Practices

Post author:Vithal S
Post last modified:March 12, 2018
Post category:BigData
Reading time:2 mins read

Apache HBase distributes its load through region splitting. HBase stored rows in the tables and each table is split into ‘regions’. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process in the system. All rows in the tables are sorted between regions start and end key. Every single row is belonging to exactly one region and a region is served by single region server at any given point of time. In this article, we will check Splitting HBase Tables, Examples and Best Practices.

HBase Table Regions

Regions are the physical mechanism used to distribute the write and query load across region servers in HBase. A table in HBase consists of many regions associated with region servers. When table is created, by default, HBase allocate single region to it. Thus, initial loading of HBase table does not utilize the entire capacity of cluster.

Pre-splitting HBase Tables

As mentioned in previous section, HBase allocates only one region to table, because it does not know how to split the table into multiple regions. With a pre-splitting process, you can create a HBase table with many regions by supplying the split points at the table creation time.

Calculating Split Point for Tables

You can use the RegionSplitter utility to identify correct split point for table. RegionSplitter creates the split points, by with either HexStringSplit or UniformSplit Split Algorithm.

For example, create table ‘table1’ with 5 regions:

https://gist.github.com/d88578b5c9145efac4b339cf32fa2c61.js

Pre-splitting HBase Tables Examples

If you know the split point, you can use HBase shell command to create table. Below is the example for splitting HBase tables:

https://gist.github.com/cefcf8a2f643ad9737c921d3a5f4c088.js

Tags: HBase