HBase Architecture and its Components

Post author:Vithal S
Post last modified:March 12, 2018
Post category:BigData
Reading time:3 mins read

HBase is an open-source, distributed key value data store, column-oriented database running on top of HDFS. HBase Architecture has high write throughput and low latency random read performance.

Facebook uses HBase: Leading social media Facebook uses the HBase for its messenger service. Facebook has customised the HBase as HydraBase to meet their requirements to integrate SMS, chat, email and Facebook Messages into one inbox. Apart from messenger, HBase is used in production by other Facebook services, including internal monitoring system, Nearby Friends feature, search indexing, streaming data analysis, and data scraping for internal data warehouses.

HBase Architecture

In HBase, data is physically sharded into what are known as regions. A single region server hosts each region, and each region server is responsible for one or more regions.

The HBase Architecture consists of servers in a Master-Slave relationship. The HBase cluster has one Master node, which is called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions – HRegions.

Below diagram explains the HBase architecture:

HBase Architecture

HBase Architecture ( Image credit – MapR)

Components of HBase Architecture

The HBase architecture has two major components: HMaster and Region Server. HBase store data on regions.

HBase HMaster

HMaster master node and is a light weight process that assign the Region to Region Server.

The main responsibilities of HMaster are:

Monitor and manages HBase cluster.
Performs some of administrative tasks such as load balancing, creating, updating, deleting tables etc.
Changes the schema upon client application direction.
HMaster handles most of DDL operation on HBase tables.
Provides high availability by controlling the failovers.

HBase Region Server

Region Servers are worker nodes which handle read, write, update, and delete requests from clients. Region Server is light weight process, runs on every node in the Hadoop cluster.

The main work of the region server is to store the data into regions and perform the requests received from the client application. Another important work of HBase Region Server is to perform load balancing using Auto Sharding method by dynamically distributing the HBase table when it becomes too large after data insert.

The Hbasse requires zookeeper framework as it makes use of some of its processes.

Read:

Tags: Apache Hadoop, HBase