Apache HBase Column Versions and Explanations

  • Post author:
  • Post last modified:March 12, 2018
  • Post category:BigData
  • Reading time:3 mins read

Cells in HBase is a combination of the row, column family, and version contains a value and a timestamp, which represents the column family version. In this article, we will check Apache HBase column versions and explanations with some examples.

Apache HBase Column Versions and Explanations

Apache HBase Column Versions

As mentioned in beginning of this post, A {row, column, version} tuple exactly specifies a cell in HBase. In the Apache HBase you can have many cells where row and columns are same but differs only in version values. A version is a timestamp values is written alongside each value. By default, the timestamp values represent the time on the RegionServer when the data was written, but you can change the default HBase setting and specify a different timestamp value when you put data into the cell.

In HBase, rows and column keys are expressed as bytes, the version is specified using a long integer. The HBase version dimension is stored in decreasing order, so that when reading from a store file, the most recent values are found first.

Why HBase Maintain Versions?

Since HBase also uses hdfs, it’s not easy to update data. So, to enable that feature HBase creates a version on the cells being updated. By default, it maintains 3 versions.

For example, let us assume you have row with value 123, and updated this value with 456. HBase does not overwrite the 123 with 456, instead, it will add another row with updated value and latest timestamp as a version number.

Specify Number of Versions to Keep

You can specify number of version to keep for any given column at the time of HBase table creation or using HBase ALTER command.

Related reading:

Create HBase Table with Number of Versions to Keep

Below is the HBase create table example that shows how to specify the number of versions to keep:

https://gist.github.com/b95f3983029f1606104ed126b0da08fe.js

Alter HBase Table to Include Number of Versions to Keep

Below is the HBase alter table example that shows how to specify the number of versions to keep:

https://gist.github.com/4caf7a95a2d8343ea7c4779deb6f6f46.js

Display Multiple Versions of Column using get command

You can use HBase get command to display all column version that are stored for that row. Below example displays all version of columns:

Related reading:

https://gist.github.com/15f6a8b9c9ec15185e3b17878a7250c9.js