Apache Hive Table Update using ACID Transactions and Examples

  • Post author:
  • Post last modified:November 2, 2019
  • Post category:BigData
  • Reading time:4 mins read

Apache Hive and Cloudera Impala supports SQL on Hadoop and provides better way to manage data on Hadoop ecosystem. There are many frameworks to support SQL on Hadoop are available, but Hive and Cloudera are widely used and popular frameworks. Until recently, Apache Hive did not support Update tables. Version 0.14 onwards, Hive supports ACID transactions. You must define the table as transaction to use ACID transactions such as UPDATE and DELETE. In this article, we will check Apache Hive table update using ACID Transactions and Examples.

Apache Hive Table Update using ACID Transactions

Apache Hive Table Update using ACID Transactions Support

You should not think Apache Hive as a regular relational database management system. Apache Hive is in fact better suited for batch processing over very larger dataset. Hive is not for online transaction processing (OLTP).

If you have requirement to update Hive table records, then Hive provides ACID transactions. This feature is available in Hive 0.14 and above. You must create table by setting up TBLPROPERTIES to use transactions on the tables.

Create Hive Table with TABLEPROPERTIES

As mentioned in previous section, TBLPROPERTIES enables, ACID transactions on Hive tables. Let’s start by creating a transactional table. Only transactional tables can support updates and deletes.

Below is the syntax that you can use to enable transaction:

DROP TABLE IF EXISTS hive_acid_demo;

CREATE TABLE hive_acid_demo (key int, value int)
CLUSTERED BY(key) INTO 3 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

Note that, you must bucket the table to use ACID transactions on the tables. You can’t set TBLPROPERTIES in CREATE TABLE syntax without bucketing it.

Related reading:

Enable ACID on Hive

You must set below properties at Hive level to enable ACID transaction on Hive:

SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nostrict;

Apache Hive Table Update using ACID Transactions

We have created table, now let us INSERT some records to the tables and check how update works in Hive with transaction tables.

Related reading:

How to update Hive Table without Setting Table Properties?

Here is an example that inserts some records, deletes one record and updates one record.

INSERT INTO hive_acid_demo  VALUES (1, 1);
INSERT INTO hive_acid_demo  VALUES (2, 2);
INSERT INTO hive_acid_demo  VALUES (3, 3);
INSERT INTO hive_acid_demo  VALUES (4, 4);

Update Hive Table

We have data in the table, let us run the update statement and verify the table again.

UPDATE hive_acid_demo
set value = 5
where key = 4;

Verify the updated table;

hive> select * from hive_acid_demo where key = 4;
OK
4       5
Time taken: 0.287 seconds, Fetched: 1 row(s)

Related Articles