How to update Hive Table without Setting Table Properties?

  • Post author:
  • Post last modified:December 25, 2019
  • Post category:BigData
  • Reading time:4 mins read

Apache Hive and Cloudera Impala provides better way to manage data on Hadoop ecosystem. There are many frameworks to support SQL on Hadoop are available, but Hive and Cloudera are widely used and popular frameworks. Until recently, Apache Hive did not support Update tables. You must set up TBLPROPERTIES to use transaction on the Hive table. These are relatively new features and should be used with caution. In this article, we will discuss how to update Hive table without setting table properties.

How to update Hive Table without Setting Table Properties

You should not think Apache Hive as a regular relational database, Apache Hive is better suited for batch processing over very large sets of immutable data.

Apache Hive is not designed for online transaction processing and does not offer real-time queries and row level updates.

However, below is the one approach that you can follow if you have requirement to update records in the Hive or Impala tables.

How to update Hive Table without Setting Table Properties?

There are many approaches that you can follow to update Hive tables, such as:

You can read:

In this article, we will check first approach i.e. How to update Hive Tables using temporary table. This is one of easy and fastest way to update Hive tables.

Update Hive Table without Setting Table Properties Example

Below example explain steps to update Hive tables using temporary tables:

Let us consider you want to update col2 of table1 by taking data from staging table2. Below is the sample Teradata SQL update statement:

update a from tabl1 a, table2 b
set col2 = b.col2
where a.col1=b.col1;
Step1: Drop temporary table if it is already exists

Use DROP IF EXISTS command to drop temporary table if it is already exists in the Hive database:

DROP TABLE IF EXISTS table1Temp;
Step2: Create temporary table structure like original table (i.e. table1)

Use Hive LIKE statement along with CREATE TABLE:

CREATE TABLE table1Temp LIKE table1;

Step3: Insert data into temporary table with updated records

Join table2 along with table1 to get updated records and insert data into temporary table that you create in step2:

INSERT INTO TABLE table1Temp SELECT a.col1, COALESCE( b.col2 , a.col2) AS col2
FROM table1 a
LEFT OUTER JOIN table2 b
ON ( a.col1 = b.col1);
Step4: DROP original managed table

In this step, drop original managed table that holds old data.

DROP TABLE table1;
Step5: Rename temporary table to original table

Now the final step is to rename temporary table to original table

ALTER TABLE table1Temp RENAME TO tabl1e;

Instead of dropping original table, you can use INSERT OVERWRITE to INSERT data into original table.

Related Articles