Improve Hive Memory Usage using Hadoop Archive

Post author:Vithal S
Post last modified:February 28, 2018
Post category:BigData
Reading time:3 mins read

Hadoop hdfs is designed in such a way that, number of hdfs files directly affects the memory consumption in the namenode as it must keep track of all files in the hdfs environment. It does not affect if cluster is small, memory usage may cause problem on cluster when file count crosses 50 to 100 million files. Hadoop ecosystem performs best with fewer number of files. Now, let us check Improve Hive Memory Usage using Hadoop Archive.

Improve Hive Memory Usage using Hadoop Archive

You can use Hadoop archiving to reduce the number of hdfs files in the Hive table partition. Hive has built in functions to convert Hive table partition into Hadoop Archive (HAR). HAR does not compress the files, it is analogous to the Linux tar command.

Note that, if the Hive table partition is archived, Hive SQL query may run slow because of additional overhead in reading HAR files.

Hive Hadoop Archive Settings

There are three setting that you can use on Hive console to set up Hadoop archiving:

https://gist.github.com/2b4d7a79e6b382657fcf9bb2e454d779

Improve Hive Memory Usage using Hadoop Archive Examples

You can use Hive ALTER TABLE command with ARCHIVE PARTITION option to perform Hadoop archiving. Below is the example of usage on Hadoop archiving.

Archive using Hive ALTER TABLE

As mentioned earlier, Hadoop archive will help you to reduce the number of HDFS files in the table partition. You can use the Hive Alter Table command to perform the Archive on Hive table partitions. Below is the Hadoop archive functionality demonstration:

https://gist.github.com/9a16578510b43c33a8ad381ab2a2b557

Unarchive using Hive ALTER TABLE

Hadoop also supports the unarchive functionality. You can use the Hive Alter Table command to perform the un archive on Hive table partitions whenever required. Below is the command to unarchive partitions in the table:

https://gist.github.com/f998e3ac6b1cf0b6b5d8d5c72b711a21

Tags: Hive