How to List Hive High Volume Tables?

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:BigData
  • Reading time:2 mins read

Unlike other relational databases, Apache Hive does not have any system table that keeps track of size of growing tables. It is difficult to find table size in hive using query. As a part of maintenance, you should identify the size of growing tables periodically. Big tables can cause the performance issue in the Hive.Below are some of methods that you can use to list Hive high volume tables.

Use hdfs dfs -du Command

Hadoop supports many useful commands that you can use in day to day activities such as finding size of hdfs folder. Hive stores data in the table as hdfs file, you can simply use hdfs dfs -du command to identify size of folder and that would be your table size.

Related reading:

Below are some examples of using hdfs dfs -du command

Identify Top Hive Tables with High Volume Data in given Database using hdfs dfs -du Command

Below command allows you to identify top hive high volume tables in the given database:

https://gist.github.com/4767878ad966465c7d85ab5de8004859

Use DESCRIBE EXTENDED Output

Other methods to identify table size is to use DESCRIBE EXTENDED <table_name> output. This method is useful when you want to identify the size of known table. Note that, table size will be mentioned in “bytes”.

Below is the sample DESCRIBE EXTENDED table output. Highlighted part in below example is your area of interest:

https://gist.github.com/c383bdfbb6d15ac70b7f3db994d4a854

Use ANALYZE TABLE Command

Another possible method to identify table size is to use ANALYZE TABLE command. Just like DESCRIBE EXTENDED, this method is useful when you want to identify the size of known table. Below is the sample output of ANALYZE TABLE command. Highlighted part in below example is your area of interest:

Read:

https://gist.github.com/94fa3c8c46a20c54539b3c0ce4d77498