Hadoop Hive Table Dynamic Partition and Examples

Partition in Hive is used for the better performance. Hive supports the single or multi column partition. You can manually add the partition to the Hive tables or Hive can dynamically partition. You can choose either methods based on your needs. In this article, we will discuss about the Hadoop Hive table dynamic partition and demonstrate using examples.

Hadoop Hive Table Dynamic Partition

In Hadoop Hive, data is stored as files on HDFS, whenever you partition the table in Hive, it creates sub directories within main directory using the partition key. For example, if you have table names students and you partition table on dob, Hadoop Hive will creates the subdirectory with dob within student directory.

In dynamic partitioning of hive table, the data is inserted into the respective partition dynamically without you having explicitly create the partitions on that table. When specifying the dynamic partition, keep in mind that you should not use high cardinality column as that will create lot of sub-directories.

Hadoop Hive Table Dynamic Partition Examples

Below are the tables that we will use in the demo examples:

--This is the final demo table. Note that, it is partitioned on the dob column

create table demo (ID int) 
partitioned by (dob date) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS textfile;

--This is intermediate table. Note that, table is not partitioned.

create table INT_demo (ID int,dob date) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
STORED AS textfile;
--Load data to intermediate Hive table

hive> load data local inpath '/home/data/query_result.csv' overwrite into table INT_demo; 
Loading data to table default.int_demo 
Table default.int_demo stats: [numFiles=1, numRows=0, totalSize=1392, rawDataSize=0] 
OK 
Time taken: 7.04 seconds 
hive>

Hive Properties to allows all partitions to be dynamic

Use below set statement that allows all the partitions to be dynamic in Hive

set hive.exec.dynamic.partition.mode=nonstrict;

The final step is to load the demo table by using dynamic partition

hive> insert overwrite table demo partition(dob) select id,dob from INT_demo;

Now verify the HDFS directory structure for the partitioned table:

You can see the sub-directories as per the partition column.

[cloudera@quickstart ~]$ hdfs dfs -ls /user/hive/warehouse/demo 
Found 5 items 
drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-01 
drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:31 /user/hive/warehouse/demo/dob=2015-10-02 
drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-03 
drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-04 
drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-05 
[cloudera@quickstart ~]$

Changing Partition

Once you create partition, you can update or delete the partition for a table. Follow my other post for more information – How to Update or Drop Hive Partition? Steps and Examples