Partition in Hive is used for the better performance. Hive supports the single or multi column partition. You can manually add the partition to the Hive tables or Hive can dynamically partition. You can choose either methods based on your needs. In this article, we will discuss about the Hadoop Hive table dynamic partition and demonstrate using examples.
Hadoop Hive Table Dynamic Partition
In Hadoop Hive, data is stored as files on HDFS, whenever you partition the table in Hive, it creates sub directories within main directory using the partition key. For example, if you have table names students and you partition table on dob, Hadoop Hive will creates the subdirectory with dob within student directory.
In dynamic partitioning of hive table, the data is inserted into the respective partition dynamically without you having explicitly create the partitions on that table. When specifying the dynamic partition, keep in mind that you should not use high cardinality column as that will create lot of sub-directories.
Hadoop Hive Table Dynamic Partition Examples
Below are the tables that we will use in the demo examples:
--This is the final demo table. Note that, it is partitioned on the dob column create table demo (ID int) partitioned by (dob date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile; --This is intermediate table. Note that, table is not partitioned. create table INT_demo (ID int,dob date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile; --Load data to intermediate Hive table hive> load data local inpath '/home/data/query_result.csv' overwrite into table INT_demo; Loading data to table default.int_demo Table default.int_demo stats: [numFiles=1, numRows=0, totalSize=1392, rawDataSize=0] OK Time taken: 7.04 seconds hive>
Hive Properties to allows all partitions to be dynamic
Use below set statement that allows all the partitions to be dynamic in Hive
set hive.exec.dynamic.partition.mode=nonstrict;
The final step is to load the demo table by using dynamic partition
hive> insert overwrite table demo partition(dob) select id,dob from INT_demo;
Now verify the HDFS directory structure for the partitioned table:
You can see the sub-directories as per the partition column.
[cloudera@quickstart ~]$ hdfs dfs -ls /user/hive/warehouse/demo Found 5 items drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-01 drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:31 /user/hive/warehouse/demo/dob=2015-10-02 drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-03 drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-04 drwxrwxrwx - cloudera supergroup 0 2017-05-20 22:32 /user/hive/warehouse/demo/dob=2015-10-05 [cloudera@quickstart ~]$
Changing Partition
Once you create partition, you can update or delete the partition for a table. Follow my other post for more information – How to Update or Drop Hive Partition? Steps and Examples
Also Read:
- Exclude Hive Partition Column From SELECT Query
- Cloudera Impala Generate Sequence Numbers without UDF
- Netezza ROWNUM Pseudo Column Alternative
- Run Impala SQL Script File Passing argument and Working Example
- An Introduction to Hadoop Cloudera Impala Architecture
- Commonly used Hadoop Hive Commands