In general, partitions in relational databases are used to increase the performance of the SQL queries. The partition is the concept of storing relevant data in the same place. For example, let us say you want to query the data monthly bases, then you can partition your data on month. In this article, we will check how to update or drop the Hive partition that you have already created.
What are Partitions in Hive?
Just like relational databases, Apache Hive partitions are used to improve the performance of the HiveQL queries. In Hive, partitions are dividing and storing relevant data into HDFS sub directory.
For example, consider below data which has employee data.
/path/employee/emp1
id, name, dept, yoj
1, Ram, IT, 2009
2, Sham, Product, 2009
3, Bhim, HR, 2010
Now, if you partition the above employee data on year of join (yoj), it will divide employee data into multiple sub directories.
For example,
/path/employee/emp1/2009
1, Ram, IT, 2009
2, Sham, Product, 2009
/path/employee/emp1/2009
3, Bhim, HR, 2010
Apache Hive supports partition on single or multiple columns. Partition columns are extra column visible in your Hive table. You can also exclude those partition columns if you don’t want to show them on your reports.
In the subsequent sections, we will check how to update or drop partition that are already present in Hive tables.
Update Hive Partition
You can use Hive ALTER TABLE command to change the HDFS directory location or add new directory. Alter command will change the partition directory.
For example, below command will
ALTER TABLE some_table PARTITION(year = 2012)
SET LOCATION 'hdfs://user/user1/some_table/2012';
This command does not move the old data, nor does it delete the old data. It simply sets the Hive table partition to the new location.
Drop or Delete Hive Partition
You can use ALTER TABLE with DROP PARTITION option to drop a partition for a table.
ALTER TABLE some_table DROP IF EXISTS PARTITION(year = 2012);
This command will remove the data and metadata for this partition. The drop partition will actually move data to the .Trash/Current directory if Trash is configured, unless PURGE is specified, but the metadata is completely lost.
Related Articles
- Hadoop Hive Table Dynamic Partition and Examples
- Hive Insert into Partition Table and Examples
- How to Exclude Hive Partition Column From SELECT Query
Hope this helps 🙂