Hive Drop Column Alternative and Examples

  • Post author:
  • Post last modified:May 9, 2019
  • Post category:BigData
  • Reading time:6 mins read

Apache Hive is a data warehouse framework on top of Hadoop ecosystem. Hive works well for all your batch processing. It is not true data warehouse platform as it does not provide support for real-time analytics. There are many features missing in Hive that are available in traditional relational databases. One of such features is DROP COLUMNS using ALTER TABLE statements. In this article, we will check Hive drop column alternative with some examples.

Hive Drop Column Alternative

There are two approaches that you can follow if you want to drop column from Hive table:

  • Use ALTER TABLE with REPLACE option
  • Create intermediate or temporary table

Now let us try to get more details and limitations about above mentioned approaches;

Use ALTER TABLE with REPLACE Option to Drop Column in Hive

The REPLACE COLUMNS removes all existing columns and adds the new set of columns. REPLACE COLUMNS can also be used to drop columns.

Using ALTER TABLE command with REPLACE options will alter only table schema to remove the column. It will not change the data.

This option is recommended when you want to remove columns from an empty table.

Related Reading:

Syntax

Below is the Hive REPLACE column syntax:

ALTER TABLE table_name REPLACE COLUMNS (col_name data_type);

For example, if you want to remove customer_id column from the empty table, you can use ALTER TABLE with REPLACE column option.

0: jdbc:hive2://192.168.200.100:10000/default> desc table2;
+--------------+------------+----------+--+
|   col_name   | data_type  | comment  |
+--------------+------------+----------+--+
| customer_id  | int        |          |
| first_name   | string     |          |
| last_name    | string     |          |
+--------------+------------+----------+--+
3 rows selected (0.097 seconds)


0: jdbc:hive2://192.168.200.100:10000/default> alter table table2 replace columns(first_name string, last_name string);
No rows affected (0.116 seconds)

0: jdbc:hive2://192.168.200.100:10000/default> desc table2;
+-------------+------------+----------+--+
|  col_name   | data_type  | comment  |
+-------------+------------+----------+--+
| first_name  | string     |          |
| last_name   | string     |          |
+-------------+------------+----------+--+
2 rows selected (0.068 seconds)

Create Intermediate or Temporary Table Excluding Columns

This is one of the options that you can use to remove unwanted columns from Hive table. This option is feasible when you have data in the table. Follow below steps to remove columns from Hive table using intermediate or temporary tables.

Related Readings:

Here is the original Table form which we are going to drop first column.

0: jdbc:hive2://192.168.200.100:10000/default> create table table3(id int, f_name string, addr string);
No rows affected (0.118 seconds)
Step 1: Create intermediate or temporary table

The intermediate table would be the replica of the original table excluding, column that you want to drop.

0: jdbc:hive2://192.168.200.100:10000/default> create table table3_tmp as select f_name, addr from table3;
Step 2: Drop original Table

We have already created intermediate by excluding column and with data, now you can drop the original table as we are going to rename an intermediate table in the next step.

0: jdbc:hive2://192.168.200.100:10000/default> drop table table3;
No rows affected (0.173 seconds)
Step 3: Rename intermediate table

Now, use the ALTER TABLE command to rename an intermediate table.

0: jdbc:hive2://192.168.200.100:10000/default> alter table table3_tmp rename to table3;
No rows affected (0.107 seconds)

Finally, verify the table structure.

0: jdbc:hive2://192.168.200.100:10000/default> desc table3;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| f_name    | string     |          |
| addr      | string     |          |
+-----------+------------+----------+--+
2 rows selected (0.082 seconds)

Related Readings:

Hope this helps ?