Apache Hive is a data warehouse framework on top of Hadoop HDFS. Hive is a high level language to store and analyse large volumes of data. Apache Hive support most of the relational database features such as partitioning large tables and store values according to partition column. But, Hive stores partition column as a virtual column and is visible when you perform ‘select * from table’. In this article, we will check method to exclude Hive partition column from a SELECT query.
Hive Table Partition
Partition in Hive table is used for the best performance. Hive supports the single or multi column partition. You can manually add the partition to the Hive tables or Hive can dynamically partition.
For example, consider below create table example with partition clause on date_col column.
CREATE TABLE test_table
(
col1 INT,
col2 STRING
)
PARTITIONED BY (date_col date)
stored as textfile;
When you use select * from test_table, you will notice, you will get a partition column also in your result.
For examples, consider below select clause.
select * from test_table;
+------------------+------------------+----------------------+--+
| test_table.col1 | test_table.col2 | test_table.date_col |
+------------------+------------------+----------------------+--+
+------------------+------------------+----------------------+--+
No rows selected (0.106 seconds)
Note that, the last column is actually a partition column and it is not required in your SELECT clause.
Exclude Hive Partition Column From SELECT Query
Now, the question is how to exclude the Hive partition column from the SELECT query result?
There is one method that you can use to exclude partition column.
Add below properties to your ‘hive-site.xml’ file or execute it on the Hive interactive shell.
hive.support.quoted.identifiers=none
Now, execute Hive query with a partition column that you want to exclude. For example, let us say you want to exclude date_col column from a query, execute something like below.
SELECT `(date_col)?+.+` FROM test_table;
Output:
+------------------+------------------+--+
| test_table.col1 | test_table.col2 |
+------------------+------------------+--+
+------------------+------------------+--+
No rows selected (0.121 seconds)
Exclude Particular Column From SELECT Query in Hive
Similar to excluding partition field, you can exclude a partition column from the Hive SELECT output.
For example, if you want to exclude col1 from the result, use below query.
SELECT `(col1)?+.+` FROM test_table;
Output:
+------------------+----------------------+--+
| test_table.col2 | test_table.date_col |
+------------------+----------------------+--+
+------------------+----------------------+--+
No rows selected (0.1 seconds)
Related Articles
- Hadoop Hive Table Dynamic Partition and Examples
- Hive Create Table Command, Syntax and Examples
- How to Update or Drop Hive Partition? Steps and Examples
Hope this helps 🙂