Apache Hive is the data warehouse framework on top of the Hadoop distributed file system (HDFS). It provides a query language called Hive Query Language, HiveQL or HQL. HiveQL syntax is similar to SQL syntax with minor changes. Similar to SQL insert statements, HQL also supports inserting data into tables using various methods. In this article, we will check one of the data insert methods into Hive table using a Select statement or clause.
Hive Insert Data into Table Methods
Below are the some of commonly used methods to insert data into tables.
- INSERT INTO table using VALUES clause
- The Insert data into table using LOAD command
- INSERT INTO table using SELECT clause
Now let us check these methods with some simple examples. Though, we will check third method in details, but let us check other methods too.
INSERT INTO table using VALUES clause
This method is easiest and mostly widely used when you have a very limited set of records to be inserted into the Hive table.
Below is the syntax of the INSERT into Hive table using VALUES clause.
INSERT INTO table [(col1, col2,...)] VALUES (value1, value2, ...);
For example, consider below example to insert recrods into ‘insert_test’ table using VALUES clause.
INSERT INTO insert_test (col1, col2) VALUES (1, 'ABC');
Insert data into table using LOAD command
The second method, inserting data into Hive table using LOAD command works well when you have a data stored in flat files.
I have written another post on a similar subject. You can read more about loading data from CSV file in my other post Apache Hive Load Quoted Values CSV File and Examples.
INSERT INTO table using SELECT clause
This is one of the widely used methods to insert data into Hive table. We will use the SELECT clause along with INSERT INTO command to insert data into a Hive table by selecting data from another table.
Below is the syntax of using SELECT statement with INSERT command.
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
You should use PARTITION clause only if your Hive table is partitioned.
For example, consider simple example of inserting data into Hive table using SELECT clause.
INSERT INTO insert_test SELECT * FROM insert_test2;
failed rule ‘regularbody’ in statement – Error while Inserting Data
This is interesting error you may get when you try to insert data into Hive table using SELECT statement without FROM clause.
This is a known issue, and it is fixed in Hive version 1.3.1/2.0.
You can’t use the insert as select without from a clause in earlier versions. Rewrite your query using insert into… values… or upgrade to latest Hive version.
Below is the workaround to insert data into Hive table using select clause without FROM if you can’t upgrade to the latest version.
INSERT INTO insert_test SELECT * (SELECT Col1, Col2 FROM insert_test2) as tmp;
The above query will work on all versions of Apache Hive. You can use it to provide back-word compatibility.
Related Articles
- Hadoop Hive Date Functions and Examples
- Commonly used Hadoop Hive Commands
- Hadoop Hive Dynamic Partition and Examples
- Hive String Functions and Examples
- Hive Insert into Partition Table and Examples
- Oracle INSERT ALL Alternative in Hive/Spark SQL
Hope this helps 🙂