Data warehouse stores the information in the form of tables. You may have to delete out-dated data and update the table’s values in order to keep data up-to-date. These performance critical operations are critical to keep the data warehouse on bigdata also when you migrate data from relational database systems. In this article, we will check Impala delete from tables and alternative examples.
Impala Delete from Table Command
Cloudera Impala version 5.10 and above supports DELETE FROM table command on kudu storage. This command deletes an arbitrary number of rows from a Kudu table. This statement only works for Impala tables that use the Kudu storage engine.
Syntax
Below is the delete from table syntax:
DELETE [FROM] [database_name.]table_name [ WHERE where_conditions ];
As mentioned, Impala Delete command will works only on the Kudu storage types. In a bigdata warehouse it is unlikely to use kudu as storage format for all tables. You may choose different tables storage formats based on your requirements and data.
Related Articles
Impala Delete from Tables Alternative
Any storage format other then Kudu are not designed for online transaction processing and does not offer real-time queries and row level updates and deletes. However, if you access Hive table from Impala, the latest version of Apache Hive supports ACID transaction, but using ACID transaction on table with huge amount of data may kill the performance of Hive server. To use ACID transaction, one must create a table with ACID transaction properties set. Delete can be performed on the table that supports ACID. Instead, you can follow other easy steps such as create Impala temporary table and select records from the original table by excluding data that you want to delete from table. Sounds easy!
Related articles:
- Apache Hive Table Update using ACID Transactions and Examples
- Database ACID Properties and Explanation
Below are some of the DELETE FROM table Equivalents:
- Impala NOT IN to exclude records to be deleted
- Impala NOT EXISTS to exclude records to be deleted
Now let us verify above methods using some examples.
SQL Delete Query:
DELETE FROM delete_test_demo
WHERE ID = 1;
We will write Impala equivalent statement for above SQL delete query.
Impala NOT IN as Workaround to Delete Records from Impala Table
You can use NOT IN condition to exclude the records that you want to delete from table.
Below are the steps to delete records from Impala table using not in condition:
- Create temp table same as target table:
Create table delete_test_demo_temp like delete_test_demo;
- Load data into temp table. This table holds all records except records to be deleted:
INSERT INTO delete_test_demo_temp
select * FROM delete_test_demo
WHERE id NOT IN (SELECT id
FROM delete_test_demo
WHERE ID = 1 ) ;
- Insert records from temp table into target table
INSERT OVERWRITE TABLE delete_test_demo select * from delete_test_demo_temp;
- Drop temp table
Drop table delete_test_demo_temp;
Impala NOT EXISTS as Workaround to Delete Records from Impala Table
Cloudera Impala supports EXISTS and NOT EXISTS clauses. You can make use of these keywords as a workaround to delete records from impala tables.
Let us re-write previous SQL delete statement using NOT EXISTS.
- Create temp table same as target table
Create table delete_test_demo_temp like delete_test_demo;
- Load data into temp table. This table holds all records except records to be deleted
INSERT INTO delete_test_demo_temp
SELECT *
FROM delete_test_demo aa
WHERE NOT EXISTS (SELECT 1
FROM delete_test_demo bb
WHERE ( aa.id = bb.id)
AND bb.id = 2);
- Insert records from temp table into target table
INSERT OVERWRITE TABLE delete_test_demo select * from delete_test_demo_temp;
- Drop temp table
Drop table delete_test_demo_temp;
End Note
There are many methods that you can follow. Above mentioned methods are simple to understand and use. Try to use NOT EXISTS clause as it is faster and safer to use as it uses correlated sub-query.
Hope this helps :-). Let me know if you know different method .