The set operators in SQL are used to combine similar data set of two or more SELECT statements. Here similar data set literally means the number of columns and its data type should match, otherwise you must explicitly type cast the data types of the values in SELECT statements. Hive does support UNION and UNION ALL set operator, INTERSECT and MINUS are not supported as of now. In this article, we will check SQL set operator MINUS alternative in Hive with an example.
SQL SET Operator MINUS Alternative in Hive
In SQL, MINUS is also called EXCEPT. The MINUS operator finds the difference between two tables or sub-queries and return results from only first SELECT statement.
Apache Hive does not support MINUS set operator. If you have any requirement to perform MINUS, then you have to rewrite your queries using an alternate method.
There are two methods that you can use:
- Use LEFT OUTER JOIN
- Use NOT EXISTS clause
LEFT JOIN as a MINUS Alternative in Hive
We have used below mentioned two tables to test Hive SET operator MINUS alternative:
pat_dtls_load:
SK | PHONE_NO | NAME |
111 | 7895434 | AAAA |
113 | 7877434 | ACCA |
114 | 7874434 | ADDA |
123 | 987654321 | Jhoney |
234 | 876543210 | Stuart |
456 | 765432101 | Jeff |
567 | 345678901 | Max Stuart |
new_pat_dtls_load:
SK | PHONE_NO | NAME |
111 | 7895434 | AAAA |
114 | 7874434 | ADDA |
123 | 987654321 | Jhoney |
456 | 765432101 | Jeff |
567 | 345678901 | Max Stuart |
Left outer join returns complete set of records from the left table along with any matched records from right table. Unmatched records from right tables will be NULL by default. You have select only non-null values from table two to make it work as minus operator.
Related article:
SQL Minus Operator Example:
SELECT *
FROM pat_dtls_load
minus
SELECT *
FROM new_pat_dtls_load;
Output:
SK | PHONE_NO | NAME
-----+-----------+--------
234 | 876543210 | Stuart
113 | 7877434 | ACCA
(2 rows)
Hive Alternative
You can write hive equivalent using left join:
SELECT a.*
FROM pat_dtls_load a
LEFT JOIN new_pat_dtls_load b
ON ( a.sk = b.sk )
WHERE b.sk IS NULL;
Output:
+-------+-------------+---------+--+
| a.sk | a.phone_no | a.name |
+-------+-------------+---------+--+
| 113 | 7877434 | ACCA |
| 234 | 876543210 | Stuart |
+-------+-------------+---------+--+
2 rows selected (4.294 seconds)
NOT EXISTS as a MINUS Alternative in Hive
Latest version of Hive supports EXISTS and NOT EXISTS clauses. You can use this clause as a MINUS alternative in Hive. Below is the example:
SELECT *
FROM pat_dtls_load aa
WHERE NOT EXISTS (SELECT 1
FROM new_pat_dtls_load bb
WHERE ( aa.sk = bb.sk ));
Output:
+--------+--------------+----------+--+
| aa.sk | aa.phone_no | aa.name |
+--------+--------------+----------+--+
| 113 | 7877434 | ACCA |
| 234 | 876543210 | Stuart |
+--------+--------------+----------+--+
Related Articles,
Hope this helps 🙂