SQL SET Operator MINUS Alternative in Hive and Examples

  • Post author:
  • Post last modified:January 7, 2020
  • Post category:BigData
  • Reading time:6 mins read

The set operators in SQL are used to combine similar data set of two or more SELECT statements. Here similar data set literally means the number of columns and its data type should match, otherwise you must explicitly type cast the data types of the values in SELECT statements. Hive does support UNION and UNION ALL set operator, INTERSECT and MINUS are not supported as of now. In this article, we will check SQL set operator MINUS alternative in Hive with an example.

SQL SET Operator MINUS Alternative in Hive

In SQL, MINUS is also called EXCEPT. The MINUS operator finds the difference between two tables or sub-queries and return results from only first SELECT statement.

Apache Hive does not support MINUS set operator. If you have any requirement to perform MINUS, then you have to rewrite your queries using an alternate method.

There are two methods that you can use:

  • Use LEFT OUTER JOIN
  • Use NOT EXISTS clause

LEFT JOIN as a MINUS Alternative in Hive

We have used below mentioned two tables to test Hive SET operator MINUS alternative:

pat_dtls_load:

SK PHONE_NO NAME
111 7895434 AAAA
113 7877434 ACCA
114 7874434 ADDA
123 987654321 Jhoney
234 876543210 Stuart
456 765432101 Jeff
567 345678901 Max Stuart

new_pat_dtls_load:

SK PHONE_NO NAME
111 7895434 AAAA
114 7874434 ADDA
123 987654321 Jhoney
456 765432101 Jeff
567 345678901 Max Stuart

Left outer join returns complete set of records from the left table along with any matched records from right table. Unmatched records from right tables will be NULL by default. You have select only non-null values from table two to make it work as minus operator.

Related article:

SQL Minus Operator Example:

SELECT * 
 FROM   pat_dtls_load 
 minus 
 SELECT * 
 FROM   new_pat_dtls_load; 

Output:

 SK  | PHONE_NO  |  NAME
-----+-----------+--------
 234 | 876543210 | Stuart
 113 |   7877434 | ACCA
(2 rows)

Hive Alternative

You can write hive equivalent using left join:

SELECT a.* 
 FROM   pat_dtls_load a 
        LEFT JOIN new_pat_dtls_load b 
               ON ( a.sk = b.sk ) 
 WHERE  b.sk IS NULL; 

Output:

+-------+-------------+---------+--+
| a.sk  | a.phone_no  | a.name  |
+-------+-------------+---------+--+
| 113   | 7877434     | ACCA    |
| 234   | 876543210   | Stuart  |
+-------+-------------+---------+--+
2 rows selected (4.294 seconds)

NOT EXISTS as a MINUS Alternative in Hive

Latest version of Hive supports EXISTS and NOT EXISTS clauses. You can use this clause as a MINUS alternative in Hive. Below is the example:

SELECT * 
 FROM   pat_dtls_load aa 
 WHERE  NOT EXISTS (SELECT 1 
                    FROM   new_pat_dtls_load bb 
                    WHERE  ( aa.sk = bb.sk )); 

Output:

+--------+--------------+----------+--+
| aa.sk  | aa.phone_no  | aa.name  |
+--------+--------------+----------+--+
| 113    | 7877434      | ACCA     |
| 234    | 876543210    | Stuart   |
+--------+--------------+----------+--+

Related Articles,

Hope this helps 🙂