Many relational databases such as Teradata and Oracle supports the multiple columns in WHERE clause. You use two or more columns in predicates. When you want to migrate to cloud databases such as Amazon Redshift, you always look for compatible SQL patterns on the target side. In this article, we will check one of such pattern that is Amazon Redshift WHERE clause predicate with multiple columns.
What is a WHERE Clause?
Before going into details, first let us check what is a where clause in general.
The WHERE clause contains one or multiple conditions that either apply predicates to columns in tables or join multiple tables. Almost all modern databases and bigdata frameworks such as Hive, Spark SQL allows you to join tables in WHERE clause and filter out the records using predicates.
The WHERE
clause specifies a condition that matches a subset of rows from a table or expression. You can use the WHERE clause to filter the result of the FROM clause in a SELECT statement. You can use where clause in UPDATE, MERGE, or DELETE statements. The where clause in UPDATE, DELETE or MERGE statement allows you to operate on only matching records.
Following is the WHERE clause syntax:
[ WHERE condition ]
In above syntax, the search condition will be a Boolean result, such as a join condition or a predicate on a table column.
WHERE Clause with Multiple Columns in Redshift
When Redshift was launched, this particular feature was not supported. But, recently they silently started supporting it. You can use multiple columns in the WHERE clause to apply predicates to multiple columns in the table.
Following is the syntax:
[ WHERE (col1, col2) IN (value1, value2)]
In the above syntax, col1, col2 are table columns and value1 , value2 can be an expression or subquery columns.
Amazon Redshift WHERE Clause with Multiple Columns Example
The following simple Redshift SQL example shows the multiple columns in where clause.
select
*
from
inventory
where
(inv_date_sk,
inv_item_sk) in (
select
inv_date_sk,
inv_item_sk
from
inventory_stg);
Is there any alternative methods for WHERE Clause with Multiple Columns?
Yes. You can use correlated subquery to handle multiple columns in where clause. You can correlate both outer and table within subquery on WHERE clause columns. You can use EXISTS condition to evaluate multiple columns in the WHERE clause. This method you can use if you are using a subquery.
The EXISTS conditions test for the existence of rows in a subquery, and return true if a subquery returns at least one row. If NOT is specified, the condition returns true if a subquery returns no rows.
For example,
select
*
from
inventory inv
where
exists (
select
1
from
inventory_stg inv_stg
where
inv.inv_date_sk = inv_stg.inv_date_sk
and inv.inv_item_sk = inv_stg.inv_item_sk
);
The query that uses the EXISTS condition is considered as an optimized query.
Related Articles,
Hope this helps 🙂