Spark SQL CASE WHEN on DataFrame – Examples

  • Post author:
  • Post last modified:February 4, 2020
  • Post category:Apache Spark
  • Reading time:5 mins read

In general, the CASE expression or command is a conditional expression, similar to if-then-else statements found in other languages. Spark SQL supports almost all features that are available in Apace Hive. One of such a features is CASE statement. In this article, how to use CASE WHEN and OTHERWISE statement on a Spark SQL DataFrame.

Spark SQL CASE WHEN on DataFrame - Examples

Spark SQL CASE WHEN on DataFrame

The CASE WHEN and OTHERWISE function or statement tests whether any of a sequence of expressions is true, and returns a corresponding result for the first true expression.

Spark SQL DataFrame CASE Statement Examples

You can write the CASE statement on DataFrame column values or you can write your own expression to test conditions.

Following are the different kind of examples of CASE WHEN and OTHERWISE statement. We have used PySpark to demonstrate the Spark case statement.

Test Data

We will be using following DataFrame to test Spark SQL CASE statement.

testDF = sqlContext.createDataFrame([(1,"One"), (2,"Two"), (3,"Three"), (4,"Four")], ["id", "number"])
+---+------+
| id|number|
+---+------+
|  1|   One|
|  2|   Two|
|  3| Three|
|  4|  Four|
+---+------+
Import Required Pyspark Functions

As a first step, you need to import required functions such as col and when.

For example, the execute following command on the pyspark command line interface or add it in your Python script.

from pyspark.sql.functions import col, when
Spark DataFrame CASE with multiple WHEN Conditions

In this example, we will check multiple WHEN conditions without any else part.

For example,

>>> testDF.select(col("*"), when(col("id") == "1", col("number")) \
.when(col("id") == "2", col("number")) \
.alias("number")).show()

+---+------+------+
| id|number|number|
+---+------+------+
|  1|   One|   One|
|  2|   Two|   Two|
|  3| Three|  null|
|  4|  Four|  null|
+---+------+------+

Note that, null values in the result are because of unmatched condition.

Spark DataFrame CASE WHEN Condition with else part (OTHERWISE)

You can also specify the OTHERWISE part which will execute if none of the conditions are met. You can consider this as an else part.

Following example demonstrates the Spark SQL CASE WHEN with a default OTHERWISE condition.

>>> testDF.select(col("*"), when(col("id") == "1", col("number")) \
.when(col("id") == "2", col("number")) \
.otherwise("No Condition") \
.alias("number")).show()

+---+------+------------+
| id|number|      number|
+---+------+------------+
|  1|   One|         One|
|  2|   Two|         Two|
|  3| Three|No Condition|
|  4|  Four|No Condition|
+---+------+------------+
CASE Statement using Spark SQL Query

You can write a CASE statement using SQL query and execute it using Spark SQL.

Following example executes the CASE statement. It uses Spark SQL context to execute queries.

>>> sqlContext.sql(" SELECT CASE 1 WHEN 1 THEN 'ONE' WHEN 2 THEN 'TWO' ELSE 'NO MATCH FOUND' END").show()

+---+
|_c0|
+---+
|ONE|
+---+

Related Articles,

Hope this helps 🙂