Spark SQL to_date() Function – Pyspark and Scala

  • Post author:
  • Post last modified:June 16, 2022
  • Post category:Apache Spark
  • Reading time:6 mins read

Spark SQL supports many date and time conversion functions. One of such a function is to_date() function. Spark SQL to_date() function is used to convert string containing date to a date format. The function is useful when you are trying to transform captured string data into particular data type such as date type.

In this article, we will check how to use the Spark to_date function on DataFrame as well as in plain SQL queries.

Spark SQL to_date() Function

You can use Spark to_date() function to convert and format string containing the date (StringType) to a proper date (DateType) format. In other words, you can use the Spark to_date function to convert string format to date format.

Spark to_date() Function Syntax

Following is the Spark to_date() function syntax.

to_date(date_str[, fmt]) 

Parses the date_str expression with the fmt expression to a date. The function returns null with invalid input. If you omit the fmt, to_date will follow the CAST function rule.

Spark to_date() Function In Pyspark – Example

Following example demonstrates the usage of to_date function on Pyspark DataFrames. We will check to_date on Spark SQL queries at the end of the article.

schema = 'id int, dob string'
sampleDF = spark.createDataFrame(
[[1,'2021-01-01'], 
[2,'2021-01-02']], schema=schema)

Column dob is defined as a string. You can use the to_date function to convert string format to date.

For example,

>>> from pyspark.sql.functions import *
>>> sampleDF.select('id','dob' ,to_date('dob', 'dd-mm-yyyy').alias('dob_date')).show()
+---+----------+----------+
| id|       dob|  dob_date|
+---+----------+----------+
|  1|01-01-2021|2021-01-01|
|  2|01-02-2021|2021-01-01|
+---+----------+----------+

Here is the complete Pyspark example to use the to_date function.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
 
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL to_date example") \
    .getOrCreate()

schema = 'id int, dob string'
sampleDF = spark.createDataFrame(
[[1,'01-01-2021'], 
[2,'01-02-2021']], schema=schema)

sampleDF.select('id','dob' ,to_date('dob', 'dd-mm-yyyy').alias('dob_date')).show()

Spark to_date() Function In Scala – Example

Following example demonstrates the usage of to_date function on Scala DataFrames.

val sampleDF = spark.createDataFrame(Seq(
(1,"01-01-2021"), 
(2,"01-02-2021")
)).toDF("id", "dob")

Column dob is defined as a string. You can use the to_date function to convert string format to date.

For example,

scala> sampleDF.select($"id",$"dob" ,to_date($"dob", "dd-mm-yyyy").alias("dob_date")).show()
+---+----------+----------+
| id|       dob|  dob_date|
+---+----------+----------+
|  1|01-01-2021|2021-01-01|
|  2|01-02-2021|2021-01-01|
+---+----------+----------+

Here is the complete Scala example to use the to_date function.

package com.dwgeek.com.scala.functions.dateime

import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

val sampleDF = spark.createDataFrame(Seq(
(1,"01-01-2021"), 
(2,"01-02-2021")
)).toDF("id", "dob")

sampleDF.select($"id",$"dob" ,to_date($"dob", "dd-mm-yyyy").alias("dob_date")).show()

Spark to_date() Function is Spark SQL Queries.

Following is the example Spark SQL queries to use the to_date.

scala> spark.sql("select id, dob, to_date(dob, 'dd-mm-yyyy') as dob_date from sample_table").show()
+---+----------+----------+
| id|       dob|  dob_date|
+---+----------+----------+
|  1|01-01-2021|2021-01-01|
|  2|01-02-2021|2021-01-01|
+---+----------+----------+

Spark Timestamp to Date Format Conversion

You can use the to_date function to convert timestamp to date format.

For examples,

>>> sampleDF.select('id','dob' ,to_date('dob', 'dd-mm-yyyy').alias('dob_date')).show()
+---+-------------------+----------+
| id|                dob|  dob_date|
+---+-------------------+----------+
|  1|01-01-2021 10:10:10|2021-01-01|
|  2|01-02-2021 20:20:20|2021-01-01|
+---+-------------------+----------+

Related Articles,

Hope this helps 🙂