Spark SQL supports many date and time conversion functions. One of such a function is to_date(
) function. Spark SQL to_date() function is used to convert string containing date to a date format. The function is useful when you are trying to transform captured string data into particular data type such as date type.
In this article, we will check how to use the Spark to_date function on DataFrame as well as in plain SQL queries.
Spark SQL to_date() Function
You can use Spark to_date() function to convert and format string containing the date (StringType) to a proper date (DateType) format. In other words, you can use the Spark to_date function to convert string format to date format.
Spark to_date() Function Syntax
Following is the Spark to_date() function syntax.
to_date(date_str[, fmt])
Parses the date_str
expression with the fmt
expression to a date. The function returns null
with invalid input. If you omit the fmt, to_date will follow the CAST
function rule.
Spark to_date() Function In Pyspark – Example
Following example demonstrates the usage of to_date function on Pyspark DataFrames. We will check to_date on Spark SQL queries at the end of the article.
schema = 'id int, dob string'
sampleDF = spark.createDataFrame(
[[1,'2021-01-01'],
[2,'2021-01-02']], schema=schema)
Column dob is defined as a string. You can use the to_date function to convert string format to date.
For example,
>>> from pyspark.sql.functions import *
>>> sampleDF.select('id','dob' ,to_date('dob', 'dd-mm-yyyy').alias('dob_date')).show()
+---+----------+----------+
| id| dob| dob_date|
+---+----------+----------+
| 1|01-01-2021|2021-01-01|
| 2|01-02-2021|2021-01-01|
+---+----------+----------+
Here is the complete Pyspark example to use the to_date function.
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession \
.builder \
.appName("Python Spark SQL to_date example") \
.getOrCreate()
schema = 'id int, dob string'
sampleDF = spark.createDataFrame(
[[1,'01-01-2021'],
[2,'01-02-2021']], schema=schema)
sampleDF.select('id','dob' ,to_date('dob', 'dd-mm-yyyy').alias('dob_date')).show()
Spark to_date() Function In Scala – Example
Following example demonstrates the usage of to_date function on Scala DataFrames.
val sampleDF = spark.createDataFrame(Seq(
(1,"01-01-2021"),
(2,"01-02-2021")
)).toDF("id", "dob")
Column dob is defined as a string. You can use the to_date function to convert string format to date.
For example,
scala> sampleDF.select($"id",$"dob" ,to_date($"dob", "dd-mm-yyyy").alias("dob_date")).show()
+---+----------+----------+
| id| dob| dob_date|
+---+----------+----------+
| 1|01-01-2021|2021-01-01|
| 2|01-02-2021|2021-01-01|
+---+----------+----------+
Here is the complete Scala example to use the to_date function.
package com.dwgeek.com.scala.functions.dateime
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
val sampleDF = spark.createDataFrame(Seq(
(1,"01-01-2021"),
(2,"01-02-2021")
)).toDF("id", "dob")
sampleDF.select($"id",$"dob" ,to_date($"dob", "dd-mm-yyyy").alias("dob_date")).show()
Spark to_date() Function is Spark SQL Queries.
Following is the example Spark SQL queries to use the to_date.
scala> spark.sql("select id, dob, to_date(dob, 'dd-mm-yyyy') as dob_date from sample_table").show()
+---+----------+----------+
| id| dob| dob_date|
+---+----------+----------+
| 1|01-01-2021|2021-01-01|
| 2|01-02-2021|2021-01-01|
+---+----------+----------+
Spark Timestamp to Date Format Conversion
You can use the to_date function to convert timestamp to date format.
For examples,
>>> sampleDF.select('id','dob' ,to_date('dob', 'dd-mm-yyyy').alias('dob_date')).show()
+---+-------------------+----------+
| id| dob| dob_date|
+---+-------------------+----------+
| 1|01-01-2021 10:10:10|2021-01-01|
| 2|01-02-2021 20:20:20|2021-01-01|
+---+-------------------+----------+
Related Articles,
- How to Remove Duplicate Records from Spark DataFrame – Pyspark and Scala
- Spark SQL Recursive DataFrame – Pyspark and Scala
- Spark SQL Date and Timestamp Functions and Examples
- How to Use Spark SQL REPLACE on DataFrame?
Hope this helps 🙂