Spark SQL provides many built-in functions. The functions such as date and time functions are useful when you are working with DataFrame which stores date and time type values. The built-in functions also support type conversion functions that you can use to format the date or time type. In this article, we will check what are Spark SQL date and timestamp functions with some examples.
Spark SQL Date and Timestamp Functions
Spark SQL supports almost all date and time functions that are supported in Apache Hive. You can use these Spark DataFrame date functions to manipulate the date frame columns that contains date type values.
The Spark SQL built-in date functions are user and performance friendly. Use these functions whenever possible instead of Spark SQL user defined functions.
In subsequent sections, we will check Spark supported Date and time functions.
Spark Date Functions
Following are the Spark SQL date functions. The list contains pretty much all date functions that are supported in Apache Spark.
Spark Date Function | Description |
date_format(date, format) | Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument. |
current_date() | Returns the current date as a date column. |
date_add(start, days) | Add days to the date. |
add_months(start, months) | Add months to date. |
datediff(end, start) | Returns difference between two dates in days. |
year(col) | Extract the year of a given date as integer. |
quarter(col) | Extract the quarter of a given date as integer. |
month(col) | Extract the month of a given date as integer. |
hour(col) | Extract the hours of a given date as integer. |
minute(col) | Extract the minutes of a given date as integer. |
second(col) | Extract the seconds of a given date as integer. |
dayofmonth(col) | Extract the day of the month of a given date as integer. |
date_sub(start, days) | Subtract the days from date field. |
dayofyear(col) | Extract the day of the year of a given date as integer. |
last_day(date) | Returns the last day of the month which the given date belongs to. |
months_between(date1, date2) | Returns number of months between two dates. |
next_day(date, dayOfWeek) | Returns the first date which is later than the value of the date column. |
trunc(date, format) | Returns date truncated to the unit specified by the format. |
weekofyear(col) | Extract the week number of a given date as integer. |
to_date(col) | Convert string type containing date value to date format. |
Spark Timestamp Functions
Following are the timestamp functions supported in Apache Spark.
Spark Timestamp Function | Description |
current_timestamp() | Returns the current timestamp as a timestamp column |
from_unixtime(timestamp, format=”yyyy-MM-dd HH:mm:ss”) | Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format. |
unix_timestamp(timestamp=None, format=’yyyy-MM-dd HH:mm:ss’) | Convert time string with given pattern (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix time stamp (in seconds), using the default timezone and the default locale, return null if fail. |
from_utc_timestamp(timestamp, tz) | Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone. |
to_utc_timestamp(timestamp, tz) | Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC. |
Test Data
We will be using following sample DataFrame in our date and timestamp function examples.
testDF = sqlContext.createDataFrame([("2020-01-01","2020-01-31")], ["start_date", "end_date"])
Import Functions in PySpark Shell
Before trying to use Spark date functions, you need to import the functions in pyspark shell. Otherwise, you will end up getting “NameError: name ‘current_date’ is not defined” error.
from pyspark.sql.functions import *
Spark Date Functions Examples
Following are the examples of the date and timestamp examples.
Get current date in PySpark SQL
testDF.select( current_date().alias("current_dt")).show()
+----------+
|current_dt|
+----------+
|2020-01-31|
+----------+
Change the date format using date_format function in spark SQL
testDF.select("start_date", date_format("start_date", 'dd/MM/yyyy').alias("dt_format")).show()
+----------+----------+
|start_date| dt_format|
+----------+----------+
|2020-01-01|01/01/2020|
+----------+----------+
Add days to date using date_add function in Spark SQL
testDF.select("start_date", date_add("start_date",2).alias("date_days")).show()
+----------+----------+
|start_date| date_days|
+----------+----------+
|2020-01-01|2020-01-03|
+----------+----------+
Add months to the date using add_months function in Spark SQL
testDF.select("start_date", add_months("start_date",2).alias( "add_months")).show()
+----------+----------+
|start_date|add_months|
+----------+----------+
|2020-01-01|2020-03-01|
+----------+----------+
Get difference between two dates using datediff function in Spark SQL
testDF.select("start_date", "end_date", datediff("end_date", "start_date").alias("date_diff")).show()
+----------+----------+---------+
|start_date| end_date|date_diff|
+----------+----------+---------+
|2020-01-01|2020-01-31| 30|
+----------+----------+---------+
Extract year from date using Spark SQL Function
testDF.select("start_date", year( "start_date").alias("year")).show()
+----------+----+
|start_date|year|
+----------+----+
|2020-01-01|2020|
+----------+----+
Subtract days from date using date_sub function in Spark SQL
testDF.select("start_date", date_sub("start_date", 3).alias("date_sub")).show()
+----------+----------+
|start_date| date_sub|
+----------+----------+
|2020-01-01|2019-12-29|
+----------+----------+
Get last day of the month using last_day function in Spark SQL
testDF.select("start_date", last_day( "start_date").alias("last_day")).show()
+----------+----------+
|start_date| last_day|
+----------+----------+
|2020-01-01|2020-01-31|
+----------+----------+
Identify months between two dates using months_between function in Spark SQL
testDF.select("start_date", "end_date", months_between( "end_date","start_date").alias("months")).show()
+----------+----------+----------+
|start_date| end_date| months|
+----------+----------+----------+
|2020-01-01|2020-01-31|0.96774194|
+----------+----------+----------+
Get next Monday using next_day function in Spark SQL
testDF.select("start_date", next_day( "start_date", "mon").alias("next_monday")).show()
+----------+-----------+
|start_date|next_monday|
+----------+-----------+
|2020-01-01| 2020-01-06|
+----------+-----------+
Convert String to Date using to_date function in Spark SQL
testDF.select("start_date", to_date( "start_date").alias("to_date")).show()
+----------+----------+
|start_date| to_date|
+----------+----------+
|2020-01-01|2020-01-01|
+----------+----------+
Spark SQL Timestamp Functions Examples
Following are the some Spark SQL timestamp function examples.
Get Current Timestamp using current_timestamp() function
testDF.select(current_timestamp().alias( "current_timestamp")).show()
+--------------------+
| current_timestamp|
+--------------------+
|2020-01-31 23:12:...|
+--------------------+
Related Articles,
Hope this helps 🙂