The temporary tables are tables that are available within the current session. Tables are automatically dropped at the end of the current session. In this article, we will check how to create Spark SQL temporary tables, its syntax and some examples.
Spark SQL Create Temporary Tables
Temporary tables or temp tables in Spark are available within the current spark session. Spark temp tables are useful, for example, when you want to join the dataFrame column with other tables.
Spark DataFrame Methods or Function to Create Temp Tables
Depends on the version of the Spark, there are many methods that you can use to create temporary tables on Spark.
For examples,
- registerTempTable ( (Spark < = 1.6)
- createOrReplaceTempView (Spark > = 2.0)
- createTempView (Spark > = 2.0)
In this article, we have used Spark version 1.6 and we will be using the registerTempTable dataFrame method to create temp table.
Before jumping into example, let us check what registerTempTable
method does when you use it.
registerTempTable
The registerTempTable()
method creates an in-memory table that is scoped to the cluster in which it was created. The data in temporary table is stored using Hive’s highly-optimized, in-memory columnar format.
When you re-register temporary table with the same name using overwite=True option, Spark will update the data and is immediately available for the queries.
Spark SQL Create Temporary Tables Example
Now, let us create the sample temporary table on pyspark and query it using Spark SQL.
Create Sample dataFrame
For the demonstration, we will be using following dataFrame.
>>> testDF = sqlContext.createDataFrame([(1,"111"), (2,"111"), (3,"222"), (4,"222"), (5,"222"), (6,"111"), (7,"333"), (8,"444")], ["id", "d_id"])
>>> testDF.show()
+---+----+
| id|d_id|
+---+----+
| 1| 111|
| 2| 111|
| 3| 222|
| 4| 222|
| 5| 222|
| 6| 111|
| 7| 333|
| 8| 444|
+---+----+
Now, register the temporary table using registerTempTable
method. Note that, if you are using Spark version >= 2.0, use any of the other two methods mentioned above.
Register DataFrame as a temp table.
Following command allows you to register testDF as a temp table.
>>> testDF.registerTempTable('dataset')
Query Spark Temporary Table
Now, you can query the Spark temp table as if it is Hive. You can use it to join with other tables.
>>> sqlContext.sql("select * from dataset").show()
+---+----+
| id|d_id|
+---+----+
| 1| 111|
| 2| 111|
| 3| 222|
| 4| 222|
| 5| 222|
| 6| 111|
| 7| 333|
| 8| 444|
+---+----+
Related Articles,
- Rename PySpark DataFrame Column – Methods and Examples
- How to Update Spark DataFrame Column Values using Pyspark?
Hope this helps 🙂