Spark SQL Create Temporary Tables, Syntax and Examples

Post author:Vithal S
Post last modified:March 7, 2020
Post category:Apache Spark
Reading time:4 mins read

The temporary tables are tables that are available within the current session. Tables are automatically dropped at the end of the current session. In this article, we will check how to create Spark SQL temporary tables, its syntax and some examples.

Spark SQL Create Temporary Tables

Temporary tables or temp tables in Spark are available within the current spark session. Spark temp tables are useful, for example, when you want to join the dataFrame column with other tables.

Spark DataFrame Methods or Function to Create Temp Tables

Depends on the version of the Spark, there are many methods that you can use to create temporary tables on Spark.

For examples,

registerTempTable ( (Spark < = 1.6)
createOrReplaceTempView (Spark > = 2.0)
createTempView (Spark > = 2.0)

In this article, we have used Spark version 1.6 and we will be using the registerTempTable dataFrame method to create temp table.

Before jumping into example, let us check what registerTempTable method does when you use it.

registerTempTable

The registerTempTable() method creates an in-memory table that is scoped to the cluster in which it was created. The data in temporary table is stored using Hive’s highly-optimized, in-memory columnar format.

When you re-register temporary table with the same name using overwite=True option, Spark will update the data and is immediately available for the queries.

Spark SQL Create Temporary Tables Example

Now, let us create the sample temporary table on pyspark and query it using Spark SQL.

Create Sample dataFrame

For the demonstration, we will be using following dataFrame.

>>> testDF = sqlContext.createDataFrame([(1,"111"), (2,"111"), (3,"222"), (4,"222"), (5,"222"), (6,"111"), (7,"333"), (8,"444")], ["id", "d_id"])

>>> testDF.show()
+---+----+
| id|d_id|
+---+----+
|  1| 111|
|  2| 111|
|  3| 222|
|  4| 222|
|  5| 222|
|  6| 111|
|  7| 333|
|  8| 444|
+---+----+

Now, register the temporary table using registerTempTable method. Note that, if you are using Spark version >= 2.0, use any of the other two methods mentioned above.

Register DataFrame as a temp table.

Following command allows you to register testDF as a temp table.

>>> testDF.registerTempTable('dataset')

Query Spark Temporary Table

Now, you can query the Spark temp table as if it is Hive. You can use it to join with other tables.

>>> sqlContext.sql("select * from dataset").show()
+---+----+
| id|d_id|
+---+----+
|  1| 111|
|  2| 111|
|  3| 222|
|  4| 222|
|  5| 222|
|  6| 111|
|  7| 333|
|  8| 444|
+---+----+