How to Load Spark DataFrame to Oracle Table - Example

In my previous post Steps to Connect Oracle Database from Spark, I have explained how to connect to Oracle and query tables from the database. But in some cases, you may get requirement to load Spark dataFrame to Oracle table.

We can also use JDBC to write data from a Spark dataframe to database tables. In the subsequent sections, we will explore method to write Spark dataframe to Oracle Table.

Load Spark DataFrame to Oracle Table

As mentioned in the previous section, we can use JDBC driver to write dataframe to Oracle tables.

Download Oracle ojdbc6.jar JDBC Driver

You need an Oracle jdbc driver to connect to the Oracle server. The latest version of the Oracle jdbc driver is ojdbc6.jar file. You can download the driver version as per your JDK version.

You can download this driver from official website. Go ahead and create Oracle account to download if you do not have.

Add JDBC Driver to CLASSPATH

There are two methods that you can follow to add an Oracle JDBC driver to CLASSPATH.

export CLASSPATH=$PWD/ojdbc6.jar

Sometimes, Spark will not recognize the driver class when you export it in CLASSPATH.

Start Pyspark by providing jar files

This is another method to add jar while you start pyspark shell.

 pyspark --conf spark.executor.extraClassPath=/path/ojdbc6.jar --driver-class-path /path/ojdbc6.jar --jars /path/ojdbc6.jar

Enable Hive Support

This is an optional step. We enable Hive supports to read data from Hive table to create test dataframe.

>>> spark=SparkSession.builder.appName( "dftoOracle" ).enableHiveSupport().getOrCreate()

Create Test DataFrame

Use Spark SQL to generate test dataframe that we are going to load into Oracle table.

>>> hiveDf=spark.sql("select * from default.Test")
>>> hiveDf.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   a|   b|   v|
+----+----+----+

Load Spark DataFrame to Oracle Table Example

Now the environment is set and test dataframe is created. we can use dataframe.write method to load dataframe into Oracle tables.

For example, following piece of code will establish jdbc connection with Oracle database and copy dataframe content into mentioned table.

hiveDf.write.format('jdbc').options(
      url='jdbc:oracle:thin:@192.168.11.100:1521:ORCL',
      driver='oracle.jdbc.driver.OracleDriver',
      dbtable='testschema.test',
      user='testschema',
      password='password').mode('append').save()

Note that, the mode is ‘append‘. It will insert records to the existing table.

Check Oracle Table

If the previous command is executed successfully, you can check the Oracle table to see if data is loaded.

SQL> select * from newtpcds.vartest;
COL1       COL2       COL3
---------- ---------- ----------
a          b          v

Create Oracle Table from Spark DataFrame Example

You can also create tables out of Spark dataFrame if it is not already present.

For example, consider following example.

hiveDf.write.format('jdbc').options(
      url='jdbc:oracle:thin:@192.168.11.100:1521:ORCL',
      driver='oracle.jdbc.driver.OracleDriver',
      dbtable='testschema.test',
      user='testschema',
      password='password').mode('overwrite').save()

Note that, the mode is ‘overwrite‘. It will truncate the table and insert records if table present. Otherwise, it will create a new table.

Do let me know in the comment section if you are using a different method.