In my previous post Steps to Connect Oracle Database from Spark, I have explained how to connect to Oracle and query tables from the database. But in some cases, you may get requirement to load Spark dataFrame to Oracle table.
We can also use JDBC to write data from a Spark dataframe to database tables. In the subsequent sections, we will explore method to write Spark dataframe to Oracle Table.
Load Spark DataFrame to Oracle Table
As mentioned in the previous section, we can use JDBC driver to write dataframe to Oracle tables.
Download Oracle ojdbc6.jar JDBC Driver
You need an Oracle jdbc driver to connect to the Oracle server. The latest version of the Oracle jdbc driver is ojdbc6.jar file. You can download the driver version as per your JDK version.
You can download this driver from official website. Go ahead and create Oracle account to download if you do not have.
Add JDBC Driver to CLASSPATH
There are two methods that you can follow to add an Oracle JDBC driver to CLASSPATH.
export CLASSPATH=$PWD/ojdbc6.jar
Sometimes, Spark will not recognize the driver class when you export it in CLASSPATH.
Start Pyspark by providing jar files
This is another method to add jar while you start pyspark shell.
pyspark --conf spark.executor.extraClassPath=/path/ojdbc6.jar --driver-class-path /path/ojdbc6.jar --jars /path/ojdbc6.jar
Enable Hive Support
This is an optional step. We enable Hive supports to read data from Hive table to create test dataframe.
>>> spark=SparkSession.builder.appName( "dftoOracle" ).enableHiveSupport().getOrCreate()
Create Test DataFrame
Use Spark SQL to generate test dataframe that we are going to load into Oracle table.
>>> hiveDf=spark.sql("select * from default.Test")
>>> hiveDf.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| a| b| v|
+----+----+----+
Load Spark DataFrame to Oracle Table Example
Now the environment is set and test dataframe is created. we can use dataframe.write method to load dataframe into Oracle tables.
For example, following piece of code will establish jdbc connection with Oracle database and copy dataframe content into mentioned table.
hiveDf.write.format('jdbc').options(
url='jdbc:oracle:thin:@192.168.11.100:1521:ORCL',
driver='oracle.jdbc.driver.OracleDriver',
dbtable='testschema.test',
user='testschema',
password='password').mode('append').save()
Note that, the mode is ‘append‘. It will insert records to the existing table.
Check Oracle Table
If the previous command is executed successfully, you can check the Oracle table to see if data is loaded.
SQL> select * from newtpcds.vartest;
COL1 COL2 COL3
---------- ---------- ----------
a b v
Create Oracle Table from Spark DataFrame Example
You can also create tables out of Spark dataFrame if it is not already present.
For example, consider following example.
hiveDf.write.format('jdbc').options(
url='jdbc:oracle:thin:@192.168.11.100:1521:ORCL',
driver='oracle.jdbc.driver.OracleDriver',
dbtable='testschema.test',
user='testschema',
password='password').mode('overwrite').save()
Note that, the mode is ‘overwrite‘. It will truncate the table and insert records if table present. Otherwise, it will create a new table.
Do let me know in the comment section if you are using a different method.
Related articles,
- QUALIFY Clause in Oracle – Alternative
- Oracle INSERT ALL Alternative in Hive/Spark SQL.
- Steps to Import Oracle Tables using Sqoop
- Steps to Connect Oracle Database from Spark – Examples
Hope this helps 🙂