In my other article, Steps to Connect Teradata Database from Spark, we have seen how to connect Teradata database from Spark using the JDBC driver. In this article, we will check how to export spark dataframe to a Teradata table using same JDBC drivers. We will also check how to create table our of Spark dataframe if it’s not present in the target database. i.e. Teradata database.
Export Spark DataFrame to Teradata Table
Apache Spark is fast because of its in-memory computation. It is common practice to use Spark as an execution engine to process huge amount data and copy processed data back into relational databases such as Teradata.
We are going to use a JDBC driver to write data from a Spark dataframe to database tables. In the subsequent sections, we will explore method to write Spark dataframe to Teradata Table.
Download Teradata JDBC Driver
You need two jar files to use Teradata jdbc string, terajdbc4.jar and tdgssconfig.jar.
You can download latest version of jdbc drivers from official website.
Add Jar to Spark Contex or Spark Session
You can either start the pyspark session by providing jars or add them to Spark session.
For example,
sc.addJar("/path/terajdbc4.jar")
sc.addJar("/path/tdgssconfig.jar")
or
pyspark --jars /path/terajdbc4.jar, /path/tdgssconfig.jar
Enable Hive Support
This is an optional step. We enable Hive supports to read data from Hive table to create test dataframe. We will move same test dataframe to Terdata table.
>>> spark=SparkSession.builder.appName( "SparktoTeradata" ).enableHiveSupport().getOrCreate()
Create Test DataFrame
You can use Spark SQL to read Hive table and create test dataframe that we are going to load into Teradata table.
>>> tesDf=spark.sql("select * from default.Test")
>>> testDf.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| a| b| v|
+----+----+----+
Example to Export Spark DataFrame to Teradata Table
Now, the environment is set and test dataframe is created. we can use dataframe.write method to copy dataframe into Teradata tables.
For example, following piece of code will establish jdbc connection with Teradata database and move dataframe content into the table.
testDf.write.format('jdbc').options(
url='jdbc:teradata://192.168.200.100 /database=test_db, TMODE=TERA, user=dbc, password=dbc',
driver='com.teradata.jdbc.TeraDriver',
dbtable='test_db.vartest').mode('append').save()
Check Teradata Table
If the previous command is executed successfully, you can check the Teradata table for the data.
Create Teradata Table from Spark DataFrame Example
You can also create tables out of Spark dataFrame if it is not already present.
The append mode will also create a table if it is not already present in the target database.
For example, consider following example.
testDf.write.format('jdbc').options(
url='jdbc:teradata://192.168.200.100/database=test_db, TMODE=TERA, user=dbc, password=dbc',
driver='com.teradata.jdbc.TeraDriver',
dbtable='test_db.vartest').mode('append').save()
Hope this helps 🙂