How to Export Spark DataFrame to Teradata Table

  • Post author:
  • Post last modified:July 13, 2020
  • Post category:General
  • Reading time:5 mins read

In my other article, Steps to Connect Teradata Database from Spark, we have seen how to connect Teradata database from Spark using the JDBC driver. In this article, we will check how to export spark dataframe to a Teradata table using same JDBC drivers. We will also check how to create table our of Spark dataframe if it’s not present in the target database. i.e. Teradata database.

Export Spark DataFrame to Teradata Table

Apache Spark is fast because of its in-memory computation. It is common practice to use Spark as an execution engine to process huge amount data and copy processed data back into relational databases such as Teradata.

We are going to use a JDBC driver to write data from a Spark dataframe to database tables. In the subsequent sections, we will explore method to write Spark dataframe to Teradata Table.

Download Teradata JDBC Driver

You need two jar files to use Teradata jdbc string, terajdbc4.jar and tdgssconfig.jar.

You can download latest version of jdbc drivers from official website.

Add Jar to Spark Contex or Spark Session

You can either start the pyspark session by providing jars or add them to Spark session.

For example,

sc.addJar("/path/terajdbc4.jar")
sc.addJar("/path/tdgssconfig.jar")

or

pyspark --jars /path/terajdbc4.jar, /path/tdgssconfig.jar	

Enable Hive Support

This is an optional step. We enable Hive supports to read data from Hive table to create test dataframe. We will move same test dataframe to Terdata table.

>>> spark=SparkSession.builder.appName( "SparktoTeradata" ).enableHiveSupport().getOrCreate()
Create Test DataFrame

You can use Spark SQL to read Hive table and create test dataframe that we are going to load into Teradata table.

>>> tesDf=spark.sql("select * from default.Test")
>>> testDf.show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   a|   b|   v|
+----+----+----+

Example to Export Spark DataFrame to Teradata Table

Now, the environment is set and test dataframe is created. we can use dataframe.write method to copy dataframe into Teradata tables.

For example, following piece of code will establish jdbc connection with Teradata database and move dataframe content into the table.

testDf.write.format('jdbc').options(
      url='jdbc:teradata://192.168.200.100 /database=test_db, TMODE=TERA, user=dbc, password=dbc',	  
      driver='com.teradata.jdbc.TeraDriver',
      dbtable='test_db.vartest').mode('append').save() 

Check Teradata Table

If the previous command is executed successfully, you can check the Teradata table for the data.

How to Export Spark DataFrame to Teradata Table

Create Teradata Table from Spark DataFrame Example

You can also create tables out of Spark dataFrame if it is not already present.

The append mode will also create a table if it is not already present in the target database.

For example, consider following example.

testDf.write.format('jdbc').options(
      url='jdbc:teradata://192.168.200.100/database=test_db, TMODE=TERA, user=dbc, password=dbc',	  
      driver='com.teradata.jdbc.TeraDriver',
      dbtable='test_db.vartest').mode('append').save() 

Hope this helps 🙂