How to Connect to Snowflake from Databricks?

  • Post author:
  • Post last modified:January 27, 2023
  • Post category:Snowflake
  • Reading time:7 mins read

Many organizations use the hybrid model to process the data. They use databricks to perform operations such as Machine Learning tasks and copy end results to Snowflake for reporting or further analysis. In this article, we will check how to connect to Snowflake from databricks to build hybrid architecture.

Connect to Snowflake from Databricks

The Snowflake is one of the relational databases that provide connector for Spark. You can use the Snowflake Spark connector to connect to Snowflake server and copy data from databricks to Snowflake.

Test Data

We will create Spark DataFrame out of existing Databricks table and we will save that DataFrame as a Snowflake table.

For example,

df = spark.sql("SELECT source_id, source_name, target_id from rec_test")
df.show()
+---------+---------------+---------+
|source_id|    source_name|target_id|
+---------+---------------+---------+
|        1|Robert C. Allen|     1416|
|        1|Robert C. Allen|      353|
|       10|    Chris Speed|      353|
|     1416|      SomeName2|     1417|
|      353|       SomeName|      356|
+---------+---------------+---------+

You can use the following steps to establish the connection between Databricks and Snowflake.

Install Snowflake Spark Connector on Databricks Cluster

Download the latest version of the Spark connector from the Maven repository.

Once downloaded, upload jar to a Databricks library folder. You can either create new a library folder or upload jar to default folder.

Right click on user, click on Create and select Library folder to upload jar file.

once the jar is uploaded, install it on your cluster. For example,

How to Connect to Snowflake from Databricks

Create Snowflake Options Dictionary

Following piece of code creates an options dictionary that will use used to connect to Snowflake.

options = dict(sfUrl="https://Snowflake_account.us-east- 2.aws.snowflakecomputing.com",
               sfUser="Snowuser",
               sfPassword="Password1**",
               sfDatabase="TEST_DB",
               sfSchema="PUBLIC",
               sfWarehouse="COMPUTE_WH")

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

You can use the secret manager to get the login name and password for the Snowflake user.

Following is method that you can use to store the user credentials.

user = dbutils.secrets.get(scope="jdbc", key="snowflake_username")
password = dbutils.secrets.get(scope="jdbc", key="snowflake_password")

Follow the Databricks official document on secrete management.

Write Databricks Spark DataFrame as a Snowflake Table

Following piece of code will save Databricks Spark DataFrame as a Snowflake table.

df.write.format(SNOWFLAKE_SOURCE) \
        .options(**options) \
        .option("dbtable", "databricksTable") \
        .save()

Check Snowflake Table

Following is the Snowflake table content.

>select * from "TEST_DB"."PUBLIC"."DATABRICKSTABLE";
+-----------+-----------------+-----------+
| SOURCE_ID | SOURCE_NAME     | TARGET_ID |
|-----------+-----------------+-----------|
|         1 | Robert C. Allen |       353 |
|        10 | Chris Speed     |       353 |
|         1 | Robert C. Allen |      1416 |
|      1416 | SomeName2       |      1417 |
|       353 | SomeName        |       356 |
+-----------+-----------------+-----------+

Connection Troubleshooting

Error: net.snowflake.client.jdbc.SnowflakeSQLException: Incorrect username or password was specified

Solution: check your username or password.

Error: java.lang.NullPointerException

Solution: Issue may be because Snowflake URL is wrong. Check you snowflake URL and try again.

Error: net.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session.

Solution: The warehouse may be inactive or you are referring non-existent warehouse in your connection settings.

Error: net.snowflake.client.jdbc.SnowflakeSQLException: Cannot perform CREATE TEMPSTAGE. This session does not have a current database

Solution: You may be referring non-existent database. Check database on Snowflake and provide a correct database name.

Error: java.lang.ClassNotFoundException: Failed to find data source:

Solution: Data source name is different. Make sure you have used “net.snowflake.spark.snowflake” as a data source name.

Related Articles,

Hope this helps 🙂