Many organizations use the hybrid model to process the data. They use databricks to perform operations such as Machine Learning tasks and copy end results to Snowflake for reporting or further analysis. In this article, we will check how to connect to Snowflake from databricks to build hybrid architecture.
Connect to Snowflake from Databricks
The Snowflake is one of the relational databases that provide connector for Spark. You can use the Snowflake Spark connector to connect to Snowflake server and copy data from databricks to Snowflake.
Test Data
We will create Spark DataFrame out of existing Databricks table and we will save that DataFrame as a Snowflake table.
For example,
df = spark.sql("SELECT source_id, source_name, target_id from rec_test")
df.show()
+---------+---------------+---------+
|source_id| source_name|target_id|
+---------+---------------+---------+
| 1|Robert C. Allen| 1416|
| 1|Robert C. Allen| 353|
| 10| Chris Speed| 353|
| 1416| SomeName2| 1417|
| 353| SomeName| 356|
+---------+---------------+---------+
You can use the following steps to establish the connection between Databricks and Snowflake.
Install Snowflake Spark Connector on Databricks Cluster
Download the latest version of the Spark connector from the Maven repository.
Once downloaded, upload jar to a Databricks library folder. You can either create new a library folder or upload jar to default folder.
Right click on user, click on Create and select Library folder to upload jar file.
once the jar is uploaded, install it on your cluster. For example,
Create Snowflake Options Dictionary
Following piece of code creates an options dictionary that will use used to connect to Snowflake.
options = dict(sfUrl="https://Snowflake_account.us-east- 2.aws.snowflakecomputing.com",
sfUser="Snowuser",
sfPassword="Password1**",
sfDatabase="TEST_DB",
sfSchema="PUBLIC",
sfWarehouse="COMPUTE_WH")
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
You can use the secret manager to get the login name and password for the Snowflake user.
Following is method that you can use to store the user credentials.
user = dbutils.secrets.get(scope="jdbc", key="snowflake_username")
password = dbutils.secrets.get(scope="jdbc", key="snowflake_password")
Follow the Databricks official document on secrete management.
Write Databricks Spark DataFrame as a Snowflake Table
Following piece of code will save Databricks Spark DataFrame as a Snowflake table.
df.write.format(SNOWFLAKE_SOURCE) \
.options(**options) \
.option("dbtable", "databricksTable") \
.save()
Check Snowflake Table
Following is the Snowflake table content.
>select * from "TEST_DB"."PUBLIC"."DATABRICKSTABLE";
+-----------+-----------------+-----------+
| SOURCE_ID | SOURCE_NAME | TARGET_ID |
|-----------+-----------------+-----------|
| 1 | Robert C. Allen | 353 |
| 10 | Chris Speed | 353 |
| 1 | Robert C. Allen | 1416 |
| 1416 | SomeName2 | 1417 |
| 353 | SomeName | 356 |
+-----------+-----------------+-----------+
Connection Troubleshooting
Error: net.snowflake.client.jdbc.SnowflakeSQLException: Incorrect username or password was specified
Solution: check your username or password.
Error: java.lang.NullPointerException
Solution: Issue may be because Snowflake URL is wrong. Check you snowflake URL and try again.
Error: net.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session.
Solution: The warehouse may be inactive or you are referring non-existent warehouse in your connection settings.
Error: net.snowflake.client.jdbc.SnowflakeSQLException: Cannot perform CREATE TEMPSTAGE. This session does not have a current database
Solution: You may be referring non-existent database. Check database on Snowflake and provide a correct database name.
Error: java.lang.ClassNotFoundException: Failed to find data source:
Solution: Data source name is different. Make sure you have used “net.snowflake.spark.snowflake” as a data source name.
Related Articles,
- How to use Snowflake Python Connector and Example
- How to Access Azure Blob Storage Files from Databricks?
- Connect to SQL Server From Spark – PySpark
- How to Connect to Databricks SQL Endpoint from Azure Data Factory?
Hope this helps 🙂