How to Access Azure Blob Storage Files from Databricks?

  • Post author:
  • Post last modified:January 27, 2023
  • Post category:Apache Spark
  • Reading time:7 mins read

Azure blob storage is a Microsoft Azure cloud service to store large amount of structured and unstructured data such as text files, database export files, json files, etc. Azure blob storage allows you to store data publicly or you can store application data privately. You can access public Azure blob data without using any additional credentials. But, to access private data, you need to generate access key. In this article, we will check how to access Azure Blob storage files from Databricks?

Access Azure Blob Storage Files from Databricks

Similar to Snowflake cloud data warehouse, Databricks supports cloud platforms such as Microsoft Azure, Amazon AWS and Google GCP. You can create a Databricks cluster on any of these cloud venders. In this article, we will explore Azure Databricks to access files stored in an Azure blob container.

Azure Databricks is a fully managed, Platform-as-a-Service (PaaS) offering for Azure cloud. Azure Databricks leverages Microsoft cloud to scale rapidly, host massive amounts of data effortlessly.

Following is the step-by-step guide to access data files stored in an Azure blob storage.

Now, let us check these steps in detail.

Create an Azure Blob Container and upload files.

Similar to directory in a file system, a container organizes a set of blobs. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs.

How to Access Azure Blob Storage Files from Databricks?
Storage containers
  • Go to storage account and click on the container to create new container.
Create Container
  • To upload data files to blob container, click on upload.
Upload files to Container

Now, your data files are available in the Azure blob container. Next step, would be to mount above created container in Azure Databricks so that you can access data files as if they are local files.

Mount Azure Blob Storage

You need storage access key to mount private blob containers. Go to “Access Keys” within the storage account and click on “Show keys” to copy access key. Refer following image.

Storage Account Access Key

You need this access key to mount storage container.

You can use following Python code to mount a storage in Databricks.

dbutils.fs.mount(
  source = "wasbs://category@StorageAccountName.blob.core.windows.net",
  mount_point = "/mnt/category",
  extra_configs = {"fs.azure.account.key.dbusecase.blob.core.windows.net": "access key"})

Access Data files using Mount Location

Finally, you can access the data file using mount location that you created in the previous step. You can use the command to check if location is available in the Azure Databricks mounts.

Use following command to check mount locations.

'''Check the mount locations'''

dbutils.fs.mounts()

Use following command to list the files in a mount location.

''' List the files in a mount location '''

display(dbutils.fs.ls("/mnt/category"))

And finally, create a Spark DataFrame from the data file available in mount location.

For example,

df = spark.read.text("/mnt/category/dim_category.txt")
display(df)

Related Articles,

Hope this helps 🙂