Create External Tables in Azure Synapse Analytics

  • Post author:
  • Post last modified:February 25, 2021
  • Post category:Azure Synapse
  • Reading time:8 mins read

There are many situations in which you need to access the data without loading it to Azure Synapse analytics. The data may be stored in an external data source such as flat files. Azure Synapse Analytics dedicated SQL pool allows you to query external data available on Azure the data lake or Azure Blob storage. In this article, we will check how to create external tables in Azure Synapse Analytics.

Create External Tables in Azure Synapse Analytics

As a prerequisite, you need to have Azure Synapse Analytics dedicated SQL pool and storage account created.

Following are the steps that you need to follow to create external tables in Azure Synapse analytics data warehouse.

For a simplicity, use tool such as SQL Server Management Studio to connect to Azure Synapse Analytics dedicated SQL pool server.

Now, let us check the steps in brief.

Create MASTER KEY in Azure Synapse Analytics

Firstly, create a master key in the database that would be used for encryption.

You can create master key using the CREATE MASTER KEY command in Azure Synapse analytics. Ensure to specify a complex password that would satisfy the password policy.

Following is the create master key command example in Azure Synapse analytics dedicated SQL pool.

CREATE MASTER KEY ENCRYPTION BY PASSWORD='StoragePassword@123';

Create a Database Scoped Credential in Azure Synapse Analytics

Secondly, create a database scoped credential that would be used by the Synapse dedicated SQL pool to connect to the Azure Storage Gen2 account.

You need to have SAS token to crate database scoped credentials. Generate SAS token on the container if you don’t have the same with you.

Following is the example to create database scoped credentials.

CREATE DATABASE SCOPED CREDENTIAL storageCred
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<your SAS Token Key here>';

We are using the Shared Access Signature Key, so we would be specifying the keyword “SHARED ACCESS SIGNATURE” as the identity and the value of the secret as the SAS key of the storage account.

Create an External Data Source in Azure Synapse Analytics

Thirdly, we will create an external data source using EXTERNAL DATA SOURCE command.

Following is the command to create external data sources in Azure Synapse dedicated SQL pool.

CREATE EXTERNAL DATA SOURCE cp_ds
WITH
( LOCATION = 'wasbs://<container>@<storageName>.blob.core.windows.net',
CREDENTIAL = storageCred, 
TYPE = HADOOP
);

Note that the type should be “HADOOP”, the credential should be the database scoped credentials that we created in the earlier step and the value of the location would be the storage container path in the mentioned format.

There are many other options that you can use. Find more about options in official documentation.

Create an External File Format in Azure Synapse Analytics

After creating the data source, the next step is to register a file format that specifies the details of the delimited file that we are trying to access.

You can create am external file format using CREATE EXTERNAL FILE FORMAT command. The type of format would be delimited text as CSV, TXT, TSV, etc. are delimited file formats.

Following command create external file format.

CREATE EXTERNAL FILE FORMAT CSV_FORMAT  
WITH (  
FORMAT_TYPE = DELIMITEDTEXT
,
FORMAT_OPTIONS (FIELD_TERMINATOR = '|')
);

Note that format type is DELIMITEDTEXT for CSV and field terminator is ‘|’ (pipe). There are many other options that you can use. Find more about options in official documentation.

Create an External Table in Azure Synapse

We have created the external data source and file format. Now, the last step is to create an external table in Azure synapse dedicated SQL pool server.

You can create external tables using CREATE EXTERNAL TABLE command in Azure Synapse anlaytics.

For example,

CREATE EXTERNAL TABLE CP_external_table
(
cp_catalog_page_sk        integer ,
cp_catalog_page_id        char(16) ,
cp_start_date_sk          integer ,
cp_end_date_sk            integer ,
cp_department             varchar(50) ,
cp_catalog_number         integer ,
cp_catalog_page_number    integer ,
cp_description            varchar(100) ,
cp_type                   varchar(100) 
)  
WITH (
LOCATION='/catalog_page.csv',
DATA_SOURCE = cp_ds,  
FILE_FORMAT = CSV_FORMAT
);

Note that, LOCATION is the location of the file that you are trying to access. You should provide location excluding the path that you provided in the data source.

Finally, test your external table.

Create External Tables in Azure Synapse Analytics

Related Articles,

Hope this helps 🙂