You can integrate the cloud storages, such as AWS S3, Azure Blob and GCP cloud storage with Snowflake cloud data warehouse very easily. Snowflake does support external tables, you can create external tables on the top of the files stored on the external storages such as S3, blob or GCP storage. In this article, we will check how to query S3 external files directly in Snowflake.
How to Query S3 External Files in Snowflake?
Snowflake Cloud data warehouse supports using standard SQL to query data files located in an internal stage or named external such as Amazon S3, Google Cloud Storage, or Microsoft Azure stage. You can use these simple queries for inspecting or viewing the contents of the staged files, particularly before loading or after unloading data.
Following methods allows you to query the external files from AWS S3.
- Querying External Files in S3 Staged Files
- Querying External Files in S3 Staged Files using External Tables
Now, let us check these two methods briefly
Querying External Files in S3 Staged Files
Snowflake allows you to use standard SQL directly on the file stored in internal or external stage. The only requirement is to create the internal or named external stage and copy the file that you want to query.
A stage in Snowflake is an intermediate space where you can upload the files so that you can use the COPY command to load or unload tables. You can also query staged external files.
Read more about internal and external stages in my other articles – Type of Snowflake Stages – How to Create and Use them?
Following example creates a named external S3 stage.
CREATE OR REPLACE STAGE mys3stage
url='s3://snow-testing/ExternalFiles/'
CREDENTIALS=(aws_key_id='ABCDRHDKSNF' aws_secret_key='***********')
List S3 Staged Files
If above create stage command is successful, then you can list the files present in the external stage.
For examples,
LIST @mys3stage/;
Create Snowflake File Format
You need to create the file format to access the S3 files.
For example,
create or replace file format mys3csv
type = 'CSV'
field_delimiter = ','
skip_header = 1;
Query the External Files Stored in S3
Now, you are all set to query the S3 files directly using Snowflake SQL.
For example,
select t.$1 as ID, t.$2 as NAME, t.$3 as DEPT
from @MYS3STAGE (file_format => 'mys3csv') t;
+------------+------+-------+
| ID | NAME | DEPT |
|------------+------+-------|
| 1234567890 | aaa | 32114 |
| 0987654123 | bbb | 45789 |
| 1122445521 | ccc | 75896 |
| 7845951656 | ddd | 45795 |
| 5645798965 | xyz | 45564 |
| 4547858551 | vvd | 84654 |
+------------+------+-------
-- Directly refer the file name
>select t.$1 as ID, t.$2 as NAME, t.$3 as DEPT
from @MYS3STAGE/emp.csv (file_format => 'mys3csv') t;
+------------+------+-------+
| ID | NAME | DEPT |
|------------+------+-------|
| 1234567890 | aaa | 32114 |
| 0987654123 | bbb | 45789 |
| 1122445521 | ccc | 75896 |
| 7845951656 | ddd | 45795 |
| 5645798965 | xyz | 45564 |
| 4547858551 | vvd | 84654 |
+------------+------+-------+
Where the $1, $2, etc. refers the position of the columns in the CSV files.
How to Handle Spaces in S3 File Path in Snowflake?
To use any special characters including spaces, you need to enclose the stage, including file name in single quotes.
For examples,
-- File name with Spaces
>select t.$1 as ID, t.$2 as NAME, t.$3 as DEPT
from '@MYS3STAGE/path/ emp.csv' (file_format => 'mys3csv') t;
Get Header of staged csv File in Snowflake?
You can query the following metadata of your staged files.
- METADATA$FILENAME: Name of the staged data file the current row belongs to. Includes the path to the data file in the stage.
- METADATA$FILE_ROW_NUMBER: The row number for each record in the container staged data file.
Querying External Files in S3 Staged Files using External Tables
Snowflake External tables allow you to access files stored in external stage as a regular table.
You can read more about Snowflake external tables in my other article – Working with Snowflake External Tables and S3 Examples.
Hope this helps 🙂