Amazon Redshift Load CSV File using COPY and Example

Many organizations use flat files such as CSV or TSV files to offload tables, managing flat files is easy and can be transported by any electronic medium. In this article, we will check how to load or import CSV file into Amazon Redshift using COPY command with some examples.

Page Content

Introduction

Preparing for Redshift Load CSV

Load CSV File using Redshift COPY Command

Redshift COPY command with Column Names

Redshift COPY command to ignore First Line from CSV

Redshift COPY Command with Custom Delimiter

Best Practices for Loading CSV Files into Redshift

Troubleshooting Redshift Load CSV Issues

Conclusion

Introduction

Amazon Redshift is a popular cloud-based data warehouse that enables businesses to store and analyze large volumes of data quickly and cost-effectively. One of the ways in which Redshift can ingest data is through CSV (Comma-Separated Values) files, which are a widely used format for data exchange. Loading CSV files into Redshift can be a straightforward process, you can use COPY command. In this post, we will provide a comprehensive guide on how to load CSV files into Redshift, covering everything from formatting and configuring the files to best practices for optimization and troubleshooting.

Preparing for Redshift Load CSV

In this section of the post, we will cover the steps that are required to prepare for loading CSV files into Redshift. Importing a CSV or TSV files requires you to first a create table. You can use predefined DDL or duplicate existing table structure based on your requirements. Assuming that you already have Amazon Redshift cluster is up and running.

Here are the prerequisite steps:

Create Table Structure on Amazon Redshift
Upload CSV file to S3 bucket using AWS console or AWS S3 CLI

Load CSV File using Redshift COPY Command

The easiest way to load a CSV into Redshift is to first upload the file to an Amazon S3 Bucket. You can follow the Redshift Documentation for how to do this. After that you can use the COPY command to load file from S3 and to your Redshift table.

For example, consider below example to load data into Redshift table.

COPY table_name
FROM 's3://<bucket-name>/load/fileName.csv'
credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>'
CSV;

Note that, you should provide ‘CSV’ keyword at the end of your COPY command to tell COPY command that the file is comma separated file. By default, the COPY command consider file as a pip (|) delimited.

Redshift COPY command with Column Names

If your CSV file has different column order or does not contain all columns, you need to specify the a column list in your COPY command.

For example, consider below example on how to specify the column names in the COPY command.

COPY table_name (col1, col2, col3, col4)
FROM 's3://<your-bucket-name>/load/file_name.csv'
credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>'
CSV;

Redshift COPY command to ignore First Line from CSV

If the your CSV file contains header row and it is to be ignored, you can specify the number of lines to be skipped from CSV file.

For example, below COPY command example skips header or first row of the CSV file.

COPY table_name (col1, col2, col3, col4)
FROM 's3://<your-bucket-name>/load/file_name.csv'
credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>'
CSV
IGNOREHEADER 1;

Redshift COPY Command with Custom Delimiter

Finally, if the your CSV file contains custom delimiter, you can use the DELIMITER option specifies the custom delimiter.

For example, below COPY command loads CSV file which has ‘~’ as record delimiter.

COPY table_name (col1, col2, col3, col4)
FROM 's3://<your-bucket-name>/load/file_name.csv'
credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>'
DELIMITER '~'
IGNOREHEADER 1;

Best Practices for Loading CSV Files into Redshift

Here are some of the best practices for loading CSV files into Amazon Redshift table:

Optimize CSV file size: the first step is to optimize CSV file size by compressing files, splitting large files, and removing unnecessary columns. This will help reduce data transfer time and storage costs.
COPY command with multiple threads: The next step is to optimize data ingestion speed by using the COPY command with multiple threads, specifying sort keys and distribution keys, and using bulk inserts with the INSERT command. These techniques will help maximize Redshift’s data loading capacity and improve query performance.
Validating data before loading: Finally, we will cover strategies for reducing the risk of data errors, such as validating data before loading and verifying STL_LOAD_ERRORS system table for any load errors.

Troubleshooting Redshift Load CSV Issues

In this section of the post, we will cover some common issues that may arise when loading CSV files into Redshift, and provide some strategies for resolving them.

Understand error message: The first step is to understand Redshift error messages, including common error messages and error codes and descriptions, which can help pinpoint the source of the issue.
Check CSV formatting issue: You should identify and resolve data formatting issues, such as mismatched delimiters and encoding, inconsistent line breaks and null values, and incorrect data types and formatting.
Permission issues: Finally, Check access denied and authentication failures, IAM role and permission issues, and network and firewall restrictions.
Verify STL_LOAD_ERRORS: Check STL_LOAD_ERRORS system table for any loading related issues.

Conclusion

In conclusion, you can use Redshift COPY command to load CSV files into Amazon Redshift. By following best practices for optimizing performance and reducing the risk of errors, and understanding common issues and how to troubleshoot them, you can ensure that your CSV files are loaded into Redshift quickly, efficiently, and reliably.

Hope this helps 🙂