Fixed width text files are special cases of text files where the format is specified by column widths, pad character and left or right alignment. In this format column width are in terms of units of characters. In this article, we will learn about Redshift Fixed-Width file loading options and examples.
Fixed-Width File Overview
In general, fixed-length format files use ordinal positions, which are offsets to identify where fields are within the record of the file.
Fixed-width data files have uniform lengths for each column of data. Each field in a fixed-width data file has exactly the same length and position.
In fixed width files, there are no field delimiters, and there may or may not be end-of-record delimiter associated with file. Usually, data in fixed width format do not use any date delimiter as the file is fixed-length and does not require delimiter. If you want proper date format then you have to explicitly type convert the data and store in database table.
Below is sample fixed-width file:
Redshift Fixed-Width File Loading Options
There are couple of options available to load fixed width file into Redshift database tables.
- Loading fied-width data using Redshift COPY command
- Load fixed-width file in single column table and use substring to extract require fields
Loading fixed-width data using Redshift COPY command
To load fixed-width file into existing Redshift table, USE the FIXEDWIDTH parameter in the COPY command.
Note that, to use COPY command you must first place file to S3 location.
Redshift COPY command Syntax to Load Fixed-width File
Below is the COPY command syntax to load fixed-width file:
COPY table_name FROM 's3://mybucket/prefix' CREDENTIALS 'aws_access_key_id=<YOUR-ACCESS-KEY-ID>;aws_secret_access_key=<YOUR-SECRET-ACCESS-KEY>' fixedwidth 'fixedwidth_spec';
Redshift COPY command Example to Load Fixed-width File
Below is the example of loading fixed-width file using COPY command:
Create stage table:
create table sample_test_stage ( col1 varchar(6), col2 varchar(4), col3 varchar(11), col4 varchar(12), col5 varchar(10), col6 varchar(8));
Execute Redshift COPY command:
Execute copy command soon after table is available in Redshift database or schema. You should have data file available in amazon S3 server and point COPY Command to that location. Below is the sample COPY command code.
training=# copy sample_test_stage from 's3://testbucket/Sample_fixed_width.txt' CREDENTIALS
training-# 'aws_access_key_id=xxxxxxxx; aws_secret_access_key=yyyyyyyyy' fixedwidth 'col1:6,col2:4, col3:11,col4:12,col5:10,col6:8';
INFO: Load into table 'sample_test_stage' completed, 17 record(s) loaded successfully.
COPY
training=# select * from sample_test_stage limit 2;
col1 | col2 | col3 | col4 | col5 | col6
--------+------+-------------+--------------+------------+---------
191675 | 01 | 01142013 | 2368183100 | OUNHQEX | XUFQONY
191673 | 01 | 01142013 | 2632703881 | PAHFSAP | LUVIKXZ
(2 rows)
training=#
Load fixed-width file in single column table and use substring to extract require fields
The other option is to load fixed-with file into single column table and then use Redshift substring string functions to extract required field and populate target table.
Read:
Create single column table:
Firstly, let us create table with single column with max length to store fixed-width data.
create table sample_test_single_col (single_col varchar(2000));
Load single column table using COPY command:
Next, execute copy command soon after single column table is available in Redshift database or schema. You should have data file available in amazon S3 server and point COPY Command to that location. Below is the sample COPY command code to load data to single column table in Redshift:
training=# copy sample_test_single_col from 's3://testbucket/Sample_fixed_width.txt' CREDENTIALS
training-# 'aws_access_key_id=xxxxxxx; aws_secret_access_key=yyyyyyyy' ;
INFO: Load into table 'sample_test_single_col' completed, 17 record(s) loaded successfully.
COPY
training=# select * from sample_test_single_col limit 2;
single_col
----------------------------------------------------
16524 01 10172012 3930621977 TXNPUES
191667 01 01142013 3714468136 GHAKASC QHJXDFM
(2 rows)
training=#
Use substring to extract fields and load target table:
You can use the redshift substring function to get required field from single column table.
select substring(single_col,1,6) as col1, substring(single_col,7,4) as col2, substring(single_col,11,10) as col3, substring(single_col,22,12) as col4, substring(single_col,33,10) as col5, substring(single_col,44,8) as col6 from sample_test_single_col;
Hope this helps 🙂