Redshift Split Delimited Fields into Table Records and Examples

Amazon Redshift is relatively new to relational databases. It is based on PostgreSQL, but being a columnar distributed database, it does not support all functions that are available in PostgreSQl. One of such requirement is split a delimited string into rows. In this article, we will check how to split delimited fields into table records or rows using Redshift built in function.

Not that, the method described in this article usually works when you have a fixed number of delimited fields in your input string. Say, you have 5 delimited values in your input field.

Redshift Split Delimited Fields into Records

Amazon Redshift provides basic regular expression functions support. It does not provide functions such as regexp_split_to_table or array functions.

Most importantly, Redshift does support SPLIT_PART function. Meanwhile, you can use this function to as an alternative to split your delimited input string into rows.

Redshift SPLIT_PART Function

Amazon Redshift split_part function is a string function which is used to split a string on the specified delimiter and returns the part at the specified position.

Redshift SPLIT_PART Function Syntax

Below is the syntax of redshift split_part function.

SPLIT_PART(string, delimiter, part)

where,

string is a string to be split. The string can be CHAR or VARCHAR.
delimiter is the delimiter string.
part is the position of the portion to return (counting from 1).

Redshift SPLIT_PART Function Example

For example, split sample input string on comma and return first value.

select split_part('aaa,bbb,ccc,ddd,eee', ',', 1);
 split_part
------------
 aaa
(1 row)

Commonly used Redshift String Functions and Example

Redshift Split Delimited Fields Example

We will use split_part string function to split delimited fields into rows or records. We will use Redshift WITH clause to generate the position number to extract the required values.

For example, split input string which contains five delimited records into table row.

with sample AS (
  select 1 as n union all
  select 2 union all
  select 3 union all
  select 4 union all
  select 5
)
select
  TRIM(SPLIT_PART('aaa,bbb,ccc,ddd,eee', ',', sample.n)) AS rows
from sample;

 rows
------
 aaa
 bbb
 ccc
 ddd
 eee
(5 rows)

Redshift strtok_split_to_table Alternative

The STRTOK_SPLIT_TO_TABLE function is available in Teradata. It is used to split a string into multiple rows based on a specified delimiter and return the output in a table format. This function can be useful in situations where you need to parse a string and extract specific information.

Following is the example of Teradata STRTOK_SPLIT_TO_TABLE function.

CREATE TABLE t (id integer,
str varchar(256)character set
unicode);

insert
	into
	t
values (1,
'Teradata,Oracle, Netezza, SQL Server, Hadoop');

SELECT
	d.*
FROM
	TABLE ( strtok_split_to_table(1,
	t.str,
	',') 
RETURNS ( k integer,
	tokennum integer ,
	data varchar(20) character set 	unicode)) AS d;

|k  |tokennum|data       |
|---|--------|-----------|
|1  |1       |Teradata   |
|1  |2       |Oracle     |
|1  |3       | Netezza   |
|1  |4       | SQL Server|
|1  |5       | Hadoop    |

You can simply use the CTE to create a Teradata STRTOK_SPLIT_TO_TABLE function alternative in Amazon Redshift.

Following is the example and solution is borrowed from source.

with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
  , generted_numbers AS
(
    SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
    FROM ten_numbers AS t1
      JOIN ten_numbers AS t2 ON 1 = 1
      JOIN ten_numbers AS t3 ON 1 = 1
      JOIN ten_numbers AS t4 ON 1 = 1
)
  , splitter AS
(
    SELECT *
    FROM generted_numbers
    WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(str, '\\,') + 1)
                                 FROM t)
)
  , expanded_input AS
(
    SELECT
      1 as k,
      row_number () over (order by null) as tokennum,
      split_part(str, ',', s.gen_num) AS data
    FROM t AS td
      JOIN splitter AS s ON 1 = 1
    WHERE split_part(str, ',', s.gen_num) <> ''
)
SELECT * FROM expanded_input
order by tokennum;

|k  |tokennum|data       |
|---|--------|-----------|
|1  |1       |Teradata   |
|1  |2       | Netezza   |
|1  |3       | SQL Server|
|1  |4       |Oracle     |
|1  |5       | Hadoop    |

Hope this helps 🙂