Amazon Redshift is relatively new to relational databases. It is based on PostgreSQL, but being a columnar distributed database, it does not support all functions that are available in PostgreSQl. One of such requirement is split a delimited string into rows. In this article, we will check how to split delimited fields into table records or rows using Redshift built in function.
Not that, the method described in this article usually works when you have a fixed number of delimited fields in your input string. Say, you have 5 delimited values in your input field.
Redshift Split Delimited Fields into Records
Amazon Redshift provides basic regular expression functions support. It does not provide functions such as regexp_split_to_table or array functions.
Most importantly, Redshift does support SPLIT_PART function. Meanwhile, you can use this function to as an alternative to split your delimited input string into rows.
Redshift SPLIT_PART Function
Amazon Redshift split_part function is a string function which is used to split a string on the specified delimiter and returns the part at the specified position.
Redshift SPLIT_PART Function Syntax
Below is the syntax of redshift split_part function.
SPLIT_PART(string, delimiter, part)
where,
- string is a string to be split. The string can be CHAR or VARCHAR.
- delimiter is the delimiter string.
- part is the position of the portion to return (counting from 1).
Redshift SPLIT_PART Function Example
For example, split sample input string on comma and return first value.
select split_part('aaa,bbb,ccc,ddd,eee', ',', 1);
split_part
------------
aaa
(1 row)
Related Articles
Redshift Split Delimited Fields Example
We will use split_part string function to split delimited fields into rows or records. We will use Redshift WITH clause to generate the position number to extract the required values.
For example, split input string which contains five delimited records into table row.
with sample AS (
select 1 as n union all
select 2 union all
select 3 union all
select 4 union all
select 5
)
select
TRIM(SPLIT_PART('aaa,bbb,ccc,ddd,eee', ',', sample.n)) AS rows
from sample;
rows
------
aaa
bbb
ccc
ddd
eee
(5 rows)
Redshift strtok_split_to_table Alternative
The STRTOK_SPLIT_TO_TABLE function is available in Teradata. It is used to split a string into multiple rows based on a specified delimiter and return the output in a table format. This function can be useful in situations where you need to parse a string and extract specific information.
Following is the example of Teradata STRTOK_SPLIT_TO_TABLE function.
CREATE TABLE t (id integer,
str varchar(256)character set
unicode);
insert
into
t
values (1,
'Teradata,Oracle, Netezza, SQL Server, Hadoop');
SELECT
d.*
FROM
TABLE ( strtok_split_to_table(1,
t.str,
',')
RETURNS ( k integer,
tokennum integer ,
data varchar(20) character set unicode)) AS d;
|k |tokennum|data |
|---|--------|-----------|
|1 |1 |Teradata |
|1 |2 |Oracle |
|1 |3 | Netezza |
|1 |4 | SQL Server|
|1 |5 | Hadoop |
You can simply use the CTE to create a Teradata STRTOK_SPLIT_TO_TABLE function alternative in Amazon Redshift.
Following is the example and solution is borrowed from source.
with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
, generted_numbers AS
(
SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
)
, splitter AS
(
SELECT *
FROM generted_numbers
WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(str, '\\,') + 1)
FROM t)
)
, expanded_input AS
(
SELECT
1 as k,
row_number () over (order by null) as tokennum,
split_part(str, ',', s.gen_num) AS data
FROM t AS td
JOIN splitter AS s ON 1 = 1
WHERE split_part(str, ',', s.gen_num) <> ''
)
SELECT * FROM expanded_input
order by tokennum;
|k |tokennum|data |
|---|--------|-----------|
|1 |1 |Teradata |
|1 |2 | Netezza |
|1 |3 | SQL Server|
|1 |4 |Oracle |
|1 |5 | Hadoop |
Related Article
- Redshift WITH clause Syntax, usage and Examples
- Amazon Redshift Regular Expression Functions and Examples
Hope this helps 🙂