Just like many other SQL procedural languages, the latest version of Redshift plpgsql supports cursor type variables. You can use a cursor to retrieve a few rows at a time from the result set of a larger query. You can use the cursor type inside Redshift stored procedures (SP). They are used in selections and FOR loops to hold one row from a SELECT operation. In this article, we will check how to create and use Redshift cursor variable inside your stored procedure.
Redshift Cursor Variable
As mentioned earlier, Redshift cursor variables are used to retrieve a few rows from the result set of larger query. When you want to loop through the rows of particular SELECT statement result, you can create a cursor and use that in FOR loop.
When the first row of a cursor is fetched using FETCH statement, the entire result set is materialized on the leader node, in memory or on disk, if needed.
Because of the potential negative performance impact of using cursors with large result sets, Redshift recommend using alternative approaches such as UNLOAD to export tables whenever possible.
Create Redshift Cursor
You can use DECLARE command to create cursor. DECLARE statement defines new cursor inside your query block.
Redshift Cursor Syntax
You can use below syntax to create cursor:
DECLARE cursor_name CURSOR FOR query
Where cursor_name is Name of the new cursor and query is a a SELECT statement that populates the cursor.
Cursor FETCH statement
The FETCH
statement is used to retrieve the rows using cursor. FETCH statement retrieves rows based on the current position within the cursor. When a cursor is created, it is positioned before the first row. After a FETCH, the cursor is positioned on the last row retrieved.
Cursor FETCH Syntax
Below is the fetch statement syntax.
FETCH [ NEXT | ALL | {FORWARD [ count | ALL ] } ] FROM cursor;
Where NEXT fetch next row from cursor. This option is by default. ALL option with FETCH returns all remaining rows. FORWARD return next count of rows or ALL. FORWARD 0 returns current row without increment counter.
Redshift Cursor Example
Below is the sample example that demonstrates the usage of cursor:
CREATE OR REPLACE PROCEDURE sample_cursor_test (rs_out INOUT refcursor)
AS $$
BEGIN
OPEN rs_out FOR SELECT * FROM test_table1;
END;
$$ LANGUAGE plpgsql;
Here is the output:
BEGIN;
CALL sample_cursor_test ('newcursor');
FETCH NEXT FROM newcursor;
dev=# BEGIN;
BEGIN
dev=# CALL sample_cursor_test ('newcursor');
rs_out
-----------
newcursor
(1 row)
dev=# FETCH NEXT FROM newcursor;
col1 | col2
------+------
1 | AA
(1 row)
dev=# CLOSE newcursor;
CLOSE CURSOR
Create Cursor outside Redshift Stored Procedure
You can create cursor outside stored procedure as well. Below is the example.
BEGIN;
DECLARE newCursor CURSOR FOR SELECT * FROM test_table1;
FETCH NEXT FROM newCursor;
CLOSE newCursor;
Here is the output:
dev=# BEGIN;
BEGIN
dev=# DECLARE newCursor CURSOR FOR SELECT * FROM test_table1;
DECLARE CURSOR
dev=# FETCH NEXT FROM newCursor;
col1 | col2
------+------
1 | AA
(1 row)
dev=# CLOSE newCursor;
CLOSE CURSOR
Cursor CLOSE statement
This is an optional statement you can use to close already opened cursor.
CLOSE cursor_name;
For examples,
dev=# CLOSE newcursor;
CLOSE CURSOR
Redshift Cursor FETCH into Variables
The FETCH
inside Redshift stored procedure retrieves the next row from the cursor into a target. This target can be a row variable, a record variable, or a comma-separated list of simple variables, just as with SELECT INTO.
The following example demonstrates the use of FETCH into a variables.
CREATE OR REPLACE PROCEDURE redshift_cursor_fetch_demo()
AS $$
DECLARE
var1 VARCHAR(6);
var2 VARCHAR(6);
newcount int default 0;
test_cursor cursor for
SELECT
id,
name
FROM
rs_cursor_test
where id = 5;
begin
OPEN test_cursor;
var1 := NULL;
var2 := NULL;
FETCH test_cursor INTO var1, var2;
IF var1 = 5 THEN
delete from rs_cursor_test where id = var1 ;
commit;
end if;
END;
$$
LANGUAGE plpgsql;
Redshift Cursor Performance
There are few performance considerations when using Amazon Redshift cursors.
- Using Redshift cursors on large data set: cursors materialize the entire result set on the leader node before beginning to return results to the client. So using the cursors with very large result sets can have a negative impact on performance.
- Uses more resources because each time you fetch a row from the cursor, it results in a network roundtrip.
Redshift Cursor Limitations
Below are some of the limitations that you should keep in mind when creating and using cursors:
- Only one cursor at a time can be open per session. You cannot open multiple cursors per session.
- Cursors must be used within a transaction blocks. i.e. it should be within BEGIN … END block. Cursors are usually created within stored procedures (SP). You can also create transaction block and work with cursors
- The maximum cumulative result set size for all cursors is constrained based on the cluster node type. Based on your cluster type values differs. You can verify the size on the official website.
Redshift Cursor vs Temp table
You can use Redshift cursors and temporary tables to manage data in Amazon Redshift.
Following are some of the differences between Amazon Redshift cluster and temporary tables:
- Performance: Cursors can be significantly slower than temporary tables when working with large amounts of data sets. Temporary tables, however, can be optimized for faster query performance.
- Small Datasets: Cursor can perform better on small data sets. However, you can use a temp table if your requirement is batch processing.
- Loop through Dataset: It is easy to loop through data set when you use cursors. When you use temporary tables, it can be difficult to loop thought records.
- Flexibility: Cursors offer more flexibility in terms of data manipulation as they allow for more granular control over the data being fetched. Temporary tables, on the other hand, provide more flexibility in terms of data analysis and aggregation.
- Return result set: It is always better to use temporary tables to return a result set.
- Set based approach: Cursors can be inefficient, especially with large data sets. It is better to use set-based processing with SQL statements.
Also Read
- Redshift Dynamic SQL Queries and Examples
- AWS Redshift Stored Procedure Return Result Set – Working Example
- Redshift RECORD Type, Usage and Examples
- Amazon Redshift SELECT INTO Variable – Example
Hope this helps 🙂