Greenplum Unloading Data Examples

  • Post author:
  • Post last modified:May 7, 2019
  • Post category:Greenplum
  • Reading time:3 mins read

Greenplum can supports unloading large amounts of data. It also supports single file, non-parallel export for small amounts of data. Greenplum unloading data is supported by various methods such as COPY TO command and create writable external tables.

Greenplum Unloading Data

In this post you will learn how to unload data from Greenplum Database using parallel unload (writable external tables) and non-parallel unload (COPY)

Read:

Create Writable External Tables

Greenplum writable external table uses the Greenplum distributed file server, gpfdist to create file from database table. It can also make use of Greenplum Hadoop Distributed File System, gphdfs.

You can use the CREATE WRITABLE EXTERNAL TABLE command to define the external table and specify the location and format of the output files. You can unload the data in the form of delimited and fixed width files.

Syntax:

CREATE WRITABLE EXTERNAL TABLE unload_table 
 (LIKE table ) 
 LOCATION ('gpfdist://host-1:8081/unload_table1.out', 
 gpfdist://host-2:8081/unload_table2.out')
FORMAT 'TEXT' (DELIMITER ',')
DISTRIBUTED BY (dist_key);

Start gpfdist utility. Use the –d switch to set the “home” directory used to search for files in the particular directory. Use the –p to set the port and background the process.

For Example,

gpfdist -d ~/data/source_files -p 8081 > /tmp/gpfdist.log 2>&1 &

Then you can unload the table once all the above steps are completed. You can use below command to unload or create file.

INSERT INTO unload_table SELECT * FROM table;

Create Writable External Tables Example

Consider the example of unloading the d_airlines table. Below are the steps to unload table.

CREATE WRITABLE EXTERNAL TABLE d_airlines_unload
(LIKE d_airlines)
LOCATION ('gpfdist://localhost:8081/d_airlines_sample.csv')
FORMAT 'TEXT' (DELIMITER ',')
DISTRIBUTED BY (airlineid);

Start gpfdist greenplum file server

gpfdist -d ~/data/source_files -p 8081 > /tmp/gpfdist.log 2>&1 &

Finally, create flat file using below command

INSERT INTO d_airlines_unload SELECT * FROM d_airlines;

Greenplum Unloading Data Using COPY

COPY TO command copies data from a table to a flat file or standard input on the Greenplum master host using a single process on the Greenplum master instance. COPY TO command doesn’t make use of parallel process to unload table. Use COPY to output a table’s entire contents, or filter the output using a SELECT statement.

Syntax:

COPY (table) TO '/data/out_files/a_table.out';

Unloading Data Using COPY Example

Consider the example of unloading the d_airlines table using COPY TO command. Use below command.

COPY d_airlines TO '/data/out_files//d_airlines.out';

Other way is

COPY (select * from d_airlines) TO '/data/out_files/d_airlines.out';