Greenplum Table Distribution and Best Practices

Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy. Greenplum Table Distribution uses the two types of distribution, Hash and Random. When you create or alter tables you will have to tell the system which distribution it should use. By default, Greenplum database data distribution uses the hash algorithm. Types of Greenplum Data Distribution Greenplum database distributes data using two methods Column Oriented/Hash Distribution: Distributes data evenly across all segment using the column specified in DISTRIBUTED BY…

Continue ReadingGreenplum Table Distribution and Best Practices
Comments Off on Greenplum Table Distribution and Best Practices

Greenplum Data Loading Options

Being a MPP server, Greenplum supports parallel data loading for large amounts of data. It also supports single file, non-parallel import for small amounts of data. Greenplum data loading is supported by various methods as follows. Read: Greenplum Architecture Data Loading Options Greenplum supports following tools for loading 1. Greenplum data Loading with gpload Command The gpload Greenplum data loading utility is an interface to external table parallel loading feature. gpload uses a load specification or layout defined in a YAML formatted control file to load data into the target table…

Continue ReadingGreenplum Data Loading Options
Comments Off on Greenplum Data Loading Options