DWgeek.com

How Greenplum Hash Distribution works?

When you have a Distribution Key by Hash and the values in that column are unique, the data will spread evenly evenly across all segments in Greenplum system. The Greenplum system distributes the rows with same distribution value to the same segment. This is because the data values in the hash key use a hashing algorithm. How Hash Algorithm Works in Distributed systems? Data is stored based on selected field (s) which are used for distribution. When you have a Distribution Key by Hash the values of the Distribution Key…

Comments Off

August 22, 2016

Greenplum

Greenplum Table Distribution and Best Practices

Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy. Greenplum Table Distribution uses the two types of distribution, Hash and Random. When you create or alter tables you will have to tell the system which distribution it should use. By default, Greenplum database data distribution uses the hash algorithm. Types of Greenplum Data Distribution Greenplum database distributes data using two methods Column Oriented/Hash Distribution: Distributes data evenly across all segment using the column specified in DISTRIBUTED BY…

Comments Off

August 22, 2016

Greenplum

Greenplum Constraints:Table and Column Constraints

Greenplum Constraints are used to apply business rules for the database tables. You can define constraints on columns and tables to restrict the data in your tables. Greenplum Database support for constraints is the same as PostgreSQL with some limitations. Read: Greenplum Sequence and its Usage Greenplum Data Loading Options Greenplum constraints includes: CHECK NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY CHECK Greenplum Constraints and Example CHECK Greenplum Constraints allows you to specify that the value in a certain column must satisfy a Boolean expression. The boolean condition will evaluate to…

Comments Off

August 21, 2016

Greenplum

Access Greenplum Database with No Password Prompt

Users can access Greenplum database using a PostgreSQL-compatible psql client. Users can always connect to the Greenplum database via masters; the segments cannot accept any client connection. Segments can only store user data and process the query distributed by the masters. Couple of options available to set up connection with no password prompt. Read: Greenplum Architecture Greenplum Data Loading Options Option 1. Export Greenplum Database Environmental Variables In order to access Greenplum database with no password prompt, you need to set up some environmental variables. Environmental Variable Description PGHOST The…

Comments Off

August 20, 2016

Greenplum

Greenplum Sequence and its Usage

Like any other data warehouse appliances, Greenplum has sequences. Greenplum sequence is an auto number generator. These sequence then can be used in any SQL statements. Greenplum Sequence Overview CREATE SEQUENCE creates a new sequence number generator. This command willalso creates the special single-row table and initialize it. Sequence will be owned by the user creating it. Read: Greenplum Data Loading Sequence Also check: Greenplum Unloading Data Syntax: CREATE SEQUENCE name [Options] Following are the options associated with Greenplum sequence. [INCREMENT [BY] value] [MINVALUE minvalue | NO MINVALUE] [MAXVALUE maxvalue…

Comments Off

August 20, 2016

Greenplum

Greenplum Unloading Data Examples

Greenplum can supports unloading large amounts of data. It also supports single file, non-parallel export for small amounts of data. Greenplum unloading data is supported by various methods such as COPY TO command and create writable external tables. In this post you will learn how to unload data from Greenplum Database using parallel unload (writable external tables) and non-parallel unload (COPY) Read: Greenplum Data Loading Options How to Export Vertica Query Results into CSV? Create Writable External Tables Greenplum writable external table uses the Greenplum distributed file server, gpfdist to create file…

Comments Off

August 20, 2016

Greenplum

Greenplum Data Loading Options

Being a MPP server, Greenplum supports parallel data loading for large amounts of data. It also supports single file, non-parallel import for small amounts of data. Greenplum data loading is supported by various methods as follows. Read: Greenplum Architecture Data Loading Options Greenplum supports following tools for loading 1. Greenplum data Loading with gpload Command The gpload Greenplum data loading utility is an interface to external table parallel loading feature. gpload uses a load specification or layout defined in a YAML formatted control file to load data into the target table…

Comments Off

August 19, 2016

Greenplum

Greenplum Architecture

Like IBM Netezza and Amazon Redshift, Greenplum database is a massively parallel processing (MPP) database server. Greenplum architecture is designed to manage large scale data warehouse for analytics and business intelligence needs. Like any other large scale data warehouse appliances, Greenplum works well with Dimensional modeling. Read: Star Schema Model in Data Warehouse Step By Step Guide to Dimensional Modeling Greenplum Architecture Overview The MPP environment shared nothing architecture is made up of two or more processor that work together to perform tasks. Each processor has its own memory, operation…

Comments Off

August 19, 2016

Netezza

Netezza Analytic Functions

Netezza analytic functions compute an aggregate value that is based on a group of rows. A Netezza SQL analytic function works on the group of rows and ignores the NULL in the dat. Netezza server is basically an analytics system and provides many useful functions that can perform day to day aggregations. As these functions are native to Netezza server, hence use of these Netezza analytic functions improves performance of SQL queries as well as server performance. Frequently used Netezza analytical functions are as follows: COUNT Analytic Function Returns the number of…

Comments Off

August 17, 2016

Netezza

Collocated Joins in Netezza: Optimizing Query Performance

In Netezza, if two tables are distributed on same column then such tables are called collocated tables. If you join two collocated tables than each SPU in system works 100% independent to each other as the relevant rows required for joining these tables exist in the same SPU. These types of joins are called collocated joins. Collocated Joins Example Let’s take an examples, assume that we have two tables a patient and hospital, both are distributed on patient id. Netezza distributes the rows with same distribution key (column) to same…

2 Comments

August 16, 2016