Greenplum Architecture Archives

Greenplum Analyze and Examples

The most important prerequisites for good query performance is to collect the table statistics time to time using Greenplum analyze command. Greenplum analyze collects statistics about the contents of tables in the database, and stores the results in the system catalog table pg_statistic. Greenplum database uses these statistics to determine the best execution plan for the queries. Syntax: ANALYZE [VERBOSE] [ROOTPARTITION [ALL] ] [table [ (column [, ...] ) ]] Where: ROOTPARTITION [ALL]: Collect statistics only on the root partition of partitioned tables. VERBOSE: Enables display of progress messages. Table:…

Comments Off

September 5, 2016

Greenplum

Greenplum Encryption Options and Best Practices

To minimize the data breaches, now a day’s companies are increasingly adding security and cryptographic functions to their data at rest. This applies to the most of the big data appliances such as Greenplum, Netezza, Redshift etc. In this post we will see how the Greenplum encryption works. Greenplum support the data encryption at various level: Encrypting the Connections to the Database Encryption of data in Transit Encryption of data at Rest Database Connections Encryption In the Greenplum systems, connections between clients and the master database can be encrypted with SSL. This…

Comments Off

September 1, 2016

Greenplum

Built-in Greenplum Analytics Functions and Examples

Window functions or Greenplum analytics functions compute an aggregated value that is based on a group of rows. These functions allow the application developers to more easily write complex online analytical processing (OLAP) queries using standard SQL commands. For example, with Greenplum analytics functions or windows expressions, users can calculate moving averages or sums over various intervals, ranks as selected column values etc. Read: Greenplum Computed Column Support and Alternative Greenplum Architecture Greeplum Analytic Functions Examples Here are the examples of some commonly used Greenplum analytics functions: COUNT Analytics functions…

Comments Off

August 27, 2016

Greenplum

Greenplum Skew and How to Avoid it

Greenplum is a MPP shared nothing environment. Data is spread across the many segments located on the multiple segment hosts. If the data is distributed properly, no two segments in the system have same data. The even distribution of the data is determined by the column(s) provided in the DISTRIBUTED BY clause. Greenplum skew is the table situation that degrade the performance. System distributes the rows with same distribution values to same segment. Hence, the more the unique value in the distribution column, the better. In case if the data…

Comments Off

August 25, 2016

Greenplum

How Greenplum Hash Distribution works?

When you have a Distribution Key by Hash and the values in that column are unique, the data will spread evenly evenly across all segments in Greenplum system. The Greenplum system distributes the rows with same distribution value to the same segment. This is because the data values in the hash key use a hashing algorithm. How Hash Algorithm Works in Distributed systems? Data is stored based on selected field (s) which are used for distribution. When you have a Distribution Key by Hash the values of the Distribution Key…

Comments Off

August 22, 2016

Greenplum

Greenplum Table Distribution and Best Practices

Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy. Greenplum Table Distribution uses the two types of distribution, Hash and Random. When you create or alter tables you will have to tell the system which distribution it should use. By default, Greenplum database data distribution uses the hash algorithm. Types of Greenplum Data Distribution Greenplum database distributes data using two methods Column Oriented/Hash Distribution: Distributes data evenly across all segment using the column specified in DISTRIBUTED BY…

Comments Off

August 22, 2016

Greenplum

Greenplum Constraints:Table and Column Constraints

Greenplum Constraints are used to apply business rules for the database tables. You can define constraints on columns and tables to restrict the data in your tables. Greenplum Database support for constraints is the same as PostgreSQL with some limitations. Read: Greenplum Sequence and its Usage Greenplum Data Loading Options Greenplum constraints includes: CHECK NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY CHECK Greenplum Constraints and Example CHECK Greenplum Constraints allows you to specify that the value in a certain column must satisfy a Boolean expression. The boolean condition will evaluate to…

Comments Off

August 21, 2016

Greenplum

Access Greenplum Database with No Password Prompt

Users can access Greenplum database using a PostgreSQL-compatible psql client. Users can always connect to the Greenplum database via masters; the segments cannot accept any client connection. Segments can only store user data and process the query distributed by the masters. Couple of options available to set up connection with no password prompt. Read: Greenplum Architecture Greenplum Data Loading Options Option 1. Export Greenplum Database Environmental Variables In order to access Greenplum database with no password prompt, you need to set up some environmental variables. Environmental Variable Description PGHOST The…

Comments Off

August 20, 2016

Greenplum

Greenplum Sequence and its Usage

Like any other data warehouse appliances, Greenplum has sequences. Greenplum sequence is an auto number generator. These sequence then can be used in any SQL statements. Greenplum Sequence Overview CREATE SEQUENCE creates a new sequence number generator. This command willalso creates the special single-row table and initialize it. Sequence will be owned by the user creating it. Read: Greenplum Data Loading Sequence Also check: Greenplum Unloading Data Syntax: CREATE SEQUENCE name [Options] Following are the options associated with Greenplum sequence. [INCREMENT [BY] value] [MINVALUE minvalue | NO MINVALUE] [MAXVALUE maxvalue…

Comments Off

August 20, 2016

Greenplum

Greenplum Architecture

Like IBM Netezza and Amazon Redshift, Greenplum database is a massively parallel processing (MPP) database server. Greenplum architecture is designed to manage large scale data warehouse for analytics and business intelligence needs. Like any other large scale data warehouse appliances, Greenplum works well with Dimensional modeling. Read: Star Schema Model in Data Warehouse Step By Step Guide to Dimensional Modeling Greenplum Architecture Overview The MPP environment shared nothing architecture is made up of two or more processor that work together to perform tasks. Each processor has its own memory, operation…

Comments Off

August 19, 2016