Greenplum Computed Column Support and Alternative

Derived or computed columns in Greenplum are columns that are derived from the previously computed columns in same table. These computed columns are virtual columns that are not physically stored in the Greenplum table. Their values are re-calculated every time they are referenced in a query. Many relational databases such as Netezza supports derived or computed columns but Greenplum does not support as of now. Development work is going on to include this feature in upcoming ProsgreSQL release. In this article, we will check Greenplum Computed Column Support and alternative…

Continue ReadingGreenplum Computed Column Support and Alternative
Comments Off on Greenplum Computed Column Support and Alternative

Greenplum Analyze and Examples

The most important prerequisites for good query performance is to collect the table statistics time to time using Greenplum analyze command. Greenplum analyze collects statistics about the contents of tables in the database, and stores the results in the system catalog table pg_statistic. Greenplum database uses these statistics to determine the best execution plan for the queries. Syntax: ANALYZE [VERBOSE] [ROOTPARTITION [ALL] ] [table [ (column [, ...] ) ]] Where: ROOTPARTITION [ALL]: Collect statistics only on the root partition of partitioned tables. VERBOSE: Enables display of progress messages. Table:…

Continue ReadingGreenplum Analyze and Examples
Comments Off on Greenplum Analyze and Examples

Greenplum Encryption Options and Best Practices

To minimize the data breaches, now a day’s companies are increasingly adding security and cryptographic functions to their data at rest. This applies to the most of the big data appliances such as Greenplum, Netezza, Redshift etc. In this post we will see how the Greenplum encryption works. Greenplum support the data encryption at various level: Encrypting the Connections to the Database Encryption of data in Transit Encryption of data at Rest Database Connections Encryption In the Greenplum systems, connections between clients and the master database can be encrypted with SSL. This…

Continue ReadingGreenplum Encryption Options and Best Practices
Comments Off on Greenplum Encryption Options and Best Practices

Built-in Greenplum Analytics Functions and Examples

Window functions or Greenplum analytics functions compute an aggregated value that is based on a group of rows. These functions allow the application developers to more easily write complex online analytical processing (OLAP) queries using standard SQL commands. For example, with Greenplum analytics functions or windows expressions, users can calculate moving averages or sums over various intervals, ranks as selected column values etc. Read: Greenplum Computed Column Support and Alternative Greenplum Architecture Greeplum Analytic Functions Examples Here are the examples of some commonly used Greenplum analytics functions: COUNT Analytics functions…

Continue ReadingBuilt-in Greenplum Analytics Functions and Examples
Comments Off on Built-in Greenplum Analytics Functions and Examples

Greenplum Skew and How to Avoid it

Greenplum is a MPP shared nothing environment. Data is spread across the many segments located on the multiple segment hosts. If the data is distributed properly, no two segments in the system have same data. The even distribution of the data is determined by the column(s) provided in the DISTRIBUTED BY clause. Greenplum skew is the table situation that degrade the performance. System distributes the rows with same distribution values to same segment. Hence, the more the unique value in the distribution column, the better. In case if the data…

Continue ReadingGreenplum Skew and How to Avoid it
Comments Off on Greenplum Skew and How to Avoid it

Greenplum Interview Questions and Answers – Part1

Explain Greenplum Architecture.  Read Post: Greenplum Architecture How data is distributed using hash algorithm? Read Post : How Greenplum Hash Distribution Works  What are different ways to get data into Greenplum data warehouse? COPY FROM Gpload INSERT statement Create EXTERNAL TABLE Explain how data is stored in Greenplu? Data is stored based on selected field (s) which are used for distribution. When you have a Distribution Key by Hash the values of the Distribution Key are run through a Hash Formula. Then, a map is used to distribute the row to the…

Continue ReadingGreenplum Interview Questions and Answers – Part1
4 Comments

How Greenplum Hash Distribution works?

When you have a Distribution Key by Hash and the values in that column are unique, the data will spread evenly evenly across all segments in Greenplum system. The Greenplum system distributes the rows with same distribution value to the same segment. This is because the data values in the hash key use a hashing algorithm. How Hash Algorithm Works in Distributed systems? Data is stored based on selected field (s) which are used for distribution. When you have a Distribution Key by Hash the values of the Distribution Key…

Continue ReadingHow Greenplum Hash Distribution works?
Comments Off on How Greenplum Hash Distribution works?

Greenplum Table Distribution and Best Practices

Greenplum is a massive parallel processing data store, and data is distributed across segments as per the definition of the distribution strategy. Greenplum Table Distribution uses the two types of distribution, Hash and Random. When you create or alter tables you will have to tell the system which distribution it should use. By default, Greenplum database data distribution uses the hash algorithm. Types of Greenplum Data Distribution Greenplum database distributes data using two methods Column Oriented/Hash Distribution: Distributes data evenly across all segment using the column specified in DISTRIBUTED BY…

Continue ReadingGreenplum Table Distribution and Best Practices
Comments Off on Greenplum Table Distribution and Best Practices

Greenplum Constraints:Table and Column Constraints

Greenplum Constraints are used to apply business rules for the database tables. You can define constraints on columns and tables to restrict the data in your tables. Greenplum Database support for constraints is the same as PostgreSQL with some limitations. Read: Greenplum Sequence and its Usage Greenplum Data Loading Options Greenplum constraints includes: CHECK NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY CHECK Greenplum Constraints and Example CHECK Greenplum Constraints allows you to specify that the value in a certain column must satisfy a Boolean expression. The boolean condition will evaluate to…

Continue ReadingGreenplum Constraints:Table and Column Constraints
Comments Off on Greenplum Constraints:Table and Column Constraints

Access Greenplum Database with No Password Prompt

Users can access Greenplum database using a PostgreSQL-compatible psql client. Users can always connect to the Greenplum database via masters; the segments cannot accept any client connection. Segments can only store user data and process the query distributed by the masters. Couple of options available to set up connection with no password prompt. Read: Greenplum Architecture Greenplum Data Loading Options Option 1. Export Greenplum Database Environmental Variables In order to access Greenplum database with no password prompt, you need to set up some environmental variables. Environmental Variable Description PGHOST The…

Continue ReadingAccess Greenplum Database with No Password Prompt
Comments Off on Access Greenplum Database with No Password Prompt