Greenplum Skew and How to Avoid it

Greenplum is a MPP shared nothing environment. Data is spread across the many segments located on the multiple segment hosts. If the data is distributed properly, no two segments in the system have same data. The even distribution of the data is determined by the column(s) provided in the DISTRIBUTED BY clause. Greenplum skew is the table situation that degrade the performance. System distributes the rows with same distribution values to same segment. Hence, the more the unique value in the distribution column, the better. In case if the data…

Continue ReadingGreenplum Skew and How to Avoid it
Comments Off on Greenplum Skew and How to Avoid it

How Greenplum Hash Distribution works?

When you have a Distribution Key by Hash and the values in that column are unique, the data will spread evenly evenly across all segments in Greenplum system. The Greenplum system distributes the rows with same distribution value to the same segment. This is because the data values in the hash key use a hashing algorithm. How Hash Algorithm Works in Distributed systems? Data is stored based on selected field (s) which are used for distribution. When you have a Distribution Key by Hash the values of the Distribution Key…

Continue ReadingHow Greenplum Hash Distribution works?
Comments Off on How Greenplum Hash Distribution works?