Greenplum Architecture

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Greenplum
  • Reading time:3 mins read

Like IBM Netezza and Amazon Redshift, Greenplum database is a massively parallel processing (MPP) database server. Greenplum architecture is designed to manage large scale data warehouse for analytics and business intelligence needs. Like any other large scale data warehouse appliances, Greenplum works well with Dimensional modeling.

Read:

Greenplum Architecture Overview

The MPP environment shared nothing architecture is made up of two or more processor that work together to perform tasks. Each processor has its own memory, operation system and disks. All these systems are interconnected with gigabyte Ethernet switch. Greenplum uses this high-performance system architecture to distribute the workload of multi-terabyte data warehouses and process queries in parallel with help of system resources.

Greenplum Architecture Diagram

The Greenplum architecture includes the master host, segment host and gigabyte Ethernet switch. Master coordinates its work with other database instances in the system with segment host, which store and process data. External tools such as query workbench, ETL tools connect to master host via ODBC or JDBC connections.

Greenplum Master

Greenplum master is the entry point to database system, which accepts the client connection or SQL queries using JDBC or ODBC and distributes work to segment instances. System has standby master host to provide the high-availability.

A set of system tables that contains metadata about greenplum database system resides on the greenplum Master. Master contains only metadata details, user data resides on the segments.

The master authenticates client connections, parse and processes incoming SQL commands, distributes workloads among segments, coordinates the results returned by each segment, and presents the final results to the client program.

Greenplum Segments

Greenplum Database segment instances are independent PostgreSQL databases; each can store a portion of the data and perform the majority of query processing in the system.

Segments run on servers called segment hosts. A segment host typically executes from two to eight Greenplum segments, depending on the CPU cores, RAM, storage, network interfaces, and workloads. When user issues a query via master, processes are created in each segment databases to handle the work of that query and perform the requested task and return result to master.

Greenplum Interconnect

In Greenplum architecture, interconnect is the network layer and it is an inter-process communication between segments and network infrastructure. Interconnect makes use of standard gigabyte Ethernet switch.

By default, the interconnect uses user datagram protocol (UDP) to send messages over network.