Amazon Redshift Architecture and its Components

  • Post author:
  • Post last modified:February 6, 2023
  • Post category:Redshift
  • Reading time:10 mins read

Amazon Redshift Architecture is a shared nothing MPP architecture. The architecture is similar to the other MPP data warehouse systems such as Netezza, MS PDW, Greenplum etc. Amazon Redshift is a result of  database integration, processing CPU and storage in a system. Amazon Redshift architecture is depicted in below diagram:

Amazon Redshoft Architecture

SQL Client applications

There are many SQL client applications that you can use to connect and query data stored in Amazon Redshift database:

  1. SQL Workbench/J: It is a free, open-source SQL client application for Windows, MacOS and Linux.
  2. pgAdmin: provides a psql, another open-source client for Windows, MacOS and Linux, specifically designed for use with PostgreSQL databases. Amazon Redshift is based on PostgreSQL 8.X so you can use PSQL.
  3. Amazon QuickSight: it is a cloud-based data visualization tool provided by AWS that can connect to Redshift and other data sources.
  4. SQL Client: an application provided by the operating system or database management system that supports JDBC or ODBC connections.
  5. ETL/ELT tools: Amazon Redshift support ETL/ELT tools such as Informatica, Talend, DBT, etc.
  6. Reporting tools: Amazon support leading reporting and data visualization tools such as Tableau, Qlikview, etc.

Amazon Redshift supports various data loading tools that includes ETL/ELT tools and business intelligence tools to load data. Amazon Redshift is based on the industry standard PostgresSQL, so most of SQL client applications will work with Redshift.

Related Articles,

JDBC/ODBC Connections

Amazon Redshift communicates with client applications by using industry-Standard PostgreSQL JDBC and ODBC drivers.

To connect to Amazon Redshift using JDBC or ODBC, you need to:

  1. Download and install a JDBC/ODBC driver.
  2. Obtain the endpoint and port number from your Redshift cluster.
  3. Configure the SQL client or programming environment to use the JDBC/ODBC driver and connect to Redshift using the endpoint and port number.
  4. Provide your Redshift cluster credentials, such as the database name, user name, port and password, to establish a connection.

With a JDBC/ODBC connection, you can query data stored in Redshift, run SQL commands, and perform other database operations using the SQL client or programming environment of your choice.

Cluster System

The core component of an Amazon Redshift architecture is a system cluster. It is made up of one or more compute nodes.

A Redshift cluster is a collection of one or more compute nodes that work together to store and manage data. Each node in a Redshift cluster has its own CPU, memory, and disk storage. The nodes are connected by high-speed networks to provide fast and efficient data retrieval.

A Redshift cluster is responsible for storing and managing data, running SQL queries, and performing other data operations. You can scale up or down as needed. That allow you to adjust the performance and capacity of your data warehouse to meet your compute needs.

All the client applications communicates directly to the Leader node.

Leader Node

In Amazon Redshift architecture, the leader node is a special node that plays a critical role in managing the data stored in the cluster and communications with client application and all communication with compute nodes.

The leader node in Redshift database is responsible for coordinating and executing SQL queries on behalf of the client application. When a client application submits a query to Amazon Redshift, the leader node parses it, and generates a optimal query plan. Leader node then distributes the query plan to the compute nodes in the cluster, which execute the query and return the results to the leader node. The leader node then combines the results from all the compute nodes and returns the final results to the client application.

The leader node also distributes SQL statements to the various compute nodes when a query references tables that are stored on the compute node.

Read:

Compute Node

In Amazon Redshift, a compute node is a node in the cluster that stores and processes data. Each compute node in a Redshift cluster has its own CPU, memory, and disk storage.

Each compute node in Redshift cluster stores a portion of the data, allowing Redshift to scale horizontally and process large amounts of data in parallel. The data is stored in columnar format, which allows Redshift to retrieve only the data that is needed for a specific query, reducing the amount of I/O and processing required.

Compute nodes in the Redshift cluster work together with the leader node to store and process data in the cluster. When a client application sends a query to Redshift, the leader node parse it, generates a optimal query execution plan and distributes it to the compute nodes, which then execute the query independent of other compute nodes and return the results to the leader node. The leader node then combines the results from all the compute nodes and returns the results to the client application.

Every compute node has its own dedicated memory, CPU, and attached disk storage. You can increase the compute capacity and storage capacity of a cluster by increasing the number of nodes as and when required.

Cluster Internal Network

Amazon Redshift uses high-bandwidth network connections, and custom communication protocols to provide private, very high-speed network communication between the leader node and compute nodes. This network is used to transfer data between the nodes, including data transfers between the leader node and the compute nodes, and inter-node data transfers for query processing and other cluster operations. The compute nodes in the cluster run on a separate, isolated network that client applications never access directly.

The internal network in the cluster is designed to be fast and reliable, with low latency and high bandwidth, to ensure fast and efficient data transfers between the nodes. This allows Redshift to process large amounts of data in parallel, improving query performance and delivering fast results to clients.

Database

Amazon Redshift is a relational database management system (RDBMS), so it is compatible with other RDBMS applications. A cluster contains one or more databases. User data is stored on the compute nodes only. Your SQL client communicates with the leader node, which in turn coordinates query execution with the compute nodes.

Hope this helps 🙂