Snowflake Architecture – Cloud Data Warehouse

  • Post author:
  • Post last modified:December 14, 2019
  • Post category:Snowflake
  • Reading time:5 mins read

Snowflake is an analytic data warehouse on cloud provided as Software-as-a-Service (SaaS). Snowflake is faster, easier to use cloud data warehouse compared to other relational databases. The Snowflake database support ANSI SQL with added functionalities. In this article, we will check Snowflake architecture and how it is different from other relational databases.

Snowflake Architecture

Snowflake runs on the cloud such as Amazon AWS, Microsoft Azure, and Google cloud. It uses virtual compute instances for its compute needs and a storage service for persistent storage of data.

As per the official documents, Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing database architectures. Unlike traditional single-cluster shared-disk/shared-nothing architectures (Netezza), Snowflake has as a multi-cluster, a shared data architecture that is dynamic and highly scalable. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores and process a portion of the entire data set locally.

Snowflake Architecture -  Cloud Data Warehouse
Image Source: Snowflake official website

Snowflake Architecture Compenents

Snowflake data data warehouse consists three main layers

  • Database Storage Layer
  • Query Processing Engine
  • Cloud Services

Now let us check these three layer in brief.

Database Storage Layer

Similar to Amazon Redshift, Snowflake is a columnar and append-only database. When you load data, Snowflake reorganizes that data into its internal optimized, compressed format.

Snowflake handles the data organization, file size, structure, compression, metadata, and statistics.

Snowflake gives utmost importance to a data security. It encrypts the data as soon as you load.

Query Processing Engine

Snowflake processes queries using virtual warehouses. Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses.

Cloud Services

Snowflake uses cloud services to coordinate activities that take place when you access cloud data warehouse.

Cloud services includes

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control

Is Snowflake based on PostgreSQL?

Snowflake’s data warehouse is not built on an existing database or Hadoop. It is built from the scratch which has a unique architecture designed for the cloud. Snowflake SQL is familiar as it supports ANSI SQL with added functionalities and features.

Snowflake Key Features

Following are the snowflake key features.

  • Standard and extended SQL supports (SQL:1999 and part of SQL:2003)
  • Data replication
  • High availability of data
  • Supports ODBC and JDBC
  • Provides support for major programming languages such as Python, Go, Spark connector
  • Provides wide ranges of tools (including snowSQL) to access data
  • Data import and export support to migrate your existing data warehouse applications
  • Uses micro-partitions to securely and efficiently store customer data
  • Data protection
  • Multi-cluster and shared data

Hope this helps 🙂