Greenplum Encryption Options and Best Practices

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Greenplum
  • Reading time:4 mins read

To minimize the data breaches, now a day’s companies are increasingly adding security and cryptographic functions to their data at rest. This applies to the most of the big data appliances such as Greenplum, Netezza, Redshift etc. In this post we will see how the Greenplum encryption works.

Greenplum Encryption Options and Best Practices

Greenplum support the data encryption at various level:

  • Encrypting the Connections to the Database
  • Encryption of data in Transit
  • Encryption of data at Rest

Database Connections Encryption

In the Greenplum systems, connections between clients and the master database can be encrypted with SSL. This is enabled by setting the ssl server configuration parameter to on and editing the pg_hba.conf file.

And also network communications between hosts in the Greenplum Database cluster can be encrypted using IPsec. An authenticated, encrypted VPN is established between every pair of hosts in the cluster.

Encryption of Data in Transit

SSL encryption of the data in transit between the gpfdist server and Greenplum database segment hosts is supported in Greenplum.

The gpfdists protocol is a secure version of gpfdist, which enables encrypted communication and secure identification of the file server and the Greenplum Database to protect against attacks such as eavesdropping and man-in-the-middle attacks. During SSL implementation, the important point to be noted here is, do not protect the private key with a passphrase. The server does not prompt for a passphrase for the private key, and loading data fails with an error if one is required.

There are two ways to use gpfdists protocol:

  • Run gpfdist with the –ssl option and then use the gpfdists protocol in the LOCATION clause of a CREATE EXTERNAL TABLE statement.
  • Use a YAML Control File with the SSL option set to true and then run gpload. Running gpload starts the gpfdist server with the –ssl option and then uses the gpfdists protocol. You must also provide the location of the certificate in YALM file.

Encryption of Data at Rest

The pgcrypto package available in encryption/decryption functions protects data at rest in the database. Encryption at the column level allows the database administrator to protect sensitive information, such as passwords, Social Security numbers, or credit card numbers. This adds the extra layer of the security as the encrypted data cannot be read by user without encryption key.

The pgcrypto package is not installed by default with Greenplum Database; however you can download the package and install it across your entire cluster.

PgCrypto has various levels of encryption ranging from basic to advanced built-in functions. MD5, SHA1, SHA22/256/384/512,AES many more functionalities as supported.

Greenplum Encryption Best Practices

  • Greenplum Encryption and decryption has a performance cost; only encrypt data whenever required.
  • You should always evaluate the performance of the system before implementing any Greenplum encryption solution in a production system
  • Client connections to Greenplum Database should use SSL encryption whenever users are connecting from insecure link or line
  • Use the symmetric scheme which has the better performance over asymmetric
  • Use pgcrypto functions to encrypt data on disk
  • Whenever you are performing the ETL operation use the gpfdists protocol

Read: