Connect Redshift using Python and Jdbc Driver- Example

Redshift is one of the widely used MPP data warehouse appliance on the AWS cloud. AWS Redshift is popular because of its flexible features and benefits. Redshift supports both JDBC and OBDC drivers. You can use those drivers from any programming language to connect to Redshift. In this article, we will check how to connect Redshift using Python and Jdbc driver with a working example.

Page Content

Introduction

Setup Redshift JDBC Driver Environment

Install Jaydebeapi Python Module

Set CLASSPATH to Driver Location

Redshift JDBC Connection String

Connect Redshift using JDBC Driver from Python

Conclusion

Introduction

Amazon Redshift is a powerful cloud data warehousing solution that enables users to store, manage, and analyze large amounts of data. Python is a popular programming language that is widely used for data analysis and management. By connecting Redshift to Python using JDBC driver and the jaydbapi library, users can leverage the strengths of both tools to perform sophisticated data analysis, visualization, and reporting.

Setup Redshift JDBC Driver Environment

Amazon Redshift offers drivers for tools that are compatible with either the JDBC 4.2 API, JDBC 4.1 API, or JDBC 4.0 API. You can use a JDBC connection to connect to your Amazon Redshift cluster from many third-party SQL client tools or from your favorite programming language that supports JDBC connection.

Download latest version of Redshift JDBC connector from official website (S3 location).

You can use any python module that supports JDBC connection. We will be using the Jaydebeapi module for simplicity. Jaydebeapi is one of the easiest module that I have been using.

Install Jaydebeapi Python Module

The JayDeBeApi module allows you to connect from Python code to various databases using Java JDBC drivers and a connection string. It provides a Python DB-API v2.0 to that database.

You can install it using pip:

# Python
pip install Jaydebeapi

Set CLASSPATH to Driver Location

If there are dependent jar files for any JDBC jar file, you have to provide path of those jar files in CLASSPATH shell or environment variable.

The Redshift provided JDBC driver does not have any dependent jars, you can directly refer this driver in your jaydebeapi module.

Alternatively, if you are on Linux operating system, you can export jar location to CLASSPATH shell variable and run your python program without needing to set jar the location in your module.

You can read on how to set CLASSPATH variable in my other post – Set and Use Environment Variable inside Python Script.

Redshift JDBC Connection String

Connection string differs from database to database. The Redshift jdbc connection string is slightly different from databases like Netezza, Vertica, etc.

Below is the JDBC connection string that you can use to connect Redshift database:

jdbc:redshift://host:port/database

For example;

jdbc:redshift://examplecluster.abc123xyz789.us-east-2.redshift.amazonaws.com:5439/dev

How to Connect Redshift using JDBC Driver from Python?

Now we have Redshift jdbc driver downloaded and a classpath variable or jar location is set. In this section, we will check how to connect Redshift using JDBC driver from Python program.

Note that, Redshift jdbc driver class name is “com.amazon.redshift.jdbc.Driver“

Here is the code that can help you:

import jaydebeapi as jay
import os

# Redshift Server Details
dsn_database = "dev"
dsn_hostname = "example-cluster.quertyuiop123456.us-east-2.redshift.amazonaws.com"
dsn_port = "5439"
dsn_uid = "Redshiftuser"
dsn_pwd = "Password@123"

# Redshift JDBC class name
jdbc_driver_name = "com.amazon.redshift.jdbc.Driver"

# Redshift JDBC driver path
jdbc_driver_loc = os.path.join(r'D:\Redshift\RedshiftJDBC42-no-awssdk-1.2.32.1056.jar')

# Sample SQL Query
sql_str = "select now()"

# JDBC connection string
connection_string='jdbc:redshift://'+ dsn_hostname+':'+dsn_port+'/'+dsn_database

url = '{0}:user={1}; password={2}'.format(connection_string, dsn_uid, dsn_pwd)
print("Connection String: " + connection_string)

# Establish JDBC connection
conn = jay.connect(jdbc_driver_name, connection_string, {'user': dsn_uid, 'password': dsn_pwd},
jars=jdbc_driver_loc)

curs = conn.cursor()
curs.execute(sql_str)
result = curs.fetchall()

print(result[0])

Here is the sample output:

Connection String: jdbc:redshift://example-cluster.quertyuiop123456.us-east-2.redshift.amazonaws.com:5439/dev
('2019-08-21 23:45:31.415730',)

Conclusion

In conclusion, connecting to Amazon Redshift using Python and JDBC driver is a efficient way to query the Redshift data warehouse. By following the steps outlined in this post, you can easily establish a connection, execute SQL queries on AWS Redshift. The combination of Python and JDBC driver provides a flexible and customizable solution for data analysis and management, allowing you to optimize your workflow and streamline your data pipeline. We hope this post has provided you with a solid foundation for getting started with Redshift and Python, and encourages you to explore further and discover the full potential of these powerful tools.

Hope this helps 🙂

This Post Has 2 Comments

John Doe September 11, 2019

NO JayDeBeApi3 found whenever i do a pip3 install JayDeBeApi3

Collecting JayDeBeApi3
ERROR: Could not find a version that satisfies the requirement JayDeBeApi3 (from versions: none)
ERROR: No matching distribution found for JayDeBeApi3
1. Vithal S September 12, 2019
  
  Hi,
  
  Thanks for stopping by.
  
  Looks like they removed Python3 version and maintaining single copy.
  
  pip install Jaydebeapi
  
  Should work.
  
  Thanks,

Comments are closed.