Steps to Connect HiveServer2 from Python using Hive JDBC Drivers

  • Post author:
  • Post last modified:October 24, 2018
  • Post category:BigData
  • Reading time:3 mins read

HiveServer2 has a JDBC driver and It supports both embedded and remote access to HiveServer2. Usually, remote HiveServer2 is recommended for production environment as it does not require direct metastore or HDFS access to be given to Hive users. In this article, we will check steps to Connect HiveServer2 from Python using Hive JDBC Drivers.

Steps to Connect HiveServer2 from Python using Hive JDBC Drivers

Hive JDBC driver is one of the widely used method to connect to HiveServer2. You can use the Hive JDBC with Python Jaydebeapi open source module.

Note that, this post concentrates only on connecting to HiveServer2 which has KERBEROS authentication enabled.

There are various other ways to connect to HiveServer2 from Python.

For instance,

Install Jaydebeapi

The JayDeBeApi module allows you to connect from Python code to databases using Java JDBC. It provides a Python DB-API v2.0 to that database.

You can install it using pip:

pip install Jaydebeapi
Set CLASSPATH to Driver Location

Hive JDBC driver is dependent on many other jars. You can download them or simply set Hadoop-client and Hive-client path to CLASSPATH shell environmental variable.

export CLASSPATH=$CLASSPATH:$(hadoop classpath):/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hive-client/lib/*:/usr/hdp/current/hadoop-client/client/*

Execute above command from your Linux edge node. If you are trying to execute form windows then you might want to set user specific environmental variables.

Note that, jaydebeapi module will not work if you don’t have all required jar files.

You can read on how to set CLASSPATH variable in my another post Set and Use Environment Variable inside Python Script

Create Kerberos Ticket

Before connecting to Hive server, you must create Kerberos ticket. You can use knit command along with keytab file to create ticket. Your local Hadoop admins can help you on this.

Connect HiveServer 2 using JDBC Driver

Now you are all set to connect to Hivesever2. Verify if ticket is already created using klist command.

Below is the code that you can use to connect HiveServer2 from Python using Hive JDBC Drivers:

import jaydebeapi

database='testtdb'
driver='org.apache.hive.jdbc.HiveDriver'
server='192.168.200.100'
principal='hive/example.domain.com@DOMAIN.COM.'
port=10000

# JDBC connection string
url=("jdbc:hive2://" + server + ":" + str(port)
+ "/"+ database +";principal=" + principal + ";")

#Connect to HiveServer2 
conn=jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver", url)
cursor = conn.cursor()

# Execute SQL query
sql="select * from item limit 10"
cursor.execute(sql)
results = cursor.fetchall()
print results

Below is the output:

$ python hive_jdbc_conn.py
18/09/21 12:41:25 INFO jdbc.Utils: Supplied authorities: 192.168.200.100:10000
18/09/21 12:41:25 INFO jdbc.Utils: Resolved authority: 192.168.200.100:10000
18/09/21 12:41:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[(1537513919L,)]

Hope this helps. Let me know if you find any better method 🙂