HiveServer2 has a JDBC driver and It supports both embedded and remote access to HiveServer2. Usually, remote HiveServer2 is recommended for production environment as it does not require direct metastore or HDFS access to be given to Hive users. In this article, we will check steps to Connect HiveServer2 from Python using Hive JDBC Drivers.
Steps to Connect HiveServer2 from Python using Hive JDBC Drivers
Hive JDBC driver is one of the widely used method to connect to HiveServer2. You can use the Hive JDBC with Python Jaydebeapi open source module.
Note that, this post concentrates only on connecting to HiveServer2 which has KERBEROS authentication enabled.
There are various other ways to connect to HiveServer2 from Python.
For instance,
- You can execute Hive Beeline JDBC string command from Python.
- You can connect to HiveServer2 using Python Pyhive package.
Install Jaydebeapi
The JayDeBeApi module allows you to connect from Python code to databases using Java JDBC. It provides a Python DB-API v2.0 to that database.
You can install it using pip:
pip install Jaydebeapi
Set CLASSPATH to Driver Location
Hive JDBC driver is dependent on many other jars. You can download them or simply set Hadoop-client and Hive-client path to CLASSPATH shell environmental variable.
export CLASSPATH=$CLASSPATH:$(hadoop classpath):/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hive-client/lib/*:/usr/hdp/current/hadoop-client/client/*
Execute above command from your Linux edge node. If you are trying to execute form windows then you might want to set user specific environmental variables.
Note that, jaydebeapi module will not work if you don’t have all required jar files.
You can read on how to set CLASSPATH variable in my another post Set and Use Environment Variable inside Python Script
Create Kerberos Ticket
Before connecting to Hive server, you must create Kerberos ticket. You can use knit command along with keytab file to create ticket. Your local Hadoop admins can help you on this.
Connect HiveServer 2 using JDBC Driver
Now you are all set to connect to Hivesever2. Verify if ticket is already created using klist command.
Below is the code that you can use to connect HiveServer2 from Python using Hive JDBC Drivers:
import jaydebeapi database='testtdb' driver='org.apache.hive.jdbc.HiveDriver' server='192.168.200.100' principal='hive/example.domain.com@DOMAIN.COM.' port=10000 # JDBC connection string url=("jdbc:hive2://" + server + ":" + str(port) + "/"+ database +";principal=" + principal + ";") #Connect to HiveServer2 conn=jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver", url) cursor = conn.cursor() # Execute SQL query sql="select * from item limit 10" cursor.execute(sql) results = cursor.fetchall() print results
Below is the output:
$ python hive_jdbc_conn.py 18/09/21 12:41:25 INFO jdbc.Utils: Supplied authorities: 192.168.200.100:10000 18/09/21 12:41:25 INFO jdbc.Utils: Resolved authority: 192.168.200.100:10000 18/09/21 12:41:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [(1537513919L,)]
Hope this helps. Let me know if you find any better method 🙂