Methods to Access Hive Tables from Python

  • Post author:
  • Post last modified:November 16, 2018
  • Post category:BigData
  • Reading time:3 mins read

Apache Hive is database framework on the top of Hadoop distributed file system (HDFS) to query structured and semi-structured data. Just like your regular RDBMS, you access hdfs files in the form of tables. You can create tables, views etc in Apache Hive. You can analyze structured data using HiveQL language which is similar to Structural Query Language (SQL). In this article, we will check different methods to access Hive tables from python program. Methods we are going to discuss here will help you to connect Hive tables and get required data for your analysis.

Methods to Access Hive Tables from Python

Methods to Access Hive Tables from Python

Following are commonly used methods to connect to Hive from python program:

  • Execute Beeline command from Python.
  • Connect to Hive using PyHive.
  • Connect to Remote Hiveserver2 using Hive JDBC driver.

Now, let us check these methods in details;

Execute Beeline command from Python

Beeline is latest command line interface to connect to Hive. You can use beeline to connect to either embedded (local) Hive or remote Hive. Beeline command works well with Kerberos authenticated Hive cluster.

You can check Beeline command line options and details in my other post:

You can follow the steps given in below post to execute beeline commands from Python program:

Connect to Hive using PyHive

There are lot of other Python packages available to connect to remote Hive. Pyhive package is one of the easy, well-maintained and supported package available today. Pyhive was mainly created to connect to remote HiveServer2.

You can follow steps given in below post to connect to remote HiveServer2:

Connect to Remote Hiveserver2 using Hive JDBC Driver

HiveServer2 has a JDBC driver and It supports both embedded and remote access to HiveServer2. Use Python Jaydebeapi package to connect to remote HiveServer2 from Python program.

Note that, there are two version of Jaydebeapi available: Jaydebeapi for Python 2 and Jaydebeapi3 for Python3.

Follow steps given in below post to use Hive JDBC driver with Python program:

Bonus:

There is another python package that you can use to connect to remote Hiveserver2.

Connect to HiveServer2 using Pyhs2 package

pyHS2 is a python client driver for connecting to hive server 2

Please note that, Pyhs2 package is not maintained. Last version was release in 2014.

pyHS2 is a python client driver for connecting to hive server 2.

For more details follow official documents: Pyhs2

Hope this helps. Let me know if you are using any other method. 🙂