Step by Step Guide Connecting HiveServer2 using Python Pyhive

  • Post author:
  • Post last modified:September 21, 2018
  • Post category:BigData
  • Reading time:4 mins read

Data plays important role in every decision-making process. You may have to connect to various remote servers to get required data for your application. This article explains how to connect Hive running on remote host (HiveSever2) using commonly used Python package, Pyhive. In this article, we will check step by step guide Connecting HiveServer2 using Python Pyhive.

Guide Connecting HiveServer2 using Python Pyhive

There are lot of other Python packages available to connect to remote Hive, but Pyhive package is one of the easy and well-maintained and supported package.

There is a option to connect to Hive beeline without any packages such as Pyhive, Pyhs2 or imyla. Read more in Execute Hive Beeline JDBC String Command from Python.

You can also use the Hive JDBC drivers to connect HiveServer2 from Python using Jaydebeapi.

Note that, all steps and piece of code are tested on Ubuntu 14.04.

What is Pyhive?

Before going into details on how to access HiveServer2 using Pyhive package, let us understand what is Pyhive?

PyHive is a written using collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.  You can use this package to perform basic Hive operations such are reading data from tables, executing Hive queries.

You can follow official Pyhive page for more information.

Step by Step Guide to Install Pyhive

Below are the steps that you can follow to install Pyhive on Ubuntu machine:

Step 1: Install Dependent Modules

Before attempting to install Pyhive, you must install packages on which Pyhive is dependent.

Below are the packages that are required by Pyhive:

Installing gcc

sudo apt-get install gcc

Install Thrift

pip install thrift+

Install SASL

pip install sasl

In case if you get error while installing sasl, follow Pyhive sasl error section below to install dependencies.

Install thrift sasl

pip install thrift_sasl

Installing Pyhive

Once all above packages are installed successfully, you can go ahead and install Pyhive using pip:

pip install pyhive

Pyhive sasl error – fatal error: sasl/sasl.h

If you are configuring for first time, then it is likely that you may get sasl.h error when installing sasl module. Install libsasl2-dev module to get rid of error.

sudo apt-get install libsasl2-dev

Step2: Connecting HiveServer2 using Python Pyhive

Now you are all set to connect to HiveServer2 using Pyhive module. Below is the sample code that you can use:

from pyhive import hive

host_name = "192.168.0.38"
port = 10000
user = "admin"
password = "password"
database="test_db"

def hiveconnection(host_name, port, user,password, database):
    conn = hive.Connection(host=host_name, port=port, username=user, password=password,
                           database=database, auth='CUSTOM')
    cur = conn.cursor()
    cur.execute('select item_sk,reason_sk, account_credit from returns limit 5')
    result = cur.fetchall()

    return result

# Call above function
output = hiveconnection(host_name, port, user,password, database)
print(output)

Step 3: Execute Python Script to get result from remote Hive server

You will see result like below if everything works fine.

$ python hivePython.py
[(1119, 27, '827.06'), (5, None, '50.12'), (907, 5, '231.12'), (21, None, '134.18'), (579, 12, '5.58')]

Related Reading:

This Post Has 2 Comments

  1. mehmet

    Thanks a lot, it is really helpful clean and understandable.

    1. Vithal S

      Thank you 🙂

Comments are closed.