Hive Dynamic SQL Support and Alternative

  • Post author:
  • Post last modified:May 29, 2019
  • Post category:BigData
  • Reading time:5 mins read

Dynamic SQL queries are created on the fly and executed. Dynamic SQL lets SQL statements be defined and execute at run time, i.e. you can build SQL queries based on the user input and execute them to provide required output. For examples, pass a session specific value to the HQL queries dynamically during runtime. In this article, we will check how to build Apache Hive Dynamic SQL queries and how to execute them.

Hive Dynamic SQL Support and Alternative

Hive Dynamic SQL Support

Apache Hive version 1.x and Cloudera impala does not support dynamic SQL, you have to identify an alternate approach to build SQL queries dynamically. However, Hive version 2.x supports HPL/SQL which has EXECUTE IMMEDIATE to execute dynamic SQL.

As most organizations have not migrated to Hive version 2.x, in this article, we will check available alternative to build and execute dynamic queries on Hive version 1.x.

Most of relational databases like Netezza, Oracle, SQL Server, etc. supports stored procedure that allows you to build and execute dynamic SQL queries at run-time.

Hive Dynamic SQL Alternative

As mentioned earlier, hive 1.x does not support dynamic queries. Below are the couple of options that you can use to build and execute dynamic queries at run-time.

  • Dynamic SQL using Shell Script
  • Beeline variable substitution

Note that, these methods are tested on kerberos hadoop cluster using beeline. Same methods will holds good for non-remote Hive servers also.

Dynamic SQL using Shell Script

This is one of the easiest method that you can use to build and execute generated dynamic queries using beeline command line options.

The main idea behind this approach is to create a shell script that generates SQL queries based on the user input.  

Below is the sample shell script that generates SQL dynamically and execute it:

#!/bin/sh

#######################################
#
# Usage: hive-dynamic-SQL.sh id
#
######################################
id=$1
echo $id

# Build SQL Query
sql_str="select id, name from sample_tab1 where id = $id"
echo $sql_str

# Execute dynamic queries using hive beeline

beeline -u jdbc:hive2://192.168.200.100:10000/testtdb; principal=hive/system1.company.com@COMPANY.COM; auth=Kerberos; -n admin1 -e "$sql_str"

Now execute the shell script by passing ID as a parameter.

For example,

[admin1@syste-n500 vithal]$ sh hive-dynamic-sql.sh 1
1
select id, name from sample_tab1 where id = 1
Connecting to jdbc:hive2://192.168.200.100:10000/testtdb; principal=hive/system1.company.com@COMPANY.COM; auth=Kerberos;
Connected to: Apache Hive (version 1.2.1000.2.5.0.0-1245)
Driver: Hive JDBC (version 1.2.1000.2.5.0.0-1245)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-----+-------+--+
| id  | name  |
+-----+-------+--+
| 1   | AAA   |
+-----+-------+--+
1 row selected (0.249 seconds)
Beeline version 1.2.1000.2.5.0.0-1245 by Apache Hive
Closing: 0: jdbc:hive2://192.168.200.100:10000/testtdb; principal=hive/system1.company.com@COMPANY.COM; auth=Kerberos;
[admin1@syste-n500 vithal]$

Beeline Variable Substitution

Variable substitution is nothing but passing values to a variable referred in HQL queries. I have already discussed variable substitution in my other post, Run HiveQL Script File Passing Parameter using Beeline CLI and Examples.

Hope this helps 🙂