What are SQL Features Missing in Hive?

Apache Hive syntax looks similar to SQL-92 standards but does not fully compatible to SQL-92. Storage and querying underlying table’s closes resembles traditional databases available in industry. HiveQL provides some of the extensions that are not present in traditional databases. There are some features gap between traditional SQL and Apache Hive. In this article, we will check some basic and import SQL features missing in Hive.

SQL features Missing in Hive

Below are some of important yet basic SQL features missing in Hive:

Online Transaction Processing (OLTP)
Correlated Sub-queries
Materialized Views
Truncate Table
Indexes
GROUP_CONCAT
Primary Key and Foreign Key
GROUP_ID() and GROUPING()
TO_CHAR
CAST Function
Transactions
MERGE Statement

Now let us check above queries in details and alternative approach to implement in Hive;

Online Transaction Processing (OLTP)

Apache Hive was designed for OLAP applications. It will work on huge amount of data. Hive usually uses the Hadoop HDFS to store process and store data.

Hive is not designed for OLTP. It does support records level update but not as good as traditional databases. Alternatively, Apache HBase is used to process records required lot of update and delete.

Correlated sub-queries

Correlated sub-queries are queries within query that refers columns from outer or parent query. These types of sub-queries are fast and make you overall SQL query less complex.

Hive support correlated sub-queries but it is not as matured as it would be in relational databases. If you want to use these types of queries in Hive, you have to make little bit design changes such as convert queries to use LEFT outer join or use EXISTS/NOT EXISTS clause in your queries.

Apache Hive Correlated Subquery and it’s Restrictions

Materialized Views

In a relational database term, materialized view is a database object that contains results of the query. Materialized view can refer table, view, or another materialized views.

Materialized views are usually used to improve the performance of the SQL query. Unfortunately, Apache Hive does not support materialized views.

You can read my other article on Netezza Materialized Views.

Truncate Table

There is no truncate table in Hive. This feature may be provided in future but as of now you have to try workaround to delete tables in Hive.

Possible alternative to TRUNCATE table would be to use INSERT OVERWRITE TABLE option. This option will delete all underlying hdfs files associated with managed tables and insert new records.

Indexes

Apache Hive INDEX is not fully compatible with SQL but provides basic supports for Indexes. But be informed that Index on hive table is not recommended. The create index will help if you are migrating your existing data warehouse to Hive and you have transformed the query with index as it is.

Hive CREATE INDEX to Optimize and Improve Query Performance

GROUP_CONCAT Support in Hive

Group_concat is a single string representing the argument value concatenated together for each row of the result set. The resulting string is a comma separated values. Many relational databases supports group-concat functionality as a standard built in function. Unfortunately, Hive does not have group_concat function.

Apache Hive group_concat Alternative and Example

Primary Key and Foreign Key

Hive is mainly used for OLAP; it does not provide way to declare primary and foreign keys. Hive create table command does not have primary key and foreign key syntax.

Hive create Table Command and Examples

GROUP_ID() and GROUPING() Support in Hive

Enhanced grouping options such as GROUP_ID and GROUPPING functions are not supported in Hive as of now. There are multiple JIRA tickets to provide support for these functions. Future version of Hive may provide supports to these functions.

TO_CHAR Support in Hive

Conversion of data to char type is not supported in Hive. Many organizations write custom user-defined Functions for these type of functions. You can create UDF in language of your choice. Apache Hive provides UDF written in Java, Python or Scala.

Relate articles:

Apache Hive User Defined Functions and Examples

CAST Function Support in Hive

CAST function is used to convert values from one data type of other. Hive provides CAST function support but is not matured as it would be in SQL.

For examples, you can convert string data type to integer without any issue, but when you try to convert string to date data type, you will end up with NULL value. CAST is present but not compatible with SQL.

Transactions

Latest version of Hive does support transactions such INSERT, UPDATE and DELETE. Transactions support is not quite mature as it would be in traditional databases.

Transactions are slow in Hive as it has to recreate entire HDFS block that contained particular data that you are updating, deleting.

Merge Statemenet

Hive does not support MERGE statement.

Hive Merge Tables Statement – Alternative and Example

Hope this helps 🙂