Apache Hive – Extract Value from JSON using Hive – Example

  • Post author:
  • Post last modified:July 8, 2021
  • Post category:BigData
  • Reading time:4 mins read

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. Json files are mainly used to transfer data in web applications. Many web applications use the json files for data transfer between application and servers. In this article, we will check how to extract or get value from json file using Hive queries.

Extract Value from JSON using Hive

Apache Hive provides limited support to JSON files. You can store json data into Hive tables with string as a data type. There are 3rd party SerDe jars that you can use to create Hive table to import JSON files.

Hive get_json_object Function

As mentioned earlier, Apache Hive provides limited support to json files. However, you can use get_json_object function to get a value from simple json files.

Hive get_json_object Syntax

Following is the syntax of get_json_object function available in Hive.

get_json_object(jsonString, '$.key')

Where, jsonString is a valid json string. $.key is a key of a value that you are trying to extract.

For example, consider below simple example to extract name from json string using get_json_object function.

Note, you can either use Hive cli or beeline cli to connect to Hive.

 select get_json_object(jvalue, '$.name') 
from  (select '{"name":"someName"}' as jvalue) as q;
OK
+-----------+--+
|    _c0    |
+-----------+--+
| someName  |
+-----------+--+
1 row selected (2.469 seconds)

Extract Value from Nested JSON String

In the previous example, we have seen how to extract value from simple json string. The get_json_object function provides the option to extract fairly complex json string such as nested json values.

For example, consider below example to extract ‘pin’ value from nested or embedded json object.

select get_json_object(jvalue, '$.name.pin') 
from  (select '{"name":{"pin":"123456"}}' as jvalue) as q;
OK
+---------+--+
|   _c0   |
+---------+--+
| 123456  |
+---------+--+
1 row selected (0.22 seconds)

Extract Value from Nested JSON with list of Values

It is very common to include a list of values in json field. You can get_json_object function options to get particular values from a list of values stored in nested json format.

For example, consider below example to extract second value from list of nested json string.

 select get_json_object(jvalue, '$.name.pin\[1]') 
from  (select '{"name":{"pin":["123456","654321"]}}' as jvalue) as q;
OK
+---------+--+
|   _c0   |
+---------+--+
| 654321  |
+---------+--+
1 row selected (0.087 seconds)

Note that, get_json_object will return NULL if there is no proper matching condition.

Hope this helps 🙂