Hadoop Hive isnumeric Alternative and Examples

  • Post author:
  • Post last modified:November 17, 2019
  • Post category:BigData
  • Reading time:7 mins read

In a data warehouse environment, you will be working with heterogeneous data set. You may have to filter out unwanted data before loading it to the actual data into Hive table. For example, you many have a field1 of type string contains alphanumeric values. In many scenarios, you may get requirement to filter out non-numeric values. In this article, we will check Hadoop Hive isnumeric Alternative with some examples.

Hadoop Hive isnumeric Function

Many relational databases provide an extended SQL functions to help the data warehouse developers. Databases such as SQL server supports isnumeric built-in functions. As of now, Apache Hive does not support isnumeric function. But, luckily there are many other options those you can use as an alternative to isnumeric function.

As mentioned, there are many methods that you can use as an isnumeric function alternative.

For example,

  • RLIKE with Regular Expression
  • CAST Function
  • Create isnumeric user defined Function
  • Create Hive Macro

Now let use verify the above methods with an example.

Hive RLIKE Statement

Hive RLIKE is used to match the pattern using regular expression. You can use this Hive pattern matching function in the WHERE clause to get only numeric values from the string.

For example, consider below example to check string isnumeric.

select col1 rlike '^(([0-9]*)|(([0-9]*)\.([0-9]*)))$'  
from (select '0.11214' as col1) as q;
OK
+-------+--+
|  _c0  |
+-------+--+
| true  |
+-------+--+
1 row selected (0.222 seconds)

Use RLIKE in a WHERE condition to filter out non-numeric values.

For example,

select *  
from (select '0.11214' as col1 union all select '1.255' as col1 union all select 'aaaa' as col1) as q 
where col1 rlike '^(([0-9]*)|(([0-9]*)\.([0-9]*)))$';

OK
+----------+--+
|  q.col1  |
+----------+--+
| 0.11214  |
| 1.255    |
+----------+--+
2 rows selected (75.014 seconds)

Related Article

Hive CAST Function

The Hive CAST function converts the value of an expression or column to any other type. The result of the function will be NULL in case if the function cannot convert to a particular data type.

You can use this function in the WHERE clause with NULL statements to filter out non-numeric values. Cast given value to double, non-null value will be the numeric value.

For example, consider below example to filter out non-numeric values using CAST function.

select cast(col1 as double) 
from (select '0.11214' as col1 union all select '1.255' as col1 union all select 'aaaa' as col1) as q;

OK
+----------+--+
|   _c0    |
+----------+--+
| 0.11214  |
| 1.255    |
| NULL     |
+----------+--+
3 rows selected (37.952 seconds)

Use CAST in the WHERE clause to get only numeric values.

For example,

select * 
from (select '0.11214' as col1 union all select '1.255' as col1 union all select 'aaaa' as col1) as q 
where cast(col1 as double) is not null;

OK
+----------+--+
|  q.col1  |
+----------+--+
| 0.11214  |
| 1.255    |
+----------+--+
2 rows selected (42.734 seconds)

Related Article

Alternatively, you can create Hive user-defined function using either of the above method to implement isnumeric function in Hive.

Apache Hive is numeric User Defined Function

You can create user defined function to check if string is numeric.

Below is the sample python script to identify the numeric values.

import sys
import re
for line in sys.stdin:
        line = line.strip('\n\r')
        num_val = unicode(line, 'utf-8')
        if num_val.isnumeric() == True:
                print True
        elif re.match("^\d+?\.\d+?$", num_val) is not None:
                print True
        else:
                print False

Now add this script to Hive. For example,

0: jdbc:hive2://> add FILE /home/cloudera/isnumeric.py

Use Hive TRANSFORM function to execute isnumeric script.

select Transform (col1)
 using  'python  /home/cloudera/isnumeric.py' AS (col_1)
 from (select '0.11214' as col1 union all select '1.255' as col1 union all select 'aaaa' as col1) as q
;

OK
+--------+--+
| col_1  |
+--------+--+
| True   |
| True   |
| False  |
+--------+--+
3 rows selected (38.973 seconds)

Create Hive Macro to check if string is numeric

You can also create macro as an alternative to isnumeric function. You can read more about macro is my other post – Working with Hive Macros, Syntax and Examples

Related Articles

Hope this helps 🙂