Hive Extract Numbers using Regular Expression Functions

  • Post author:
  • Post last modified:December 25, 2019
  • Post category:BigData
  • Reading time:5 mins read

In my other article, we have seen how to extract date values from a string using Hive regular expressions. The regular expression function is sometime called as regex. The other common uses of regular expression is to extract the numeric values. For example, extract area code or phone numbers from the string data. In this article, we will check how to extract numbers using regular expression functions in Apache Hive.

Extract Numbers using Hive Regular Expression Functions

When you work on different data sources, you may get requirement to extract numeric values such as phone numbers, or area code from the given string type column.

Following regular expressions allows you to get required numeric values.

Now, let us use these functions to extract numeric or numbers from the string type.

Extract Numbers using Hive REGEXP_REPLACE

The Hive REGEXP_REPLACE function is one of the easiest functions get required values. The idea here is to replace all the alphabetical characters except numbers or numeric values.

For example, consider below Hive example to replace all characters except date value.

SELECT TRIM(REGEXP_REPLACE(string, '[^0-9]', ' ')) AS Numeric_value
FROM (SELECT ' Area code for employee ID 112244 is 12345.' AS string) a;

+------------------+
|  numeric_value   |
+------------------+
| 112244    12345  |
+------------------+

If you are familiar with the regular expression, then you can use the different expression based on your requirement.

For example, consider below Hive example to extract numbers from the string using different expression in a function.

SELECT  TRIM(REGEXP_REPLACE(string, '[a-z/-/A-z/./#/*]', '')) AS Numeric_value
FROM (SELECT ' Area code for employee ID 112244 is 12345.' AS string) a;

+----------------+
| numeric_value  |
+----------------+
| 112244  12345  |
+----------------+

Extract Numbers using Hive REGEXP_EXTRACT

The Hive REGEXP_EXTRACT function is another function to extract the required values. The idea here is to identify the number fields and extract only that part.

Hive Extract 3 digit’s numbers from string value examples

The common example is to extract certain digits from the string. For instance, get only 10 digit phone number from the string.

For example, consider following example to get only 3 digit numbers from string using Hive regular expressions.

SELECT REGEXP_EXTRACT(string, '[0-9]{3}',0) AS Numeric_value
FROM (SELECT 'Area code 123 is different for employee ID 112244.' AS string) a;

+----------------+
| numeric_value  |
+----------------+
| 123            |
+----------------+

Hive Extract 6 digit’s numbers from string value examples

Another example is to extract 6 digits from the string using Hive regular expressions.

For example, the regular expression in following example extract only 6 digits from string.

SELECT REGEXP_EXTRACT(string, '[0-9]{6}',0) AS Numeric_value
FROM (SELECT 'Area code 123 is different for employee ID 112244.' AS string) a;

+----------------+
| numeric_value  |
+----------------+
| 112244         |
+----------------+

Hope this helps 🙂