Cloudera Impala Extract Numbers using Regular Expressions

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:BigData
  • Reading time:3 mins read

Impala supports various built in functions those you can use to extract the numbers from strings values. If you are working on various data sets then there is a possibility that you may get data which is corrupted or merged with other fields. You can separate that data using various string functions available in Impala. In this article, we will discuss about Impala extract numbers using regular expressions and examples.

You can read about regular expression in my other post:

Impala Extract Numbers using Regular Expressions

Impala supports Regular expression functions. Below are the some of the examples for Impala extract number from string values. Here are some of the common uses of Impala regular expressions with some examples;

Impala Extract 5 digit’s numbers from string value examples

Here is the example to extract the 5 digit’s number from string using Impala regexp_extract regular expressions:

Query: select regexp_extract('abc jjs Updates U 12323 I389362','[0-9][0-9][0-9][0-9][0-9]',0) 
+-----------------------------------------------------------------------------------+ 
| regexp_extract('abc jjs updates u 12323 i389362', '[0-9][0-9][0-9][0-9][0-9]', 0) | 
+-----------------------------------------------------------------------------------+ 
| 12323 | 
+-----------------------------------------------------------------------------------+ 
Fetched 1 row(s) in 0.12s

Impala Extract Numbers from the string examples

Here is the example to extract the number from the string values using Impala regexp_replace regular expressions:

Query: select regexp_replace(string1, '[^[:digit:]]', ' ') from (select ' My zip is 12345 and id is 389362. Send details to my house # 8/22' as string1) a
+--------------------------------------------------------------------+
| regexp_replace(string1, '[^[:digit:]]', ' ') |
+--------------------------------------------------------------------+
| 12345 389362 8 22 |
+--------------------------------------------------------------------+
Fetched 1 row(s) in 0.11s

Alternative method:

Query: select regexp_replace(string1, '[a-z/-/A-z/./#/*]', '') from (select ' My zip is 12345 and id is 389362. Send details to my house # 8/22' as string1) a
+--------------------------------------------------+
| regexp_replace(string1, '[a-z/-/a-z/./#/*]', '') |
+--------------------------------------------------+
| 12345 389362 822 |
+--------------------------------------------------+

Example to Extract Alphanumeric from the string values using Impala regular expressions

Here is the another simple example to extract 6 digit number that start with ‘I’ using impala regular expressions:

Query: select regexp_extract('abc jjs Updates U 123 I389362','I[0-9][0-9][0-9][0-9][0-9][0-9]',0) 
+---------------------------------------------------------------------------------------+ 
| regexp_extract('abc jjs updates u 123 i389362', 'i[0-9][0-9][0-9][0-9][0-9][0-9]', 0) | 
+---------------------------------------------------------------------------------------+ 
| I389362 | 
+---------------------------------------------------------------------------------------+ 
Fetched 1 row(s) in 0.12s