Cloudera Impala Regular Expression Functions and Examples

  • Post author:
  • Post last modified:July 31, 2019
  • Post category:BigData
  • Reading time:5 mins read

The Cloudera Impala regular expression functions identify precise patterns of characters in the given string and are useful for extracting string from the data and validation of the existing data, for example, validate date, range checks, checks for characters, and extract specific characters from the data.

Cloudera Impala Regular Expression Functions

In this article, we will be checking some commonly used Cloudera Impala regular expression functions with an examples.

Types of Cloudera Impala Regular Expression Functions

As of now, Cloudera Impala supports only three regular expression functions:

  • regexp_extract
  • regexp_like
  • regexp_replace

Impala regexp_extract Function

The Impala regexp_extract function returns the matching text item in the string or data.

Impala regexp_extract Function Syntax

The Impala regular expression syntax conforms to the POSIX Extended Regular Expression syntax. Below is the syntax for Impala regexp_replace function:

regexp_extract(string subject, string pattern, int index);

Impala regexp_extract Function Examples

Below examples show how you can extract parts of a string matching a pattern:

[quickstart.cloudera:21000] > select regexp_extract('foothebar', 'foo(.*?)(bar)', 2); 
Query: select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) 
+-------------------------------------------------+ 
| regexp_extract('foothebar', 'foo(.*?)(bar)', 2) | 
+-------------------------------------------------+ 
| bar | 
+-------------------------------------------------+ 
Fetched 1 row(s) in 0.31s 

[quickstart.cloudera:21000] > select regexp_extract('foothebar', 'foo(.*?)(bar)', 1); 
Query: select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) 
+-------------------------------------------------+ 
| regexp_extract('foothebar', 'foo(.*?)(bar)', 1) | 
+-------------------------------------------------+ 
| the | 
+-------------------------------------------------+ 
Fetched 1 row(s) in 0.12s

Impala regexp_like Function

Returns true or false to indicate whether the source string contains anywhere inside it the regular expression given by the pattern.

The regexp_like function returns true if there is at least one matching occurrence in the input string or data.

Impala regexp_like Function Syntax

Impala regular expression syntax conforms to the POSIX Extended Regular Expression syntax. Below is the syntax for Impala regexp_like function:

regexp_like(string source, string pattern[, string options]);

Impala regexp_like Function Examples

Below examples show how you can use the regexp_like to match string with pattern:

[quickstart.cloudera:21000] > select regexp_like('my user id is 091234 or 0239daz6','[0-9][^0-9]+[0-9]');
Query: select regexp_like('my user id is 091234 or 0239daz6','[0-9][^0-9]+[0-9]')
...
+----------------------------------------------------------------------+
| regexp_like('my user id is 091234 or 0239daz6', '[0-9][^0-9]+[0-9]') |
+----------------------------------------------------------------------+
| true                                                                 |
+----------------------------------------------------------------------+
Fetched 1 row(s) in 0.13s

Impala regexp_replace Function

Searches a string for a regular expression pattern and replaces every occurrence of the pattern with the specified replacement string.

Returns the initial argument with the regular expression pattern replaced by the final argument string.

Impala regexp_replace Function Syntax

Impala regular expression syntax conforms to the POSIX Extended Regular Expression syntax. Below is the syntax of Impala regexp_replace function:

regexp_replace(string initial, string pattern, string replacement);

Impala regexp_replace Function Examples

Below examples show how you can replace parts of a string matching a pattern with replacement text.

[quickstart.cloudera:21000] > select regexp_replace('111-222-333','[^[:digit:]]','');
Query: select regexp_replace('111-222-333','[^[:digit:]]','')
...
+---------------------------------------------------+
| regexp_replace('111-222-333', '[^[:digit:]]', '') |
+---------------------------------------------------+
| 111222333                                         |
+---------------------------------------------------+
Fetched 1 row(s) in 0.12s

Because the impala-shell interpreter uses the \ character for escaping, use \\ to represent the regular expression escape character in any regular expressions that you submit through impala-shell.

Read: