The Cloudera Impala regular expression functions identify precise patterns of characters in the given string and are useful for extracting string from the data and validation of the existing data, for example, validate date, range checks, checks for characters, and extract specific characters from the data.
In this article, we will be checking some commonly used Cloudera Impala regular expression functions with an examples.
Types of Cloudera Impala Regular Expression Functions
As of now, Cloudera Impala supports only three regular expression functions:
- regexp_extract
- regexp_like
- regexp_replace
Impala regexp_extract Function
The Impala regexp_extract function returns the matching text item in the string or data.
Impala regexp_extract Function Syntax
The Impala regular expression syntax conforms to the POSIX Extended Regular Expression syntax. Below is the syntax for Impala regexp_replace function:
regexp_extract(string subject, string pattern, int index);
Impala regexp_extract Function Examples
Below examples show how you can extract parts of a string matching a pattern:
[quickstart.cloudera:21000] > select regexp_extract('foothebar', 'foo(.*?)(bar)', 2); Query: select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) +-------------------------------------------------+ | regexp_extract('foothebar', 'foo(.*?)(bar)', 2) | +-------------------------------------------------+ | bar | +-------------------------------------------------+ Fetched 1 row(s) in 0.31s [quickstart.cloudera:21000] > select regexp_extract('foothebar', 'foo(.*?)(bar)', 1); Query: select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) +-------------------------------------------------+ | regexp_extract('foothebar', 'foo(.*?)(bar)', 1) | +-------------------------------------------------+ | the | +-------------------------------------------------+ Fetched 1 row(s) in 0.12s
Impala regexp_like Function
Returns true or false to indicate whether the source string contains anywhere inside it the regular expression given by the pattern.
The regexp_like function returns true if there is at least one matching occurrence in the input string or data.
Impala regexp_like Function Syntax
Impala regular expression syntax conforms to the POSIX Extended Regular Expression syntax. Below is the syntax for Impala regexp_like function:
regexp_like(string source, string pattern[, string options]);
Impala regexp_like Function Examples
Below examples show how you can use the regexp_like to match string with pattern:
[quickstart.cloudera:21000] > select regexp_like('my user id is 091234 or 0239daz6','[0-9][^0-9]+[0-9]'); Query: select regexp_like('my user id is 091234 or 0239daz6','[0-9][^0-9]+[0-9]') ... +----------------------------------------------------------------------+ | regexp_like('my user id is 091234 or 0239daz6', '[0-9][^0-9]+[0-9]') | +----------------------------------------------------------------------+ | true | +----------------------------------------------------------------------+ Fetched 1 row(s) in 0.13s
Impala regexp_replace Function
Searches a string for a regular expression pattern and replaces every occurrence of the pattern with the specified replacement string.
Returns the initial argument with the regular expression pattern replaced by the final argument string.
Impala regexp_replace Function Syntax
Impala regular expression syntax conforms to the POSIX Extended Regular Expression syntax. Below is the syntax of Impala regexp_replace function:
regexp_replace(string initial, string pattern, string replacement);
Impala regexp_replace Function Examples
Below examples show how you can replace parts of a string matching a pattern with replacement text.
[quickstart.cloudera:21000] > select regexp_replace('111-222-333','[^[:digit:]]',''); Query: select regexp_replace('111-222-333','[^[:digit:]]','') ... +---------------------------------------------------+ | regexp_replace('111-222-333', '[^[:digit:]]', '') | +---------------------------------------------------+ | 111222333 | +---------------------------------------------------+ Fetched 1 row(s) in 0.12s
Because the impala-shell interpreter uses the \ character for escaping, use \\ to represent the regular expression escape character in any regular expressions that you submit through impala-shell.
Read:
- Commonly used Cloudera Impala String Functions and Examples
- Commonly used Cloudera Impala Date Functions and Examples
- Cloudera Impala Generate Sequence Numbers without UDF
- Run Impala SQL Script File Passing argument and Working Example
- An Introduction to Hadoop Cloudera Impala Architecture
- Commonly used Hadoop Hive Commands