Vertica Extract Numbers using Regular Expressions

  • Post author:
  • Post last modified:April 15, 2019
  • Post category:Vertica
  • Reading time:5 mins read

In a data warehouse environment, being a heterogeneous source, you may get lot of different kinds of data in terms of pipeline, flat files extracted from these data sources. Received data may be corrupted during transfer, or may have unwanted characters. You can use Vertica built in functions and regular expression to clean such corrupted and unwanted data. HP Vertica Support many built in functions, you can use those built in functions to extract numbers, specific string, or alphanumeric values from the string. In this article, we will discuss one of such method, i.e. Vertica extract numbers using regular expressions.

Vertica Extract Numbers using Regular Expressions

As you will be working on various heterogeneous data sources, you may get requirement to extract part of string from received data that is available in Vertica database tables.

Related article:

Below are some common uses of Vertica regular expression for day to day activities:

Vertica Extract 6 digit’s numbers from string value examples

Here is an example to extract the 6 digit’s number from string using Vertica REGEXP_SUBSTR regular expressions. You can modify regular expression pattern to extract any number of digits based on your requirements.

SELECT Regexp_substr(string,
              '\d{6}+')
FROM   (SELECT
'here is my area code 12345 and id that is being used is 389362 . These details are collected from employee comments'
AS string) a;

 Regexp_substr
---------------
 389362
(1 row)

Vertica Extract Numbers from the string examples

Here is the example to extract the number from the string using Vertica regexp_replace regular expression.

SELECT Regexp_replace(string, '[^[:digit:]]', ' ')
FROM   (SELECT
'here is my area code 12345 and id that is being used is 389362 . These details are collected from employee comments'
AS string) a;

                               Regexp_replace
--------------------------------------------------
12345                              389362
(1 row)


--Another way to extract the numbers from the string using Vertica regexp_replace regular expressions

SELECT Regexp_replace(string, '[a-z/-/A-z/./#/*]', '')
FROM   (SELECT
'here is my area code 12345 and id that is being used is 389362 . These details are collected from employee comments'
AS string) a;

          Regexp_replace
----------------------------------
      12345        389362
(1 row)

As you can see in above examples, you can use various options available in regexp_replace regular expressions to extract number from given string values.

In first example above, regex is extracting only digits from the string and in second, regex is replacing all occurrence of alphabets, special characters with NONE (”). If you know the regular expression pattern, you can apply any regular expression function to get desired output.

Vertica Extract Alphanumeric from the string examples

You may have requirement to extract alphanumeric values that is matching particular patters. In this example we will check how to extract alphanumeric values from given string.

For example, extract alphanumeric that starts with ‘U’ and followed by 6 digits.

SELECT Regexp_substr('abc wqezaf Updates U 123 U389362', 
       'U[0-9][0-9][0-9][0-9][0-9][0-9]' 
       ); 
	   
 Regexp_substr
---------------
 U389362
(1 row)

Hope this helps 🙂