Hive Extract Date using Regular Expression Functions

  • Post author:
  • Post last modified:December 25, 2019
  • Post category:BigData
  • Reading time:3 mins read

In my other articles, we have seen how regular expression functions are used to check if a string is numeric. The other common requirements is to extract the date field from the string using Hive regular expression functions. For example, you may get requirement to get date value from the string field.

Apache Hive support many functionalities that you can use to extract date type. But, using a regular expression function to extract date from a string is the easiest method.

Extract Date using Hive Regular Expression Functions

There are some application such as medical transcription will generate the text. You may get requirement to extract the patient date of birth from the transcription. You can extract only date field from the text or string content using Hive supported regular expression function.

Extract Date using Hive REGEXP_REPLACE Function

The Hive REGEXP_REPLACE function is one of the easiest functions get required value. The idea here is to replace all the alphabetical characters except numbers and date delimiter. Note, if you have many numeric values, you may need to search for the text or string that contains both number and date delimiter.

For example, consider below Hive example to replace all characters except date value.

SELECT  TRIM(REGEXP_REPLACE(string, '[a-z/-/A-z/.]', '')) AS date_value
FROM (SELECT 'patient dob is 10-12-2019.' AS string) a;

+-------------+--+
| date_value  |
+-------------+--+
| 10-12-2019  |
+-------------+--+

Related Articles,

Hope this helps 🙂