Hive Extract Numbers using Regular Expression Functions

In my other article, we have seen how to extract date values from a string using Hive regular expressions. The regular expression function is sometime called as regex. The other common uses of regular expression is to extract the numeric values. For example, extract area code or phone numbers from the string data. In this article, we will check how to extract numbers using regular expression functions in Apache Hive. Extract Numbers using Hive Regular Expression Functions When you work on different data sources, you may get requirement to extract…

Continue ReadingHive Extract Numbers using Regular Expression Functions
Comments Off on Hive Extract Numbers using Regular Expression Functions

Working with Hive Macros, Syntax and Examples

Many relational databases such as Teradata supports Macro functions. In RDBMS, Macros are stored in the data dictionary. Users can share macros and can execute based on the requirements. Hive Macros are a bit different compared to that of relational databases. In this article, we will check what are Macros, its syntax, how to use them and some macro examples. What are Macros in Hive? The macros in Hive are set of SQL statements which are stored and executed by calling macro function names. Macros exist for the duration of the current session. Macros are…

Continue ReadingWorking with Hive Macros, Syntax and Examples
Comments Off on Working with Hive Macros, Syntax and Examples

Hive on Error Stop Script Execution – Options

When you build a data warehouse on top of Hadoop HDFS using Hive framework, you may have to execute HiveQL or SQL queries or HiveQL script containing a bunch of HiveQL statements. Hive and Beeline does provide option to execute a script file. There may be a scenario in which you may want to stop the script execution in case if any of the SQL statement fails. In this article, we will check stop script execution on error in Hive. We shall see both Hive and Beeline CLI options to exit script execution in case…

Continue ReadingHive on Error Stop Script Execution – Options
Comments Off on Hive on Error Stop Script Execution – Options

Hive Incremental Load Options and Examples

The incremental load is very common in a data warehouse environment. Incremental load is commonly used to implement slowly changing dimensions. When you migrate your data to the Hadoop Hive, you might usually keep the slowly changing tables to sync up tables with the latest data. In this article, we will check Hadoop Hive incremental load options and some examples. Hive Incremental Load Options There are many methods you can use. Apache Hive introduced to ACID supports since Hive 0.14. Following are the couple of methods that you can use…

Continue ReadingHive Incremental Load Options and Examples
Comments Off on Hive Incremental Load Options and Examples

Export Hive Table DDL, Syntax and Shell Script Example

There are many situations where you are required to export DDL's. For example, you are migrating some of your Hive tables to the RDBMS for reporting. If you are working as a Hadoop administrator, you should have knowledge on how to export Table DDL. In this article, we will check on how to export Hive table DDL to a text file using shell script and beeline connection string. Export Hive Table DDL As mentioned earlier, it is good to have a utility that allows you to generate DDL in Hive.…

Continue ReadingExport Hive Table DDL, Syntax and Shell Script Example
Comments Off on Export Hive Table DDL, Syntax and Shell Script Example

Hive UDF using Python-Use Python Script into Hive-Example

Hadoop provides an API so that you can write user-defined functions or UDFs using any of your favorite programming language. In this article, we will check how to how to create a custom function for Hive using Python? that is nothing but creating Hive UDF using Python. What is Hive? Hive is a data warehouse ecosystem built on top of Hadoop HDFS to perform batch and ad-hoc query execution on large datasets. Apache Hive can handle petabyte of data. The Hive is designed for OLAP. It is not suited for OLTP…

Continue ReadingHive UDF using Python-Use Python Script into Hive-Example
Comments Off on Hive UDF using Python-Use Python Script into Hive-Example

Register Hive UDF jar into pyspark – Steps and Examples

Apache Spark is one of the widely used processing engine because of its fast and in-memory computation. Most of the organizations use both Hive and Spark. Hive as a data source and Spark as a processing engine. You can use any of your favorite programming language to interact with Hadoop. You can write custom UDFs in Java, Python or Scala. To use those UDFs, you have to register into the Hive so that you can use them like normal built-in functions. In this article, we check check couple of methods…

Continue ReadingRegister Hive UDF jar into pyspark – Steps and Examples
Comments Off on Register Hive UDF jar into pyspark – Steps and Examples

Hive Dynamic SQL Support and Alternative

Dynamic SQL queries are created on the fly and executed. Dynamic SQL lets SQL statements be defined and execute at run time, i.e. you can build SQL queries based on the user input and execute them to provide required output. For examples, pass a session specific value to the HQL queries dynamically during runtime. In this article, we will check how to build Apache Hive Dynamic SQL queries and how to execute them. Hive Dynamic SQL Support Apache Hive version 1.x and Cloudera impala does not support dynamic SQL, you…

Continue ReadingHive Dynamic SQL Support and Alternative
Comments Off on Hive Dynamic SQL Support and Alternative

Hive Merge Tables Statement – Alternative and Example

The MERGE query or statement in SQL is used to perform incremental load. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the condition. i.e. you can update old values and insert new records. The MERGE statement in SQL are mainly used to implement slowly changing dimensions. As of now, Hive does not support MERGE statement. In this article, we will check what is Hive Merge tables alternative with an example. Sometimes, update insert is also called UPSERT. Related Article, Slowly changing dimension…

Continue ReadingHive Merge Tables Statement – Alternative and Example
Comments Off on Hive Merge Tables Statement – Alternative and Example

Hive Drop Column Alternative and Examples

Apache Hive is a data warehouse framework on top of Hadoop ecosystem. Hive works well for all your batch processing. It is not true data warehouse platform as it does not provide support for real-time analytics. There are many features missing in Hive that are available in traditional relational databases. One of such features is DROP COLUMNS using ALTER TABLE statements. In this article, we will check Hive drop column alternative with some examples. Hive Drop Column Alternative There are two approaches that you can follow if you want to…

Continue ReadingHive Drop Column Alternative and Examples
Comments Off on Hive Drop Column Alternative and Examples