Apache Hive Set Operators: UNION and UNION ALL

You can use the Apache Hive set operators to combine similar data sets from two or more SELECT statements into a single result set . Here the similar data set literally mean, the data type of the result set should also match. Otherwise, you have to explicitly convert type using type conversion functions. Hive Set Operators Hadoop Hive supports following set operators. UNION [DISTINCT] UNION ALL Hive versions prior to 1.2.0 only support UNION ALL (bag union), in which duplicate rows are not eliminated. Starting version 1.2.0, Hive includes optional ALL keywords.…

Continue ReadingApache Hive Set Operators: UNION and UNION ALL
Comments Off on Apache Hive Set Operators: UNION and UNION ALL

How to Write Hive Conditional Join and Example

The join in Hive is used to combine the records from multiple tables based on the joining condition. The joining condition can be on the common columns between participating tables. Hadoop Hive supports various join types. You may get the requirement to choose the joining values based on certain conditions. For example, you can have OR condition in the joining condition. In this article, we will check how to write Conditional Join in Hive with some examples. Hive Conditional Join You never know what type of requirement that you are…

Continue ReadingHow to Write Hive Conditional Join and Example
Comments Off on How to Write Hive Conditional Join and Example

Hive Extract Numbers using Regular Expression Functions

In my other article, we have seen how to extract date values from a string using Hive regular expressions. The regular expression function is sometime called as regex. The other common uses of regular expression is to extract the numeric values. For example, extract area code or phone numbers from the string data. In this article, we will check how to extract numbers using regular expression functions in Apache Hive. Extract Numbers using Hive Regular Expression Functions When you work on different data sources, you may get requirement to extract…

Continue ReadingHive Extract Numbers using Regular Expression Functions
Comments Off on Hive Extract Numbers using Regular Expression Functions

Hive Extract Date using Regular Expression Functions

In my other articles, we have seen how regular expression functions are used to check if a string is numeric. The other common requirements is to extract the date field from the string using Hive regular expression functions. For example, you may get requirement to get date value from the string field. Apache Hive support many functionalities that you can use to extract date type. But, using a regular expression function to extract date from a string is the easiest method. Extract Date using Hive Regular Expression Functions There are…

Continue ReadingHive Extract Date using Regular Expression Functions
Comments Off on Hive Extract Date using Regular Expression Functions

Hive Table Sampling – Concept and Example

The Relational databases like SQL server supports writing queries on a relatively small number of rows from the very large table. In this article, we will check Hive table sampling concept, methods and some examples. Hive Table Sampling Concept The Hive TABLESAMPLE clause allows the users to write queries for samples of the data instead of the whole table. The sampling comes handy when you are working on the large tables and it takes time to return results. The TABLESAMPLE clause can be added to any table in the FROM…

Continue ReadingHive Table Sampling – Concept and Example
Comments Off on Hive Table Sampling – Concept and Example

Apache Hive Integer Value Check – Examples

In a data warehouse environment, there are many options that you can check for an integer value. Using this process, you can usually remove the unwanted records and save some I/O. For example, filter out non-numeric values when comparing it with integer types. In this article, we see different methods to check integer value in Hive. Apache Hive Integer Value Check Many relational databases provide an extended SQL function to help the data warehouse developers. The built-in functions such isnumeric is used to check given string value is a number…

Continue ReadingApache Hive Integer Value Check – Examples
Comments Off on Apache Hive Integer Value Check – Examples

Working with Hive Macros, Syntax and Examples

Many relational databases such as Teradata supports Macro functions. In RDBMS, Macros are stored in the data dictionary. Users can share macros and can execute based on the requirements. Hive Macros are a bit different compared to that of relational databases. In this article, we will check what are Macros, its syntax, how to use them and some macro examples. What are Macros in Hive? The macros in Hive are set of SQL statements which are stored and executed by calling macro function names. Macros exist for the duration of the current session. Macros are…

Continue ReadingWorking with Hive Macros, Syntax and Examples
Comments Off on Working with Hive Macros, Syntax and Examples

Hive Self Join Query, Performance and Optimization

By definition, self join is a join in which a table is joined itself. Self joins are usually used only when there is a parent child relationship in the given data. In this article, we will check how to write self join query in the Hive, its performance issues and how to optimize it. Hive Self Join Query As mentioned earlier, self join is used when there is parent-child relation between your data. For example, consider an employee table. an employee table contains details about the employees and an employee…

Continue ReadingHive Self Join Query, Performance and Optimization
Comments Off on Hive Self Join Query, Performance and Optimization

How to Update or Drop Hive Partition? Steps and Examples

In general, partitions in relational databases are used to increase the performance of the SQL queries. The partition is the concept of storing relevant data in the same place. For example, let us say you want to query the data monthly bases, then you can partition your data on month. In this article, we will check how to update or drop the Hive partition that you have already created. What are Partitions in Hive? Just like relational databases, Apache Hive partitions are used to improve the performance of the HiveQL…

Continue ReadingHow to Update or Drop Hive Partition? Steps and Examples
Comments Off on How to Update or Drop Hive Partition? Steps and Examples

Hadoop Hive isnumeric Alternative and Examples

In a data warehouse environment, you will be working with heterogeneous data set. You may have to filter out unwanted data before loading it to the actual data into Hive table. For example, you many have a field1 of type string contains alphanumeric values. In many scenarios, you may get requirement to filter out non-numeric values. In this article, we will check Hadoop Hive isnumeric Alternative with some examples. Hadoop Hive isnumeric Function Many relational databases provide an extended SQL functions to help the data warehouse developers. Databases such as…

Continue ReadingHadoop Hive isnumeric Alternative and Examples
Comments Off on Hadoop Hive isnumeric Alternative and Examples