Details about bigdata

Hive Pivot Table-Transpose Rows to Column and Example

Many relational databases such as Oracle, Snowflake support PIVOT function which you can use to convert row to column. But, Apache Hive does not support Pivot function yet. As an alternative method, you can use CASE and DECODE statements to convert table rows to column, or columns to rows as per your requirements. In this article, we will check different methods to transpose Hive table using a Pivot function alternative method with some examples. Hive Pivot Table Creating pivot table is a relatively common task in a data warehouse. You…

Continue ReadingHive Pivot Table-Transpose Rows to Column and Example
Comments Off on Hive Pivot Table-Transpose Rows to Column and Example

Oracle INSERT ALL Alternative in Hive/Spark SQL

Oracle database is one of widely used relational databases. It supports many syntax that are not available in any other transaction databases. One of such command is INSERT ALL. The INSERT ALL is used to insert computed records into multiple tables based on conditions. In this article, we will check what is Oracle INSERT ALL alternative in Hive and Spark SQL. Oracle INSERT ALL alternative in Hive/Spark SQL When migrating Oracle scripts to Apache Hive and Spark SQL, you will notice Hive and Spark SQL does not support many Oracle…

Continue ReadingOracle INSERT ALL Alternative in Hive/Spark SQL
Comments Off on Oracle INSERT ALL Alternative in Hive/Spark SQL

Apache Hive Set Operators: UNION and UNION ALL

You can use the Apache Hive set operators to combine similar data sets from two or more SELECT statements into a single result set . Here the similar data set literally mean, the data type of the result set should also match. Otherwise, you have to explicitly convert type using type conversion functions. Hive Set Operators Hadoop Hive supports following set operators. UNION [DISTINCT] UNION ALL Hive versions prior to 1.2.0 only support UNION ALL (bag union), in which duplicate rows are not eliminated. Starting version 1.2.0, Hive includes optional ALL keywords.…

Continue ReadingApache Hive Set Operators: UNION and UNION ALL
Comments Off on Apache Hive Set Operators: UNION and UNION ALL

How to Write Hive Conditional Join and Example

The join in Hive is used to combine the records from multiple tables based on the joining condition. The joining condition can be on the common columns between participating tables. Hadoop Hive supports various join types. You may get the requirement to choose the joining values based on certain conditions. For example, you can have OR condition in the joining condition. In this article, we will check how to write Conditional Join in Hive with some examples. Hive Conditional Join You never know what type of requirement that you are…

Continue ReadingHow to Write Hive Conditional Join and Example
Comments Off on How to Write Hive Conditional Join and Example

Hive Extract Numbers using Regular Expression Functions

In my other article, we have seen how to extract date values from a string using Hive regular expressions. The regular expression function is sometime called as regex. The other common uses of regular expression is to extract the numeric values. For example, extract area code or phone numbers from the string data. In this article, we will check how to extract numbers using regular expression functions in Apache Hive. Extract Numbers using Hive Regular Expression Functions When you work on different data sources, you may get requirement to extract…

Continue ReadingHive Extract Numbers using Regular Expression Functions
Comments Off on Hive Extract Numbers using Regular Expression Functions

Hive Extract Date using Regular Expression Functions

In my other articles, we have seen how regular expression functions are used to check if a string is numeric. The other common requirements is to extract the date field from the string using Hive regular expression functions. For example, you may get requirement to get date value from the string field. Apache Hive support many functionalities that you can use to extract date type. But, using a regular expression function to extract date from a string is the easiest method. Extract Date using Hive Regular Expression Functions There are…

Continue ReadingHive Extract Date using Regular Expression Functions
Comments Off on Hive Extract Date using Regular Expression Functions

Hive Table Sampling – Concept and Example

The Relational databases like SQL server supports writing queries on a relatively small number of rows from the very large table. In this article, we will check Hive table sampling concept, methods and some examples. Hive Table Sampling Concept The Hive TABLESAMPLE clause allows the users to write queries for samples of the data instead of the whole table. The sampling comes handy when you are working on the large tables and it takes time to return results. The TABLESAMPLE clause can be added to any table in the FROM…

Continue ReadingHive Table Sampling – Concept and Example
Comments Off on Hive Table Sampling – Concept and Example

Apache Hive Integer Value Check – Examples

In a data warehouse environment, there are many options that you can check for an integer value. Using this process, you can usually remove the unwanted records and save some I/O. For example, filter out non-numeric values when comparing it with integer types. In this article, we see different methods to check integer value in Hive. Apache Hive Integer Value Check Many relational databases provide an extended SQL function to help the data warehouse developers. The built-in functions such isnumeric is used to check given string value is a number…

Continue ReadingApache Hive Integer Value Check – Examples
Comments Off on Apache Hive Integer Value Check – Examples

Working with Hive Macros, Syntax and Examples

Many relational databases such as Teradata supports Macro functions. In RDBMS, Macros are stored in the data dictionary. Users can share macros and can execute based on the requirements. Hive Macros are a bit different compared to that of relational databases. In this article, we will check what are Macros, its syntax, how to use them and some macro examples. What are Macros in Hive? The macros in Hive are set of SQL statements which are stored and executed by calling macro function names. Macros exist for the duration of the current session. Macros are…

Continue ReadingWorking with Hive Macros, Syntax and Examples
Comments Off on Working with Hive Macros, Syntax and Examples

Hive Self Join Query, Performance and Optimization

By definition, self join is a join in which a table is joined itself. Self joins are usually used only when there is a parent child relationship in the given data. In this article, we will check how to write self join query in the Hive, its performance issues and how to optimize it. Hive Self Join Query As mentioned earlier, self join is used when there is parent-child relation between your data. For example, consider an employee table. an employee table contains details about the employees and an employee…

Continue ReadingHive Self Join Query, Performance and Optimization
Comments Off on Hive Self Join Query, Performance and Optimization