Spark SQL isnumeric Function Alternative and Example

Most of the organizations are moving their data warehouse to the Hive and using Spark as an execution engine. Spark as an execution engine will boost the performance. In SQL, there are many options that you can use to deal with non-numeric values, for example, you can create user defined functions to filter out unwanted data. In this article, we will check Spark SQL isnumeric function alternative and examples. Spark SQL isnumeric Function Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write…

Continue ReadingSpark SQL isnumeric Function Alternative and Example
Comments Off on Spark SQL isnumeric Function Alternative and Example

Spark SQL DataFrame Self Join and Example

You can use Spark Dataset join operators to join multiple dataframes in Spark. Two or more dataFrames are joined to perform specific tasks such as getting common data from both dataFrames. In this article, we will check how to perform Spark SQL DataFrame self join using Pyspark. Spark SQL DataFrame Self Join using Pyspark Spark DataFrame supports various join types as mentioned in Spark Dataset join operators. A self join in a DataFrame is a join in which dataFrame is joined to itself. The self join is used to identify…

Continue ReadingSpark SQL DataFrame Self Join and Example
Comments Off on Spark SQL DataFrame Self Join and Example

Hive Self Join Query, Performance and Optimization

By definition, self join is a join in which a table is joined itself. Self joins are usually used only when there is a parent child relationship in the given data. In this article, we will check how to write self join query in the Hive, its performance issues and how to optimize it. Hive Self Join Query As mentioned earlier, self join is used when there is parent-child relation between your data. For example, consider an employee table. an employee table contains details about the employees and an employee…

Continue ReadingHive Self Join Query, Performance and Optimization
Comments Off on Hive Self Join Query, Performance and Optimization

How to Update or Drop Hive Partition? Steps and Examples

In general, partitions in relational databases are used to increase the performance of the SQL queries. The partition is the concept of storing relevant data in the same place. For example, let us say you want to query the data monthly bases, then you can partition your data on month. In this article, we will check how to update or drop the Hive partition that you have already created. What are Partitions in Hive? Just like relational databases, Apache Hive partitions are used to improve the performance of the HiveQL…

Continue ReadingHow to Update or Drop Hive Partition? Steps and Examples
Comments Off on How to Update or Drop Hive Partition? Steps and Examples

Hive on Error Stop Script Execution – Options

When you build a data warehouse on top of Hadoop HDFS using Hive framework, you may have to execute HiveQL or SQL queries or HiveQL script containing a bunch of HiveQL statements. Hive and Beeline does provide option to execute a script file. There may be a scenario in which you may want to stop the script execution in case if any of the SQL statement fails. In this article, we will check stop script execution on error in Hive. We shall see both Hive and Beeline CLI options to exit script execution in case…

Continue ReadingHive on Error Stop Script Execution – Options
Comments Off on Hive on Error Stop Script Execution – Options

Hadoop Hive isnumeric Alternative and Examples

In a data warehouse environment, you will be working with heterogeneous data set. You may have to filter out unwanted data before loading it to the actual data into Hive table. For example, you many have a field1 of type string contains alphanumeric values. In many scenarios, you may get requirement to filter out non-numeric values. In this article, we will check Hadoop Hive isnumeric Alternative with some examples. Hadoop Hive isnumeric Function Many relational databases provide an extended SQL functions to help the data warehouse developers. Databases such as…

Continue ReadingHadoop Hive isnumeric Alternative and Examples
Comments Off on Hadoop Hive isnumeric Alternative and Examples

Hadoop Hive Transactional Table Update join and Example

As you know Apache Hive is a data warehouse framework on top of Hadoop HDFS. Since it contains tables, you may want to update records of that table based on how your data changes. Until recently Apache Hive was not supporting transactions. Starting Hive 0.14 and above supports transactional table. You need to enable ACID properties in order to use update, delete, merge in your Hive queries. In this article, we will address How to use update join on your Hive transactional table. You can also update Hive table without…

Continue ReadingHadoop Hive Transactional Table Update join and Example
Comments Off on Hadoop Hive Transactional Table Update join and Example

Apache Hive – Extract Value from JSON using Hive – Example

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. Json files are mainly used to transfer data in web applications. Many web applications use the json files for data transfer between application and servers. In this article, we will check how to extract or get value from json file using Hive queries. Extract Value from JSON using Hive Apache Hive provides limited support to JSON files. You can store json data into Hive…

Continue ReadingApache Hive – Extract Value from JSON using Hive – Example
Comments Off on Apache Hive – Extract Value from JSON using Hive – Example

Hive Array Functions, Usage and Examples

It is very common to store values in the form of an array in the databases. Later you can use array manipulation functions to manipulate the array types. In this article, we will check how to work with Hive array functions to manipulate array types. Hive Array Functions Below are some of the commonly used Hive array functions. Hive Array Function The very first most used function is array function. This function is used to create array out of integer or string values. Following is the syntax of array function.…

Continue ReadingHive Array Functions, Usage and Examples
Comments Off on Hive Array Functions, Usage and Examples

Apache Hive DUAL Table Support and Alternative

Apache Hive like many other relational databases does not support dual table. You can simply use the SELECT without FROM clause to display the results of the function or expression that you are testing. But, it may cause a problem when you are migrating from Oracle to Hive. You may find a lot of queries using dual tables. In this article, we will check what is the dual table alternative in Hive and how to use it. What is DUAL table in Relational Databases? In relation databases, the DUAL is…

Continue ReadingApache Hive DUAL Table Support and Alternative
Comments Off on Apache Hive DUAL Table Support and Alternative