How to Save Spark DataFrame as Hive Table – Example

Apache Spark is one of the highly contributed frameworks. Many e-commerce, data analytics and travel companies are using Spark to analyze the huge amount of data as soon as possible. Because of in memory computations, Apache Spark can provide results 10 to 100X faster compared to Hive. In this article, we will check How to Save Spark DataFrame as Hive Table? and some examples. How to Save Spark DataFrame as Hive Table? Because of its in-memory computation, Spark is used to process the complex computation. In case if you have…

Continue ReadingHow to Save Spark DataFrame as Hive Table – Example
Comments Off on How to Save Spark DataFrame as Hive Table – Example

Hive Insert from Select Statement and Examples

Apache Hive is the data warehouse framework on top of the Hadoop distributed file system (HDFS). It provides a query language called Hive Query Language, HiveQL or HQL. HiveQL syntax is similar to SQL syntax with minor changes. Similar to SQL insert statements, HQL also supports inserting data into tables using various methods. In this article, we will check one of the data insert methods into Hive table using a Select statement or clause. Hive Insert Data into Table Methods Below are the some of commonly used methods to insert…

Continue ReadingHive Insert from Select Statement and Examples
Comments Off on Hive Insert from Select Statement and Examples

How to Exclude Hive Partition Column From SELECT Query

Apache Hive is a data warehouse framework on top of Hadoop HDFS. Hive is a high level language to store and analyse large volumes of data. Apache Hive support most of the relational database features such as partitioning large tables and store values according to partition column. But, Hive stores partition column as a virtual column and is visible when you perform 'select * from table'. In this article, we will check method to exclude Hive partition column from a SELECT query. Hive Table Partition Partition in Hive table is…

Continue ReadingHow to Exclude Hive Partition Column From SELECT Query
Comments Off on How to Exclude Hive Partition Column From SELECT Query

Identify and Remove Duplicate Records from Hive Table

Apache Hive being batch processing engine, does not support primary, foreign or unique key constraints. You can insert the duplicate records in the Hive table. There are no constraints to ensure uniqueness or primary key, but if you have a table and have loaded data twice, then you can de-duplicate in several ways. Below methods explain you how to identify and Remove duplicate records or rows from Hive table. Remove Duplicate Records from Hive Table Apache Hive does not provide support to many functions or internal columns that are supported…

Continue ReadingIdentify and Remove Duplicate Records from Hive Table
Comments Off on Identify and Remove Duplicate Records from Hive Table

Export Hive Table DDL, Syntax and Shell Script Example

There are many situations where you are required to export DDL's. For example, you are migrating some of your Hive tables to the RDBMS for reporting. If you are working as a Hadoop administrator, you should have knowledge on how to export Table DDL. In this article, we will check on how to export Hive table DDL to a text file using shell script and beeline connection string. Export Hive Table DDL As mentioned earlier, it is good to have a utility that allows you to generate DDL in Hive.…

Continue ReadingExport Hive Table DDL, Syntax and Shell Script Example
Comments Off on Export Hive Table DDL, Syntax and Shell Script Example

Apache Hive Replace Function and Examples

By default, there is no Hive replace function available. String manipulation function replace is very much needed in case if you are manipulating strings and when there is a need to replace the particular value. Value could be junk value. In this article, we will check what are Hive replace function alternative methods that you can use whenever required. Hive Replace Function As mentioned earlier, Apache Hive does not provide support for replace function. However, it does provides support for regular expression functions and translate function. You can use any…

Continue ReadingApache Hive Replace Function and Examples
Comments Off on Apache Hive Replace Function and Examples

Hive Interval Data Types and Conversion Examples

Hive supports interval types in the same way as other relational databases such as Netezza, Vertica, Oracle, etc. It accepts interval syntax with unit specifications. You have to specify the units along withe interval value. For example, INTERVAL '1' DAY refers to day time. In this article, we will check Hive interval data types and its conversion examples. Hive Interval Data Types Hive version 1.2 and above supports interval types. Intervals of time units, Year to month intervals and Day to second intervals are available in hive version 1.2 and…

Continue ReadingHive Interval Data Types and Conversion Examples
Comments Off on Hive Interval Data Types and Conversion Examples