Details about bigdata

Apache Hive Type Conversion Functions and Examples

Apache Hive has some very strict rules regarding data types for function parameters that you provide while executing it. Hive type conversion functions are used to explicitly convert to the required type and format. For example, Hive does not convert DOUBLE to FLOAT, INT to STRING etc. In my other post, we have discussed on Hive date functions and examples. In this article, we will check out Cloudera Hive type conversion functions with some examples. Related Article Commonly used Apache Hive Date Functions and Examples Apache Hive Type Conversion Functions…

Continue ReadingApache Hive Type Conversion Functions and Examples
2 Comments

Hive Insert from Select Statement and Examples

Apache Hive is the data warehouse framework on top of the Hadoop distributed file system (HDFS). It provides a query language called Hive Query Language, HiveQL or HQL. HiveQL syntax is similar to SQL syntax with minor changes. Similar to SQL insert statements, HQL also supports inserting data into tables using various methods. In this article, we will check one of the data insert methods into Hive table using a Select statement or clause. Hive Insert Data into Table Methods Below are the some of commonly used methods to insert…

Continue ReadingHive Insert from Select Statement and Examples
Comments Off on Hive Insert from Select Statement and Examples

How to Exclude Hive Partition Column From SELECT Query

Apache Hive is a data warehouse framework on top of Hadoop HDFS. Hive is a high level language to store and analyse large volumes of data. Apache Hive support most of the relational database features such as partitioning large tables and store values according to partition column. But, Hive stores partition column as a virtual column and is visible when you perform 'select * from table'. In this article, we will check method to exclude Hive partition column from a SELECT query. Hive Table Partition Partition in Hive table is…

Continue ReadingHow to Exclude Hive Partition Column From SELECT Query
Comments Off on How to Exclude Hive Partition Column From SELECT Query

Identify and Remove Duplicate Records from Hive Table

Apache Hive being batch processing engine, does not support primary, foreign or unique key constraints. You can insert the duplicate records in the Hive table. There are no constraints to ensure uniqueness or primary key, but if you have a table and have loaded data twice, then you can de-duplicate in several ways. Below methods explain you how to identify and Remove duplicate records or rows from Hive table. Remove Duplicate Records from Hive Table Apache Hive does not provide support to many functions or internal columns that are supported…

Continue ReadingIdentify and Remove Duplicate Records from Hive Table
Comments Off on Identify and Remove Duplicate Records from Hive Table

How to use Impala Replace Function and Examples

The latest version of the Cloudera supports Impala replace function. String manipulation function replace is very much needed in case if you are manipulating strings and when there is a need to replace the particular value. Value could be a junk value or any other values based on your requirements. In this article, we will check Impala replace function and alternative methods that you can use whenever required. Impala Replace Function As mentioned earlier, the latest version of Cloudera impala does provide support for replace function. The syntax and usage…

Continue ReadingHow to use Impala Replace Function and Examples
Comments Off on How to use Impala Replace Function and Examples

Export Hive Table DDL, Syntax and Shell Script Example

There are many situations where you are required to export DDL's. For example, you are migrating some of your Hive tables to the RDBMS for reporting. If you are working as a Hadoop administrator, you should have knowledge on how to export Table DDL. In this article, we will check on how to export Hive table DDL to a text file using shell script and beeline connection string. Export Hive Table DDL As mentioned earlier, it is good to have a utility that allows you to generate DDL in Hive.…

Continue ReadingExport Hive Table DDL, Syntax and Shell Script Example
Comments Off on Export Hive Table DDL, Syntax and Shell Script Example

Impala Interval Data Type and Conversion Examples

Cloudera Impala Interval type is slightly different compared to Apache Hive interval data types. Only difference is it accept interval unit as a integer, where are in Hive it is string type. Interval type in Impala woks same way as in other relational databases such as Netezza, Vertica, Greenplum, Oracle, etc. In article, we will check more information on Impala interval data type and how to convert it. Impala Interval Data Type Impala interval type syntax accept unit specifications. The unit could be SECOND, HOUR, DAY, MONTH, YEAR. You have…

Continue ReadingImpala Interval Data Type and Conversion Examples
Comments Off on Impala Interval Data Type and Conversion Examples

Apache Hive Replace Function and Examples

By default, there is no Hive replace function available. String manipulation function replace is very much needed in case if you are manipulating strings and when there is a need to replace the particular value. Value could be junk value. In this article, we will check what are Hive replace function alternative methods that you can use whenever required. Hive Replace Function As mentioned earlier, Apache Hive does not provide support for replace function. However, it does provides support for regular expression functions and translate function. You can use any…

Continue ReadingApache Hive Replace Function and Examples
Comments Off on Apache Hive Replace Function and Examples

Hive Interval Data Types and Conversion Examples

Hive supports interval types in the same way as other relational databases such as Netezza, Vertica, Oracle, etc. It accepts interval syntax with unit specifications. You have to specify the units along withe interval value. For example, INTERVAL '1' DAY refers to day time. In this article, we will check Hive interval data types and its conversion examples. Hive Interval Data Types Hive version 1.2 and above supports interval types. Intervals of time units, Year to month intervals and Day to second intervals are available in hive version 1.2 and…

Continue ReadingHive Interval Data Types and Conversion Examples
Comments Off on Hive Interval Data Types and Conversion Examples

Hive UDF using Python-Use Python Script into Hive-Example

Hadoop provides an API so that you can write user-defined functions or UDFs using any of your favorite programming language. In this article, we will check how to how to create a custom function for Hive using Python? that is nothing but creating Hive UDF using Python. What is Hive? Hive is a data warehouse ecosystem built on top of Hadoop HDFS to perform batch and ad-hoc query execution on large datasets. Apache Hive can handle petabyte of data. The Hive is designed for OLAP. It is not suited for OLTP…

Continue ReadingHive UDF using Python-Use Python Script into Hive-Example
Comments Off on Hive UDF using Python-Use Python Script into Hive-Example