Hive Archives - Page 4 of 8

Export Hive Query Output into Local Directory using INSERT OVERWRITE

INSERT OVERWRITE statements to HDFS filesystem or LOCAL directories are the best way to extract large amounts of data from Hive table or query output. Hive can write to HDFS directories in parallel from within a map-reduce job. In this article, we will check Export Hive Query Output into Local Directory using INSERT OVERWRITE and some examples. Export Hive Query Output into Local Directory using INSERT OVERWRITE Query results can be inserted into filesystem directories by using Hive INSERT OVERWRITE statement. You can insert data into either HDFS or LOCAL…

Comments Off

January 30, 2018

BigData

Apache Hive ALTER TABLE Command and Examples

You can use the Apache Hive ALTER TABLE command to change the structure of an existing table. You can add, modify existing columns in Hive tables. Uses of Hive ALTER TABLE Command Below are the most common uses of the ALTER TABLE command: You can rename table and column of existing Hive tables. You can add new column to the table. Rename Hive table column. Add or drop table partition. Add Hadoop archive option to Hive table. Related reading: Apache Hive Data Types and Best Practices Apache Hive CREATE TABLE…

Comments Off

January 25, 2018

BigData

Improve Hive Memory Usage using Hadoop Archive

Hadoop hdfs is designed in such a way that, number of hdfs files directly affects the memory consumption in the namenode as it must keep track of all files in the hdfs environment. It does not affect if cluster is small, memory usage may cause problem on cluster when file count crosses 50 to 100 million files. Hadoop ecosystem performs best with fewer number of files. Now, let us check Improve Hive Memory Usage using Hadoop Archive. Related reading: Hadoop HDFS Architecture Improve Hive Memory Usage using Hadoop Archive You can…

Comments Off

January 24, 2018

BigData

Apache Hive Data Types and Best Practices

In general, data type is an attribute that specifies type of data that is going to be stored in that specific column. Each column, variable and expression has related data type associated with its column in SQL and HiveQL. However, data type names are not consistent across all databases. Hive supports almost all data types that relational database supports. In this article, we will check Apache Hive data types and Best practices. When you issue Apache Hive create table command in the Hadoop environment, each column in a table structure…

Comments Off

January 24, 2018

BigData

Apache Hive Fixed-Width File Loading Options and Examples

In general, fixed-width text files are special types of text files where the row format is specified by column widths, pad character and either left or right alignments. In the fixed width file format, column width is in terms of units of characters. Fixed width format files are usually generated by machines such as switches, SS7 etc. In this article, we will learn about Apache Hive fixed-width file loading options and some examples. Fixed-Width File Overview In general, fixed-length format files use ordinal positions, which are offsets to identify where…

5 Comments

January 23, 2018

BigData

Apache Hive LIKE statement and Pattern Matching Example

Unlike various relational databases such as Netezza, Teradata, Oracle etc, Apache hive support pattern matching using LIKE, RLIKE or INSTR functions. You can search for string by matching patterns. Note that, Hive LIKE statement is case-sensitive. Apache Hive LIKE statements returns TRUE if string that you are searching for. The Hive NOT LIKE is negation of LIKE and vice-versa. Related reading: Apache Hive Regular Expression Functions Apache Hive String Functions and Examples Hive LIKE Statement Patterns Matching If the string does not contain any percentage sign or underscore, then pattern…

Comments Off

January 20, 2018

BigData

Apache Hive Derived Tables and Examples

In some application, you may have to derive column values from base tables. For example, you may have to find out maximum value of aggregated column data. In this scenario, you will have to create aggregated data first and then apply MAX function on that column. You can achieve this by using Apache Hive derived tables. We will check type of derived tables supported in Hive with some examples. Apache Hive Derived Tables Apache Hive derived tables is a subquery which will be there in FROM clause of the HiveQL…

Comments Off

January 20, 2018

BigData

Apache Hive Correlated Subquery and it’s Restrictions

Apache Hive Correlated subquery is a query within a query that refer the columns from the outer query. Hive does support some of subqueris such as table subquery, WHERE clause subquery etc, and correlated subqueries. In most cases, the Hive correlated subqueries are used to improve the Hive query performance. Above diagram clearly explains the correlated subqueries in case of relational databases and Apache Hive. Read: Apache Hive Supported Subqueries and Examples Apache Hive Correlated Subquery Examples For example, consider query, “check if student id is already exists in the…

2 Comments

January 18, 2018

BigData

Apache Hive Supported Subqueries and Examples

A subquery in Hive is a select expression that is enclosed in parentheses as a nested query block in a HiveQL query statement. The subquery in Hive is like other relational database subquery that may return zero to one or more values to its upper select statements. In this article, we will check Apache Hive supported subqueries and some examples. Apache Hive Supported Subqueries As mentioned above, Hive subquery is a select expression enclosed in parenthesis as a nested query block. You can use these nested query blocks in any…

Comments Off

January 18, 2018

BigData

How to List Hive High Volume Tables?

Unlike other relational databases, Apache Hive does not have any system table that keeps track of size of growing tables. It is difficult to find table size in hive using query. As a part of maintenance, you should identify the size of growing tables periodically. Big tables can cause the performance issue in the Hive.Below are some of methods that you can use to list Hive high volume tables. Use hdfs dfs -du Command Hadoop supports many useful commands that you can use in day to day activities such as…

Comments Off

January 17, 2018