Vithal S, Author at DWgeek.com

Sqoop Export HBase Table into Relational Database

You can use Apache Sqoop to export HBase table into relational table (RDBMS). Sqoop does not support direct export from HBase to relational databases. You have to use the work around to export data out to relational database, in this article, we will check out Sqoop export HBase table into relational database and steps with an examples. Sqoop Export HBase Table into Relational Database HBase structure doesn't map very well to the typical relational database such as Netezza, Oracle, SQL Servers etc. In relational databases fixed schema for the tables…

Comments Off

October 6, 2017

BigData

Apache Hive Different File Formats:TextFile, SequenceFile, RCFile, AVRO, ORC,Parquet

Apache Hive supports several familiar file formats used in Apache Hadoop. Hive can load and query different data file created by other Hadoop components such as Pig or MapReduce. In this article, we will check Apache Hive different file formats such as TextFile, SequenceFile, RCFile, AVRO, ORC and Parquet formats. Cloudera Impala also supports these file formats. Hive Different File Formats Different file formats and compression codecs work better for different data sets in Apache Hive. Following are the Apache Hive different file formats: Text File Sequence File RC File…

Comments Off

October 3, 2017

BigData

Hadoop Hive WITH Clause Syntax and Examples

With the Help of Hive WITH clause you can reuse piece of query result in same query construct. You can also improve the Hadoop Hive query using WITH clause. You can simplify the query by moving complex, complicated repetitive code to the WITH clause and refer the logical table created in your SELECT statements. Hadoop Hive WITH Clause A Hive WITH Clause can be added before a SELECT statement of you query, to define aliases for complex and complicated expressions that are referenced multiple times within the body of the…

1 Comment

October 2, 2017

BigData

Hadoop Hive Conditional Functions: IF,CASE,COALESCE,NVL,DECODE

Hadoop Hive supports the various Conditional functions such as IF, CASE, COALESCE, NVL, DECODE etc. You can use these function for testing equality, comparison operators and check if value is null. Following diagram shows various Hive Conditional Functions: Hive Conditional Functions Below table describes the various Hive conditional functions: Conditional Function Description IF(boolean testCondition, T valueTrue, T valueFalseOrNull); This is the one of best Hive Conditional Functions and is similar to the IF statements in other programming languages. The IF Hive Conditional functions tests an expression and returns a corresponding…

3 Comments

October 1, 2017

BigData

Hadoop Hive Date Functions and Examples

Many applications manipulate the date and time values. Latest Hadoop Hive query language support most of relational database date functions. In this article, we will check commonly used Hadoop Hive date functions and some of examples on usage of those functions. Hadoop Hive Date Functions Date types are highly formatted and very complicated. Each date value contains the century, year, month, day, hour, minute, and second. We shall see how to use the Hadoop Hive date functions with an examples. You can use these functions as Hive date conversion functions…

3 Comments

October 1, 2017

BigData

Commonly used Cloudera Impala Date Functions and Examples

This article is about short descriptions and examples of the commonly used Cloudera Impala date functions that you can use to manipulate date columns in Impala SQL. In the real word scenarios many application manipulate the date and time data types. Impala SQL supports most of the date and time functions that relational databases supports. Date types are highly formatted and very complicated. Each date value contains the century, year, month, day, hour, minute, and second. We shall see how to use the Impala date functions with an examples. Cloudera…

3 Comments

October 1, 2017

BigData

Impala or Hive Slowly Changing Dimension – SCD Type 2 Implementation

Slowly changing dimensions in Data warehouse are commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases. Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. Since Cloudera impala or Hadoop Hive does not support update statements, you have to implement the update using intermediate tables. In this article, we will check Cloudera Impala or Hive Slowly Changing Dimension - SCD Type 2 Implementation steps with an example. For demonstration purpose, lets take the example…

2 Comments

September 30, 2017

BigData

Apache HBase Data Model Explanation

Apache HBase is column oriented scalable database built on top of Hadoop HDFS. The HBase is an open-source implementation of Google’s BigTable. In this article, we will check Apache HBase data model and explanation. Apache HBase Data Model The Apache HBase Data Model is designed to accommodate structured or semi-structured data that could vary in field size, data type and columns. HBase stores data in tables, which have rows and columns. The table schema is very different from traditional relational database tables. You can consider HBase table as a multi-dimensional…

2 Comments

September 28, 2017

BigData

HBase Table Schema Design and Concept

HBase table can scale to billions of rows and many number of column based on your requirements. This table allows you to store terabytes of data in it. The HBase table supports the high read and write throughput at low latency. A single value in each row is indexed; this value is known as the row key. In this article, we will check HBase table schema design and concept. HBase Table Schema Design General Concepts The HBase schema design is very different compared to the relation database schema design. Below…

Comments Off

September 27, 2017

BigData

How to avoid HBase Hotspotting?

HBase hotspotting occurs when large amount of traffic from various clients redirected to single or very few numbers of nodes in the cluster. The HBase hotspotting occurs because of bad row key design. In this article, we will see how to avoid HBase hotspotting or region server hotspotting. How Does HBase hotspotting occurs? HBase hotspotting occurs because of poorly designed row key. Because of bad row key, HBase stores large amount of data on single node and entire traffic is redirected to this node when client requests some data leaving…

Comments Off

September 26, 2017