Details about bigdata

SQL SET Operator MINUS Alternative in Hive and Examples

The set operators in SQL are used to combine similar data set of two or more SELECT statements. Here similar data set literally means the number of columns and its data type should match, otherwise you must explicitly type cast the data types of the values in SELECT statements. Hive does support UNION and UNION ALL set operator, INTERSECT and MINUS are not supported as of now. In this article, we will check SQL set operator MINUS alternative in Hive with an example. SQL SET Operator MINUS Alternative in Hive…

Continue ReadingSQL SET Operator MINUS Alternative in Hive and Examples
Comments Off on SQL SET Operator MINUS Alternative in Hive and Examples

What are SQL Features Missing in Hive?

Apache Hive syntax looks similar to SQL-92 standards but does not fully compatible to SQL-92. Storage and querying underlying table’s closes resembles traditional databases available in industry. HiveQL provides some of the extensions that are not present in traditional databases. There are some features gap between traditional SQL and Apache Hive. In this article, we will check some basic and import SQL features missing in Hive. SQL features Missing in Hive Below are some of important yet basic SQL features missing in Hive: Online Transaction Processing (OLTP)Correlated Sub-queriesMaterialized ViewsTruncate TableIndexes…

Continue ReadingWhat are SQL Features Missing in Hive?
Comments Off on What are SQL Features Missing in Hive?

Hive DELETE FROM Table Alternative– Easy Steps

By definition, Data Warehouse is mechanism to store historical data in an easy accessible manner. Data may be updated to keep tables with up-to date records. This performance critical operation holds good when you plan to migrate your data warehouse to bigdata world. In this article, we will check one of the method to remove outdated records from Hive table i.e. Hive DELETE FROM table Alternative.   Hive DELETE FROM Table Alternative Apache Hive is not designed for online transaction processing and does not offer real-time queries and row level…

Continue ReadingHive DELETE FROM Table Alternative– Easy Steps
Comments Off on Hive DELETE FROM Table Alternative– Easy Steps

Steps to Import Oracle Tables using Sqoop

Oracle database is one of the largely used database in the world. Most of financial organizations are using Oracle for their transaction processing. As mentioned in my other post import Netezza tables using Apache Sqoop, with growing data organizations are moving their computation part to Hadoop ecosystem. In this post, we will check steps to import Oracle tables using Sqoop commands. Steps to Import Oracle Tables using Sqoop Most of the organizations and people trying to get data into Hadoop ecosystem, they use various options such as creating flat-files and…

Continue ReadingSteps to Import Oracle Tables using Sqoop
Comments Off on Steps to Import Oracle Tables using Sqoop

Sqoop Export Hive Tables into Netezza

Hadoop systems are mostly best suited for batch processing. Reporting is not recommended on Hadoop Hive or Impala. Sometimes to enable faster reporting, organizations transfer the processed data from Hadoop ecosystem to high performance relational databases such as Netezza. In this article, we will check Sqoop export Hive tables into Netezza with working examples. Sqoop Export Hive Tables into Netezza In some cases, data processed by Hadoop ecosystem may be needed in production systems hosted on relational databases to help run additional critical business functions and generate reports. The Sqoop can exports…

Continue ReadingSqoop Export Hive Tables into Netezza
Comments Off on Sqoop Export Hive Tables into Netezza

How to Import Netezza Tables using Sqoop?

With growing data, organizations are moving computation part to Hadoop ecosystem. Apache Sqoop is an open source tool to import data from relational databases to Hadoop and vice versa. Apache Sqoop is one of the easiest tool to import relational database such as Netezza into Hadoop ecosystem. The Sqoop command allows you to import all tables, single table, execute query and store result in Hadoop HDFS. In this article, we will check how to import Netezza tables using Sqoop with some practical examples. Sqoop uses a connector based architecture which…

Continue ReadingHow to Import Netezza Tables using Sqoop?
Comments Off on How to Import Netezza Tables using Sqoop?

Apache Hive User-defined Functions

Apache Hive is a data warehouse framework on top of Hadoop ecosystem. The Apache Hive architecture is different compared to other Hadoop tools that are available. Being an open source project, Apache Hive has added a lot of functionalities since its inception. But it still lacks some basic functionalities that are available in traditional data warehouse systems such as Netezza, Teradata, Oracle, etc. In this post, we will check Apache Hive user-defined functions and how to use them to perform a specific task. Apache Hive User-defined Functions When you start…

Continue ReadingApache Hive User-defined Functions
Comments Off on Apache Hive User-defined Functions

Best Practices to Optimize Hive Query Performance

As we have seen in my other post Steps to Optimize SQL Query Performance, we can improve the performance of back-end SQL by adding simple improvement while writing SQL queries. Apache Hive architecture behaves differently with data and type of HQL query you write. In this post, we will check best practices to optimize Hive query performance with some examples. In data warehouse environment, we write lot of queries and pay very little attention to the optimization part. Tuning performance of Hive query is one of important step and require…

Continue ReadingBest Practices to Optimize Hive Query Performance
2 Comments

Hive ANALYZE TABLE Command – Table Statistics

Hive uses cost based optimizer. Statistics serve as the input to the cost functions of the Hive optimizer so that it can compare different plans and choose best among them. Hive uses the statistics such as number of rows in tables or table partition to generate an optimal query plan. Other than optimizer, hive uses mentioned statistics in many other ways. In this post, we will check Apache Hive table statistics - Hive ANALYZE TABLE command and some examples. Uses of Hive Table or Partition Statistics There are many ways…

Continue ReadingHive ANALYZE TABLE Command – Table Statistics
Comments Off on Hive ANALYZE TABLE Command – Table Statistics

Methods to Access Hive Tables from Python

Apache Hive is database framework on the top of Hadoop distributed file system (HDFS) to query structured and semi-structured data. Just like your regular RDBMS, you access hdfs files in the form of tables. You can create tables, views etc in Apache Hive. You can analyze structured data using HiveQL language which is similar to Structural Query Language (SQL). In this article, we will check different methods to access Hive tables from python program. Methods we are going to discuss here will help you to connect Hive tables and get…

Continue ReadingMethods to Access Hive Tables from Python
Comments Off on Methods to Access Hive Tables from Python