Details about bigdata

How to Update Impala Table? – Steps and Examples

Cloudera Impala and Apache Hive provide a better way to manage structured and semi-structured data on Hadoop ecosystem. Both frameworks make use of HDFS as a storage mechanism to store data. The HDFS architecture is not intended to update files, it is designed for batch processing. i.e. process huge amount of data. But most of the organizations are maintaining a data warehouse on traditional relation databases like Netezza, Teradata, Oracle, etc. When they migrate their data warehouse to Hadoop ecosystem, they might want to have a design similar to that…

Continue ReadingHow to Update Impala Table? – Steps and Examples
Comments Off on How to Update Impala Table? – Steps and Examples

How to Execute HBase Commands from Shell Script? – Examples

Shell scripting is one of the widely used scripting language to automate day to day activities. Usually, Linux shells are interactive, they accept command as input from users and execute them. However, it will become repetitive as you have to type in all commands each time on terminal. Instead, you can bundle those commands in shell script. In this article, we will check how to execute HBase Commands from Shell Script with an example. Why Shell Script is Required? There are many reasons to use Linux shell scripting: It helps…

Continue ReadingHow to Execute HBase Commands from Shell Script? – Examples
Comments Off on How to Execute HBase Commands from Shell Script? – Examples

Methods to Access Impala Tables from Python

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored on bigdata Hadoop. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface as Apache Hive. In this article, we will check different methods to access Impala tables from python program or script. The methods we are going to discuss here will help you to connect Impala…

Continue ReadingMethods to Access Impala Tables from Python
Comments Off on Methods to Access Impala Tables from Python

Impala Delete from Tables and Alternative Steps

Data warehouse stores the information in the form of tables. You may have to delete out-dated data and update the table’s values in order to keep data up-to-date. These performance critical operations are critical to keep the data warehouse on bigdata also when you migrate data from relational database systems.  In this article, we will check Impala delete from tables and alternative examples. Impala Delete from Table Command Cloudera Impala version 5.10 and above supports DELETE FROM table command on kudu storage. This command deletes an arbitrary number of rows…

Continue ReadingImpala Delete from Tables and Alternative Steps
Comments Off on Impala Delete from Tables and Alternative Steps

SQL SET Operator MINUS Alternative in Impala

The SQL set operators are used to combine data from two or more SELECT statements. The set operators can combine only similar data sets. Here similar data set literally means the number of columns and its data type should match, otherwise you must explicitly type cast the types of the values or columns in the SELECT statements. Just like Apache hive, Impala support only UNION and UNION ALL set operator, INTERSECT and MINUS are not supported as of now. In this article, we will check SQL set operator MINUS alternative…

Continue ReadingSQL SET Operator MINUS Alternative in Impala
Comments Off on SQL SET Operator MINUS Alternative in Impala

Cloudera Impala Merge Statement – UPSERT Command

The MERGE query or statement in SQL is used to perform incremental load. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the condition. i.e. you can update old values and insert new records. The merge command is widely used in incremental load where you have to update old records and insert new records if any. In this article, we will check Cloudera Impala Merge Statement along with some Impala native UPSERT command. In SQL world, the merge statement is also referred to…

Continue ReadingCloudera Impala Merge Statement – UPSERT Command
Comments Off on Cloudera Impala Merge Statement – UPSERT Command

Impala Dynamic SQL Support and Alternative Approaches

Dynamic SQL lets SQL statements be created and execute at run time, i.e. you can build SQL queries based on the user or application input and execute them to provide required output. For examples, pass a session specific value to the Impala queries dynamically during runtime. In this article, we will check how to build Cloudera Impala Dynamic SQL queries and how to execute them. Most of the relational databases like Netezza, Teradata, etc. supports stored procedure that will allow you to build and execute dynamic queries. Impala Dynamic SQL…

Continue ReadingImpala Dynamic SQL Support and Alternative Approaches
Comments Off on Impala Dynamic SQL Support and Alternative Approaches

Hive Dynamic SQL Support and Alternative

Dynamic SQL queries are created on the fly and executed. Dynamic SQL lets SQL statements be defined and execute at run time, i.e. you can build SQL queries based on the user input and execute them to provide required output. For examples, pass a session specific value to the HQL queries dynamically during runtime. In this article, we will check how to build Apache Hive Dynamic SQL queries and how to execute them. Hive Dynamic SQL Support Apache Hive version 1.x and Cloudera impala does not support dynamic SQL, you…

Continue ReadingHive Dynamic SQL Support and Alternative
Comments Off on Hive Dynamic SQL Support and Alternative

Hive Merge Tables Statement – Alternative and Example

The MERGE query or statement in SQL is used to perform incremental load. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the condition. i.e. you can update old values and insert new records. The MERGE statement in SQL are mainly used to implement slowly changing dimensions. As of now, Hive does not support MERGE statement. In this article, we will check what is Hive Merge tables alternative with an example. Sometimes, update insert is also called UPSERT. Related Article, Slowly changing dimension…

Continue ReadingHive Merge Tables Statement – Alternative and Example
Comments Off on Hive Merge Tables Statement – Alternative and Example

Hive Drop Column Alternative and Examples

Apache Hive is a data warehouse framework on top of Hadoop ecosystem. Hive works well for all your batch processing. It is not true data warehouse platform as it does not provide support for real-time analytics. There are many features missing in Hive that are available in traditional relational databases. One of such features is DROP COLUMNS using ALTER TABLE statements. In this article, we will check Hive drop column alternative with some examples. Hive Drop Column Alternative There are two approaches that you can follow if you want to…

Continue ReadingHive Drop Column Alternative and Examples
Comments Off on Hive Drop Column Alternative and Examples