Vithal S, Author at DWgeek.com

Cloudera Impala Generate Sequence Numbers without UDF

If you are migrating from traditional database to Cloudera Impala then you might have noticed there is not sequence number function. In the process of Cloudera Impala Generate Sequence Numbers without UDF, you can use analytical function that are available in Cloudera Impala. If you want generate sequential sequences that automatically keep in sync with your table sequence number, you can do so with the help of Cloudera impala supported ROW_NUMBER analytical function. Related reading: Impala Conditional Functions An Introduction to Cloudera Hadoop Impala Architecture Commonly used Impala shell Command…

1 Comment

October 16, 2016

Netezza

Netezza ROWNUM Pseudo Column Alternative

If you are coming from Oracle database background, you will find it difficult in Netezza without ROWNUM pseudo column. The one possible solution to this is ROW_NUMBER() analytical function as Netezza ROWNUM pseudo column alternative. You can use ROW_NUMBER analytic function as a Netezza ROWNUM equivalent. I think most of the distributed databases does not provide the ROWNUM columns. There is a LIMIT clause to restrict the output but very difficult to assign the sequential numbers to the rows in the Netezza tables. Even Netezza Sequences also does not produce…

Comments Off

October 16, 2016

Netezza

Changing Netezza Table Distribution key and Example

Choosing right distribution key is one of the important factor to improve the performance of Netezza server. If you have created the table with RANDOM distribution or with different column with lots of duplicate records then you should immediately change the distribution key otherwise that will reduces the performance. Changing Netezza table distribution key is process of redistributing the Netezza table using Netezza nzsql. Changing Netezza Table Distribution key and Example You can achieve the redistribution in couple of ways: Redistribute using CTAS Creating new table and loading data at…

1 Comment

October 15, 2016

BigData

Run Impala SQL Script File Passing argument and Working Example

If you are porting Hive SQL scripts to Impala, you may come across passing variable to sql script as argument in Impala. You may get challenge to run Impala SQL script file passing argument. Prior to impala-shell version 2.5 there was no option to pass the value to script as arguments. Read: Impala Dynamic SQL Support and Alternative Approaches Run Hive Script File Passing Parameter and Working Example CDH 5.7/Impala shell version 2.5 and higher run Impala SQL Script File Passing argument. You can make use of the --var=variable_name option…

Comments Off

October 14, 2016

BigData

Commonly used Impala shell Command Line Options

You can use the Impala shell interactive tool (impala-shell) to set up databases and tables, insert data, and issue queries. If you have worked on Netezza or Oracle, this tool is similar to nzsql and SQLPlus. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can write the queries in the script file and execute those using Impala shell Command Line Options. Read: Impala Conditional Functions An Introduction to Hadoop Impala Architecture Impala shell Command Line Options Command line Options Description -i…

Comments Off

October 14, 2016

BigData

Impala Conditional Functions: IF, CASE, COALESCE, DECODE, NVL, ZEROIFNULL

Cloudera Impala supports the various Conditional functions. You can use these function for testing equality, comparison operators and check if value is null. Following are Impala Conditional Functions: Impala IF Conditional Function This is the one of best Impala Conditional Functions and is similar to the IF statements in other programming languages. Tests an expression and returns a corresponding result depending on whether the result is true, false or null. Read: An Introduction to Impala Architecture Syntax: if(boolean condition, type ifTrue, type ifFalseOrNull) For example; select if(1=1,'TRUE','FALSE') as IF_TEST; Impala CASE…

Comments Off

October 13, 2016

BigData

An Introduction to Cloudera Hadoop Impala Architecture

Cloudera Hadoop impala architecture is very different compared to other database engine on HDFS like Hive. The Impala server is a distributed, massively parallel processing (MPP) database engine. The architecture is similar to the other distributed databases like Netezza, Greenplum etc. Hadoop impala consists of different daemon processes that run on specific hosts within your CDH cluster. Read: Sqoop Architecture Sqoop Import Sqoop Export Netezza and Hadoop Integration Hadoop HDFS Architecture Introduction and Design Cloudera Hadoop Impala Architecture Overview The Hadoop impala is consists of three components: The Impala Daemon,…

Comments Off

October 13, 2016

Netezza

Netezza Table Locking and Concurrency

You cannot explicitly lock the tables in Netezza. The Netezza SQL, however, uses implicit Netezza table locking when there is a DDL operation on it. For example, drop table command is blocked on the table if a DML commands are running on table and vice versa. Netezza uses the serializable transaction isolation to lock the table and is ACID property compliant. That ensures no dirty reads, no non repeatable reads. Read: How Netezza Updates Records in Table? Netezza Identify and Kill Table Locks nzsql Command and its Usage Netezza Best…

Comments Off

October 12, 2016

General

Identify and Remove Netezza Duplicate Records in Table

Netezza do not have primary or unique key. You can insert the duplicate records in the table. There are no constraints to ensure uniqueness or primary key, but if you have a table and have loaded data twice, then you can de-duplicate in several ways. Below methods explain you how to identify and Remove Netezza Duplicate Records Read: Netezza Pivot Rows to Column with Example Netezza Primary Key Constraint and Syntax 1. Use Intermediate and DISTINCT Keyword You can remove the Netezza duplicate records by creating another table using DISTINCT…

Comments Off

October 11, 2016

Netezza

How Netezza Update Records in Tables?

Netezza update records operation is costlier. IBM Netezza does not perform updates, but rather does deletes the records and inserts updated values. When you run nzsql command to update record, Netezza marks the record being updated as logically deleted by setting current transaction value to the deletexid field, but does not delete it. This ensures that the database system adheres to the ACID properties of RDBMS SQL standards. How Netezza Update Records in Tables? Each record in Netezza contains two slots, one for createxid another for deletexid. Deletexid allows you…

Comments Off

October 5, 2016