Working with Netezza Clustered Base Tables (CBT)

A Netezza clustered base tables (CBT) are user table that has data which is organized using one to four organizing keys columns. You can specify max four columns in organize on clause and those columns should not be a part of distribute on clause. An organizing key is a column of the table that you specify for clustering the table records; organizing table helps Netezza to save records in same or nearby extents. You can organize the records using "ORGANIZE ON" clause. Netezza does create zone maps on organizing columns, which will accelerate the performance of queries on that…

Continue ReadingWorking with Netezza Clustered Base Tables (CBT)
Comments Off on Working with Netezza Clustered Base Tables (CBT)

Hadoop Single Node Cluster Setup on Ubuntu

In this tutorial, I will explain you setting up Hadoop single node cluster setup on Ubuntu 14.04. Single node cluster will sit on the top of Hadoop Distributed File System (HDFS). Hadoop single node cluster setup on Ubuntu 14.04 Hadoop is a Java framework for running application on the large cluster made up of commodity hardware's. Hadoop framework allows us to run MapReduce programs on file system stored in highly fault-tolerant Hadoop distributed file systems. Related Readings:  How to Learn Apache Hadoop   Also: 7 Best Books to Learn Bigdata Hadoop The main…

Continue ReadingHadoop Single Node Cluster Setup on Ubuntu
Comments Off on Hadoop Single Node Cluster Setup on Ubuntu

7 Best Hadoop Books to Learn Bigdata Hadoop

The Hadoop ecosystem is vast and may take long time to learn bigdata and start implement applications therefore people new to big data Hadoop technology must choose right book to start with. Here are some of Best Hadoop books you may want to consider. The Hadoop Bigdata has a huge demand in the domains like finance, Insurance, Banking, social networking and many other platforms that deal with very large data sets. The Hadoop experts are in great demand in industries which needs to handle and big, complicated data sets. A working knowledge of…

Continue Reading7 Best Hadoop Books to Learn Bigdata Hadoop
1 Comment

How to Learn Apache Hadoop

Most of you want to know what Apache Hadoop is, how and where to start learning it? Here I’m going to share you some of steps I followed to learn hadoop.  Don’t worry!  You don’t have to be a Java programmer to learn Hadoop. You should know little bit of basic Linux commands. You will learn all remaining programming languages once you login to cluster :-) Let’s first know what is Hadoop? Apache Hadoop is an open source framework to process very large data sets (BigData). Hadoop allows the distributed storage and…

Continue ReadingHow to Learn Apache Hadoop
Comments Off on How to Learn Apache Hadoop

Groom in Netezza Tables and Databases with Aginity

Before getting into how to groom Netezza table and database, first let’s understand what grooming is? Use the GROOM TABLE command to maintain the user tables by reclaiming disk space for deleted or outdated rows. You can also use GROOM TABLE command to reorganize the tables by their organizing keys columns. End user can execute DML statements such as SELECT, UPDATE, DELETE, and INSERT operations while the data grooming is running. The SELECT operations run in parallel with the grooming operations and any INSERT, UPDATE, and DELETE operations run serially between the…

Continue ReadingGroom in Netezza Tables and Databases with Aginity
Comments Off on Groom in Netezza Tables and Databases with Aginity

Mining Frequent itemsets – Apriori Algorithm

Apriori algorithm is an algorithm for frequent item set mining and association rule learning over transaction databases. Its followed by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database. Read: Methods to Measure Data Dispersion 9 Laws Everyone In The Data Mining Should Use Various Data Mining Clustering Algorithms and Examples…

Continue ReadingMining Frequent itemsets – Apriori Algorithm
Comments Off on Mining Frequent itemsets – Apriori Algorithm

Easy methods to Import data using Aginity

In this post, I am going to show you how to import data using Aginity workbench. Data import in Aginity workbench has fairly easy method to get data into Netezza system for those who are not familiar with nzload and external table options. Netezza import is one of important task in Neteza data warehouse. Read:  Installing Aginity workbench Aginity workbench is an easy-to-use application that enhances performance and creates efficiencies when working with MPP data warehouse appliances. Aginity workbench provides a powerful set of GUI-based tools for developers, DBAs, and data analysts. Import…

Continue ReadingEasy methods to Import data using Aginity
Comments Off on Easy methods to Import data using Aginity

How to Install Aginity Workbench for Netezza

I have received multiple requests to create tutorial on working with Netezza SQL database development tool. In this post, I have created guide to install Netezza Aginity workbench in windows environment. About Netezza Aginity Aginity workbench is an easy-to-use application that enhances performance and creates efficiencies when working with MPP data warehouse appliances. Aginity workbench provides a powerful set of GUI-based tools for developers, DBAs, and data analysts. Read: Access Netezza Database, Tools and Examples Netezza Groom Tables and Databases nzsql command and its Usage Netezza SET CATALOG command This SQL database…

Continue ReadingHow to Install Aginity Workbench for Netezza
4 Comments

Netezza Sequence and how to Create/Use it

A Netezza sequence is named objects in an individual database in Netezza, that can provide the unique value when get next value method. You can use sequence to generate unique numbers that can be used as surrogate key values for primary key values. Netezza Sequence Overview A sequence value is an integer that you can use wherever you would use numeric values. Netezza supports user sequences for the four integer types: byteint, smallint, integer, and bigint. You can even create a sequence with an initial value, an increment, a minimum and a maximum value.…

Continue ReadingNetezza Sequence and how to Create/Use it
Comments Off on Netezza Sequence and how to Create/Use it

Netezza Skew and How to avoid it

You will hear a lot about "Netezza Skew" if you are developing data warehouse on Netezza, Redshift, Teradata, hive or Impala database. The performance of the system is directly linked to uniform distribution of the user data  across all of the data slices in the system. When you create a table and then load the data into the system, the rows of the table should be distributed uniformly among all the data slices. If some data slices have more rows of a table than others this scenarios is called skew.…

Continue ReadingNetezza Skew and How to avoid it
Comments Off on Netezza Skew and How to avoid it