Hadoop Single Node Cluster Setup on Ubuntu

In this tutorial, I will explain you setting up Hadoop single node cluster setup on Ubuntu 14.04. Single node cluster will sit on the top of Hadoop Distributed File System (HDFS). Hadoop single node cluster setup on Ubuntu 14.04 Hadoop is a Java framework for running application on the large cluster made up of commodity hardware's. Hadoop framework allows us to run MapReduce programs on file system stored in highly fault-tolerant Hadoop distributed file systems. Related Readings:  How to Learn Apache Hadoop   Also: 7 Best Books to Learn Bigdata Hadoop The main…

Continue ReadingHadoop Single Node Cluster Setup on Ubuntu
Comments Off on Hadoop Single Node Cluster Setup on Ubuntu

7 Best Hadoop Books to Learn Bigdata Hadoop

The Hadoop ecosystem is vast and may take long time to learn bigdata and start implement applications therefore people new to big data Hadoop technology must choose right book to start with. Here are some of Best Hadoop books you may want to consider. The Hadoop Bigdata has a huge demand in the domains like finance, Insurance, Banking, social networking and many other platforms that deal with very large data sets. The Hadoop experts are in great demand in industries which needs to handle and big, complicated data sets. A working knowledge of…

Continue Reading7 Best Hadoop Books to Learn Bigdata Hadoop
1 Comment

How to Learn Apache Hadoop

Most of you want to know what Apache Hadoop is, how and where to start learning it? Here I’m going to share you some of steps I followed to learn hadoop.  Don’t worry!  You don’t have to be a Java programmer to learn Hadoop. You should know little bit of basic Linux commands. You will learn all remaining programming languages once you login to cluster :-) Let’s first know what is Hadoop? Apache Hadoop is an open source framework to process very large data sets (BigData). Hadoop allows the distributed storage and…

Continue ReadingHow to Learn Apache Hadoop
Comments Off on How to Learn Apache Hadoop

Groom in Netezza Tables and Databases with Aginity

Before getting into how to groom Netezza table and database, first let’s understand what grooming is? Use the GROOM TABLE command to maintain the user tables by reclaiming disk space for deleted or outdated rows. You can also use GROOM TABLE command to reorganize the tables by their organizing keys columns. End user can execute DML statements such as SELECT, UPDATE, DELETE, and INSERT operations while the data grooming is running. The SELECT operations run in parallel with the grooming operations and any INSERT, UPDATE, and DELETE operations run serially between the…

Continue ReadingGroom in Netezza Tables and Databases with Aginity
Comments Off on Groom in Netezza Tables and Databases with Aginity

Mining Frequent itemsets – Apriori Algorithm

Apriori algorithm is an algorithm for frequent item set mining and association rule learning over transaction databases. Its followed by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database. Read: Methods to Measure Data Dispersion 9 Laws Everyone In The Data Mining Should Use Various Data Mining Clustering Algorithms and Examples…

Continue ReadingMining Frequent itemsets – Apriori Algorithm
Comments Off on Mining Frequent itemsets – Apriori Algorithm

Easy methods to Import data using Aginity

In this post, I am going to show you how to import data using Aginity workbench. Data import in Aginity workbench has fairly easy method to get data into Netezza system for those who are not familiar with nzload and external table options. Netezza import is one of important task in Neteza data warehouse. Read:  Installing Aginity workbench Aginity workbench is an easy-to-use application that enhances performance and creates efficiencies when working with MPP data warehouse appliances. Aginity workbench provides a powerful set of GUI-based tools for developers, DBAs, and data analysts. Import…

Continue ReadingEasy methods to Import data using Aginity
Comments Off on Easy methods to Import data using Aginity

How to Install Aginity Workbench for Netezza

I have received multiple requests to create tutorial on working with Netezza SQL database development tool. In this post, I have created guide to install Netezza Aginity workbench in windows environment. About Netezza Aginity Aginity workbench is an easy-to-use application that enhances performance and creates efficiencies when working with MPP data warehouse appliances. Aginity workbench provides a powerful set of GUI-based tools for developers, DBAs, and data analysts. Read: Access Netezza Database, Tools and Examples Netezza Groom Tables and Databases nzsql command and its Usage Netezza SET CATALOG command This SQL database…

Continue ReadingHow to Install Aginity Workbench for Netezza
4 Comments

Netezza Sequence and how to Create/Use it

A Netezza sequence is named objects in an individual database in Netezza, that can provide the unique value when get next value method. You can use sequence to generate unique numbers that can be used as surrogate key values for primary key values. Netezza Sequence Overview A sequence value is an integer that you can use wherever you would use numeric values. Netezza supports user sequences for the four integer types: byteint, smallint, integer, and bigint. You can even create a sequence with an initial value, an increment, a minimum and a maximum value.…

Continue ReadingNetezza Sequence and how to Create/Use it
Comments Off on Netezza Sequence and how to Create/Use it

Netezza Skew and How to avoid it

You will hear a lot about "Netezza Skew" if you are developing data warehouse on Netezza, Redshift, Teradata, hive or Impala database. The performance of the system is directly linked to uniform distribution of the user data  across all of the data slices in the system. When you create a table and then load the data into the system, the rows of the table should be distributed uniformly among all the data slices. If some data slices have more rows of a table than others this scenarios is called skew.…

Continue ReadingNetezza Skew and How to avoid it
Comments Off on Netezza Skew and How to avoid it

Importance of right Netezza Distribution key

This post is all about how data is distributed (Netezza distribution key) in Netezza server. Feel free to make comments or suggestions to improve it, or pass it on if you like. Let’s first understand how NPS stores the data on disk drives. Each Snippet Processor in the Snippet Processing Unit (SPU) has a dedicated hard drive has its separate CPU, FPGA, separate RAM memory, hard disks  and the data on stored on drive is called a data slice. Read: Changing Netezza Table Distribution Key Cluster Based Tables (CBT) in…

Continue ReadingImportance of right Netezza Distribution key
Comments Off on Importance of right Netezza Distribution key