DWgeek.com

Hadoop Single Node Cluster Setup on Ubuntu

In this tutorial, I will explain you setting up Hadoop single node cluster setup on Ubuntu 14.04. Single node cluster will sit on the top of Hadoop Distributed File System (HDFS). Hadoop single node cluster setup on Ubuntu 14.04 Hadoop is a Java framework for running application on the large cluster made up of commodity hardware's. Hadoop framework allows us to run MapReduce programs on file system stored in highly fault-tolerant Hadoop distributed file systems. Related Readings: How to Learn Apache Hadoop Also: 7 Best Books to Learn Bigdata Hadoop The main…

Comments Off

July 23, 2016

BigData

7 Best Hadoop Books to Learn Bigdata Hadoop

The Hadoop ecosystem is vast and may take long time to learn bigdata and start implement applications therefore people new to big data Hadoop technology must choose right book to start with. Here are some of Best Hadoop books you may want to consider. The Hadoop Bigdata has a huge demand in the domains like finance, Insurance, Banking, social networking and many other platforms that deal with very large data sets. The Hadoop experts are in great demand in industries which needs to handle and big, complicated data sets. A working knowledge of…

1 Comment

July 16, 2016

BigData

How to Learn Apache Hadoop

Most of you want to know what Apache Hadoop is, how and where to start learning it? Here I’m going to share you some of steps I followed to learn hadoop. Don’t worry! You don’t have to be a Java programmer to learn Hadoop. You should know little bit of basic Linux commands. You will learn all remaining programming languages once you login to cluster :-) Let’s first know what is Hadoop? Apache Hadoop is an open source framework to process very large data sets (BigData). Hadoop allows the distributed storage and…

Comments Off

July 16, 2016

Netezza

Groom in Netezza Tables and Databases with Aginity

Before getting into how to groom Netezza table and database, first let’s understand what grooming is? Use the GROOM TABLE command to maintain the user tables by reclaiming disk space for deleted or outdated rows. You can also use GROOM TABLE command to reorganize the tables by their organizing keys columns. End user can execute DML statements such as SELECT, UPDATE, DELETE, and INSERT operations while the data grooming is running. The SELECT operations run in parallel with the grooming operations and any INSERT, UPDATE, and DELETE operations run serially between the…

Comments Off

July 13, 2016

Data Mining

Mining Frequent itemsets – Apriori Algorithm

Apriori algorithm is an algorithm for frequent item set mining and association rule learning over transaction databases. Its followed by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database. Read: Methods to Measure Data Dispersion 9 Laws Everyone In The Data Mining Should Use Various Data Mining Clustering Algorithms and Examples…

Comments Off

July 9, 2016

Netezza

Easy methods to Import data using Aginity

In this post, I am going to show you how to import data using Aginity workbench. Data import in Aginity workbench has fairly easy method to get data into Netezza system for those who are not familiar with nzload and external table options. Netezza import is one of important task in Neteza data warehouse. Read: Installing Aginity workbench Aginity workbench is an easy-to-use application that enhances performance and creates efficiencies when working with MPP data warehouse appliances. Aginity workbench provides a powerful set of GUI-based tools for developers, DBAs, and data analysts. Import…

Comments Off

July 9, 2016

Netezza

How to Install Aginity Workbench for Netezza

I have received multiple requests to create tutorial on working with Netezza SQL database development tool. In this post, I have created guide to install Netezza Aginity workbench in windows environment. About Netezza Aginity Aginity workbench is an easy-to-use application that enhances performance and creates efficiencies when working with MPP data warehouse appliances. Aginity workbench provides a powerful set of GUI-based tools for developers, DBAs, and data analysts. Read: Access Netezza Database, Tools and Examples Netezza Groom Tables and Databases nzsql command and its Usage Netezza SET CATALOG command This SQL database…

4 Comments

July 8, 2016

Netezza

Netezza Sequence and how to Create/Use it

A Netezza sequence is named objects in an individual database in Netezza, that can provide the unique value when get next value method. You can use sequence to generate unique numbers that can be used as surrogate key values for primary key values. Netezza Sequence Overview A sequence value is an integer that you can use wherever you would use numeric values. Netezza supports user sequences for the four integer types: byteint, smallint, integer, and bigint. You can even create a sequence with an initial value, an increment, a minimum and a maximum value.…

Comments Off

July 7, 2016

Netezza

Netezza Skew and How to avoid it

You will hear a lot about "Netezza Skew" if you are developing data warehouse on Netezza, Redshift, Teradata, hive or Impala database. The performance of the system is directly linked to uniform distribution of the user data across all of the data slices in the system. When you create a table and then load the data into the system, the rows of the table should be distributed uniformly among all the data slices. If some data slices have more rows of a table than others this scenarios is called skew.…

Comments Off

July 6, 2016

Netezza

Importance of right Netezza Distribution key

This post is all about how data is distributed (Netezza distribution key) in Netezza server. Feel free to make comments or suggestions to improve it, or pass it on if you like. Let’s first understand how NPS stores the data on disk drives. Each Snippet Processor in the Snippet Processing Unit (SPU) has a dedicated hard drive has its separate CPU, FPGA, separate RAM memory, hard disks and the data on stored on drive is called a data slice. Read: Changing Netezza Table Distribution Key Cluster Based Tables (CBT) in…

Comments Off

July 6, 2016