Load HDFS file into Netezza Table Using nzload and External Tables

nzload command is bulk copy command available in Netezza data warehouse appliance. This Netezza native command provides an easy method for using external tables and getting data into the Netezza appliance. There is no straight forward option to load hdfs file into Netezza tables using nzload command. You must use some work around to get hdfs file into Netezza tables. In this article, we will check out methods to load HDFS file into Netezza Table Using nzload and external tables with some examples. Install Netezza Drivers Before attempting to load…

Continue ReadingLoad HDFS file into Netezza Table Using nzload and External Tables
Comments Off on Load HDFS file into Netezza Table Using nzload and External Tables

Hadoop Security – Hadoop HDFS File Permissions

Hadoop HDFS file permissions are almost similar to the POSIX file system. In a Linux system, we usually create OS level users and make them members of an existing operating system group. But in Hadoop, we create directory and associate it with an owner and a group. Hadoop HDFS File and Directory Permissions The following sections show Hadoop HDFS file and directory permissions: Just like Linux operating system, Hadoop uses notation (r,w) to denote read and write permissions. There is an execute (x) permission for files but you cannot execute…

Continue ReadingHadoop Security – Hadoop HDFS File Permissions
Comments Off on Hadoop Security – Hadoop HDFS File Permissions

Migrating Netezza Data to Hadoop Ecosystem and Sample Approach

In my other post ‘Migrating Netezza to Impala SQL Best Practices’, we have discussed various best practices to migrate the Netezza SQL scripts to Impala SQL. In this article, we will discuss steps on Migrating Netezza Data to Hadoop Ecosystem. Migrating Netezza Data to Hadoop Ecosystem – Offload Netezza data to Hadoop HDFS Now a days Hadoop ecosystem is gaining popularity and organization with huge data wants to migrate to Hadoop ecosystem for their faster analytics that includes real-time or near real-time. Steps to Migrating Netezza Data to Hadoop Ecosystem…

Continue ReadingMigrating Netezza Data to Hadoop Ecosystem and Sample Approach
2 Comments

Hadoop HDFS Architecture Introduction and Design

In this post you will learn about the Hadoop HDFS architecture introduction and its design. The Hadoop Distributed File System (HDFS) is a Java based distributed file system, designed to run on commodity hardwares. It has many similarities with existing available distributed file systems. Hadoop HDFS Architecture Introduction HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. Hadoop HDFS provides high throughput access to application data and is suitable for applications that have large volume of data sets. HDFS has demonstrated production scalability of up to…

Continue ReadingHadoop HDFS Architecture Introduction and Design
Comments Off on Hadoop HDFS Architecture Introduction and Design

Basic Hadoop HDFS Filesystem Operations With Examples

There are many interfaces to HDFS available, but the command line (CLI) is one of the simplest and, to many developers, the most familiar interface. You can perform most advanced and basic Hadoop HDFS filesystem operations using CLI. Basic Hadoop HDFS Filesystem Operations The when Hadoop HDFS filesystem is set, you can do all of the basic HDFS filesystem operations, such as reading files, creating directories, moving files, deleting data, and listing directories. You can also perform the advance Hadoop HDFS filesystem operations such as updates, administrator from command line.…

Continue ReadingBasic Hadoop HDFS Filesystem Operations With Examples
Comments Off on Basic Hadoop HDFS Filesystem Operations With Examples

How to Learn Apache Hadoop

Most of you want to know what Apache Hadoop is, how and where to start learning it? Here I’m going to share you some of steps I followed to learn hadoop.  Don’t worry!  You don’t have to be a Java programmer to learn Hadoop. You should know little bit of basic Linux commands. You will learn all remaining programming languages once you login to cluster :-) Let’s first know what is Hadoop? Apache Hadoop is an open source framework to process very large data sets (BigData). Hadoop allows the distributed storage and…

Continue ReadingHow to Learn Apache Hadoop
Comments Off on How to Learn Apache Hadoop