nzload Command and its Usage in Netezza

nzload is bulk copy command available in Netezza.  This is a command that provides an easy method for using external tables and getting data into the Netezza appliance. Read: guide to load data into Netezza Working with Netezza Zone Maps Commonly used Netezza Basic Commands Netezza COPY Command Syntax and Examples How the nzload Command Works? The nzload command is a SQL CLI client tool that allows you to load data from the local or a remote client, on all the supported client platforms (Linux/windows). The nzload command processes command-line…

Continue Readingnzload Command and its Usage in Netezza
4 Comments

Slowly Changing Dimensions (SCD) in Data Warehouse

Slowly changing dimensions or SCD are dimensions that changes slowly over time, rather than regular bases.  In data warehouse environment, there may be a requirement to keep track of the change in dimension values and are used to report historical data at any given point of time. We can implement slowly changing dimensions (SCD) using various approaches, such as; Type 0: Always retains originalType 1 : Keeps latest data, old data is overwrittenType 2 : Keeps the history of old data by adding new rowType 3 : Adds new attribute to store changed valueType 4 : Uses…

Continue ReadingSlowly Changing Dimensions (SCD) in Data Warehouse
Comments Off on Slowly Changing Dimensions (SCD) in Data Warehouse

Design Slowly Changing Dimension Type 2 in SQL Cont

This is a continuation of the Design Slowly Changing Dimension (SCD) Type2 in SQL Step 4: Copy only updated records from LOAD table These are records which are updated in this load cycle. Since updated records are already in the LOAD table, you have  to compare the LOAD and TGT records on Patient ID or SK column and copy the record which is updated. You should also keep in mind to set EFF_FROM_DT and flag to indicate active record. The updated records will be active from the moment we load them so EFF_FR_DT…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL Cont
2 Comments

Design Slowly Changing Dimension Type 2 in SQL

Dimensions in data warehousing contain relatively static data about entities such as customers, stores, locations etc. Slowly changing dimensions commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases.  Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. For example, lets take the example of patient details. The fact table may contains the information about patient expense details. The fact and dimensions are always linked by means of foreign keys. One of the dimension may contain the information about patient (say, patient dimension…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL
2 Comments

Netezza TwinFin Architecture

Netezza is a data warehouse and big data analytics appliance. It uses Asymmetric Massively Parallel Processing (AMPP) architecture, which combines an SMP front end with a shared MPP back end for query processing. Netezza is a result of  database integration , processing engine and storage in a system. Netezza architecture resembles Hadoop cluster design in may ways. e.g. Distribution, active-passive node, data storing methods, replications etc Check out: Key Features of Netezza Appliances Netezza Striper Architecture: N2001 Technical Specification Netezza Mako Architecture: N3001 Technical Specification Commonly used Netezza Utilities Related book: Netezza underground - This…

Continue ReadingNetezza TwinFin Architecture
1 Comment

Import data using Apache Sqoop

Sqoop allows easy import of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. Sqoop can integrates with Oozie, allowing you to schedule and automate tasks. Read: Sqoop import Relational Database Table into HBase Table Export data using Sqoop Sqoop Command with Secure Password Import Netezza Tables using Apache Sqoop Sqoop uses a connector based architecture which supports plugins that provide connectivity to new…

Continue ReadingImport data using Apache Sqoop
1 Comment

Export data using Apache Sqoop

In some cases data processed by Hadoop pipelines may be needed in production systems to help run additional critical business functions. The sqoop can exports a set of files from HDFS to an RDBMS. The target table must already exist in the database. The input files are read and parsed into a set of records according to the user-specific delimiters. Delimiter is provided in the sqoop options file or CLI options.

Continue ReadingExport data using Apache Sqoop
Comments Off on Export data using Apache Sqoop

Sqoop Architecture – Mappers with No Reducers

Sqoop is a tool designed to transfer data between Hadoop and various relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as Netezza, MySQL, Oracle or SQL Server into the Hadoop Distributed File System (HDFS), transform the data and perform complex calculations in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop is based on a connector architecture which supports plugins to provide connectivity to external systems (RDBMS) .

Continue ReadingSqoop Architecture – Mappers with No Reducers
Comments Off on Sqoop Architecture – Mappers with No Reducers