Design Slowly Changing Dimension Type 2 in SQL

Dimensions in data warehousing contain relatively static data about entities such as customers, stores, locations etc. Slowly changing dimensions commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases.  Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. For example, lets take the example of patient details. The fact table may contains the information about patient expense details. The fact and dimensions are always linked by means of foreign keys. One of the dimension may contain the information about patient (say, patient dimension…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL
2 Comments

Netezza TwinFin Architecture

Netezza is a data warehouse and big data analytics appliance. It uses Asymmetric Massively Parallel Processing (AMPP) architecture, which combines an SMP front end with a shared MPP back end for query processing. Netezza is a result of  database integration , processing engine and storage in a system. Netezza architecture resembles Hadoop cluster design in may ways. e.g. Distribution, active-passive node, data storing methods, replications etc Check out: Key Features of Netezza Appliances Netezza Striper Architecture: N2001 Technical Specification Netezza Mako Architecture: N3001 Technical Specification Commonly used Netezza Utilities Related book: Netezza underground - This…

Continue ReadingNetezza TwinFin Architecture
1 Comment

Import data using Apache Sqoop

Sqoop allows easy import of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. Sqoop can integrates with Oozie, allowing you to schedule and automate tasks. Read: Sqoop import Relational Database Table into HBase Table Export data using Sqoop Sqoop Command with Secure Password Import Netezza Tables using Apache Sqoop Sqoop uses a connector based architecture which supports plugins that provide connectivity to new…

Continue ReadingImport data using Apache Sqoop
1 Comment

Export data using Apache Sqoop

In some cases data processed by Hadoop pipelines may be needed in production systems to help run additional critical business functions. The sqoop can exports a set of files from HDFS to an RDBMS. The target table must already exist in the database. The input files are read and parsed into a set of records according to the user-specific delimiters. Delimiter is provided in the sqoop options file or CLI options.

Continue ReadingExport data using Apache Sqoop
Comments Off on Export data using Apache Sqoop

Sqoop Architecture – Mappers with No Reducers

Sqoop is a tool designed to transfer data between Hadoop and various relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as Netezza, MySQL, Oracle or SQL Server into the Hadoop Distributed File System (HDFS), transform the data and perform complex calculations in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop is based on a connector architecture which supports plugins to provide connectivity to external systems (RDBMS) .

Continue ReadingSqoop Architecture – Mappers with No Reducers
Comments Off on Sqoop Architecture – Mappers with No Reducers