Vithal S, Author at DWgeek.com

Working with Materialized Views in Netezza

When you create a materialized views from a base table, the Netezza system stores the view definition for the lifetime of the SPM view and is visible as a materialized view. SPM view data slices are co-located on the same data slices as the corresponding base table data slices hence increases the performance of the query. A materialized views reduces the width of number of columns being scanned in a base table, these type of view contains a small subset of frequently queried columns. When you query the table (table with large number of…

Comments Off

August 7, 2016

Netezza

How to install vmware player and Netezza Emulator on ubuntu

The Netezza emulator is an emulated Netezza appliance that runs both a virtual host and a virtual S-Blade. It is a fully functional system to be used for initial development and testing only. This tool should never be used in the PRODUCTION environment. It is simply a tool for developers only and you cannot use it for performance optimization. You can use this tool to learn the Netezza appliance. Note: This tutorial is tested on the ubuntu 14.04 Useful Netezza Emulator Information Short information that you might want to know before start using Netezza emulator You should treat…

Comments Off

August 4, 2016

Data Warehouse

Data Warehouse Snowflake Schema Model and Design

Data warehouse Snowflake schema is extension of star schema data warehouse design methodology, a centralized fact table references to number of dimension tables, however, one or more dimension tables are normalized i.e. dimension tables are connected with other dimension tables. Primary Keys from the dimensions flows into fact table as foreign key. Star Schema model in Data Warehouse Data Warehouse Fact Constellation Schema and Design Snowflake schema increases the level of normalization in data, the dimension table is normalized into multiple tables. This schema has a disadvantage in terms of data retrieval, we…

Comments Off

August 3, 2016

Data Warehouse

Step by Step Guide to Dimensional Data Modeling

In this post, you will learn about the step by step guide to dimensional data modeling. You will see how to use dimensional modeling technique in real life scenarios. What is Dimensional data Modeling? Dimensional data modeling is one of the data modeling techniques used in data warehouse design. The main goal of this modeling is to improve the data retrieval, it is optimized for the SELECT operation. Dimensional data modelling is best suited for the data warehouse star and snow flake schema. Dimensional data modeling in data warehouse is different than the…

4 Comments

August 1, 2016

Data Warehouse

Data Warehouse Star Schema Model and Design

Data warehouse Star schema is a popular data warehouse design and dimensional model, which divides business data into fact and dimensions. In this model, centralized fact table references many dimension tables and primary keys from dimension table flows into fact table as a foreign key. This entity-relationship diagram looks star, hence the name star schema. This model divides the business data into fact which holds the measurable data, and dimension that holds descriptive attributes related to the fact data. For examples, fact data includes price, quantity, weight measurements and related dimension attributes example includes product color, sales…

Comments Off

July 30, 2016

Netezza

Working with Netezza Stored Procedures

Netezza stored procedures are used to encapsulate the business logic and same time handle the exceptions. SQL provides the power to get and update the database information on the host server, and the procedure language provides the logic for if-then-else branching and application processing on the data. Read: Netezza RECORD Type Variable, Usage and Examples Netezza Stored Procedure ARRAY Variables and Examples For example, you may want to check the table if its existed in database before dropping it. You achieve this by creating stored procedure. e.g. CALL DROP_IF_EXIST(table_name); if you…

Comments Off

July 30, 2016

Netezza

Working with Netezza Clustered Base Tables (CBT)

A Netezza clustered base tables (CBT) are user table that has data which is organized using one to four organizing keys columns. You can specify max four columns in organize on clause and those columns should not be a part of distribute on clause. An organizing key is a column of the table that you specify for clustering the table records; organizing table helps Netezza to save records in same or nearby extents. You can organize the records using "ORGANIZE ON" clause. Netezza does create zone maps on organizing columns, which will accelerate the performance of queries on that…

Comments Off

July 26, 2016

BigData

Hadoop Single Node Cluster Setup on Ubuntu

In this tutorial, I will explain you setting up Hadoop single node cluster setup on Ubuntu 14.04. Single node cluster will sit on the top of Hadoop Distributed File System (HDFS). Hadoop single node cluster setup on Ubuntu 14.04 Hadoop is a Java framework for running application on the large cluster made up of commodity hardware's. Hadoop framework allows us to run MapReduce programs on file system stored in highly fault-tolerant Hadoop distributed file systems. Related Readings: How to Learn Apache Hadoop Also: 7 Best Books to Learn Bigdata Hadoop The main…

Comments Off

July 23, 2016

BigData

7 Best Hadoop Books to Learn Bigdata Hadoop

The Hadoop ecosystem is vast and may take long time to learn bigdata and start implement applications therefore people new to big data Hadoop technology must choose right book to start with. Here are some of Best Hadoop books you may want to consider. The Hadoop Bigdata has a huge demand in the domains like finance, Insurance, Banking, social networking and many other platforms that deal with very large data sets. The Hadoop experts are in great demand in industries which needs to handle and big, complicated data sets. A working knowledge of…

1 Comment

July 16, 2016

BigData

How to Learn Apache Hadoop

Most of you want to know what Apache Hadoop is, how and where to start learning it? Here I’m going to share you some of steps I followed to learn hadoop. Don’t worry! You don’t have to be a Java programmer to learn Hadoop. You should know little bit of basic Linux commands. You will learn all remaining programming languages once you login to cluster :-) Let’s first know what is Hadoop? Apache Hadoop is an open source framework to process very large data sets (BigData). Hadoop allows the distributed storage and…

Comments Off

July 16, 2016