DWgeek.com

Data Warehouse Surrogate Key Design – Advantages and Disadvantages

If you are working on Data warehouse project, than you might have heard lot about surrogate keys. Surrogate keys are widely accepted data warehouse design standard. In this article, we will check data warehouse surrogate key design, advantages and disadvantages. What are surrogate keys in Data warehouse? If you are a data warehouse developer, that you might be thinking what is surrogate key? How and where it is being used? You will get answers to all your questions here. Data warehouse surrogate keys are sequentially generated meaningless numbers associated with…

2 Comments

October 31, 2017

BigData

Migrating Netezza Data to Hadoop Ecosystem and Sample Approach

In my other post ‘Migrating Netezza to Impala SQL Best Practices’, we have discussed various best practices to migrate the Netezza SQL scripts to Impala SQL. In this article, we will discuss steps on Migrating Netezza Data to Hadoop Ecosystem. Migrating Netezza Data to Hadoop Ecosystem – Offload Netezza data to Hadoop HDFS Now a days Hadoop ecosystem is gaining popularity and organization with huge data wants to migrate to Hadoop ecosystem for their faster analytics that includes real-time or near real-time. Steps to Migrating Netezza Data to Hadoop Ecosystem…

2 Comments

October 30, 2017

Data Warehouse

Data Warehouse Project Life Cycle and Design

Building data warehouse is not different than executing other development project such as front-end application. You need to be technical and business person who understand technical details along with organizations business to successfully design and implement data warehouse project. In this article, we will check what the data warehouse project life cycle is and different steps in designing data warehouse project! Steps of Data Warehouse Project Life Cycle Design Following are steps generally followed in any data warehouse projects you can consider these steps as data warehouse lifecycle: Requirements gathering…

Comments Off

October 30, 2017

BigData

Hive Create External Tables and Examples

A Hive external table allows you to access external HDFS file as a regular managed tables. You can join the external table with other external table or managed table in the Hive to get required information or perform the complex transformations involving various tables. In this article, we will check on Hive create external tables with an examples. You have to create external table same as if you are creating managed tables. LOCATION is mandatory for creating external tables. LOCATION indicates the location of the HDFS flat file that you want…

Comments Off

October 26, 2017

General

Teradata Set Operators: UNION, UNION ALL, INTERSECT, EXCEPT/MINUS

You can use the Teradata set operators to combine similar data sets from two or more SELECT statements in the query. The data types of the columns which are being used in the Teradata SET operators should match or explicitly type cast column values to required data types. The SET operators are similar to the JOINs, the only difference is that join combines the columns from different tables whereas SET operators combine rows from different tables. Read: Teradata String Functions and Examples Commonly used Teradata Date Functions and Examples Teradata…

Comments Off

October 25, 2017

General

Commonly used Teradata Date Functions and Examples

This article is about detailed descriptions and examples of the commonly used Teradata date functions that you can use to manipulate date columns in the Teradata, stored procedure or in embedded SQLs. In the real word scenario, many application manipulate the date and time data types. Date types are highly formatted and very complicated. Each date value contains the century, year, month, day, hour, minute, and second. Each RDBMS may employ different date functions, and there may also be differences in the syntax for each RDBMS even when the function…

Comments Off

October 25, 2017

Data Mining

Different types of Data Mining Clustering Algorithms and Examples

There are various types of data mining clustering algorithms but, only few popular algorithms are widely used. Basically, all the clustering algorithms uses the distance measure method, where the data points closer in the data space exhibit more similar characteristics than the points lying further away. Every algorithm follows a different approach to find the ‘similar characteristics’ among the data points. Read: Methods to Measure Data Dispersion Mining Frequent itemsets - Apriori Algorithm 9 Laws Everyone In The Data Mining Should Use Let’s look at the different types of Data Mining…

Comments Off

October 24, 2017

BigData

Hadoop Hive Create, Drop, Alter, Use Database Commands and Examples

Hadoop Hive is database framework on the top of Hadoop distributed file systems (HDFS) developed by Facebook to analyze structured data. It supports almost all commands that regular database supports. Hadoop hive create, drop, alter, use database commands are database DDL commands. This article explains these commands with an examples. Hive contains a default database named default. Read: Hive Create External Tables and Examples Hadoop Hive SHOW DATABASES commds This command displays all the databases available in Hive. Below is the example of using show database command: hive> show databases;…

Comments Off

October 12, 2017

BigData

Hive Create Table Command and Examples

The syntax of creating a Hive table is quite similar to creating a table using SQL. In this article explains Hive create table command and examples to create table in Hive command line interface. You will also learn on how to load data into created Hive table. Hive Create Table Command Hive Create Table statement is used to create table. You can also create the table hive while importing data using Sqoop command. To use, Sqoop create Hive table command, you should specify the --create-hive-table option in Sqoop command. You…

Comments Off

October 11, 2017

BigData

Teradata Analytics Functions and Examples

Teradata analytic functions compute an aggregate value that is based on a group of rows optionally partitioning among rows based on given partition column. Just like other analytics systems, Teradata analytics functions works on the group of rows and optionally ignores the NULL in the data. Teradata also released the analytics system which provides more useful methods. The regular Teradata also provides some useful analytics function that you can use to perform day to day aggregation for reporting. Read: Teradata String Functions and Examples Teradata Set Operators: UNION, UNION ALL,…

Comments Off

October 11, 2017