Data Warehouse related posts

Step by Step Guide to Dimensional Data Modeling

In this post, you will learn about the step by step guide to dimensional data modeling. You will see how to use dimensional modeling technique in real life scenarios. What is Dimensional data Modeling? Dimensional data modeling is one of the data modeling techniques used in data warehouse design. The main goal of this modeling is to improve the data retrieval, it is optimized for the SELECT operation. Dimensional data modelling is best suited for the data warehouse star and snow flake schema. Dimensional data modeling in data warehouse is different than the…

Continue ReadingStep by Step Guide to Dimensional Data Modeling
4 Comments

Data Warehouse Star Schema Model and Design

Data warehouse Star schema is a popular data warehouse design and dimensional model, which divides business data into fact and dimensions. In this model, centralized fact table references many dimension tables and primary keys from dimension table flows into fact table as a foreign key. This entity-relationship diagram looks star, hence the name star schema. This model divides the business data into fact which holds the measurable data, and dimension that holds descriptive attributes related to the fact data. For examples, fact data includes price, quantity, weight measurements and related dimension attributes example includes product color, sales…

Continue ReadingData Warehouse Star Schema Model and Design
Comments Off on Data Warehouse Star Schema Model and Design

Mining Frequent itemsets – Apriori Algorithm

Apriori algorithm is an algorithm for frequent item set mining and association rule learning over transaction databases. Its followed by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database. Read: Methods to Measure Data Dispersion 9 Laws Everyone In The Data Mining Should Use Various Data Mining Clustering Algorithms and Examples…

Continue ReadingMining Frequent itemsets – Apriori Algorithm
Comments Off on Mining Frequent itemsets – Apriori Algorithm

9 Laws Everyone In The Data Mining Should Use

DATA MINING is a powerful new technology with a great potential to help companies focus on more important information by extracting the hidden predictive information from large database in their data warehouses. There are some 9 data mining laws that miner should follow when performing mining on particular data sets. Data mining provides two types of results: Business Insights Predictive models, makes predictions automatically. It includes a various methods that include, clustering, classification and market basket analysis, etc. Read: 9 Laws Everyone In The Data Mining Should Use Mining Frequent itemsets - Apriori Algorithm…

Continue Reading9 Laws Everyone In The Data Mining Should Use
Comments Off on 9 Laws Everyone In The Data Mining Should Use

Slowly Changing Dimensions (SCD) in Data Warehouse

Slowly changing dimensions or SCD are dimensions that changes slowly over time, rather than regular bases.  In data warehouse environment, there may be a requirement to keep track of the change in dimension values and are used to report historical data at any given point of time. We can implement slowly changing dimensions (SCD) using various approaches, such as; Type 0: Always retains originalType 1 : Keeps latest data, old data is overwrittenType 2 : Keeps the history of old data by adding new rowType 3 : Adds new attribute to store changed valueType 4 : Uses…

Continue ReadingSlowly Changing Dimensions (SCD) in Data Warehouse
Comments Off on Slowly Changing Dimensions (SCD) in Data Warehouse

Design Slowly Changing Dimension Type 2 in SQL Cont

This is a continuation of the Design Slowly Changing Dimension (SCD) Type2 in SQL Step 4: Copy only updated records from LOAD table These are records which are updated in this load cycle. Since updated records are already in the LOAD table, you have  to compare the LOAD and TGT records on Patient ID or SK column and copy the record which is updated. You should also keep in mind to set EFF_FROM_DT and flag to indicate active record. The updated records will be active from the moment we load them so EFF_FR_DT…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL Cont
2 Comments

Design Slowly Changing Dimension Type 2 in SQL

Dimensions in data warehousing contain relatively static data about entities such as customers, stores, locations etc. Slowly changing dimensions commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases.  Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. For example, lets take the example of patient details. The fact table may contains the information about patient expense details. The fact and dimensions are always linked by means of foreign keys. One of the dimension may contain the information about patient (say, patient dimension…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL
2 Comments