Hive Incremental Load Options and Examples

The incremental load is very common in a data warehouse environment. Incremental load is commonly used to implement slowly changing dimensions. When you migrate your data to the Hadoop Hive, you might usually keep the slowly changing tables to sync up tables with the latest data. In this article, we will check Hadoop Hive incremental load options and some examples. Hive Incremental Load Options There are many methods you can use. Apache Hive introduced to ACID supports since Hive 0.14. Following are the couple of methods that you can use…

Continue ReadingHive Incremental Load Options and Examples
Comments Off on Hive Incremental Load Options and Examples

Types of Dimension Tables in a Data Warehouse

A dimension is something that qualifies the quantity or measures. Dimensions store the textual descriptions of the business. With help of dimension you can easily identify the measures.The different types of dimension tables are available as below: Read: Types of Fact Tables in a Data Warehouse Data Warehouse Fact Constellation Schema and Design Data Warehouse Three-tier Architecture in Details Types of Dimension Tables in a Data Warehouse Below are the commonly used dimension tables in data warehouse: Conformed Dimension A conformed dimension is the dimension that is shared across multiple data…

Continue ReadingTypes of Dimension Tables in a Data Warehouse
Comments Off on Types of Dimension Tables in a Data Warehouse

Types of Fact Tables in a Data Warehouse

A fact table stores quantitative information for analysis and is often denormalized. A fact table holds the measures, metrics and other quantifiable information. The different types of fact tables are as explained below: Read: Data Warehouse fact-less fact and Examples Slowly changing dimension Types of Dimension Tables in a Data Warehouse Types of Facts There are three types of facts: Additive Facts Additive facts can be used with any aggregation function like Sum(), Avg() etc. Example is Quantity, sales amount etc. Semi Additive Facts Semi-additive facts are those where only…

Continue ReadingTypes of Fact Tables in a Data Warehouse
Comments Off on Types of Fact Tables in a Data Warehouse

Rapidly Changing Dimension (RCD) in Data Warehouse

A dimension is a fast changing or rapidly changing dimension if one or more of its attributes in the table changes very fast and in many rows. Handling rapidly changing dimension in data warehouse is very difficult because of many performance implications. As you know slowly changing dimension type 2 is used to preserve the history for the changes. But the problem with type 2 is, with each and every change in the dimension attribute, it adds new row to the table. If in case there are dimensions that are…

Continue ReadingRapidly Changing Dimension (RCD) in Data Warehouse
2 Comments

Slowly Changing Dimensions (SCD) in Data Warehouse

Slowly changing dimensions or SCD are dimensions that changes slowly over time, rather than regular bases.  In data warehouse environment, there may be a requirement to keep track of the change in dimension values and are used to report historical data at any given point of time. We can implement slowly changing dimensions (SCD) using various approaches, such as; Type 0: Always retains originalType 1 : Keeps latest data, old data is overwrittenType 2 : Keeps the history of old data by adding new rowType 3 : Adds new attribute to store changed valueType 4 : Uses…

Continue ReadingSlowly Changing Dimensions (SCD) in Data Warehouse
Comments Off on Slowly Changing Dimensions (SCD) in Data Warehouse

Design Slowly Changing Dimension Type 2 in SQL Cont

This is a continuation of the Design Slowly Changing Dimension (SCD) Type2 in SQL Step 4: Copy only updated records from LOAD table These are records which are updated in this load cycle. Since updated records are already in the LOAD table, you have  to compare the LOAD and TGT records on Patient ID or SK column and copy the record which is updated. You should also keep in mind to set EFF_FROM_DT and flag to indicate active record. The updated records will be active from the moment we load them so EFF_FR_DT…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL Cont
2 Comments

Design Slowly Changing Dimension Type 2 in SQL

Dimensions in data warehousing contain relatively static data about entities such as customers, stores, locations etc. Slowly changing dimensions commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases.  Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. For example, lets take the example of patient details. The fact table may contains the information about patient expense details. The fact and dimensions are always linked by means of foreign keys. One of the dimension may contain the information about patient (say, patient dimension…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL
2 Comments