Hive Merge Tables Statement – Alternative and Example

The MERGE query or statement in SQL is used to perform incremental load. With the help of SQL MERGE statement, you can perform UPDATE and INSERT simultaneously based on the condition. i.e. you can update old values and insert new records. The MERGE statement in SQL are mainly used to implement slowly changing dimensions. As of now, Hive does not support MERGE statement. In this article, we will check what is Hive Merge tables alternative with an example. Sometimes, update insert is also called UPSERT. Related Article, Slowly changing dimension…

Continue ReadingHive Merge Tables Statement – Alternative and Example
Comments Off on Hive Merge Tables Statement – Alternative and Example

Impala or Hive Slowly Changing Dimension – SCD Type 2 Implementation

Slowly changing dimensions in Data warehouse are commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases. Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. Since Cloudera impala or Hadoop Hive does not support update statements, you have to implement the update using intermediate tables. In this article, we will check Cloudera Impala or Hive Slowly Changing Dimension - SCD Type 2 Implementation steps with an example. For demonstration purpose, lets take the example…

Continue ReadingImpala or Hive Slowly Changing Dimension – SCD Type 2 Implementation
2 Comments

Rapidly Changing Dimension (RCD) in Data Warehouse

A dimension is a fast changing or rapidly changing dimension if one or more of its attributes in the table changes very fast and in many rows. Handling rapidly changing dimension in data warehouse is very difficult because of many performance implications. As you know slowly changing dimension type 2 is used to preserve the history for the changes. But the problem with type 2 is, with each and every change in the dimension attribute, it adds new row to the table. If in case there are dimensions that are…

Continue ReadingRapidly Changing Dimension (RCD) in Data Warehouse
2 Comments

Design Slowly Changing Dimension Type 2 in SQL

Dimensions in data warehousing contain relatively static data about entities such as customers, stores, locations etc. Slowly changing dimensions commonly known as SCD, usually captures the data that changes slowly but unpredictably, rather than regular bases.  Slowly changing dimension type 2 is most popular method used in dimensional modelling to preserve historical data. For example, lets take the example of patient details. The fact table may contains the information about patient expense details. The fact and dimensions are always linked by means of foreign keys. One of the dimension may contain the information about patient (say, patient dimension…

Continue ReadingDesign Slowly Changing Dimension Type 2 in SQL
2 Comments