Based on what you are working and expected results, you have to use different methodologies and best practices. A data warehouse is no different, you have to use different modeling methodologies based on the type of source data and integration. Big data is a hot cake now, everybody wants to move their data to bigdata world. Traditional methods such as Kibmal’s Star schema and Inmon’s relational 3NF may not work. You have to choose a different approach based on your ecosystem and data. In this article, we will check new approach called data vault modeling methodology and its architecture.
Data Vault Modeling Methodology
Data vault modeling was originally created by Dan Linstedt in the 1990s and was released in 2000 as a public domain modeling method.
The data vault modeling is a database modeling methodology designed to store long term historical data from various heterogeneous sources.
The data vault model is relatively new approach in the data warehouse world. It is a hybrid data modeling methodology providing historical data representation from heterogeneous (multiple) sources designed to be resilient to environmental changes. Read more about data vault model in wiki
Related articles:
Data Vault Modeling Architecture
Data vault model contains three basic tables:
- Hubs
- Links
- Satellites
Hubs
This table contains unique business keys with low probability to change. The table also contains a surrogate key for each hub item and metadata information describing sources of business keys.
Links
This table establishes a relationship between business keys; These tables are basically a many-to-many relationship. Links typically link hubs, it can even link other links tables and satellites.
The links are often used to deal with changes in data granularity reducing the impact of adding a new business key to a linked Hub.
Satellites
Satellite tables hold descriptive data that can change over time. You can relate these tables as slowly changing dimension type 2 tables. In the data vault model, Hubs and Links tables, form the structure of the data model, whereas Satellites tables contain temporal and descriptive attributes. Satellite tables also contain metadata linking them to their parent Hub or Link tables.
Related articles:
Data Vault Modeling Advantages
There are many key advantages of data vault model:
- Data vault model is designed to store records. It makes the data ingestion process simpler.
- With this approach, you can easily add new data source without affecting existing schema.
- It removed the cleansing part that is usually included in Star Schema methodology.
- You can easily automate the ETL processes.
Hope this helps 🙂