Data Warehouse related posts
You will hear a lot about data warehouse and data lake when you work on Big Data. Both are widely used for storing Big Data but, they are not interchangeable. In this article, we will check data warehouse and data lake, its definition and differences. Data Warehouse and Data Lake As mentioned earlier, both are used for storing big data. But, they server different purpose when it comes to data usage. Data Warehouse A Data warehouse is an electronic storage of business data for analysis. It is a technique for…
Bigdata technologies such as Hive, HBase, NoSQL taking over industry, thanks to its fast and distributed processing. Hadoop works on commodity hardware, so it is cheap too. Every organization wants to move its data to Bigdata world. If you are reading this article, your organization may be planning to migrate your relational database to Hadoop. Hadoop works best with denormalized tables. In this article, we will check how database Table denormalization works with an example. What is Table Denormalization? Before jumping into denormalization process, let us first understand what is…
Based on what you are working and expected results, you have to use different methodologies and best practices. A data warehouse is no different, you have to use different modeling methodologies based on the type of source data and integration. Big data is a hot cake now, everybody wants to move their data to bigdata world. Traditional methods such as Kibmal’s Star schema and Inmon’s relational 3NF may not work. You have to choose a different approach based on your ecosystem and data. In this article, we will check new…
Data processing to be successful, it is essential to have an overall picture of the data. Descriptive data summarization techniques can be used to identify the typical properties of your data and highlight which data values should be treated as noise or outliers. Therefore, it’s very important to learn about the data characteristics and measure for the same. In this article, we will check Methods to Measure Data Dispersion. Methods to Measure Data Dispersion Let’s know how can we disperse the numeric data or spread the numeric data. Below are five…
A Data Warehouse fact-less fact table is a fact that does not have any measures stored in it. This table will only contain keys from different dimension tables. The fact-less fact is often used to resolve a many-to-many cardinality issue. Types of Fact-less fact tables in Data Warehouse? There are two types of fact-less fact tables Event capturing fact-less fact This type of fact table establishes the relationship among the various dimension members from various dimension tables without any measured value. For examples, Student attendance (student-teacher relation table) capturing table…
Usually, data warehouse adapts either two-tier or three-tier architecture. We have discussed three-tire architecture in my other post 'Data Warehouse Three-tier Architecture'. In this article, we will discuss on the data warehouse two-tier architecture. Data Warehouse Two-tier Architecture The data warehouse two-tier architecture is a client - server application. There is a direct communication between client and data source server, we call it as data layer or database layer. Usually, there is no intermediate application between client and database layer. Below diagram depicts data warehouse two-tier architecture: As shown in…
If you are working on Data warehouse project, than you might have heard lot about surrogate keys. Surrogate keys are widely accepted data warehouse design standard. In this article, we will check data warehouse surrogate key design, advantages and disadvantages. What are surrogate keys in Data warehouse? If you are a data warehouse developer, that you might be thinking what is surrogate key? How and where it is being used? You will get answers to all your questions here. Data warehouse surrogate keys are sequentially generated meaningless numbers associated with…
Building data warehouse is not different than executing other development project such as front-end application. You need to be technical and business person who understand technical details along with organizations business to successfully design and implement data warehouse project. In this article, we will check what the data warehouse project life cycle is and different steps in designing data warehouse project! Steps of Data Warehouse Project Life Cycle Design Following are steps generally followed in any data warehouse projects you can consider these steps as data warehouse lifecycle: Requirements gathering…
There are various types of data mining clustering algorithms but, only few popular algorithms are widely used. Basically, all the clustering algorithms uses the distance measure method, where the data points closer in the data space exhibit more similar characteristics than the points lying further away. Every algorithm follows a different approach to find the ‘similar characteristics’ among the data points. Read: Methods to Measure Data Dispersion Mining Frequent itemsets - Apriori Algorithm 9 Laws Everyone In The Data Mining Should Use Let’s look at the different types of Data Mining…
Usually, data warehouse adapts the three-tier architecture. In this article, we will discuss on the data warehouse three-tier architecture. You can read about read about two-tier architecture in my other post 'Data Warehouse Two-tier architecture in details' Data Warehouse Three-tier Architecture Following are the three-tiers of data warehouse architecture: Bottom Tier The bottom tier of the architecture is the data warehouse database server. It is usually the relational database (RDBMS) system. Data from operational databases and external sources are extracted using application program interfaces and ETL/ELT utilities. You generally use…