Database Table Denormalization Example

Bigdata technologies such as Hive, HBase, NoSQL taking over industry, thanks to its fast and distributed processing. Hadoop works on commodity hardware, so it is cheap too. Every organization wants to move its data to Bigdata world. If you are reading this article, your organization may be planning to migrate your relational database to Hadoop. Hadoop works best with denormalized tables. In this article, we will check how database Table denormalization works with an example.

What is Table Denormalization?

Before jumping into denormalization process, let us first understand what is denormalization?

Denormalization is a database optimization technique in which we deliberately add redundant data to one or more tables. This process will help you to avoid multiple table joins. In any query engine joins are costly operations, you should use them only when absolutely necessary. Denormalization is an optimization technique that is applied after doing normalization.

In a traditional normalized database, we store data in separate logical tables and attempt to minimize redundant data. Logical tables are usually connected using primary key and foreign key constraints. For example, you can have customer and customer address tables.

In the denormalization process, you can either add redundant columns to master table or create a new denomalized table. For example, you can update customer table with largely used customer address columns such as city, state, zip code, etc.

Advantages of Denormalization

Below are some of advantages of denormalization:

Below are some of pros of denormalization.

  • Data retrieval will be faster
  • Avoid multiple table joins
  • Query will be easy to read as it will refer fewer tables

Disadvantages of Denormalization

Denormalization will comes with some disadvantages. Below are some cons of denormalization.

  • Extra storage space
  • Update and insert operations are more expensive
  • Data redundancy
  • Potential data anomalies

Table Denormalization Example

There are many types of denomalization process. For simplicity, we will check one of denormalization process called “Storing detail data in master table”.

There can be a situation when you need certain columns to be used with master tables. For example, when you query “Customer” table, you are likely to use “Customer Address” table columns such as city, state, zip code, etc. In this example we will denormalize the customer table to add required columns from the address table.

Note that, this technique proves most useful when there are few records in the detail table. If the table size is huge then it will become difficult to manage and rebuild it.

Here is the customer and customer address tables with relation.

Database Table Denormalization Example

Below is the denormalized customer table DDL.

 CREATE TABLE ADMIN.CUSTOMER_DENORMALIZED
(
	C_CUSTOMER_SK INTEGER NOT NULL,
	C_CUSTOMER_ID CHARACTER(16) NOT NULL,
	C_CURRENT_CDEMO_SK INTEGER,
	C_CURRENT_HDEMO_SK INTEGER,
	C_CURRENT_ADDR_SK INTEGER,
	C_FIRST_SHIPTO_DATE_SK INTEGER,
	C_FIRST_SALES_DATE_SK INTEGER,
	C_SALUTATION CHARACTER(10),
	C_FIRST_NAME CHARACTER(20),
	C_LAST_NAME CHARACTER(30),
	C_PREFERRED_CUST_FLAG CHARACTER(1),
	C_BIRTH_DAY INTEGER,
	C_BIRTH_MONTH INTEGER,
	C_BIRTH_YEAR INTEGER,
	C_BIRTH_COUNTRY CHARACTER VARYING(20),
	C_LOGIN CHARACTER(13),
	C_EMAIL_ADDRESS CHARACTER(50),
	C_LAST_REVIEW_DATE CHARACTER(10),
    CA_CITY CHARACTER VARYING(60),
    CA_COUNTY CHARACTER VARYING(30),
    CA_STATE CHARACTER(2),
    CA_ZIP CHARACTER(10),
    CA_COUNTRY CHARACTER VARYING(20)
)
DISTRIBUTE ON (C_CUSTOMER_SK);

Last five columns are taken from a customer address. Now, you can refer single table for all your queries that involves address related columns.

Related Articles

Hope this helps 🙂