Data warehouse

A data warehouse is a centralized place where data from many different sources can be stored.

An ETL model separates data in the warehouse based on whether they have already been extracted, transformed or loaded.

An ELT model first loads the data into the warehouse and transforms the data after it's been loaded.

Why it's useful

A data warehouse needs effort to be created alongside the source storage.

Allows to view data from multiple sources in one place
It doesn't slow down the source storage while running queries
Allows to check data history

As opposed to a datamart, a data warehouse stores data for the whole company, not just for a given topic in the company.

Difference between data warehouse and data mart
Attribute	Data warehouse	Data mart
Scope of the data	enterprise-wide	department-wide
Number of subject areas	multiple	single
How difficult to build	difficult	easy
How much time takes to build	more	less
Amount of memory	larger	limited

Information storage

Facts

A fact is a value.

Dimensional versus normalized approach for storage of data

There are three or more leading approaches to storing data in a data warehouse – the most important approaches are the dimensional approach and the normalized approach.

Dimensional approach

A fact is concrete piece of data. Dimensions are the metadata about the fact.

Separation between data (facts) and contextual metadata (dimensions) makes the data warehouse simpler.

Some disadvantages of the dimensional approach are the following:

When loading data from different systems some dimensions may be missing.
If organization changes its business, both facts and dimensions may change . Changing to new structure requires either supporting reading from both old and new structure or converting old structure to the new structure.

Normalized approach

In the normalized approach, the data in the data warehouse are stored partially according to database normalization rules. Normalization removes data duplication by splitting each piece of data into entities. Each entity (e.g. customer, product, sale) relates to other entities. Reading a piece of data requires joining these entities together making reads slower.