What is a Data Warehouse?
Do you know
any related sites?
Glossary of terms
Sizing Them Up - Data Marts vs. Data Warehouses

Although a data warehouse and data mart are architecturally the same, they are very different in structure and in the information that they store.

What is a Data Mart?

A data mart is a decentralized subset of data found either in a data warehouse or as a standalone subset designed to support the unique business unit requirements of an organization. For instance, the finance department has a data mart that is separate from the marketing team's data mart, and so on. Each individual department owns the hardware, software, data and programs that constitute the data mart.

There are two kinds of data marts - dependent and independent. A dependent data mart is one whose source is a data warehouse. An independent data mart is one whose source is the legacy applications environment. All dependent data marts are fed by the same source - the data warehouse. Contrastly, an independent data mart is populated with data in a unique and separate manner by the legacy applications environment.

What is a Data Warehouse?

A data warehouse is an architecture used to maintain critical historical data that has been extracted from operational data storage and transformed into formats accessible for business analysis. The information within a data warehouse differs significantly from the data stored in a data mart. The data mart contains aggregated or summarized data; whereas, the data warehouse contains detailed data. As well, data warehouses are not owned by individual but rather, by a team spannining IT professionals, business managers and developers that oversee the implementation and maintenance of the data warehouse.

The following are just some of the other differences between a data warehouse and a data mart:

Data Warehouse Data Mart
Scope Application-neutral
Centralized, shared
Cross LOB/enterprise
Architected
Specific application requirement
LOB, department or user area
Business-process-oriented
Multiple databases with redundant data
Data Perspective Historical-detailed data
Some summary
Lightly denormalized
Detailed (some history)
Summarized
Highly denormalized
Subjects Multiple subject areas
Multiple partial subject areas
Single subject
Operational source snapshot
Data Sources Many
Operational, external data
Few
Operational, external data
OLTP database snapshot
"Boot leg" data extract
Implementation Timeframe 9-18 months for first stage
(two or three subject areas)
Multiple-stage implementation
4-12 months
Characteristics Flexible
Durable/strategic
Data oriented
Restrictive
Short life/tactical
Project orientation
Source: GARTNER GROUP INC.