| Sizing Them Up - Data Marts vs. Data Warehouses
Although a data warehouse and data mart are architecturally the same, they are
very different in structure and in the information that they store.
What is a Data Mart?
A data mart is a decentralized subset of data found either in a data warehouse
or as a standalone subset designed to support the unique business unit
requirements of an organization. For instance, the finance department has a
data mart that is separate from the marketing team's data mart, and so on. Each
individual department owns the hardware, software, data and programs that
constitute the data mart.
There are two kinds of data marts - dependent and independent. A dependent data
mart is one whose source is a data warehouse. An independent data mart is one
whose source is the legacy applications environment. All dependent data marts
are fed by the same source - the data warehouse. Contrastly, an independent
data mart is populated with data in a unique and separate manner by the legacy
applications environment.
What is a Data Warehouse?
A data warehouse is an architecture used to maintain critical historical data
that has been extracted from operational data storage and transformed into
formats accessible for business analysis. The information within a data
warehouse differs significantly from the data stored in a data mart. The data
mart contains aggregated or summarized data; whereas, the data warehouse
contains detailed data. As well, data warehouses are not owned by individual
but rather, by a team spannining IT professionals, business managers and
developers that oversee the implementation and maintenance of the data
warehouse.
The following are just some of the other differences between a data warehouse
and a data mart:
|
|
Data Warehouse
|
Data Mart
|
| Scope |
Application-neutral
Centralized, shared
Cross LOB/enterprise
Architected
|
Specific application requirement
LOB, department or user area
Business-process-oriented
Multiple databases with redundant data
|
| Data Perspective |
Historical-detailed data
Some summary
Lightly denormalized
|
Detailed (some history)
Summarized
Highly denormalized
|
| Subjects
|
Multiple subject areas Multiple partial subject areas
|
Single subject Operational source snapshot
|
| Data Sources
|
Many
Operational, external data
|
Few
Operational, external data
OLTP database snapshot
"Boot leg" data extract
|
| Implementation Timeframe |
9-18 months for first stage
(two or three subject areas)
Multiple-stage implementation
|
4-12 months
|
| Characteristics |
Flexible
Durable/strategic
Data oriented
|
Restrictive
Short life/tactical
Project orientation
|
| Source: GARTNER GROUP INC. |
|