| Considerations for Using Aggregate Tables and OLAP
Many data warehousing architects and DBAs face the dual challenge of maximizing
flexibility within their data warehousing environment, while at the same time,
ensuring maximum performance. This challenge can be overcome by developing a
strategy on how to manage aggregates, or pre-calculated data summarizations.
The resulting aggregate strategy can play a critical role in a data warehousing
initiative.
Aggregation refers to the gathering of information in separate sets from two or
more sources. Often, this data is stored in a data warehouse in a summarized
form. For example, you may wish to summarize the data by various time periods.
Aggregates are used for two main reasons. One is to save storage space; data
warehouses can get very large, and the use of aggregates greatly reduces the
space needed to store data. The second reason is to improve the performance of
business intelligence tools. When queries run faster, they take up less
processing time and users get their information back more quickly.
Aggregate strategies typically rely on two technologies: OLAP and relational
aggregate tables. While the most prevalent strategy to date has been to use
aggregate tables, as they provide a practical, cost-effective way to improve
query performance, many IT professionals are now realizing the performance and
efficiency potential that OLAP technology provides.
OLAP technology was once thought of as being incompatible with front-end query
tools. However, compatibility issues are becoming less frequent, particularly
because OLAP data is becoming more and more compatible with relational data.
There are many advantages to OLAP. In most instances, OLAP queries can be
performed faster than relational queries. Also, OLAP enables querying
flexibility, which allows any dimension, level or member to be returned to
either a row or column in a result set. The speed and flexibility of OLAP makes
it an attractive choice.
Whatever the technology you choose for your aggregate strategy, it's important
to keep an open mind to both aggregate tables and OLAP. You may find that your
data warehousing environment can perform more efficiently with both
technologies in place. For example, you may be able to use aggregate tables for
reporting and OLAP for analysis.
Also, relational and OLAP technologies are quite diverse. Therefore, there is
no definite rule for when to use OLAP and when to use aggregate tables. It's
important to develop a clear understanding of the aggregates you query most
often, and the latency requirements for your aggregates. With this
understanding, you will be able to better realize what combination of
technology best suits your needs.
One last word on aggregation strategies: You need to keep in mind how granular
you need the data to be to meet your business analysis objectives. Aggregation
can result in the loss of time-sensitive linear data. If for instance, you're
trying to compile a complete customer profile in order to better understand
your customers, aggregation may not be the answer. In this case, you would
likely want to store both detailed information and aggregated information. This
approach may take up more storage space in the data warehouse, but it also
gives your users maximum flexibility to look at all details related to customer
relationships, while still achieving the best performance when looking at
summarized data.
|