| To Aggregate or not to Aggregate…
A lot of DBAs and data warehouse professionals have questions about handling summarization and aggregation of data in the data warehouse or departmental data marts. Aggregation refers to the gathering of information in separate sets from two or more sources. Often, this data is stored in a data warehouse in a summarized form. For example, an organization may wish to summarize the data by various time periods. Aggregates are used for two primary reasons. One is to save storage space; data warehouses can get very large, and the use of aggregates greatly reduces the space needed to store data. The second reason is to improve the performance of business intelligence tools. When queries run faster, they take up less processing time and the users get their information back much more quickly. However, there is also a negative side to data aggregation. The process can result in the loss of time-sensitive linear data that can provide a more complete customer profile at the time of purchase: customer “A” bought this product before he bought that product and so on. If a business is trying to compile a complete customer profile in order to understand its customers, aggregation is not beneficial. Some data warehouses store both detailed information and aggregated information. This approach may take up even more space, but gives users the possibility of looking at all details while still having good query performance when looking at summaries.
|