| ETL Performance
How efficient is your ETL tool at replenishing your data warehouse? Many tools rely on straight mirroring, meaning that the entire physical database is copied and re-copied again and again no matter whether the data has changed since the last refresh. This method can be both resource intensive and time consuming. The larger the data warehouse, the longer it takes to replenish with this method. In some cases, the volume of data being loaded into the warehouse begins to exceed the batch window allotted for it. This straight mirroring process is inherently inefficient. In a typical environment when 20% of the data on your production systems is changing every week or month, why do you need to reload the entire data warehouse every week or month? Why send 20 gig of data when you can send only the 20% of 20 gig—or 4 gig—that has changed since you last replenished the data warehouse? With change data capture, flow and transform technology only the “net changes” on the data (adds, changes or deletes) are captured and flowed into the data warehouse, resulting in a more efficient replication process and reduced communication costs.
|