Monday, July 7, 2008

Post-pre-aggregation

heuristic (hyŏŏ-rĭs'tĭk) noun.
  1. of, pertaining to, or based on experimentation, evaluation, or trial-and-error methods
The definition of multi-dimensional cubes is an exercise in successive approximation.



The traditional bottom-up design of pre-aggregated data is based on user input collected at the beginning of the project. Then, as they start to use the system, their feedback is incorporated into re-engineering efforts. This type of system has a lot of inertia, and can easily frustrate the user population.

Let's take a step back and remember why pre-aggregates are employed in the first place. Running a report or viewing a dashboard that is based on a large dataset will be slow if it is done on the fly. If we can anticipate the queries that will be executed, and run them before they are needed, we can give the appearance of a faster response.

I think the trouble enters in the interpretation of anticipate. The traditional model aims to address future usage in the data design stage. Consider an alternative of preparing at runtime.

Let me expand on this to clarify what I mean. In the traditional model, data is pre-aggregated, then reports are created. In the alternative model, reports are created, then data is pre-aggregated.

This means that the end products can be used to define what data is pre-calculated and how. If a report displays quarterly sales information, the sales data can be pre-aggregated at the quarter level. This takes the guesswork out of the process, guaranteeing that all scheduled summations are used, and that all end products can take advantage of preprocessing.

No comments: