Tuesday, July 1, 2008

Data Mashup Defined

mashup (māsh'ŭp') noun.
  1. an audio recording that is a composite of samples from other recordings, usually from different musical styles: Danger Mouse created the Grey Album, which is a mashup of The Beatles' White Album and Jay-Z's Black Album.
  2. the creation of a new work from two sources that were not initially designed to be combined
Apply this idea to the internet and you get things like Google Maps with Subway lines.

Apply this idea to data and you have an alternative to data warehousing.



Some have tried to define data mashup as data federation that makes use of web services, screen scraping, and even Yahoo! Pipes. This is all well and good, but doesn't really live up to the spirit of mashup because of the dependence on IT skills. After all, you wouldn't label ETL as "data mashup". I think that the key component that is missing is user involvement.

From a Business Intelligence perspective, users have had ad hoc query abilities, to some degree, for years. Imagine for a moment, a business user pulling together data from a sales system, a marketing database, and a local spreadsheet, creating a dataset that will help guide future decisions.

This is certainly possible if IT populated a data warehouse with all this information, or if IT built the federation infrastructure to present all this data as a virtual warehouse. Either way, it is a static definition that IT needs to build and maintain. Changes of a permanent nature are passed down from users to developers, and changes of a temporary/hypothetical nature are left unaddressable.

Enter what I call true Data Mashup. The user-defined variety that takes ad hoc query to the next level. It gives users the lego bricks, and lets them build a car, house, or Taj Mahal.

The IT staff will always be responsible for defining the lego bricks, but end users can play with the data in ways that are unanticipated. They can combine datasets that were not initially designed to be combined.

No comments: