Monday, December 15, 2008

Speaking of Data Mashup

I've been invited to give a presentation to the Data Management Association of Minnesota on Data Mashup this Wednesday.

Also, last month I was interviewed by TDWI's Linda Briggs on the topic of Data Mashup. Read the full transcript.

In both of these venues, two of the key points are:

Data Mashup is a compromise between database administrators and Excel jockeys. It allows the flexibility and self-service power, while maintaining security, integrity, and transparency.

Data Mashup is a complement to a data warehouse, and not a replacement. A data warehouse is not a goal, but rather a solution to certain problems. Data Mashup solves some of the same problems, and some different ones. While there is some overlap, both technologies have there place.

Friday, October 10, 2008

Happy Birthday Business Intelligence and Google

Business Intelligence is celebrating 50 years since conception in a paper by Hans Luhn, and Google turned 10. Upon re-reading A Business Intelligence System from 1958, I see a great many parallels with Google and indications of their future.

The essence of Luhn's idea is a super-librarian who knows the details of all the books and documents in the library, knows the concerns and preferences of all the people with library cards, and plays matchmaker.

The first concept is an auto-abstract, essentially a summary of the document (not unlike the little blurb under a search result). With advances in natural language analysis, search engines like www.cuil.com are focussing even more on content and relevance than pagerank.

Next, Luhn mentions that after new documents are analyzed, parties who might be interested in them should be notified of their existence. I love Google Alerts, because they help me stay abreast of the latest mentions of my name, my company, and my industry.

Then there is the ability to query the librarian, which is just a search. According to the 50 year old article, the request for information should yield a list of abstracts ordered by relevance to the user. The user can then request the complete documents they choose.

Where will Google go from here?

The system that Luhn defined has profiles of its users that are more abstract and change over time based on feedback. Imagine a search engine that learns that when you refer to "fencing" you mean the sport instead of the building supply, by tracking which results you click. Google can already capture your web history.

Also, the article talks about "internal documents", which are user created. Google owns Blogger, and YouTube, and offers services for creating web sites, documents, and spreadsheets. They could leverage this information to help develop user profiles and even connect users with similar interests.

I don't know if Hans Peter Luhn was reincarnated as Larry Page or Sergey Brin, but Google is pretty close to his vision of A Business Intelligence System.

Wednesday, September 17, 2008

Unique, Like Everyone Else

I gave a presentation at one of our partners' User Conference this morning, and stayed for a panel discussion with some industry experts. Even though the industry in question was enterprise asset management, I heard some familiar comments that I suspect are universal truths.

First was advice about getting executive support for a new project. The answer (no surprise here) was being able to demonstrate ROI. Whether BI, or an equipment maintenance initiative, executives need to see the impact on the bottom line.

Second was talk about metrics. The gist was that measuring key areas of your business, and tracking the results of attempted improvements are incredibly important. Again, because of the success of Tom Davenport's Competing on Analytics, this idea is not coming out of left field.

Third was a focus on management. The manufacturing sector's trend toward outsourcing has had a ripple effect by depleting the maintenance engineer talent pool of new workers. The fact is that young people aren't training for this industry, and therefore companies need to do more with less. Software helps to some degree, by squeezing efficiency out of every area, but the real key comes down to the leadership. You can provide a great BI tool, but with a weak manager you will not be able to save a failing business.

In summary, it doesn't matter in what vertical you work, the challenges and best practices are the same at the core.

Wednesday, September 10, 2008

Historical Quadrant

I read an enlightened article called Analysts as a lagging indicator of success that explains issues with some of the large analyst firms very clearly.

In a nutshell, companies that have deeper pockets and/or more market share get more coverage. There is nothing inherently wrong with this setup. It's just that people evaluating options for a new project (e.g. a Business Intelligence deployment) should keep this in mind when looking at their research. The most established, or historically successful companies are not necessarily the best solutions for problems you are facing right now.

Small analyst firms tend to be more focused on the real technology innovations, and customer experiences. This information is of more value to a nascent opportunity.

That being said, for a vendor it feels good to be recognized by the big name analysts. It's validation that you did your job well for the past few years.

Saturday, August 9, 2008

Agile Waterfall

When it comes to developing new features, the level of requirements is on a spectrum. At the one extreme is the Waterfall model. With waterfall, development is a one-way street that progresses from requirements to implementation to testing. At the other extreme is Cowboy coding, which leaves programmers to do what they think is best.

Where your development team falls on this scale depends on how intelligent and capable of seeing the big-picture your developers are. With full requirements, the coders only need to implement what is documented. Whereas, Linux, Google, Apache, MySQL, and many other projects where products of cowboy coders. Ideally, every programmer would be a genius with vision, but that's just not the case.

A happy medium is Agile software development, which can apply the structure of Waterfall to short iterations where developers are given more freedom.

Wednesday, August 6, 2008

RIAs 4 B2B & B2C

Shaku Atre wrote an article for DM Review, Does BI Have to be Extroverted, Introverted, or Both?

The main point is that people (consumers, and business users) are familiar with, and expecting to have more power through Rich Internet Applications.

She specifically states, "Providing dynamic, interactive access with rich visualization and RIAs, B2C, B2B and B2E applications will require a robust back-end server with comprehensive access to disparate data, scalability to support millions of people, reliability, security features and improved performance to provide all of this in a matter of seconds."

Time and again I am amazed at how well journalists are able to speak on behalf of InetSoft without knowing we exist.

Monday, August 4, 2008

Intelligent Dimensions

I recently read The ‘intelligence’ in Business Intelligence solutions, written by a Sanjay Shah, who I believe to be the Sanjay Shah who is CEO of Skelta Software, a Business Process Management software and services company.

I'm sure that Mr. Shah was trying to make the case for his firm's consulting services that are able to implement the "intelligent dimensions" that business users need. When I read it, I was thinking in terms of eliminating the middle man and providing these simple data manipulation capabilities directly to the user.

Yes, I'm talking again about End User Data Mashup and I am going to shamelessly describe how my employer's products address the idea of "intelligent dimensions".

The 3 points Sanjay lays out are:
  • Create Intelligent Dimensions by Observation
  • Combine Data from Related Functional Areas
  • Combine Traditionally Different Reports into One
The first is quite simply defining your own grouping. In our product, these are either range columns, simple named groups, or complex named groups. Range columns are just how they sound, a column that groups a range of a scalar value. Simple named groups allow you to drop the distinct values of a field into custom categories you define. Complex named groups allow you to mix these capabilities, and go beyond, defining custom definitions for each bucket.

The second is Data Mashup. I keep saying that you don't need sophisticated ETL for the majority of situations that span data sources, so I won't dwell on it here, again.

The third is the idea behind the interactive visualization dashboards you can build with Style Intelligence. You can use the first two points to prepare sophisticated and actionable datasets, and then build a dynamic interface that allows you to slice and dice this data in various intuitive ways.

Thank you, Mr. Shah, for describing how companies can get the most out of business intelligence. I apologize if the use of our product means you see less consulting revenue.

Saturday, August 2, 2008

Innovate on Behalf of the Customer

Product Management is not a new concept, so I spend some time reading up on what others have to say on the topic. As my father often quotes his favorite fortune cookie, "Learn from experience, preferably other people's."

There's a blog on being a good product manager that covers various topics in a no-nonsense way.

Recently, I have been reaching out to customers and talking to them about their experiences with our product, and looking for ways we can improve. The strategy I've been taking is outlined very well in one of Jeff Lash's articles.

Essentially, product features, enhancements and innovations need to be rooted in customer needs, but not a direct implementation of their desires. A product manager has to consider the impact on: development; other customers; and future direction.

Wednesday, July 30, 2008

BI in the Cloud

Have you heard people talk about on-demand hosted subscription SaaS platform in the cloud grid computing?

Colin White wrote a good article that helps to make sense of all the different words and acronyms that companies use to describe these offerings - Business Intelligence in the Cloud: Sorting Out the Terminology

No matter what you call it, BI SaaS seems to be most suited to mid-size firms that meet the following criteria:
  • they are large enough to need BI
  • they are small enough to be unable to host their own
  • the data they have can be provided to the 3rd party
Maybe it's just me, but I don't see this being a large segment of the market.

SaaS is appropriate for applications that produce their own data (like CRM), but BI is all about tapping into the data you already have. It seems that BI SaaS gets a lot of hype, but let me know if you've actually deployed it and why you chose this option.

Monday, July 28, 2008

Emerging Market: India

It seems that India is the latest hot market, especially for Business Intelligence software. Just in the last few days there were announcements around servicing the subcontinent better.

My employer, InetSoft, announced a channel partnership that will provide direct sales to India, and other markets including UAE.

QlikTech is opening an office in India.

Rolta India will acquire an unnamed US-based BI firm.

Also, a number of India-based companies are starting to provide BI, like AnalyticsWorks and MAIA Intelligence.

I think that the relationship of US companies with India for outsourcing has resulted in an influx of capital that has bolstered their economy, and introduced a culture of business analytics and performance management. If this rationale is accurate, we can expect China, Russia, Philippines, Mexico, and Ireland to follow suit.

Wednesday, July 23, 2008

Private = Stable

I read an interesting article that talked about SAS, a privately held software company.

The points made are:

A private firm can be more stable because it is not at the mercy of the market. The sacrifice is the influx of capital from an IPO, but what is gained is consistency.

Public companies can be acquired by hostile takeover, and their mission statement becomes "increasing shareholder value". By keeping the reins, the owner(s) have more control over their future.

It ends with a quote from the founder/CEO, Jim Goodnight, saying, "The capital market guys are the ones that made all these really bad investments. I'm not sure anybody should ever listen to Wall Street again. They don't know what they're doing."

There are advantages to remaining a private firm.

Monday, July 21, 2008

Geographic Charts

proportional (prə-pōr'shə-nəl) adj.
  1. properly related in size, degree, or other measurable characteristics
Visualization is supposed to make it easier to understand data, not mislead the user.



It seems like a requirement that all BI software provide the ability to display data on a map. I just read that Tableau introduced this in their latest release, but most other vendors have provided this ability for a couple years (including my employer, InetSoft).

Maps have a few things going for them. They are familiar ways of displaying geopolitical entities (States, Countries, etc.), and they represent spatial data well.

The issue is that maps are often used to display data, like sales, by filling the interior of the region with a highlight color. Why is this a problem? Because the size and shape of a region directly impacts its visual notability, but has little correlation to population, market size, and general importance.

Consider low sales in Massachusetts and high sales in Montana. A larger portion of the US map would look good, but this belies the truth. I am reminded of a model that shows what a human body would look like if each part was in proportion to the area of the brain involved in its sensory perception.

I am not as much a stickler as Stephen Few when it comes to visual display. I am in favor of nice looking graphs, but not at the expense of conveying accurate information.

Friday, July 18, 2008

Pervasive BI Hurdles

I wrote previously about the Pervasive BI report from Wayne Eckerson at TDWI.

Here I want to highlight just the issues that are cited as reasons why BI has not spread more.

"The biggest impediments to BI adoption are the time and complexity to deploy BI tools followed by the cost of BI licenses, according to our survey."

So BI needs to be easier to get up and running. This is no great surprise, and recently there has been a real push by some technology providers, including ourselves, to deliver more intuitive tools.

The cost of BI licenses is an interesting one, because it seems that most vendors charge per named user. That is, if Bob sees Mary's dashboard and wants his own, they have to pay more money to the vendor. No wonder this reduces the BI adoption rate. A fiscally conservative firm will say, "Do you really need a dashboard? Can't you just use Excel?"

"Once BI tools are in-house, the biggest impediments to greater usage are poor data quality, overly complex tools, slow query response times, lack of executive backing, and the existence of other tools, according to respondents. To accelerate usage, they recommend integrating BI with Microsoft Office, implementing dashboards, embedding BI into a business process, and delivering highly interactive and self-service BI."

The old phrase of "garbage in, garbage out" holds true. In most situations, the consumers of data are in the same department or line of business as the producers of the data. If the BI user can explore the data, drill down to the detail, and fix the problems they discover, then the data quality problem will solve itself.

The BI tool needs to be easy to deploy AND easy to use. If users don't get frustrated, they may even enjoy the time they spend with the tool.

Users complain about slow performance, because they want to use BI more interactively. If they didn't, they would just schedule the necessary reports and move on.

The lack of executive support is sort of a catch-22. If an executive is behind a BI deployment, it will get resources to do it right. If not, large BI projects are doomed to failure, and executives will have their doubts confirmed. Two solutions are: convince a C-level sponsor to take a risk; or deploy a smaller BI project first to gain momentum. One of the best ways to have a successful initial implementation is to get the users participating early and often.

The existence of other tools is a chicken-egg situation. Users may be driven to using the other tools because of the other issues with traditional BI products. Also, "other tools" means desktop tools where the user is omnipotent.

It seems pretty unanimous that in order to use BI more, people want more self-service from their BI tools.

Thursday, July 17, 2008

Future Cloud

dystopia (dĭs-tō'pē-ə) noun.
  1. a society characterized by human misery, as squalor, oppression, disease, and overcrowding
The hyperbole employed by Nicholas Carr in The Big Switch: Rewiring the World, from Edison to Google may turn out to be an accurate prediction.



The basic idea is that cloud computing is going to catch on, this time. What this means for IT staff is joblessness, and what this means for the world is: the failure of traditional businesses; rampant "big brother"-ism; a further shift of profits from producers to aggregators; the continued rotting of human culture; and growing political rifts among the populace.

Many companies, like Amazon, Google, and Salesforce, are already starting to provide grid computing to the general public. Many businesses, especially small ones, will opt for paying pennies for data storage and processing power instead of maintaining their own staff and hardware.

We will probably also see a reemergence of dumb terminals that simply access the network/internet. Also, zero-client, web 2.0 technology will go from a feature to a requirement.

It's a very interesting time for the world, and hopefully it's not the beginning of the end.

Tuesday, July 15, 2008

BI-Phone

There have been some recent press releases from Pentaho and Oracle announcing their support of the iPhone as a client for BI. This got me thinking about the role that mobile devices play in business intelligence, and business in general.

The reason to have a BlackBerry, Palm, Pocket PC, iPhone is to keep informed when away from your desk. Think of receiving urgent emails while in a meeting.

Apply this idea to BI, and it makes sense to be notified proactively about certain exceptions via email, or maybe view a scorecard (raw numbers, and a traffic light). It does not seem efficient to view reports, interact with dashboards (charts and gauges), or analyze data on a 3.5" screen.

I like gadgets as much as the next geek, but let's be practical.

Monday, July 14, 2008

Data Warehousing Optional

In my previous post, Sea Change in BI, I briefly mentioned an article by Colin White, Is Data Warehousing Essential to Business Intelligence? I'd like to highlight a few of the key parts here.

Colin reminds us of the reasons why data warehouses were introduced in the first place:
  1. the data was not usually in a suitable form for reporting
  2. the data often had quality issues
  3. decision support processing degraded business transaction performance
  4. data was often dispersed across many different systems
  5. there was a general lack of historical information
"Data warehousing was introduced to help solve these data and performance issues. While there is no question that data warehousing helped improve business decision making, it is important to realize, nevertheless, that it was introduced primarily to solve design issues in business transaction systems, and also for performance reasons."

"... at present, business intelligence is synonymous with data warehousing. This thinking is wrong and needs to be changed. Data warehousing is a component of business intelligence, but business intelligence may employ data in other data stores. In some cases, a BI application may not even use data managed in a data warehouse."

"Another issue is that people have forgotten that data warehousing was created to overcome deficiencies in business transaction systems. Many of these issues are now solvable."

"The bottom line is that data warehousing is still an important component of business intelligence, but it is no longer the foundation on which all BI projects have to be built."

It is very encouraging to hear this kind of pragmatic approach from an analyst.

Friday, July 11, 2008

BI Mythbusters

debunk (dē-bŭngk') tr. v.
  1. To expose or ridicule the falseness, sham, or exaggerated claims of
Don't get caught up in unsubstantiated buzz.



DM Review had an interesting article that debunked some very common BI claims.

Respected analysts from Boris Evelson to Neil Raden have been talking about the data explosion. The data that DM Review presents shows that the shear volume of data used for BI has not grown significantly in the last 3 years.

The positioning of "BI for the Masses" seems as prevalent in BI today as "new and improved" is in the detergent market. This claim has amounted to little more than wishful thinking over the past 12 months. The technologies seem to be available, so maybe the lag is due to the culture trying to catch up.

Similar to the promise of pervasive business intelligence, many vendors have been throwing around "Enterprise BI" as a best practice. The fact, as presented by the article and as confirmed by InetSoft's customer base, is that BI is more often deployed successfully at the department level. This is not hard to understand, because Enterprise BI can seem like boiling the ocean.

Last is the myth that "bigger is better" when it comes to BI vendors. The data show that users are more satisfied with smaller vendors, due in large part to the better customer service. In the post-consolidation market, this is great news for the independent innovators.

Don't believe the hype.

Thursday, July 10, 2008

Spread Marts

spread mart (sprěd märt) noun.
  1. a spreadsheet that is used for data integration
Spread marts solve some problems but introduce others.



In the latest issue of DM Review, there is an article Business Intelligence The Self-Service Way by Shailesh Kosambia from Tata Consultancy Services. It talks briefly about the data issues inhibiting self-service, saying "Business users do not have access to consolidated, unified data..." Shailesh continues, "When a data warehouse does not adequately cover all lines of business, the users have to get data from different sources and consolidate it in one place, leading to several spread marts within the same organization."

Claudia Imhoff discusses ways to solve the issues of "spread marts" in How to "Excel" in Your Business Intelligence Environment:
  1. Being able to "follow the data"
  2. Scheduled data updates
  3. Expiration dates
  4. Securing certain data
  5. Preserving formats, formulas
  6. Link back to live data
Why are people so hung up on using Excel?

Neil Raden of Smart (enough) Systems explains the infatuation with Excel, saying it's because it is "subversive". To paraphrase, spreadsheets allow users to do things behind the back of IT.

What if users could get all the benefits of using Excel (easy to use; data manipulation and combination; no IT needed) without the problems around transparency, latency, authority, and consistency?

To the man with a hammer, every problem looks like a nail. For another tool, see my previous article on Data Mashup.

Wednesday, July 9, 2008

Pervasive BI by TDWI

pervasive (pər-vā'sĭv) adj.
  1. spreading or spread throughout
  2. having the quality or tendency to become spread throughout all parts of
When pervasive is used as a modifier with BI, it is always this second definition. Often it is also just wishful thinking more than reality.



Wayne Eckerson of TDWI just published a new report: Pervasive Business Intelligence - Techniques and Technologies to Deploy BI on an Enterprise Scale. Along with the 35 page report, he also gave a webinar today highlighting some of the key ideas and points he raised.

One of the main ideas he put forward, as a way of expanding usage of BI tools, was that of information sandboxes. In these interactive dashboards, users can explore and analyze data, without performing traditional ad hoc report creation. I feel that Wayne is playing catch up, because Boris Evelson at Forrester published his research on BI Workspaces (he initially called them "analytical sandboxes") last month, and vendors have been providing this capability for over a year. Better late than never.

Also, to my pleasant surprise Wayne, Mr. Data Warehouse, acknowledged that the advent of new technologies can make a data warehouse optional. I am paraphrasing, but I am glad some of these BI traditionalists are no longer in denial.

Tuesday, July 8, 2008

Self-Service vs. IT?

symbiosis (sĭm'bē-ŏnt') noun.
  1. any interdependent or mutually beneficial relationship between two persons, groups, etc.
Users need IT and vice versa.



The question mark in the title is on purpose. I believe it is an open question, with no right answer.

Just a few short years ago, I don't think anyone would have doubted the requirement for the IT department. Nowadays, with SaaS, appliances, and Web 2.0, many people/companies are.

Specifically in the BI space, Gartner published an article: Emerging Technologies Will Drive Self-Service Business Intelligence. Techworld cited it in a release: IT's role in business intelligence to lessen. The Gartner publication is also the focus of an article published today by DM Review called Gartner Says Emerging Technologies Will Marginalize IT's Role in Business Intelligence.

Aside: The focus of the article is on interactive visualization, and alternatives to data warehousing. I couldn't have made a better case for InetSoft products if I had written it myself.

I don't think the role of IT will be reduced. Rather, I believe that IT will spend less time on the mundane tasks like producing reports, and be able to focus on other, more advanced, BI projects. For example, predictive modeling, decision automation, and embedding BI in business processes are all still firmly in the realm of IT.

Monday, July 7, 2008

Post-pre-aggregation

heuristic (hyŏŏ-rĭs'tĭk) noun.
  1. of, pertaining to, or based on experimentation, evaluation, or trial-and-error methods
The definition of multi-dimensional cubes is an exercise in successive approximation.



The traditional bottom-up design of pre-aggregated data is based on user input collected at the beginning of the project. Then, as they start to use the system, their feedback is incorporated into re-engineering efforts. This type of system has a lot of inertia, and can easily frustrate the user population.

Let's take a step back and remember why pre-aggregates are employed in the first place. Running a report or viewing a dashboard that is based on a large dataset will be slow if it is done on the fly. If we can anticipate the queries that will be executed, and run them before they are needed, we can give the appearance of a faster response.

I think the trouble enters in the interpretation of anticipate. The traditional model aims to address future usage in the data design stage. Consider an alternative of preparing at runtime.

Let me expand on this to clarify what I mean. In the traditional model, data is pre-aggregated, then reports are created. In the alternative model, reports are created, then data is pre-aggregated.

This means that the end products can be used to define what data is pre-calculated and how. If a report displays quarterly sales information, the sales data can be pre-aggregated at the quarter level. This takes the guesswork out of the process, guaranteeing that all scheduled summations are used, and that all end products can take advantage of preprocessing.

Thursday, July 3, 2008

Build vs. Buy

opportunity cost (ŏp'ər-tōō'nĭ-tē kôst) noun.
  1. the cost of an alternative that must be forgone in order to pursue a certain action
  2. the benefits you could have received by taking an alternative action
The choice between build and buy requires more than simple arithmetic.



In many organizations, the choice between purchasing a software package and building your own comes down to license costs. This is a very myopic/nearsighted/shortsighted view.

Hopefully some of you are already thinking in terms of TCO. This is Total Cost of Ownership, and represents the entire cost, which is not so easy to calculate.

First, it is important to understand the difference between cost and outlay. An outlay is cash exchanged for acquisition or usage (license cost) and the concept of cost is a superset that includes outlays and other expenses (maintenance, salaries).

Next, we also have to consider opportunity cost. Consider the following two scenarios: purchasing a product that is easy to deploy and maintain; or having an in-house developer use a couple open source packages to build the desired functionality. It is clear that the second option wins the "lowest license cost" battle. In some cases, a new developer must be hired, in which case his or her salary must factor into the TCO of the second option. In other cases, when a developer is already on staff, there is the opportunity cost to consider. What else could the developer work on, instead of reinventing the wheel? What benefits could that project have yielded? If there is nothing else the developer could have done, then costs can be saved in the first scenario by reducing staff. There is also opportunity cost for the user population, because they must wait for the homegrown solution.

The last item that is a little more difficult to quantify is risk. This applies to the initial creation, where developing a project internally is more likely to fail or suffer from underestimated costs. It also applies to the risk going forward of supporting and maintaining the homegrown system without the help of a vendor.

I am not saying that buy is always better than build. It is important to accurately assess the options, considering the total costs over time.

Wednesday, July 2, 2008

Sea Change in BI

sea change (sē chānj) noun. idiomatic
  1. profound transformation: "Full fathom five thy father lies: / Of his bones are coral made: / Those are pearls that were his eyes: / Nothing of him that doth fade / But doth suffer a sea change / Into something rich and strange." -Shakespeare
  2. a striking change, as in appearance, often for the better
The data warehouse has gone from a means to an end.

It's time to step back and look at the forest.



In thinking about data mashup, I asked myself why more vendors don't provide it. My initial thought was that it was an innovation that was never thought of before, but I then humbled myself and returned to reality. I now believe a combination of two factors cause the lack of availability of data mashup.

First, my vision of user-driven data mashup would not be an effective standalone product. Think of using a separate application to pull/massage/transform/combine disparate data sources. Now, what is the result? In only a few cases is the query the end of the story. Typically that dataset now needs to be presented in an interesting/useful/appealing way. To have this dataset be the input to a separate BI product would require some integration point, like a web service, and a generic way of communicating a recordset of any size in both dimensions. This starts to get complex, and require skills that are possessed only by IT professionals.

Second, let's ask the question of why other BI vendors don't provide data mashup. It is not very hard to see that it is not because they can't, but rather they choose not to. The remainder of this article is devoted to this topic.

Tom Gonzalez wrote a blog entry called What is wrong with the Business Intelligence Industry?. As an "outsider", Tom brings a fresh perspective from his experience applying next generation technology to business problems.

Some industry analysts are even starting to challenge the status quo, which is refreshing.

Colin White wrote an article, Is Data Warehousing Essential to Business Intelligence?, that concludes, "No, it's not." He reminds us of the 5 issues that data warehouses address, and that these issues are solvable through other means.

At the risk of spreading an unconfirmed rumor, Claudia Imhoff, co-author of books on data warehousing, has allegedly admitted that some BI implementations do not require a DW. A baby step, but at least it's in the right direction.

More on this later.

Tuesday, July 1, 2008

Data Mashup Defined

mashup (māsh'ŭp') noun.
  1. an audio recording that is a composite of samples from other recordings, usually from different musical styles: Danger Mouse created the Grey Album, which is a mashup of The Beatles' White Album and Jay-Z's Black Album.
  2. the creation of a new work from two sources that were not initially designed to be combined
Apply this idea to the internet and you get things like Google Maps with Subway lines.

Apply this idea to data and you have an alternative to data warehousing.



Some have tried to define data mashup as data federation that makes use of web services, screen scraping, and even Yahoo! Pipes. This is all well and good, but doesn't really live up to the spirit of mashup because of the dependence on IT skills. After all, you wouldn't label ETL as "data mashup". I think that the key component that is missing is user involvement.

From a Business Intelligence perspective, users have had ad hoc query abilities, to some degree, for years. Imagine for a moment, a business user pulling together data from a sales system, a marketing database, and a local spreadsheet, creating a dataset that will help guide future decisions.

This is certainly possible if IT populated a data warehouse with all this information, or if IT built the federation infrastructure to present all this data as a virtual warehouse. Either way, it is a static definition that IT needs to build and maintain. Changes of a permanent nature are passed down from users to developers, and changes of a temporary/hypothetical nature are left unaddressable.

Enter what I call true Data Mashup. The user-defined variety that takes ad hoc query to the next level. It gives users the lego bricks, and lets them build a car, house, or Taj Mahal.

The IT staff will always be responsible for defining the lego bricks, but end users can play with the data in ways that are unanticipated. They can combine datasets that were not initially designed to be combined.

Monday, June 30, 2008

Introduction

Hi all,

I work as Product Manager for InetSoft, an operational BI software vendor.

I started this blog after conversing with some industry analysts and realizing that some of my ideas actually aren't that bad.

As much as possible, I want to keep this blog academic. Naturally, my experience/research/thinking in this sector is often done at work, and therefore may be colored by the focus of our product.

Lastly, I would like to encourage comments. This is really the key to the whole Web 2.0 collaboration idea. If I only wanted to publish my own ideas, I would write a book.

Thanks, and I hope you find something useful here.