Wednesday, June 2, 2010

Re: Data Visualization in a Mashed-Up World

Doing this radio show was a weird experience. I've written about Data Mashup for years, and I assumed that the other participants would be coming from a similar base of knowledge. Instead it felt like I had fallen into a time machine, and gone back to 2006. If you read the transcript you will understand what I'm talking about.

To create a data mashup there needs to be some common context, but according to the other speakers this context is limited to geography (read: Google Maps). I'll concede that universal truths, like latitude and longitude, are a convenient way to connect public data from completely unrelated sources. What I don't understand is why the radio guests (consultants and vendors) didn't consider the implications for a private organization. As I said during the show, customers and products are common contexts within a company. There is great value for a CEO to combine sales, marketing, and support information for a 360 degree view of the business. Data mashup allows this to happen, even if there's not a comprehensive data warehouse or master data solution in place.

The other interviewees also framed the state of the art as being a set of developer APIs. This also made me reminisce about the early days of mashups. Has the rest of the industry really not progressed beyond this nascent stage? It was at this point that I became a little flustered. Had none of these "experts" seen the drag-and-drop data mashup that InetSoft's been offering since 2007?

Luckily the host, Eric Kavanagh, was experienced and knowledgeable enough to understand the best practices I was putting forth. He even described my vision of enterprise data mashup (IT defining the atomic sources, meta-data, and security; and users building and sharing their own mashups) as the "ideal".

I hope that the word spreads about what data mashup can be, and we can go back to the future.

Friday, April 30, 2010

Agile BI, Not Automatic BI

Forrester's Boris Evelson recently published a paper introducing "Agile BI" with the same definition we've been pushing for years:
  1. Faster initial development
  2. React more quickly to changing requirements
The rest of the paper goes on to discuss "metadata-generated BI" which is "one such example of a new technology supporting Agile BI". These metadata-generated BI applications essentially do the same work a traditional BI environment requires (building a data warehouse), but much of it is automated. I look forward to hearing from Boris about other Agile BI technologies, because Automated BI doesn't thrill me.

By automating the initial data work, I'm sure a lot of time and effort is saved, but you may not get what you want. This is where an application like ours steps in. Instead of doing the same old thing faster, we take a new approach. We support Agile BI by eliminating the upfront effort of creating a Data Warehouse, and instead providing Data Mashup. When you have the application (reports and dashboards) designed the way you want it, then you can setup a Data Grid Cache to make it perform better. By having the data transformation be virtual, it is much faster to create and much easier to change. The tools mentioned in the paper automate the ETL work based on the metadata, before reports and dashboards are created. We automate the ETL work after the reports and dashboards are created, and only if desired.

Apparently some of these tools will also automatically generate reports based on the data. Again, is the saved effort worth anything if the finished product is not what the users want? By providing our web-based drag-and-drop tools for both interactive data visualization and publishing-quality reports, we help our customers create exactly what they want, and more efficiently than traditional tools.

Metadata-generated BI is the same old-world monolith, automated. It eliminates much of the work, and unfortunately much of the intelligence, from the process. The BI world needs better tools for skilled humans, not the same tools in the hands of robots.

Wednesday, April 28, 2010

Data Visualization in a Mashed-Up World

I've been invited to participate in a DMRadio round table about visualization and mashup. If you're interested, please visit this site to register.

Wednesday, February 17, 2010

US Job Losses

There's plenty of political posturing about the current recession and whether the American Recovery and Reinvestment Act (Obama's stimulus bill) helped. I never rely on conjecture, so I went in search of facts. These were easy enough to find from the Bureau of Labor Statistics. Then, instead of staring at raw numbers of unemployment values, I substracted consecutive values to see the differential of the Job Losses (in other words, the rate of change or velocity) from month to month. Out of curiosity, I also gathered the numbers for men and women. I plotted these on a chart, added an annotation for when the stimulus bill was passed, and then added trend lines (using least-squares) to smooth out the jaggedness. Here's the resulting graph, which you can click on to see the code for:


Monday, January 25, 2010

Wal-Mart Stores Over Time

Here's the latest visualization I created, showing the growth of Wal-Mart over the years. What's really interesting is the local focus for the first 20 years.


Monday, December 7, 2009

NYC District Data Analysis

The City of New York is running a contest to develop the best data visualization application using public data from various NYC departments. This is to help promote http://www.nyc.gov/data

The easiest way to participate is to build a dashboard using Visualize Free, and then submit it on http://www.nycbigapps.com/. The deadline is tomorrow.


Friday, November 20, 2009

Black Friday 2009 Offer Browser

Using Visualize Free I created an interactive flash tool to slice and dice all the best shopping deals for this year's post-Thanksgiving buying craze.

The Black Friday dashboard allows you to filter by store, brand, category, price, discount, doorbuster, and more. It currently has over 7000 special deals, and I'm going to be adding more.

Black Friday 2009

Monday, August 3, 2009

Free Visualization

I think it's about time for another shameless plug. However, this time it's for a free service that InetSoft provides.

We just launched a website (http://visualizefree.com) that will allow you to use our software for free, without downloading anything. Better still, you can upload your own data and create your own dashboard!

Once you create your dashboard, you can easily send your friends a link, or embed it in your own web page. Take a look at the simple example I put together for the unemployment rate (if the small iframe below is too cramped, open a new window by clicking this link).

Move the ends of the slider to adjust the time period displayed. For a real kick, click the pencil icon in the upper right corner of the chart to change what fields it displays.



Wednesday, July 29, 2009

Forrester: Mighty Mashups

James Kobielus, an analyst at Forrester Research, recently published Mighty Mashups: Do-It-Yourself Business Intelligence For The New Economy.

Jim does a pretty good job covering the inefficiencies and bottlenecks that mashup can address, and he touches the same points as Business Case for Data Mashup. Jim then goes on to list the "principal data mashup functions", including source-virtualization, publishing, and collaboration. At this point, I'm onboard and agree with everything he's saying.

Then, as Forrester is wont to do, they make sure to emphasize governance and the role that IT plays in securing the component data sources in a mashup environment. I'm still with him, and this got me thinking again about the relationship between IT and self-service.

Next, Jim lays out the Maturity Model, and I'm impressed. His characterization of the 4 levels of BI mashup make a lot of sense, and I'm thinking that this paper is building up to a ringing endorsement of InetSoft as being the only vendor who enables the whole spectrum.

Okay, let's calm down and take a step back. Kobielus explains that BI mashup isn't a panacea, and it may not fit your organization if your employees' personalities don't mesh with the self-service culture. I buy that, because I've seen it firsthand.

Then, strangely, Jim says, "Evaluate BI solutions for their integration of in-memory OLAP engines, semantic virtualization, EII, and other core mashup features." And there's a table with 4 columns: in-memory BI client; data virtualization/EII; interactive browser-based visualization; and automated source discovery. Where is this coming from?

I'm happy that InetSoft was one of only 6 vendors chosen to occupy this table (13 were interviewed), but the first and last criteria aren't on the same level as the middle two. In-memory data processing is nice (we also leverage this technique), and I'm sure that if automated source discovery works it can save a little upfront setup time, but I don't understand why these are given equal weight with the two halves of BI mashup (data mashup, and presentation mashup). What happened to governance and collaboration? I think these would be more appropriate as high level criteria.

Jim ends the paper on a high note by detailing a few best practices for organizations considering adding self-service capabilities: enforce governance; make a culture-shift; consider different roles; and enable collaboration.

Overall, it's a good paper, and validates many of the things that I've been saying for a while. I'm going to dismiss the minor digression, and focus on the overall lesson: BI mashup means providing maximal self-service, which is the fastest way to see bottom-line benefits from business intelligence.

Wednesday, June 10, 2009

BI Mashup Maturity Model

James Kobielus wrote a blog article in anticipation of his upcoming report: "Mighty Mashups: Do-It-Yourself Business Intelligence for the New Economy".

In it he lays out the 4 levels of maturity of BI mashup within the enterprise. My paraphrased and simplified list:
  1. Parameterized reports
  2. Analytic dashboards
  3. Data mashup
  4. Collaboration with governance
It is very encouraging to see Forrester's research and predictions in line with our product strategy, so I'll forgive Jim for not giving InetSoft a shout-out as the one vendor who does enable all 4 of these levels of BI mashup.

Monday, April 6, 2009

Charts and Graphs 2: Dimensional Analysis

When displaying a single variable, you should use a single dimension. That is, a point on an axis. Among multiple points, the distance of the point from the origin is what is being compared. To make this explicit, a line can be drawn from the origin to the point. With a linear scale, a line with half the length represents half the value.

To make these lines more visually salient, they are often made into bars. As long as the bars have equal width, the areas of the bars are still in the same proportion as the simple lines.

Many chart engines allow for 3D Bars for their visual appeal. Since each rectangular solid has equal depth, the volumes are in the same proportion as the bars and lines.

The same idea can be used in bubble charts. A variable can be represented by the size of the point. In visual comparison, this variable should be the area of the point. Some engines mistakenly tie the variable to the radius of the circle. It is hard to tell if a point is exactly half the area of another point, but in terms of visual salience, it works.

Now, look at the bubbles used by Advizor Analyst/X in: Multivariate analysis using parallel coordinates.

The "bubbles" are shaded to look 3D, that is like spheres. Should we compare the volumes of the spheres? Unfortunately, the volume of a sphere is not linearly proportional to the area of its 2D projection (circle). In fact the volume of a sphere is r*4/3 times the area of the circle. So, comparison of relative value is skewed.

Visual appeal in a chart is nice to have, but not at the expense of the information it represents.

Thursday, January 8, 2009

Charts and Graphs 1: Missing Data and Irregular Intervals

Stephen Few wrote about Line Graphs and Irregular Intervals, and the debate rages on.

I think the original graph of postage stamp prices is fine. The x-axis uses regular intervals, and the points demarcate the actual known values. I agree that a step/bar graph might be preferable for some purposes, but if you want to see if the rise in stamp prices is in line with inflation, the line is better. If a step graph is used, the trend line should connect the midpoints of the bars (see my version, which includes a CPI line). In effect this spreads out the changes as though they were more continuous, and the total area under the graphs would be about the same.

However, the example of households with computers and internet access has many problems. The original is missing data, but the x-axis it uses is categorical. Instead, it should use a continuous scale so that the gaps are apparent. Like the first one, it uses both points and lines. The points indicate the known data, and the lines help you to interpolate what the missing values might be. Another problem with it is that it seems to have a different purpose. The title of this chart says, "In 2003, more than 88% of households owning a computer were online, up 40% from 1997." To arrive at this fact requires dividing the Internet Access number by the Presence of Computer number. Instead, why not just graph this ratio on the chart? That's what I've done in the second chart below. Notice how this allows you to also see that a greater percentage of computer households had internet access in 2001 than in 2003.

Here's a corrected version that I created using Style Chart. Notice the missing bars and points.

To handle missing data points, there are a few different options:

  • Drop the lines altogether. But it is easier to see the slope of lines than the slope between 2 points that you mentally draw a line between.

  • Don't draw a line when values are missing. This is okay when you have 1 line, but is hard to look at when there are multiple.

  • Drawing a different connector, e.g. a dotted line. This helps in slope analysis, but make it very clear that there is missing data that has been interpolated.

  • Drawing points and lines. This is the most common and, in my opinion, the most intuitive. Line graphs always do some amount of interpolation, otherwise it's really a point graph.

Monday, December 15, 2008

Speaking of Data Mashup

I've been invited to give a presentation to the Data Management Association of Minnesota on Data Mashup this Wednesday.

Also, last month I was interviewed by TDWI's Linda Briggs on the topic of Data Mashup. Read the full transcript.

In both of these venues, two of the key points are:

Data Mashup is a compromise between database administrators and Excel jockeys. It allows the flexibility and self-service power, while maintaining security, integrity, and transparency.

Data Mashup is a complement to a data warehouse, and not a replacement. A data warehouse is not a goal, but rather a solution to certain problems. Data Mashup solves some of the same problems, and some different ones. While there is some overlap, both technologies have there place.

Friday, October 10, 2008

Happy Birthday Business Intelligence and Google

Business Intelligence is celebrating 50 years since conception in a paper by Hans Luhn, and Google turned 10. Upon re-reading A Business Intelligence System from 1958, I see a great many parallels with Google and indications of their future.

The essence of Luhn's idea is a super-librarian who knows the details of all the books and documents in the library, knows the concerns and preferences of all the people with library cards, and plays matchmaker.

The first concept is an auto-abstract, essentially a summary of the document (not unlike the little blurb under a search result). With advances in natural language analysis, search engines like www.cuil.com are focussing even more on content and relevance than pagerank.

Next, Luhn mentions that after new documents are analyzed, parties who might be interested in them should be notified of their existence. I love Google Alerts, because they help me stay abreast of the latest mentions of my name, my company, and my industry.

Then there is the ability to query the librarian, which is just a search. According to the 50 year old article, the request for information should yield a list of abstracts ordered by relevance to the user. The user can then request the complete documents they choose.

Where will Google go from here?

The system that Luhn defined has profiles of its users that are more abstract and change over time based on feedback. Imagine a search engine that learns that when you refer to "fencing" you mean the sport instead of the building supply, by tracking which results you click. Google can already capture your web history.

Also, the article talks about "internal documents", which are user created. Google owns Blogger, and YouTube, and offers services for creating web sites, documents, and spreadsheets. They could leverage this information to help develop user profiles and even connect users with similar interests.

I don't know if Hans Peter Luhn was reincarnated as Larry Page or Sergey Brin, but Google is pretty close to his vision of A Business Intelligence System.

Wednesday, September 17, 2008

Unique, Like Everyone Else

I gave a presentation at one of our partners' User Conference this morning, and stayed for a panel discussion with some industry experts. Even though the industry in question was enterprise asset management, I heard some familiar comments that I suspect are universal truths.

First was advice about getting executive support for a new project. The answer (no surprise here) was being able to demonstrate ROI. Whether BI, or an equipment maintenance initiative, executives need to see the impact on the bottom line.

Second was talk about metrics. The gist was that measuring key areas of your business, and tracking the results of attempted improvements are incredibly important. Again, because of the success of Tom Davenport's Competing on Analytics, this idea is not coming out of left field.

Third was a focus on management. The manufacturing sector's trend toward outsourcing has had a ripple effect by depleting the maintenance engineer talent pool of new workers. The fact is that young people aren't training for this industry, and therefore companies need to do more with less. Software helps to some degree, by squeezing efficiency out of every area, but the real key comes down to the leadership. You can provide a great BI tool, but with a weak manager you will not be able to save a failing business.

In summary, it doesn't matter in what vertical you work, the challenges and best practices are the same at the core.

Wednesday, September 10, 2008

Historical Quadrant

I read an enlightened article called Analysts as a lagging indicator of success that explains issues with some of the large analyst firms very clearly.

In a nutshell, companies that have deeper pockets and/or more market share get more coverage. There is nothing inherently wrong with this setup. It's just that people evaluating options for a new project (e.g. a Business Intelligence deployment) should keep this in mind when looking at their research. The most established, or historically successful companies are not necessarily the best solutions for problems you are facing right now.

Small analyst firms tend to be more focused on the real technology innovations, and customer experiences. This information is of more value to a nascent opportunity.

That being said, for a vendor it feels good to be recognized by the big name analysts. It's validation that you did your job well for the past few years.

Saturday, August 9, 2008

Agile Waterfall

When it comes to developing new features, the level of requirements is on a spectrum. At the one extreme is the Waterfall model. With waterfall, development is a one-way street that progresses from requirements to implementation to testing. At the other extreme is Cowboy coding, which leaves programmers to do what they think is best.

Where your development team falls on this scale depends on how intelligent and capable of seeing the big-picture your developers are. With full requirements, the coders only need to implement what is documented. Whereas, Linux, Google, Apache, MySQL, and many other projects where products of cowboy coders. Ideally, every programmer would be a genius with vision, but that's just not the case.

A happy medium is Agile software development, which can apply the structure of Waterfall to short iterations where developers are given more freedom.

Wednesday, August 6, 2008

RIAs 4 B2B & B2C

Shaku Atre wrote an article for DM Review, Does BI Have to be Extroverted, Introverted, or Both?

The main point is that people (consumers, and business users) are familiar with, and expecting to have more power through Rich Internet Applications.

She specifically states, "Providing dynamic, interactive access with rich visualization and RIAs, B2C, B2B and B2E applications will require a robust back-end server with comprehensive access to disparate data, scalability to support millions of people, reliability, security features and improved performance to provide all of this in a matter of seconds."

Time and again I am amazed at how well journalists are able to speak on behalf of InetSoft without knowing we exist.

Monday, August 4, 2008

Intelligent Dimensions

I recently read The ‘intelligence’ in Business Intelligence solutions, written by a Sanjay Shah, who I believe to be the Sanjay Shah who is CEO of Skelta Software, a Business Process Management software and services company.

I'm sure that Mr. Shah was trying to make the case for his firm's consulting services that are able to implement the "intelligent dimensions" that business users need. When I read it, I was thinking in terms of eliminating the middle man and providing these simple data manipulation capabilities directly to the user.

Yes, I'm talking again about End User Data Mashup and I am going to shamelessly describe how my employer's products address the idea of "intelligent dimensions".

The 3 points Sanjay lays out are:
  • Create Intelligent Dimensions by Observation
  • Combine Data from Related Functional Areas
  • Combine Traditionally Different Reports into One
The first is quite simply defining your own grouping. In our product, these are either range columns, simple named groups, or complex named groups. Range columns are just how they sound, a column that groups a range of a scalar value. Simple named groups allow you to drop the distinct values of a field into custom categories you define. Complex named groups allow you to mix these capabilities, and go beyond, defining custom definitions for each bucket.

The second is Data Mashup. I keep saying that you don't need sophisticated ETL for the majority of situations that span data sources, so I won't dwell on it here, again.

The third is the idea behind the interactive visualization dashboards you can build with Style Intelligence. You can use the first two points to prepare sophisticated and actionable datasets, and then build a dynamic interface that allows you to slice and dice this data in various intuitive ways.

Thank you, Mr. Shah, for describing how companies can get the most out of business intelligence. I apologize if the use of our product means you see less consulting revenue.

Saturday, August 2, 2008

Innovate on Behalf of the Customer

Product Management is not a new concept, so I spend some time reading up on what others have to say on the topic. As my father often quotes his favorite fortune cookie, "Learn from experience, preferably other people's."

There's a blog on being a good product manager that covers various topics in a no-nonsense way.

Recently, I have been reaching out to customers and talking to them about their experiences with our product, and looking for ways we can improve. The strategy I've been taking is outlined very well in one of Jeff Lash's articles.

Essentially, product features, enhancements and innovations need to be rooted in customer needs, but not a direct implementation of their desires. A product manager has to consider the impact on: development; other customers; and future direction.