Monday, December 7, 2009

NYC District Data Analysis

The City of New York is running a contest to develop the best data visualization application using public data from various NYC departments. This is to help promote

The easiest way to participate is to build a dashboard using Visualize Free, and then submit it on The deadline is tomorrow.

Friday, November 20, 2009

Black Friday 2009 Offer Browser

Using Visualize Free I created an interactive flash tool to slice and dice all the best shopping deals for this year's post-Thanksgiving buying craze.

The Black Friday dashboard allows you to filter by store, brand, category, price, discount, doorbuster, and more. It currently has over 7000 special deals, and I'm going to be adding more.

Black Friday 2009

Monday, August 3, 2009

Free Visualization

I think it's about time for another shameless plug. However, this time it's for a free service that InetSoft provides.

We just launched a website ( that will allow you to use our software for free, without downloading anything. Better still, you can upload your own data and create your own dashboard!

Once you create your dashboard, you can easily send your friends a link, or embed it in your own web page. Take a look at the simple example I put together for the unemployment rate (if the small iframe below is too cramped, open a new window by clicking this link).

Move the ends of the slider to adjust the time period displayed. For a real kick, click the pencil icon in the upper right corner of the chart to change what fields it displays.

Wednesday, July 29, 2009

Forrester: Mighty Mashups

James Kobielus, an analyst at Forrester Research, recently published Mighty Mashups: Do-It-Yourself Business Intelligence For The New Economy.

Jim does a pretty good job covering the inefficiencies and bottlenecks that mashup can address, and he touches the same points as Business Case for Data Mashup. Jim then goes on to list the "principal data mashup functions", including source-virtualization, publishing, and collaboration. At this point, I'm onboard and agree with everything he's saying.

Then, as Forrester is wont to do, they make sure to emphasize governance and the role that IT plays in securing the component data sources in a mashup environment. I'm still with him, and this got me thinking again about the relationship between IT and self-service.

Next, Jim lays out the Maturity Model, and I'm impressed. His characterization of the 4 levels of BI mashup make a lot of sense, and I'm thinking that this paper is building up to a ringing endorsement of InetSoft as being the only vendor who enables the whole spectrum.

Okay, let's calm down and take a step back. Kobielus explains that BI mashup isn't a panacea, and it may not fit your organization if your employees' personalities don't mesh with the self-service culture. I buy that, because I've seen it firsthand.

Then, strangely, Jim says, "Evaluate BI solutions for their integration of in-memory OLAP engines, semantic virtualization, EII, and other core mashup features." And there's a table with 4 columns: in-memory BI client; data virtualization/EII; interactive browser-based visualization; and automated source discovery. Where is this coming from?

I'm happy that InetSoft was one of only 6 vendors chosen to occupy this table (13 were interviewed), but the first and last criteria aren't on the same level as the middle two. In-memory data processing is nice (we also leverage this technique), and I'm sure that if automated source discovery works it can save a little upfront setup time, but I don't understand why these are given equal weight with the two halves of BI mashup (data mashup, and presentation mashup). What happened to governance and collaboration? I think these would be more appropriate as high level criteria.

Jim ends the paper on a high note by detailing a few best practices for organizations considering adding self-service capabilities: enforce governance; make a culture-shift; consider different roles; and enable collaboration.

Overall, it's a good paper, and validates many of the things that I've been saying for a while. I'm going to dismiss the minor digression, and focus on the overall lesson: BI mashup means providing maximal self-service, which is the fastest way to see bottom-line benefits from business intelligence.

Wednesday, June 10, 2009

BI Mashup Maturity Model

James Kobielus wrote a blog article in anticipation of his upcoming report: "Mighty Mashups: Do-It-Yourself Business Intelligence for the New Economy".

In it he lays out the 4 levels of maturity of BI mashup within the enterprise. My paraphrased and simplified list:
  1. Parameterized reports
  2. Analytic dashboards
  3. Data mashup
  4. Collaboration with governance
It is very encouraging to see Forrester's research and predictions in line with our product strategy, so I'll forgive Jim for not giving InetSoft a shout-out as the one vendor who does enable all 4 of these levels of BI mashup.

Monday, April 6, 2009

Charts and Graphs 2: Dimensional Analysis

When displaying a single variable, you should use a single dimension. That is, a point on an axis. Among multiple points, the distance of the point from the origin is what is being compared. To make this explicit, a line can be drawn from the origin to the point. With a linear scale, a line with half the length represents half the value.

To make these lines more visually salient, they are often made into bars. As long as the bars have equal width, the areas of the bars are still in the same proportion as the simple lines.

Many chart engines allow for 3D Bars for their visual appeal. Since each rectangular solid has equal depth, the volumes are in the same proportion as the bars and lines.

The same idea can be used in bubble charts. A variable can be represented by the size of the point. In visual comparison, this variable should be the area of the point. Some engines mistakenly tie the variable to the radius of the circle. It is hard to tell if a point is exactly half the area of another point, but in terms of visual salience, it works.

Now, look at the bubbles used by Advizor Analyst/X in: Multivariate analysis using parallel coordinates.

The "bubbles" are shaded to look 3D, that is like spheres. Should we compare the volumes of the spheres? Unfortunately, the volume of a sphere is not linearly proportional to the area of its 2D projection (circle). In fact the volume of a sphere is r*4/3 times the area of the circle. So, comparison of relative value is skewed.

Visual appeal in a chart is nice to have, but not at the expense of the information it represents.

Thursday, January 8, 2009

Charts and Graphs 1: Missing Data and Irregular Intervals

Stephen Few wrote about Line Graphs and Irregular Intervals, and the debate rages on.

I think the original graph of postage stamp prices is fine. The x-axis uses regular intervals, and the points demarcate the actual known values. I agree that a step/bar graph might be preferable for some purposes, but if you want to see if the rise in stamp prices is in line with inflation, the line is better. If a step graph is used, the trend line should connect the midpoints of the bars (see my version, which includes a CPI line). In effect this spreads out the changes as though they were more continuous, and the total area under the graphs would be about the same.

However, the example of households with computers and internet access has many problems. The original is missing data, but the x-axis it uses is categorical. Instead, it should use a continuous scale so that the gaps are apparent. Like the first one, it uses both points and lines. The points indicate the known data, and the lines help you to interpolate what the missing values might be. Another problem with it is that it seems to have a different purpose. The title of this chart says, "In 2003, more than 88% of households owning a computer were online, up 40% from 1997." To arrive at this fact requires dividing the Internet Access number by the Presence of Computer number. Instead, why not just graph this ratio on the chart? That's what I've done in the second chart below. Notice how this allows you to also see that a greater percentage of computer households had internet access in 2001 than in 2003.

Here's a corrected version that I created using Style Chart. Notice the missing bars and points.

To handle missing data points, there are a few different options:

  • Drop the lines altogether. But it is easier to see the slope of lines than the slope between 2 points that you mentally draw a line between.

  • Don't draw a line when values are missing. This is okay when you have 1 line, but is hard to look at when there are multiple.

  • Drawing a different connector, e.g. a dotted line. This helps in slope analysis, but make it very clear that there is missing data that has been interpolated.

  • Drawing points and lines. This is the most common and, in my opinion, the most intuitive. Line graphs always do some amount of interpolation, otherwise it's really a point graph.