Wednesday, April 09, 2008

Bah, Humbug: Let's Not Forget the True Meaning of On-Demand

I was skeptical the other day about the significance of on-demand business intelligence. I still am. But I’ve also been thinking about the related notion of on-demand predictive modeling. True on-demand modeling – which to me means the client sends a pile of data and gets back a scored customer or prospect list – faces the same obstacle as on-demand BI: the need for careful data preparation. Any modeler will tell you that fully automated systems make errors that would be obvious to a knowledgeable human. Call it the Sorcerer’s Apprentice effect.

Indeed, if you Google “on demand predictive model”, you will find just a handful of vendors, including CopperKey, Genalytics and Angoss. None of these provides the generic “data in, scores out” service I have in mind. There are, however, some intriguing similarities among them. Both CopperKey and Genalytics match the input data against national consumer and business databases. Both Angoss and CopperKey offer scoring plug-ins to Salesforce.com. Both Genalytics and Angoss will also build custom models using human experts.

I’ll infer from this that the state of the art simply does not support unsupervised development of generic predictive models. Either you need human supervision, or you need standardized inputs (e.g., Salesforce.com data), or you must supplement the data with known variables (e.g. third-party databases).

Still, I wonder if there is an opportunity. I was playing around recently with a very simple, very robust scoring method a statistician showed me more than twenty years ago. (Sum of Z-scores on binary variables, if you care.) This did a reasonably good job of predicting product ownership in my test data. More to the point, the combined modeling-and-scoring process needed just a couple dozen lines of code in QlikView. It might have been a bit harder in other systems, given how powerful QlikView is. But it’s really quite simple regardless.

The only requirements are that the input contains a single record for each customer and that all variables are coded as 1 or 0. Within those constraints, any kind of inputs are usable and any kind of outcome can be predicted. The output is a score that ranks the records by their likelihood of meeting the target condition.

Now, I’m fully aware that better models can be produced with more data preparation and human experts. But there must be many situations where an approximate ranking would be useful if it could be produced in minutes with no prior investment for a couple hundred dollars. That's exactly what this approach makes possible: since the process is fully automated, the incremental cost is basically zero. Pricing would only need to cover overhead, marketing and customer support.

The closest analogy I can think of among existing products are on-demand customer data integration sites. These also take customer lists submitted over the Internet and automatically return enhanced versions – in their case, IDs that link duplicates, postal coding, and sometimes third-party demograhpics and other information. Come to think of it, similar services perform on-line credit checks. Those have proven to be viable businesses, so the fundamental idea is not crazy.

Whether on-demand generic scoring is also viable, I can’t really say. It’s not a business I am likely to pursue. But I think it illustrates that on-demand systems can provide value by letting customers do things with no prior setup. Many software-as-a-service vendors stress other advantages: lower cost of ownership, lack of capital investment, easy remote access, openness to external integration, and so on. These are all important in particular situations. But I’d also like to see vendors explore the niches where “no prior setup and no setup cost” offers the greatest value of all.

No comments: