Veterans Health Administration: EMR Foundation for Gains Data-Mining Benefits

For an industry driven by advanced knowledge and technological innovation, American health care is shockingly behind the curve on adoption of information technology. Only 1.5% of U.S. hospitals have adopted comprehensive electronic medical records systems (Jha et al., 2009). As of 2006, only 20% of U.S. hospitals had implemented electronic medical records (Arnst, 2006). The U.S. is lags behind several OECD countries in per capita spending on health IT (eHealth101, 2006) and is perhaps more than a decade behind international leaders in health IT (Anderson et al., 2006). Without serious investment in health IT, most American hospitals can’t take advantage of data mining.

An exception to this absence of data-mining capability is found in the Veterans Health Administration. The VA began developing the nation’s first functioning electronic medical record system in the late 1970s (Longman, 2009) and computerized medical records in all of its approximately 1300 facilities by 2000 (Arnst, 2006). VA hospitals using  VistA—Veterans Health Information Systems and Technology Architecture—constitute nearly half of the hospitals in the U.S. that have implemented comprehensive electronic medical records (Jha et al., 2009). With VistA, the VA has become the “unlikely leader” in maintaining electronic records that can be mined for insights that produce significant improvements in care and cost efficiency.

The VA has used data mining to improve practices in a number of ways. VA researchers have mined VistA data to target rewards for surgical teams that beat quality and safety benchmarks (and to identify underperforming surgical teams) and to sift through 12,000 medical records to evaluate and improve treatments for diabetes (Longman, 2009). The VA’s Center for Imaging of Neurodegenerative Diseases has used Weka to apply Random Forest and Support Vector Machine algorithms to brain imaging studies (Young, 2009). VA data mining also helped discover the link between arthritis medication Vioxx and heart attacks (Longman, 2009).

One obstacle to optimal data mining in VistA is the diversity of local data dictionaries. Local users can customize data dictionaries to meet unique local needs. That flexibility is a significant part of the system’s success (Brown et al., 2003). However, those different data dictionaries complicate efforts to combine and analyze data across the nationwide system. The VA’s efforts to create national standard dictionaries to translate local dictionaries support not only better immediate transactions such as e-prescribing (Brown et al. 2003) but improved large-scale data mining. The VA’s system has been sufficiently successful that other government hospitals in the U.S. and abroad are adopting and adapting VistA for their facilities (Longman, 2009).


Anderson, G. F., Forgner, B. K., Johns, R. A., & Reinhardt, U. E. (2006). Health Care Spending and Use of Information Technology in OECD Countries. Health Affairs, 25(3), 819–831.

Arnst, C. (2006, July 17). The Best Medical Care in the U.S. BusinessWeek. Retrieved August 1, 2009, from

Brown, S. H., Lincoln, M. J., Groen, P. J., & Kolodner, R. M. (2003). VistA—U.S. Department of Veterans Affairs National-Scale HIS. International Journal of Medical Informatics, 69(2–3), 135–156.

eHealth 101: Electronic Medical Records Reduce Costs, Improve Care, and Save Lives. (2006). American Electronics Association. Retrieved August 1, 2009, from

Jha, A. K., DesRoches, C. M., Campbell, E. G., Donelan, K., Rao, S. R., Ferris, T. G., et al. (2009). Use of Electronic Health Records in U.S. Hospitals. New England Journal of Medicine, 360(16), 1628–1638. doi: 10.1056/NEJMsa0900592.

Longman, P. (2009, August). Code Red: How Software Companies Could Screw up Obama’s Health Care Reform. Washington Monthly. Retrieved August 1, 2009, from

Rundle, R. (2001, December 10) In the Drive to Mine Medical Data, VHA Is the Unlikely Leader. Wall Street Journal, New York, p. 1.

Young, K. (2009). Diagnostic Data Mining for Multi-modal Brain Image Studies. Veterans Health Administration Center for Imaging of Neurodegenerative Diseases. Retrieved August 2, 2009, from

[part 2 of an assignment for INFS 762]

Yahoo–Microsoft: “Scale Drives Knowledge”

A fundamental tenet of data mining is that “Data mining becomes more useful as the amount of data and variables stored by an organization increases” (Groth, 2000, p. 4). Microsoft CEO Steve Ballmer puts it more succinctly: “Scale drives knowledge” (Lohr, 2009a). Taking advantage of that principle of data mining is a big part of what the new advertising and search partnership between Microsoft and Yahoo is all about.

Microsoft is giving Yahoo a remarkable 88% share of the revenue from search-generated ads; in return, Yahoo implements Microsoft’s Bing as its search engine and gives Microsoft access to a new big chunk of search data (Lohr, 2009b). Both firms get access to a larger dataset to help them improve the targeting of online ads. The problem actually resembles the recommendation challenge Netflix tackles with Cinematch. Search engines pair ads with search results, hoping that users searching for particular words and phrases will be interested in clicking on ads for products related to that language (Lohr, 2009a). With more data, Microsoft and Yahoo can identify more and more subtle relationships between searches and ad clicks, tailor online ads to suit finer fragments of the market, and set more profitable advertising rates for a wider range of advertisers.

This combination of the search market’s number 2 Yahoo and number 3 Microsoft still doesn’t come close to outpacing number 1 Google: combined, Microsoft brings 8% of the U.S. search market share to the deal, while Yahoo has 20%; Google has 65% (McDougall, 2009). But Microsoft and Yahoo have decided that to stand any chance of seriously challenging Google’s dominance, only a combination of their own sizable data resources can provide the foundation for data-mining improvements that will draw more search customers and their ad-click revenue away from Google. Those customers are as valuable as their data to the business model, as Ballmer seeks to take advantage of network effects, the increased value of online technology as more people use it (Lohr, 2009a). The partnership also frees up resources at Yahoo to invest in other data-mining initiatives, such as a proposal aired by the head of Yahoo Labs, two days after the Microsoft-Yahoo deal was inked, to develop a real-time search capability based on mining the contents of “live” activity like Twitter comments for topical, demographic, and geographic information. Such real-time data mining could provide information such as a mapping of Twitter activity within neighborhoods affected by an earthquake (Oreskovic, 2009).

It is impossible to predict whether the augmented data-mining capability made possible by this partnership will produce the competitive advantage Microsoft and Yahoo seek. SImply having more data in one’s hands doesn’t guarantee that a company will be able to execute. In this case we are talking about a partnership between company that managed to lose its spot at the top of the Internet search industry and another that was slow to come to the Internet party and still can’t spend or invent its way to dominance there. Still, this combination of resources will give both Yahoo and Microsoft more data to strengthen their mining algorithms and improve the services they offer their users online.

Update: AP tech writer Jessica Mintz offers some reasonable doubt on whether getting a bigger dataset will really make that much of a difference:

“They have lots of scale. They have lots of traffic. Even being the third-place player, they have huge amounts of data to understand their own relevancy,” said Danny Sullivan, editor of the search news site “I just don’t know why they keep putting that argument out.”


Groth, R. (2000). Data Mining: Building Competitive Advantage. Upper Saddle River, NJ: Prentice-Hall.

Lohr, S. (2009a, July 30). Behind Microsoft-Yahoo: The Online Economics of Scale. New York Times: Bits. Retrieved July 31, 2009, from

Lohr, S. (2009b, July 29). Microsoft and Yahoo, in Agreement on Search, Face Uncertain Reach Search Agreement. New York Times, B1. Retrieved July 31, 2009, from

McDougall, P. (2009, July 29). Microsoft, Yahoo Deal Fraught With Risk — InformationWeek. InformationWeek. Retrieved August 2, 2009, from

Oreskovic, A. (2009, July 31). Yahoo Labs Chief Sees Real-time Search Opportunity . Reuters News. Retrieved August 2, 2009, from

[Part 1 of an assignment for INFS 762]

If Netflix did nothing more with IT than process online movie orders, they would likely still have gained significant competitive advantage against Blockbuster, Mr. Movies, and other brick-and-mortar movie vendors. I can go online, select from a 100,000+ DVD library that offers more variety than any physical store can, and get what I want by mail in two days. (I could also watch over 12,000 of those videos instantly online… if I had a slightly faster Internet connection!) They charge no late fees, a move that drove Blockbuster to ditch most late fees in 2005 and lose $400 million (Mullaney, 2006).

But Netflix has also made good use of data mining to enhance its competitive advantage. Its Cinematch recommendation engine analyzes customer rental patterns and movie ratings to help the company recommend new rentals. The system also helps Netflix make smart investments in a wider range of films. Mullaney (2006) offers one simple example: Netflix used rental patterns of the film City of God, set in Rio, and the documentary Born into Brothels to predict expected rentals and determine a reasonable fee to pay for DVD rights to Favela Rising, a documentary about musicians in Rio. Mullaney points out this sort of analysis opens the door for more independent filmmakers, as Netflix can identify more niche film markets and expand distribution for smaller-budget films without spending too much. Netflix is thus able to build its business model on “backlist” films comprising 70% of its rentals, compared to traditional video stores, where backlist films make up just 20% of rentals (Thompson, 2008). Increasing demand for lesser-known films reduces demand for big-studio blockbusters, which in turn saves Netflix money, as revenue-sharing agreements with the big studios take a bigger bite out of Netflix’s take (O’Brien, 2002).

Netflix has also been able to discover connections in movie preferences to guide its movie recommendations, from seemingly obvious overlap between customers who like The Patriot and Pearl Harbor to more curious associations between affinity between rentals The Patriot and Pay It Forward and I, Robot (Thompson, 2008). Netflix considers its recommendation system crucial to its business. The company didn’t have any such system when it opened in 1997 and didn’t feel it needed one. But as the library expanded beyond the original 1000-title collection, Netflix realized customers needed help to find films they would like. “‘I think that once you get beyond 1,000 choices, a recommendation system becomes critical,’ [said Reed] Hastings, the Netflix C.E.O…. ‘People have limited cognitive time they want to spend on picking a movie’” (Thompson, 2008).

The recommendation system also keeps people subscribing and buying movies. Cinematch provides sufficiently valuable results that in October 2006, when Netflix found it was having difficulty improving he performance of its data-mining algorithms, it announced the Netflix Prize: $1 million for the first developer who could improve the system’s performance by 10% (Thompson, 2008). The contest drew over 44,000 submissions, including a flurry of submissions during a one-month contest-ending race triggered by contest rules at the end of June, 2009, when the first team reached the 10% threshold (Lohr, 2009). Teams were able to achieve significant gains through mathematical algorithms like singular value decomposition (Thompson, 2008). And Netflix was able to take advantage of the collective inventiveness of nearly 5,000 participants to improve its data-mining algorithms for a price tag that might have covered the full-time salaries of eight entry-level developers over the same time period.

There are still quirks of human behavior that defy complete explanation of movie preferences by data-mining methods. However, as Mullaney (2006) puts it, Cinematch is able to take decisions that used to be based on gut feelings about the appeal of various films to various audiences and put them on a stronger footing of better and actual patterns of customer behavior.


Lohr, S. (2009c, July 28). Netflix Competitors Learn the Power of Teamwork. The New York Times. Retrieved July 30, 2009, from

Mullaney, T. J. (2006, May 25). Netflix: The Mail-Order Movie House That Clobbered Blockbuster. BusinessWeek: Small Business. Retrieved July 30, 2009, from

O’Brien, J. M. (2002, December). The Netflix Effect. Wired, 10(12). Retrieved August 2, 2009, from

Thompson, C. (2008, November 23). If You Liked This, You’re Sure to Love That. The New York Times. Retrieved August 2, 2009, from