INFS 762

Veterans Health Administration: EMR Foundation for Gains Data-Mining Benefits

For an industry driven by advanced knowledge and technological innovation, American health care is shockingly behind the curve on adoption of information technology. Only 1.5% of U.S. hospitals have adopted comprehensive electronic medical records systems (Jha et al., 2009). As of 2006, only 20% of U.S. hospitals had implemented electronic medical records (Arnst, 2006). The U.S. is lags behind several OECD countries in per capita spending on health IT (eHealth101, 2006) and is perhaps more than a decade behind international leaders in health IT (Anderson et al., 2006). Without serious investment in health IT, most American hospitals can’t take advantage of data mining.

An exception to this absence of data-mining capability is found in the Veterans Health Administration. The VA began developing the nation’s first functioning electronic medical record system in the late 1970s (Longman, 2009) and computerized medical records in all of its approximately 1300 facilities by 2000 (Arnst, 2006). VA hospitals using  VistA—Veterans Health Information Systems and Technology Architecture—constitute nearly half of the hospitals in the U.S. that have implemented comprehensive electronic medical records (Jha et al., 2009). With VistA, the VA has become the “unlikely leader” in maintaining electronic records that can be mined for insights that produce significant improvements in care and cost efficiency.

The VA has used data mining to improve practices in a number of ways. VA researchers have mined VistA data to target rewards for surgical teams that beat quality and safety benchmarks (and to identify underperforming surgical teams) and to sift through 12,000 medical records to evaluate and improve treatments for diabetes (Longman, 2009). The VA’s Center for Imaging of Neurodegenerative Diseases has used Weka to apply Random Forest and Support Vector Machine algorithms to brain imaging studies (Young, 2009). VA data mining also helped discover the link between arthritis medication Vioxx and heart attacks (Longman, 2009).

One obstacle to optimal data mining in VistA is the diversity of local data dictionaries. Local users can customize data dictionaries to meet unique local needs. That flexibility is a significant part of the system’s success (Brown et al., 2003). However, those different data dictionaries complicate efforts to combine and analyze data across the nationwide system. The VA’s efforts to create national standard dictionaries to translate local dictionaries support not only better immediate transactions such as e-prescribing (Brown et al. 2003) but improved large-scale data mining. The VA’s system has been sufficiently successful that other government hospitals in the U.S. and abroad are adopting and adapting VistA for their facilities (Longman, 2009).


Anderson, G. F., Forgner, B. K., Johns, R. A., & Reinhardt, U. E. (2006). Health Care Spending and Use of Information Technology in OECD Countries. Health Affairs, 25(3), 819–831.

Arnst, C. (2006, July 17). The Best Medical Care in the U.S. BusinessWeek. Retrieved August 1, 2009, from

Brown, S. H., Lincoln, M. J., Groen, P. J., & Kolodner, R. M. (2003). VistA—U.S. Department of Veterans Affairs National-Scale HIS. International Journal of Medical Informatics, 69(2–3), 135–156.

eHealth 101: Electronic Medical Records Reduce Costs, Improve Care, and Save Lives. (2006). American Electronics Association. Retrieved August 1, 2009, from

Jha, A. K., DesRoches, C. M., Campbell, E. G., Donelan, K., Rao, S. R., Ferris, T. G., et al. (2009). Use of Electronic Health Records in U.S. Hospitals. New England Journal of Medicine, 360(16), 1628–1638. doi: 10.1056/NEJMsa0900592.

Longman, P. (2009, August). Code Red: How Software Companies Could Screw up Obama’s Health Care Reform. Washington Monthly. Retrieved August 1, 2009, from

Rundle, R. (2001, December 10) In the Drive to Mine Medical Data, VHA Is the Unlikely Leader. Wall Street Journal, New York, p. 1.

Young, K. (2009). Diagnostic Data Mining for Multi-modal Brain Image Studies. Veterans Health Administration Center for Imaging of Neurodegenerative Diseases. Retrieved August 2, 2009, from


[part 2 of an assignment for INFS 762]

Yahoo–Microsoft: “Scale Drives Knowledge”

A fundamental tenet of data mining is that “Data mining becomes more useful as the amount of data and variables stored by an organization increases” (Groth, 2000, p. 4). Microsoft CEO Steve Ballmer puts it more succinctly: “Scale drives knowledge” (Lohr, 2009a). Taking advantage of that principle of data mining is a big part of what the new advertising and search partnership between Microsoft and Yahoo is all about.

Microsoft is giving Yahoo a remarkable 88% share of the revenue from search-generated ads; in return, Yahoo implements Microsoft’s Bing as its search engine and gives Microsoft access to a new big chunk of search data (Lohr, 2009b). Both firms get access to a larger dataset to help them improve the targeting of online ads. The problem actually resembles the recommendation challenge Netflix tackles with Cinematch. Search engines pair ads with search results, hoping that users searching for particular words and phrases will be interested in clicking on ads for products related to that language (Lohr, 2009a). With more data, Microsoft and Yahoo can identify more and more subtle relationships between searches and ad clicks, tailor online ads to suit finer fragments of the market, and set more profitable advertising rates for a wider range of advertisers.

This combination of the search market’s number 2 Yahoo and number 3 Microsoft still doesn’t come close to outpacing number 1 Google: combined, Microsoft brings 8% of the U.S. search market share to the deal, while Yahoo has 20%; Google has 65% (McDougall, 2009). But Microsoft and Yahoo have decided that to stand any chance of seriously challenging Google’s dominance, only a combination of their own sizable data resources can provide the foundation for data-mining improvements that will draw more search customers and their ad-click revenue away from Google. Those customers are as valuable as their data to the business model, as Ballmer seeks to take advantage of network effects, the increased value of online technology as more people use it (Lohr, 2009a). The partnership also frees up resources at Yahoo to invest in other data-mining initiatives, such as a proposal aired by the head of Yahoo Labs, two days after the Microsoft-Yahoo deal was inked, to develop a real-time search capability based on mining the contents of “live” activity like Twitter comments for topical, demographic, and geographic information. Such real-time data mining could provide information such as a mapping of Twitter activity within neighborhoods affected by an earthquake (Oreskovic, 2009).

It is impossible to predict whether the augmented data-mining capability made possible by this partnership will produce the competitive advantage Microsoft and Yahoo seek. SImply having more data in one’s hands doesn’t guarantee that a company will be able to execute. In this case we are talking about a partnership between company that managed to lose its spot at the top of the Internet search industry and another that was slow to come to the Internet party and still can’t spend or invent its way to dominance there. Still, this combination of resources will give both Yahoo and Microsoft more data to strengthen their mining algorithms and improve the services they offer their users online.

Update: AP tech writer Jessica Mintz offers some reasonable doubt on whether getting a bigger dataset will really make that much of a difference:

“They have lots of scale. They have lots of traffic. Even being the third-place player, they have huge amounts of data to understand their own relevancy,” said Danny Sullivan, editor of the search news site “I just don’t know why they keep putting that argument out.”


Groth, R. (2000). Data Mining: Building Competitive Advantage. Upper Saddle River, NJ: Prentice-Hall.

Lohr, S. (2009a, July 30). Behind Microsoft-Yahoo: The Online Economics of Scale. New York Times: Bits. Retrieved July 31, 2009, from

Lohr, S. (2009b, July 29). Microsoft and Yahoo, in Agreement on Search, Face Uncertain Reach Search Agreement. New York Times, B1. Retrieved July 31, 2009, from

McDougall, P. (2009, July 29). Microsoft, Yahoo Deal Fraught With Risk — InformationWeek. InformationWeek. Retrieved August 2, 2009, from

Oreskovic, A. (2009, July 31). Yahoo Labs Chief Sees Real-time Search Opportunity . Reuters News. Retrieved August 2, 2009, from

[Part 1 of an assignment for INFS 762]

If Netflix did nothing more with IT than process online movie orders, they would likely still have gained significant competitive advantage against Blockbuster, Mr. Movies, and other brick-and-mortar movie vendors. I can go online, select from a 100,000+ DVD library that offers more variety than any physical store can, and get what I want by mail in two days. (I could also watch over 12,000 of those videos instantly online… if I had a slightly faster Internet connection!) They charge no late fees, a move that drove Blockbuster to ditch most late fees in 2005 and lose $400 million (Mullaney, 2006).

But Netflix has also made good use of data mining to enhance its competitive advantage. Its Cinematch recommendation engine analyzes customer rental patterns and movie ratings to help the company recommend new rentals. The system also helps Netflix make smart investments in a wider range of films. Mullaney (2006) offers one simple example: Netflix used rental patterns of the film City of God, set in Rio, and the documentary Born into Brothels to predict expected rentals and determine a reasonable fee to pay for DVD rights to Favela Rising, a documentary about musicians in Rio. Mullaney points out this sort of analysis opens the door for more independent filmmakers, as Netflix can identify more niche film markets and expand distribution for smaller-budget films without spending too much. Netflix is thus able to build its business model on “backlist” films comprising 70% of its rentals, compared to traditional video stores, where backlist films make up just 20% of rentals (Thompson, 2008). Increasing demand for lesser-known films reduces demand for big-studio blockbusters, which in turn saves Netflix money, as revenue-sharing agreements with the big studios take a bigger bite out of Netflix’s take (O’Brien, 2002).

Netflix has also been able to discover connections in movie preferences to guide its movie recommendations, from seemingly obvious overlap between customers who like The Patriot and Pearl Harbor to more curious associations between affinity between rentals The Patriot and Pay It Forward and I, Robot (Thompson, 2008). Netflix considers its recommendation system crucial to its business. The company didn’t have any such system when it opened in 1997 and didn’t feel it needed one. But as the library expanded beyond the original 1000-title collection, Netflix realized customers needed help to find films they would like. “‘I think that once you get beyond 1,000 choices, a recommendation system becomes critical,’ [said Reed] Hastings, the Netflix C.E.O…. ‘People have limited cognitive time they want to spend on picking a movie’” (Thompson, 2008).

The recommendation system also keeps people subscribing and buying movies. Cinematch provides sufficiently valuable results that in October 2006, when Netflix found it was having difficulty improving he performance of its data-mining algorithms, it announced the Netflix Prize: $1 million for the first developer who could improve the system’s performance by 10% (Thompson, 2008). The contest drew over 44,000 submissions, including a flurry of submissions during a one-month contest-ending race triggered by contest rules at the end of June, 2009, when the first team reached the 10% threshold (Lohr, 2009). Teams were able to achieve significant gains through mathematical algorithms like singular value decomposition (Thompson, 2008). And Netflix was able to take advantage of the collective inventiveness of nearly 5,000 participants to improve its data-mining algorithms for a price tag that might have covered the full-time salaries of eight entry-level developers over the same time period.

There are still quirks of human behavior that defy complete explanation of movie preferences by data-mining methods. However, as Mullaney (2006) puts it, Cinematch is able to take decisions that used to be based on gut feelings about the appeal of various films to various audiences and put them on a stronger footing of better and actual patterns of customer behavior.


Lohr, S. (2009c, July 28). Netflix Competitors Learn the Power of Teamwork. The New York Times. Retrieved July 30, 2009, from

Mullaney, T. J. (2006, May 25). Netflix: The Mail-Order Movie House That Clobbered Blockbuster. BusinessWeek: Small Business. Retrieved July 30, 2009, from

O’Brien, J. M. (2002, December). The Netflix Effect. Wired, 10(12). Retrieved August 2, 2009, from

Thompson, C. (2008, November 23). If You Liked This, You’re Sure to Love That. The New York Times. Retrieved August 2, 2009, from

We had an assignment in INFS 762, Data Warehousing + Data Mining, to write three quick briefs on industry data warehousing projects. Here’s the third from my paper:

International Truck and Engine was bogged down in its own financial data. Monthly finances were taking two weeks to process. The company had implemented a data warehouse in 1996, but it wasn’t providing the business performance metrics executives and analysts needed to guide their decision-making (Whiting, 2003).

Therefore, in 2001, International Truck and Engine overhauled its data warehouse and developed its Key Business Indicators portal. The new system provided a “10–12% efficiency gain in the monthly close process” (Whiting, 2003). The system also gave International the ability to review historical trends, forecast demand, and give its suppliers more lead-time on production orders (D’Antoni, 2005). Executives and analysts could access business performance metrics that previously could only be found in hefty three-ring binders of monthly and quarterly reports (Eckerson, 2004). The project was sufficiently successful to win The Data Warehousing Institute’s 2003 “Business Performance Management” Best Practices Award (Edwards, 2003).

International’s data warehousing overhaul also followed the vital path of phased implementation. The data flowing into the warehouse came from 32 source systems. The developers chose to implement the warehouse by source rather than work group: “This way, the team delivered enterprisewide KBIs [key business indicators] while maintaining project delivery in bitesize chunks” (Eckerson, 2004). In other words, developers were able to keep steps small while regularly and from the beginning delivering tools that would prove useful to workers across the organization.


We had an assignment in INFS 762, Data Warehousing + Data Mining, to write three quick briefs on industry data warehousing projects. Here’s the second from my paper:

Home Depot launched a data warehousing project in 2002. The company had no data warehouse prior to this project. Before this project, analysts seeking big-picture information about Home Depot’s 1500 stores had to access information from as many as 16 separate mainframe systems. One could get the information if one really wanted to, but it took too long to provide effective support for decision-making (Whiting, 2002).

Home Depot was actually behind competitors like Lowe’s in adopting data warehouse technology, but, to make lemons out of lemonade, there may have been a cost advantage to being last mover instead of first mover. Instead of having to go through conversion of even embryonic prior data warehousing efforts, Home Depot could charge straight into building a new system with the latest technology (Schwartz, 2002). Bob DeRodes, the chief information officer who led Home Depot’s IT overhaul at the time, also wisely followed the phased implementation strategy that our textbook and others (such as Schwartz, 2002) say is essential to successful data warehouse development and deployment. Home Depot launched its data warehouse with applications dedicated to “analyzing human-resource expenses” (Whiting, 2002), a key business goal for the new CEO Bob Nardelli. The intent was to add incrementally to the system, with point-of-sale data to be added the following year.

Home Depot’s data warehousing project was part of a company-wide IT overhaul that cost $2 billion and required one year and one million person-hours to complete (Webster, 2006). Measuring the success of the project requires more than a look at the bottom line—after all, even the best data warehouse could not insulate Home Depot a 66% drop in 2008 Q1 profit due to the recession-launching collapse of the housing market (Clifford, 2008). The data warehouse implementation and the entire IT overhaul were part of a larger plan to remake the culture of the organization. When Bob Nardelli took over as CEO at the turn of the millennium, he found a corporation that was surprisingly decentralized. Store managers had vast autonomy and regularly rejected directives from corporate (Charan, 2006). Imagine the challenge of telling such staff to standardize their data for a new data warehouse. But that was part of the challenge Nardelli tackled, as he recognized that the lack of standardization, centralization, and discipline throughout the company would prevent the company from sustaining the growth it had enjoyed in the 1990s, especially now that competitors like Lowe’s were crowding their market.

One cultural change directly supported by the data warehouse was to shift the basis for decision-making from intuition and anecdote to data. Anecdotes and gut feelings, coupled with isolated, nonstandard bits of data, don’t travel well across over a thousand separate retail operations. The data warehouse provided the transparent data necessary for shared decision-making. It also provided the quantitative reports necessary to produce data- based templates for business review meetings. Previously managers would defend their positions with sketchy yet hard-to-challenge anecdotes from their own stores; the rich data from the data warehouse made such anecdotal escapes much more difficult and forced managers to acknowledge and act to correct failings in their operations (Charan, 2006). This change to data-based decision-making did not please everyone, and some employees did leave the company, but this cultural change, supported by the data warehouse, did take hold, as evidenced in part by improved employee satisfaction scores on in-house surveys (Charan, 2006).


We had an assignment in INFS 762, Data Warehousing + Data Mining, to write three quick briefs on industry data warehousing projects. Here’s the first from my paper:

XOJet is a young fractional aircraft company. Its primary customers are businesses that seek the convenience of private jet travel but lack the resources to purchase and maintain their own aircraft. Customers can charter flights on XOJet’s fleet of Cessna Citation Xs and Bombardier Challenger 300s or buy partial ownership in planes (“fractional” aircraft). With over 15,000 charter routes and a clientele expecting on-demand service, XOJet faces the logistical challenge of ensuring that it has just enough planes in just enough places at the right times to meet customer needs.

XOJet this adopted a data warehouse in 2006 to allow better analysis of flight patterns. XOJet used its data warehouse to develop algorithms and reports to optimize use of its fleet (Tucci 2007). This system allows XOJet to schedule maintenance around peak flight times and route its planes to arrive earlier and avoid traffic (Tucci 2007).

The data warehouse has provided concrete benefits for XOJet. Other airlines in this part of the industry commonly experience utilization rates of 75% or less. They often must “deadhead”—i.e., fly an empty plane to a different airport to pick up a paying customer. XOJet has been able to use the knowledge discovered in its data warehouse to increase its flight utilization rate to 95–97% (Watson, 2008). A 20% increase in utilization—in one year, XOJet increased its flight hours from 1000 to 1200—can lead to a doubling of profit (Tucci 2007). When XOJet can avoid deadheading and schedule more usable flight hours, it can charge less for hourly fees—27% less than competitors (Watson, 2008). While the data warehouse can’t be credited with all of XOJet’s recent growth and success, its strong competitive position helped it win $2.5 billion in financing to expand its fleet for international operations in 2008 (Lattman, 2008), a year when credit was drying up for nearly everyone else.