[part 2 of an assignment for INFS 762]

Yahoo–Microsoft: “Scale Drives Knowledge”

A fundamental tenet of data mining is that “Data mining becomes more useful as the amount of data and variables stored by an organization increases” (Groth, 2000, p. 4). Microsoft CEO Steve Ballmer puts it more succinctly: “Scale drives knowledge” (Lohr, 2009a). Taking advantage of that principle of data mining is a big part of what the new advertising and search partnership between Microsoft and Yahoo is all about.

Microsoft is giving Yahoo a remarkable 88% share of the revenue from search-generated ads; in return, Yahoo implements Microsoft’s Bing as its search engine and gives Microsoft access to a new big chunk of search data (Lohr, 2009b). Both firms get access to a larger dataset to help them improve the targeting of online ads. The problem actually resembles the recommendation challenge Netflix tackles with Cinematch. Search engines pair ads with search results, hoping that users searching for particular words and phrases will be interested in clicking on ads for products related to that language (Lohr, 2009a). With more data, Microsoft and Yahoo can identify more and more subtle relationships between searches and ad clicks, tailor online ads to suit finer fragments of the market, and set more profitable advertising rates for a wider range of advertisers.

This combination of the search market’s number 2 Yahoo and number 3 Microsoft still doesn’t come close to outpacing number 1 Google: combined, Microsoft brings 8% of the U.S. search market share to the deal, while Yahoo has 20%; Google has 65% (McDougall, 2009). But Microsoft and Yahoo have decided that to stand any chance of seriously challenging Google’s dominance, only a combination of their own sizable data resources can provide the foundation for data-mining improvements that will draw more search customers and their ad-click revenue away from Google. Those customers are as valuable as their data to the business model, as Ballmer seeks to take advantage of network effects, the increased value of online technology as more people use it (Lohr, 2009a). The partnership also frees up resources at Yahoo to invest in other data-mining initiatives, such as a proposal aired by the head of Yahoo Labs, two days after the Microsoft-Yahoo deal was inked, to develop a real-time search capability based on mining the contents of “live” activity like Twitter comments for topical, demographic, and geographic information. Such real-time data mining could provide information such as a mapping of Twitter activity within neighborhoods affected by an earthquake (Oreskovic, 2009).

It is impossible to predict whether the augmented data-mining capability made possible by this partnership will produce the competitive advantage Microsoft and Yahoo seek. SImply having more data in one’s hands doesn’t guarantee that a company will be able to execute. In this case we are talking about a partnership between company that managed to lose its spot at the top of the Internet search industry and another that was slow to come to the Internet party and still can’t spend or invent its way to dominance there. Still, this combination of resources will give both Yahoo and Microsoft more data to strengthen their mining algorithms and improve the services they offer their users online.

Update: AP tech writer Jessica Mintz offers some reasonable doubt on whether getting a bigger dataset will really make that much of a difference:

“They have lots of scale. They have lots of traffic. Even being the third-place player, they have huge amounts of data to understand their own relevancy,” said Danny Sullivan, editor of the search news site Searchengineland.com. “I just don’t know why they keep putting that argument out.”


Groth, R. (2000). Data Mining: Building Competitive Advantage. Upper Saddle River, NJ: Prentice-Hall.

Lohr, S. (2009a, July 30). Behind Microsoft-Yahoo: The Online Economics of Scale. New York Times: Bits. Retrieved July 31, 2009, from http://bits.blogs.nytimes.com/2009/07/30/behind-the-microsoft-yahoo-deal-the-internet-economics-of-scale/

Lohr, S. (2009b, July 29). Microsoft and Yahoo, in Agreement on Search, Face Uncertain Reach Search Agreement. New York Times, B1. Retrieved July 31, 2009, from http://www.nytimes.com/2009/07/30/technology/companies/30soft.html

McDougall, P. (2009, July 29). Microsoft, Yahoo Deal Fraught With Risk — InformationWeek. InformationWeek. Retrieved August 2, 2009, from http://www.informationweek.com/news/internet/search/showArticle.jhtml?articleID=218800185

Oreskovic, A. (2009, July 31). Yahoo Labs Chief Sees Real-time Search Opportunity . Reuters News. Retrieved August 2, 2009, from http://www.reuters.com/article/technologyNews/idUSTRE57000F20090801?sp=true


Google press release, “Google and Four US States Improve Public Access to Government Websites,” April 30, 2007

  • Lots of govt info on databases not accessible to search engine crawlers, thus harder to find
  • state tech managers working to increase amount of govt info available to Google searches
  • Sitemap Protocol is key
    • produces list of all pages on website
    • automatically sends list to search engines
  • see case study on Arizona and press info
    • Arizona: less than 50 tech staff hours >> implementation on eight major databases, “made hundreds of thousands of public records and other pages ‘crawlable'”
  • Started with CA, AZ, UT, and VA
  • added MI, FL