Use XML to make it easier for us to read each other’s word-processing documents: pretty good idea. Too bad Microsoft now appears to own that idea:

The patent appears to cover both the creation of the XML document and the file that’s created. That would allow a certain degree of leeway in terms of interoperability, as there is nothing here that would seem to cover reading a Microsoft-generated XML document, for example. But it certainly seems that Microsoft could assert that any word processor that used this class of XML storage as a native format was violating its patents.

The key question going forward is what Microsoft chooses to do with this patent now that it has been granted. The company is under pressure in both the US and EU to increase its software’s interoperability with that of its competitors, so a rigorous enforcement of this patent would seem like an express lane to further legal trouble, something the company has seemingly been anxious to avoid [John Timmer, “Storing Text Docs in XML May Run Afoul of Microsoft Patent,”, 2009.08.07].

No word on whether Microsoft is seeking patents on RDF apps….


[part 2 of an assignment for INFS 762]

Yahoo–Microsoft: “Scale Drives Knowledge”

A fundamental tenet of data mining is that “Data mining becomes more useful as the amount of data and variables stored by an organization increases” (Groth, 2000, p. 4). Microsoft CEO Steve Ballmer puts it more succinctly: “Scale drives knowledge” (Lohr, 2009a). Taking advantage of that principle of data mining is a big part of what the new advertising and search partnership between Microsoft and Yahoo is all about.

Microsoft is giving Yahoo a remarkable 88% share of the revenue from search-generated ads; in return, Yahoo implements Microsoft’s Bing as its search engine and gives Microsoft access to a new big chunk of search data (Lohr, 2009b). Both firms get access to a larger dataset to help them improve the targeting of online ads. The problem actually resembles the recommendation challenge Netflix tackles with Cinematch. Search engines pair ads with search results, hoping that users searching for particular words and phrases will be interested in clicking on ads for products related to that language (Lohr, 2009a). With more data, Microsoft and Yahoo can identify more and more subtle relationships between searches and ad clicks, tailor online ads to suit finer fragments of the market, and set more profitable advertising rates for a wider range of advertisers.

This combination of the search market’s number 2 Yahoo and number 3 Microsoft still doesn’t come close to outpacing number 1 Google: combined, Microsoft brings 8% of the U.S. search market share to the deal, while Yahoo has 20%; Google has 65% (McDougall, 2009). But Microsoft and Yahoo have decided that to stand any chance of seriously challenging Google’s dominance, only a combination of their own sizable data resources can provide the foundation for data-mining improvements that will draw more search customers and their ad-click revenue away from Google. Those customers are as valuable as their data to the business model, as Ballmer seeks to take advantage of network effects, the increased value of online technology as more people use it (Lohr, 2009a). The partnership also frees up resources at Yahoo to invest in other data-mining initiatives, such as a proposal aired by the head of Yahoo Labs, two days after the Microsoft-Yahoo deal was inked, to develop a real-time search capability based on mining the contents of “live” activity like Twitter comments for topical, demographic, and geographic information. Such real-time data mining could provide information such as a mapping of Twitter activity within neighborhoods affected by an earthquake (Oreskovic, 2009).

It is impossible to predict whether the augmented data-mining capability made possible by this partnership will produce the competitive advantage Microsoft and Yahoo seek. SImply having more data in one’s hands doesn’t guarantee that a company will be able to execute. In this case we are talking about a partnership between company that managed to lose its spot at the top of the Internet search industry and another that was slow to come to the Internet party and still can’t spend or invent its way to dominance there. Still, this combination of resources will give both Yahoo and Microsoft more data to strengthen their mining algorithms and improve the services they offer their users online.

Update: AP tech writer Jessica Mintz offers some reasonable doubt on whether getting a bigger dataset will really make that much of a difference:

“They have lots of scale. They have lots of traffic. Even being the third-place player, they have huge amounts of data to understand their own relevancy,” said Danny Sullivan, editor of the search news site “I just don’t know why they keep putting that argument out.”


Groth, R. (2000). Data Mining: Building Competitive Advantage. Upper Saddle River, NJ: Prentice-Hall.

Lohr, S. (2009a, July 30). Behind Microsoft-Yahoo: The Online Economics of Scale. New York Times: Bits. Retrieved July 31, 2009, from

Lohr, S. (2009b, July 29). Microsoft and Yahoo, in Agreement on Search, Face Uncertain Reach Search Agreement. New York Times, B1. Retrieved July 31, 2009, from

McDougall, P. (2009, July 29). Microsoft, Yahoo Deal Fraught With Risk — InformationWeek. InformationWeek. Retrieved August 2, 2009, from

Oreskovic, A. (2009, July 31). Yahoo Labs Chief Sees Real-time Search Opportunity . Reuters News. Retrieved August 2, 2009, from