INFS 892-SWP


cross-posted from the Madville Times… because this is too stinking cool for me to maintain any scholarly detachment!

Holy flipping cow! Tim Berners-Lee, the inventor of the World Wide Web, just commented on my blog! On Saturday, I posted on the seeming difficulty of making the Semantic Web (technology for embedding machine-readable meaning in Web content) work on a large scale. I may be biased against Semantic Web technology based on a course I took on it this summer that left me thinking, “Wow, this is hard!”

And then along comes Tim Berners-Lee, MIT professor, director of the World Wide Web Consortium and World Wide Web Foundation, and long-time explainer of the Semantic Web to tell me I may have Semantic Web all wrong:

There is an interesting reason for which Semantic Web does scale: that it has interesting scale-free properties. Or rather, the world has interesting scale-free properties and the Sem Web technologies allow one to take advantage of. It doesn’t require a ruling elite, just everyone doing their bit, in different contexts, which then are stitched together at the edges. See http://www.w3.org/DesignIssues/Fractal#tco and compare that to the models used by previous systems.

I feel like I just got an autograph… and didn’t have to ask for it! I’ll never wash my blog again!

And anyone who says blogs are a waste of time promoting mindless jabber that leaves us teetering on the edge of cultural catastrophe is flat wrong.

Here’s more from Berners-Lee on Semantic Web technology:

…and a longer (58 minutes!) interview from MIT.

I was reviewing Alex Wright’s 2008 New York Times article on Paul Otlet’s Mundaneum, an often-ignored precursor to the World Wide Web and predecessor even to Vannevar Bush‘s memex (or was it Emanuel Goldberg’s?), when I happened upon this quote from Michael Buckland, professor at Berkeley’s School of Information (I like that name for a school… and hey! Danah Boyd knows him!):

Critics of the Semantic Web say it relies too heavily on expert programmers to create ontologies (formalized descriptions of concepts and relationships) that will let computers exchange data with one another more easily. The Semantic Web “may be useful, but it is bound to fail,” Dr. Buckland said, adding, “It doesn’t scale because nobody will provide enough labor to build it.”

The same criticism could have been leveled against the Mundaneum. Just as Otlet’s vision required a group of trained catalogers to classify the world’s knowledge, so the Semantic Web hinges on an elite class of programmers to formulate descriptions for a potentially vast range of information. For those who advocate such labor-intensive data schemes, the fate of the Mundaneum may offer a cautionary tale [Alex Wright, “The Web Time Forgot,” New York Times, 2008.06.17].

That passage must have stuck in my subconscious, since it sounds very much like the concern I expressed in our class on the Semantic Web this summer: where will we find enough expertise to properly wire and check the Semantic Web? To scale up, Semantic Web technology almost needs to be telepathic (uh oh!).

I submitted this paper last week for Assignment #4, INFS 892, Semantic Web Programming. (I’d link to the syllabus and course information, but DSU boxes all that stuff up in boring old D2L. Can anyone say World Wide Web?)

Knowledge management and decision support require tools that can prevent information overload by filtering information based on its quality and its users (Smart et al., 2005). Semantic Web technologies should help in that regard. However, quickly evolving situations, like breaking news, financial information, and battlefield intelligence may challenge the capacity of Semantic Web applications to provide advantages over traditional, organic methods of information filtering. Semantic Web technology, like existing search technology, can certainly empower users, but there appears to be a threshold of newness and immediacy that Semantic Web programmers may not be able to cross.

Two news reports on the role of social media in the current political protests in Iran  got me thinking about this problem. The lead-in report (Gallafent, 2009) noted the usefulness and popularity of Twitter in covering this event: Twitter.com is blocked in Iran, but people can read and write to Twitter through so many devices besides computers that there’s still plenty of access. Reporter Laura Lynch (Werman, 2009) noted that while reporting from Tehran, faced with a government clamping down on the local media, she was relying on Twitter and other online sources, just like other observers and the protesters themselves. Even though Twitter is nigh impossible to source, there is still a wealth of clearly authentic content, especially links to photos and video. One Tweet on #iranelection drew my attention to a photo essay (Taylor, 2009) that I found sufficiently meaningful to blog and link (Heidelberger, 2009). Of course, as readers pause to investigate one such link for a minute, Twitter users may generate dozens if not hundreds of new Tweets about the Iran protests.

Human readers can hardly keep up with such a flood of information, but can the Semantic Web do any better? Lynch said that at her hotel in Tehran, one woman spent the entire day online, sifting through Twitter and other sources and pointing out highlights to Lynch and other reporters (Werman, 2009). Instead of somehow assigning RDF tags to the Twitter posts she found valuable (and really, there’s no way she could tag everything, certainly not the vast majority of Tweets that she found unimportant), she assigned meaning to bits of information by passing them on to her journalist friends. That still doesn’t make the information machine-readable, but it does filter the information and make valuable nuggets available to a larger audience. Perhaps a fast-moving news story like the Iranian protests is a strange reversal of fortune, where things happen too fast for machines to follow and we must turn to human reporters and commentators for the best story. People don’t have time to tweet in RDF. Protesters don’t have time to slap rich tags on their cellphone videos: they’re going to hit the upload button and then run from the baton-wielding riot cops. Formal Semantic Web program seems necessarily after-the-fact, retrospective, and by then, we’re on to the next breaking event.

Other fast-paced information settings may pose similar challenges. On the battlefield, will military intelligence specialists have time to convert their updates into RDF? Or do they better use their analytical powers to immediately analyze and report the data from the battlefield? Or consider the stock market: there is a wealth of easily quantifiable, Semantic-Web programmable information, but if things are moving fast, those tags won’t grab our attention; we will talk to each other, listen to the main commentators we trust, and go from there.

O’Connor et al. (2008) and Kim et al. (2009) both propose Semantic Web application models for real-world situations. O’Connor et al. (2008) discuss a useful application for helping doctors “explore treatment options for HIV-positive patients” by incorporating information from numerous doctors and past medical experiences. Kim et al. (2009) design a decision-support method for online purchases that would gather a variety of business information from different sources. These solutions do not offer a complete guide to dealing with real-time, evolving situations: both are relatively structured problems with clearly defined variables. In seeking important news about political protests or compiling and analyzing battlefield intelligence, we may be dealing with new actors, new variables that do not exist yet in any ontology. Real-time Semantic Web applications need somehow to deal with this possibility.

Some researchers are tackling the challenge of real-time Semantic Web applications. Blogging, with its use of tagging and RSS feeds, already produces a great deal of the metadata the Semantic Web relies on (Karger and Quan, 2005). A semantic blogging platform like Haystack (Karger and Quan, 2005) supports user creation of machine-readable semantic data with specialized forms. Such forms are immensely more workable for the majority of online content producers than raw coding of RDF (just as blogs, Twitter, and Facebook are much more accessible to users than HTML). However, even those forms add complication that may be unworkable for journalists (both professional and citizen) who are uploading blog posts, videos, and tweets on breaking news. Such forms may be helpful in a collaborative format, where eyewitnesses can use a simple fire-and-forget publishing interface while secondary users can access and edit the same content through semantic forms that allow them to annotate the original content with metadata that Semantic Web applications can manipulate.

Smart et al. (2005) explore the use of Semantic Web technology to support situational awareness in the complex interaction of military and humanitarian organizations. This particular challenge goes beyond the conventional question of battlefield awareness: where an army must maintain its own secure, unified intelligence system, a humanitarian intervention requires diverse governmental and non-governmental agencies to be able to communicate. Their model assumes prior knowledge of  well-defined agents and capabilities and their relationships; such a model may be difficult to apply in a setting like the Iranian protests where the “agents”—protest leaders, online coordinators, citizen journalists—may not be well-defined in any existing ontology.

Addressing the need to ontologies to evolve and capture information from diverse sources, Köhler et al. (2006) build a model that allows reasonably easy combination and updating of ontologies. However, their content-based indexing seems limited to text that exists in sufficient context. Content like tweets and brief blog posts and links might defy their model by not providing enough surrounding text to establish the full verbal context their model requires. Scharl et al. (2008) propose a framework that would allow distributed users to collaborate in updating ontologies. Karger and Schraefel (2006) might challenge that model given its reliance on ontology visualizations, which Karger and Schraefel suggest are not the most effective representations of knowledge. Whatever form we may use to represent ontologies, perhaps a more important question is whether collaborative ontology construction adds usefully and efficiently to knowledge management in a swiftly evolving situation where numerous individuals are creating and seeking information. It may be that formal efforts at Semantic Web programming in such situations produce only marginal improvements over meaning-making of busy journalists and the viral spread of information fostered by spontaneous, organic collective judgments.

Photos and video from the streets of Tehran are not searchable the way text is. Someone has to tag the video, embed it in a blog post with commentary, Digg it, etc., to get it into either conventional or semantic search results. Whether that video has meaning, and what sort of meaning it has for understanding what is happening in Iran, depend on the continuing, evolving aggregate judgment of the masses, readers like me, clicking, annotating, and forwarding that content. We may capture that meaning just as effectively through human means as through Semantic Web applications. And even if we can apply Semantic Web technologies to make quickly evolving situations more quickly comprehensible to decision-makers and outside observers, the coding and ontologies we apply to that video must remain open to reinterpretation and reassignment by the users to capture new circumstances and understandings.

References

Gallafent, A. (2009, June 17). Twitter’s role in Iran protests. The World. Public Radio International. Retrieved June 19, 2009, from http://www.theworld.org/?q=node/26972.

Heidelberger, C.A. (2009, June 15). Courage in the streets of Iran. Madville Times. Retrieved June 19, 2009, from http://madvilletimes.blogspot.com/2009/06/courage-in-streets-of-tehran.html.

Karger, D. R., & Quan, D. (2005). What would it mean to blog on the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3), 147-157. Retrieved June 16, 2009, from http://www.cs.uga.edu/~pdoshi/Courses/CSCI%204900_6900/KargerBlogISWC04.pdf.

Karger, D., & Schraefel, M. C. (2006). The pathetic fallacy of RDF. Retrieved June 20, 2009, from http://swui.semanticweb.org/swui06/papers/Karger/Pathetic_Fallacy.html.

Kim, H.‐J., Kim, W., & Lee, M. (2009). Semantic Web Constraint Language and its application to an intelligent shopping agent. Decision Support Systems, 46(4), 882‐894.

Köhler, J., Philippi, S., Specht, M., & Rüegg, A. (2006). Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19(8), 744‐754.

O’Connor, M. J., Shankar, R. D., Tu, S. W., Nyulas, C. I., & Das, A. K. (2008). Developing a Web‐Based Application using OWL and SWRL. Paper presented at the AAAI Spring Symposium.

Scharl, A., Weichselbraun, A., & Wohlgenannt, G. (2008). A web-based user interaction framework for collaboratively building and validating ontologies. In Proceedings of the VIII Brazilian Symposium on Human Factors in Computing Systems (pp. 244-247). Porto Alegre, RS, Brazil: Sociedade Brasileira de Computação. Retrieved June 20, 2009, from http://portal.acm.org/citation.cfm?id=1497470.1497498.

Smart, P. R., Shadbolt, N. R., Carr, L. A., & Schraefel, M. C. (2005). Knowledge-based information fusion for improved situational awareness. In Information Fusion, 2005 8th International Conference on (Vol. 2, p. 8 pp.). doi: 10.1109/ICIF.2005.1591969.

Taylor, A. (2009, June 15). Iran’s disputed election. The Big Picture: News Stories in Photographs. Boston.com. Retrieved June 19, 2009, from http://www.boston.com/bigpicture/2009/06/irans_disputed_election.html.

Werman, M. (2009, June 17). Deciphering the messages from Iran: Interview with Laura Lynch and Azadeh Moaveni. The World. Public Radio International. Retrieved June 19, 2009, from http://www.theworld.org/?q=node/26973.

(A speculative ramble… read at your own peril!)

One discussion in our 892 class on Semantic Web Programming got into the similarities between Resource Description Framework/Web Ontology Language (RDF/OWL*) and object-oriented programming. Yes, yes, RDF and OWL just capture meaning, while OOP makes stuff happen. There are a number of key similarities and differences.

But here’s what strikes my fancy this afternoon: object-oriented programming is cool in that makes it possible to build programs faster with with all the existing modules that have been designed to do specific things and plug into the all the potential programs that might need to do them. It’s like building a computer at home: you don’t manufacture your own chips and chassis and keyboard; you order all those components and focus your creativity on wiring them up in a new, creative way to meet your unique requirements.

Ditto programs: you don’t write your own sort routines. Well, you can, if you love to code, but OOP lets you use existing chunks of code that have already been looked over by lots of brains and tested in lots of situations. Brain power that you would exert coding and checking those basic functions can be focused on creating cool new stuff.

Ditto ontologies. For our second assignment in 892, I composed my own ontology for a library (books, magazines, authors, editors, call numbers…). The effort was worth it, because I need to learn how OWL works. But if I were doing a real project, it would be silly for me to rewrite a library ontology when I could just include an ontology that’s already been built to model that concept.

But consider this: who checks those ontologies? Sure, you can run your RDF through a validator to make sure you’ve got all your tags. But who validates the meaning?

Follow me for a bit: Ontologies are more complicated than a program module. A program is simply capturing some explicit process: for instance, that sort routine either alphabetizes properly or it doesn’t. An ontology is capturing meaning, and the Polanyi that I’m reading says meaning always has a tacit component. Maybe that’s not a big deal in capturing simple things like, “The library has a print copy of Isaac Asimov’s Robots and Empire on the second floor, in the PN section,” but it will complicate the creation of ontologies for complicated concepts fraught with subtleties and disagreements. It will take some serious and expertise in both Semantic Web techniques and the specific knowledge domain to create appropriate ontologies for some topics.

It seems to me the folks who are going to build these ontologies are likely going to be RDF/OWL experts first and subject experts second. They’ll be Ph.D./D.Sci’s in information systems and computer science who also happen to be enthusiasts in biochemistry or photography or bhakti-yoga (perhaps an Eastern approach to enlightened artifical intelligence). Will ontologies composed by non-specialists be good enough to capture the deep knowledge of the specialization?

I wonder if we have here an inherent limit on the capability of the Semantic Web to support intelligence. Practically speaking, it will be very hard to bring together the skills needed to compose really good ontologies to cpature complex meaning. It will be hard for the rest of us to check those captured meanings.

Maybe it won’t matter. Maybe we’ll be so impressed with the things our Semantic Web apps can tell us (and there will be plenty) that we won’t mind if they can’t answer our toughest questions about art, philosophy, economics, or other complex topics. Maybe they’ll save us so much time picking out groceries and arranging doctor’s appointments for us that we’ll have more time to hash out the problems that can only be solved by human application of the tacit knowledge RDF/OWL can’t get.

But even there, who will watch the Semantic Web programmers? Who will check the published ontologies to make sure they haven’t captured some bias, some value judgment unique to the developer but not applicable to all users? Or forget bias: as ontology repositories develop, who will monitor them to ensure that the meanings they have locked inot code still accurately reflect the fluid, evolving meanings of the human world? Will ontologies keep up with our intellect and culture… or hold them back?

Now watch: I’ll turn to the next chapter of our textbook and find they’ve already answered that question.

————————————
*Web Ontology Language –> OWL: Why not OWL? There is a story….