Friday, June 03, 2011

Schema.org and the Semantic Web

The interweb is all atwitter today about schema.org and what it may mean for the Semantic Web. Here is my take.

There has been a long-standing argument between microformats and the Semantic Web. Many developers, and to some degree, search engines have preferred microformats because they are easy to use and to understand. Microformats are widely deployed because of this. However, there is simply no way to combine microformats on a single page. This is the Achilles heel of microformats; sooner or later someone wishes to use more than one (or a few if they play together particularly nicely) at a time and can't do it.

RDF is harder to understand (although an experiment in Germany showed that fifth graders could easily be taught RDF. It is adults who have already learned other ways to think who have trouble.) RDF is a completely general solution to the problems that microformats solve. RDF's raison d'ĂȘtre is to allow for the combination of data from multiple people (e.g. developers and search engines, or multiple relational databases or as an interchange format between proprietary system). RDF can represent any type of data, and combines easily with other RDF.

The argument between microformats and RDF can thus be thought of as an argument between short-term pragmatism and long-term planning. Those who want to solve a specific problem now use microformats. Those who want to solve more general problems in the future use RDF.

The presumption in the Semantic Web community is that the best (perhaps the only) way to combine microformats is either RDF or something very much like it. Further, people have been expressing needs to combine the use of multiple microformats on Web pages for about five years.

Microsoft is aware of the Semantic Web and, in fact, was an early supporter of the RDF standards at the World Wide Web Consortium (W3C). They even paid for some marketing; proof may be found here.

Unfortunately for those interested in open standards, Microsoft decided that Netscape's (remember Netscape??) use of RDF in their portal was threatening, so they decided to reinvent RDF internally as a proprietary technology. Microsoft's internal version of RDF has appeared in their file system, Sharepoint and other products. That's the way this story simply must play out: Use RDF or reinvent it.

Yahoo was the first search engine to support RDFa (RDF in Web pages), followed by Google. Both supported particular vocabularies of RDFa, which is the same as saying 'microformats encoded in RDF' and therefore along the lines of my earlier comments.

The new schema.org announcement is a partnership between "Google, Bing and Yahoo" or "Google, Microsoft and Yahoo" depending where you look. Since Microsoft bought Bing and Yahoo has licensed Bing for its search services, schema.org is really between Google and Microsoft.

So, I read schema.org as an attempt (actually, a further attempt) by Microsoft to reduce the impact of RDF and Semantic Web techniques on the search business specifically and their larger business in general. Time will tell whether that will work. History suggests that it will partially work by changing the places RDF is seen as threatening to big business. Another similar area to watch will be RDF and Linked Data's threat to the Data Warehousing market (a $10 billion market in 2010). That fight will be primarily between standards and Oracle.

Michael Hausenblas at DERI released Schema.org in RDF while I was writing this. Well done to Michael and his colleagues. As Michael said, "We're sorry for the delay". Awesome.

Update 2011-06-15: Google announced at a BOF at SemTech 2011 last week that they will continue to support Rich Snippets (their RDFa implementation). That's helpful and what we should be promoting.

Tuesday, February 08, 2011

Semantic Web Elevator Pitch

Eric Franzon at SemanticWeb.com asked the community to provide "elevator pitches" for the Semantic Web. Here's my attempt:

Thursday, January 06, 2011

Ian was Right and I was Wrong

Following Ian Davis' post A Guide to Publishing Linked Data Without Redirects, I followed with A(nother) Guide to Publishing Linked Data Without Redirects. In that post, I argued that resource descriptions should be separated from resource representations at the HTTP level. I see now that I was wrong.

Ian challenged me to come up with a compelling reason why HTTP should encode the difference between a resource representation and a resource description and, after some effort, I simply could not. Ian summarized his thoughts in a new post: Back to Basics with Linked Data and HTTP.

The problem in my mind has always related to the use of HTTP URIs to identify things in the real world. We can get around that easily enough by returning RDF whenever someone resolves those URIs. You get a description of a real-world thing that is as richly described as the publisher wanted it to be. Cool.