Thursday, March 05, 2009

O'Reilly Media Joins the Semantic Web

O'Reilly Media (, the current name for the geek publishing giant founded by Tim O'Reilly, has finally joined the Semantic Web.  O'Reilly's coining of the term "Web 2.0" and early misunderstandings of the Semantic Web stack lead some to think that he didn't see much value in machine readable information.  That seems to have changed, at least in within O'Reilly Labs.

O'Reilly Labs launched a Beta product last month called the O'Reilly Product Metadata Interface (OPMI), which is available at  The OPMI is a technical platform for the exchange of metadata between publishing trading partners.  Now that it is in RDF and publicly accessible, the rest of us can play with it, too.

It is easy to retrieve RDF/XML describing any book that O'Reilly publishes. You simply perform an HTTP GET on a URL constructed with the book's International Standard Book Number (ISBN). Every edition of a published book has an ISBN and they come in two flavors, the older 10-digit variety and the newer 13-digit version. All ISBNs issued after 1 January 2007 have been 13 digits. Some books are assigned both forms by their publishers for convenience during the transition.

For example, let's get the metadata description of an O'Reilly book I wrote, Programming Internet Email. The 13-digit ISBN for the second edition of the paperback is 9781565924796, and the 10-digit equivalent is 1-56592-479-7. The OPMI nicely works with either one, but the returned RDF uses the modern 13-digit one as canonical, as it should.

The URL for any O'Reilly book is followed by its ISBN, in this case 9781565924796. The full URL is thus

An HTTP GET may be done with any Web browser, of course, or on a command line by use of the curl utility:

$ curl

The returned RDF includes a wealth of information about the book. The OPMI uses four vocabulary descriptions in its RDF: Dublin Core for describing books (title, subject, language, etc), Friend-of-a-Friend (FOAF) for describing people associated with those books, the library community's MARC (MAchine Readable Cataloging) relator codes for relating people and books and the Metadata Object Description Schema (MODS) for specifying the edition of a book. MARC and MODS come from the Library of Congress and are traditionally used in library cataloging systems.

Since this metadata is on the Web, we can use standard Semantic Web query tools to query it. Using SPARQLer, a SPARQL query language processor available freely on the Web, we can query the RDF to extract bits we want. A bit of playing around makes it easy to get the author's name and the unique URI assigned to the author by O'Reilly:

prefix dc:
prefix foaf:
prefix rdf:
SELECT ?work ?authorURI ?author
?work dc:creator ?authorType .
?authorType rdf:_1 ?authorURI .
?authorURI foaf:name ?author

The results look like this:
work authorURI author
< product:9781565924796.IP> < agent:pdb:2495> "David Wood" @en
< product:9781565924796.BOOK> < agent:pdb:2495> "David Wood" @en

There are two results because the first (.IP) is the overall URI for the work in all of its possible formats. The second (.BOOK) is the book edition of the work. If this book had been published on Safari, O'Reilly's electronic publishing forum, it would also have a URL ending in ".SAF". E-books get an ".EBOOK" and Apple iPhone applications get a ".APP".

O'Reilly claims published metadata for over 1100 books, which is a pretty reasonable addition to the Semantic Web, even in Beta. Naturally, I now want O'Reilly to publish machine-readable metadata on their human-readable Web pages using RDFa. There has been no sign of that yet, though.

This content was cross-posted to Semantic Universe.


  1. Yep, we'd like to have our catalog pages include that too. Should in a few more weeks here.

  2. Hey, you have a great blog here! I'm definitely going to bookmark you!

    I have a foam memory mattress site/blog. It pretty much covers memory foam mattress related stuff.

    Come and check it out if you get time :-)