Vowel Movement: November 2005

Tuesday, November 22, 2005

XTech 2006 Call for Participation

The Call for Participation for XTech 2006 (formerly XML Europe) is now available. The conference will be 16-19 May 2006 in Amsterdam (note that WWW 2006 is 23-26 May in Scotland). The track that includes SemWeb, tagging, microformats is entitled "Open Data".

Papers at this conference are selected by peer review of *abstracts*.

The important dates are:
9 January / Presentation and Tutorial Proposals Due
10 February / Accepted Speakers Notified
17 March / Late Breaking News & Product Proposals Due

I presented the Kowari overview paper at XTech 2005, which was also in Amsterdam. It was a lot like WWW, only less so. A number of the speakers overlapped, as did the content and focus. If you can't get to WWW, XTech is a reasonable choice.

Trip Report from WISE 2005's International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2005)

On Sunday, 20 November 2005, I attended the International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2005), held as part of The 6th International Conference on Web Information Systems Engineering (WISE 2005) in New York City. Whew! That is a rather fancy way of saying that I spent a weekend day working in a small conference room with twenty SemWeb geeks, about half of whom I already knew.

Semantic Web repositories were well represented: Dave Beckett, creator of Redland, Steve Harris of 3Store and Alex Hall and myself from Kowari.

I was on the program committee for the workshop and presented a short paper entitled "Scaling the Kowari Metastore" [1]. That paper surveyed the options to keep Kowari as the most scalable SemWeb database. I hope we will get the chance to implement some of them in 2006 so we can keep ahead of the game.

Dave Beckett moved from the University of Manchester to Yahoo! in October 2005. It is interesting to see Yahoo! hiring SemWeb people and reportedly using them on SemWeb projects.

The most interesting talk (to me) was on a new repository for OWL data called OWLIM [2]. The talk was given by primary author Atanas Kiryakov. OWLIM supports RDFS and OWL DLP and (yet another) variant he called "OWL Horst", based on a ISWC 2005 paper written by H.J. ter Horst [3].

Owl Horst is an RDFS-compatible OWL fragment which allows extensions with rules. It contains less Description Logic than OWL Lite, but more capability for rules extensibility. OWL Horst provides entailment over RDF graphs based on rules of triple patterns, with variables at any position (subject, object and/or predicate).

Atanas said that general rules extensibility in OWLIM is "possible"; which probably means that they have not exposed a generic rules engine yet. Interestingly, InverseFunctionalProperty is included as a primitive property. No cardinality constraints are included, though, which I contend limits its usefulness with ontologies dealing with business data.

OWLIM is available as a Storage and Retrieval Layer (SAIL) for Sesame. Unfortunately, although it is Open Source (LGPL), it runs on Ontotext's TRREE engine, which is freely available but proprieary to Ontotext. OWLIM reportedly has very fast upload, and retreival/query speeds, but relatively slow deletes. Its scalability becomes limited with high implicit/explicit statement ratios. They claim 30 million statements as an upper limit on reasonable hardware (64-bit Opteron), 10 million on 32-bit desktops. It is written in Java 1.5. Query speeds are claimed to be linear with data size; delete time is also linear (20 sec per million statements present).

Atanas strongly encourged the Kowari team to head toward an implementation of OWL Horst instead of the OWL Lite plus full cardinality support that we planned. I will have to look into whether we could add full cardinality to OWL Horst. If we can, it sounds like a reasonable thing to do.

Steve Harris presented his recent work on implementing SPARQL in his 3Store RDF repository [4]. 3Store is written in C and uses mySQL as a storage backend. Steve has managed to auto-generate reasonable SQL from SPARQL to allow 3Store to handle some tens of millions (up to 35 million) of RDF statements. He did not implement SPARQL's nested optional, nested union or case-insensitive regexps, but that still makes it nearly complete with the SPARQL specification.

I am sure that Steve's work will come in really handy for Oracle, if they look at it. No one from Oracle was in attendance.

Denis Ranger from Mind Alliance presented his work with Jean-Francois Cloutier on a query algorithm for scalable SemWeb P2P systems [5]. Their algorithm relies on Pastry and Scribe, both created by Microsoft Research (A non-interoperable FreePastry implementation has been released under a BSD-like license from Rice University.) Data routing is handled by Pastry; peers are connected to a small set of neighbors and the connections are rebalanced automatically as peers come and go. Scribe provides publish and subscribe message- and topic-handling. They are working on a simulation and I look forward to seeing it work.

It is interesting that several researchers have been using the Lehigh University Benchmark (LUBM) data to benchmark OWL-oriented systems. It seems to be becoming a de facto standard.

References:

[1] Wood, D., Scaling the Kowari Metastore, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 193-198, 2005.

[2] Kiryakov, A., Ognyanov, D., and Manov, D., OWLIM- A Pragmatic Semantic Repository for OWL, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 182-192, 2005.

[3] ter Horst, H.J., Combining RDF and part of OWL with Rules: Semantics, Decidability, Complexity. In Proc of ISWC 2005.

[4] Harris, S., SPARQL Query Processing with Conventional Relational Database Systems, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 235-244, 2005.

[5] Ranger, D. and Cloutier, J.F., Scalable Peer-to-Peer RDF Query Algorithm, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 266-274, 2005.

$100 Green Machine Debuts at UN

Nicholas Negroponte and Kofi Annan demoed the Green Machine last week at the United Nations (stories at BBC, New Scientist, IEEE Spectrum). This is the coolest thing I have seen in a long, long time! It is meant to bring a reasonable level of computing to the poor children of the world.

The Green Machine is a sub-$100 laptop computer powered by a hand-cranked dynamo. It contains a simple, dual-mode bright LED screen to save power (no backlighting) and includes mesh networking capabilities to share Internet connections or just create ad-hoc networks. Naturally, it uses exclusively Open Source software, both to keep the price down and to facilitate internationalization to the world's poorest countries who are not a market force.

Professor Negroponte thinks he can get millions of these things built in short order and plans to sell them to governments, presumably opening the way for grants to help. Go, man, go!

Monday, November 14, 2005

Decent VIM RegExp Guide

I finally found a decent, readable guide to regular expressions in vim: http://www.geocities.com/volontir/. Yay!

Sunday, November 13, 2005

Guitar Lessons

Well, I've done it again. Just as my life was starting to get a bit less frenetic I've taken on another project. My friend Naser took my to the Guitar Center in Fairfax and I bought a Yamaha six-string acoustic guitar. I haven't done a bit of work in the last several days, spending my time instead trying to wrap my short fingers around the eight basic chords described in Guitar Noise's Asolute Beginner series of articles. Naser is going to give me my first formal lesson on Friday.

I always thought computers were a terrible time sink. You start coding and hours just slip by. Guitars, at first glance, would seem to be worse.

Wednesday, November 02, 2005

Tufte Course Review

I attended a one-day course on data graphics yesterday, given by the justly-famous Edward Tufte. I highly recommend both the course and his books, but, as usual, have some comments.

The best parts of the course were his introduction to sparklines and his comments on the deleterious effects of Microsoft Powerpoint.

Tufte recommends a radical increase in the information density of documents and presentations. That gives documents and presentations more readability and assists the retention of information.

However, such a significant increase in information density means that in order to create a Tufte-approved presentation, one must take the time and effort to include all that additional information. One will not always have the time to do that. Office workers, military officers, stock traders and others in operational roles are often required to brief quickly, before ideas are fully formed. This, of course, never happens in science. Tufte notes that his first book required twelve years to write.

Blaise Pascal once apologized for writing a long letter, saying "The present letter is a very long one, simply because I had no leisure to make it shorter." In other words, he was saying that he did not choose to spend the time to radically increase the information density. That is an economic decision and, as we all know, all engineering is economics. Thanks to Brian S. for properly attributing the quote.

I noticed one other interesting phenomenon. At the end of the day, Prof. Tufte ended the lecture, the audience applauded and he waved. Then he basked in the applause, like a rock star or a politician. The last person I saw enjoy applause that much was Bill Clinton. Tufte has commented that his reviewers never include graphics, so here is my rendition of him basking in applause:

. The image is, of course, a play on his "airport signal people".

Still, the course was interesting and thought provoking. I highly recommend it as an addendum to reading his books.