Tuesday, November 22, 2005

Trip Report from WISE 2005's International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2005)

On Sunday, 20 November 2005, I attended the International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2005), held as part of The 6th International Conference on Web Information Systems Engineering (WISE 2005) in New York City. Whew! That is a rather fancy way of saying that I spent a weekend day working in a small conference room with twenty SemWeb geeks, about half of whom I already knew.

Semantic Web repositories were well represented: Dave Beckett, creator of Redland, Steve Harris of 3Store and Alex Hall and myself from Kowari.

I was on the program committee for the workshop and presented a short paper entitled "Scaling the Kowari Metastore" [1]. That paper surveyed the options to keep Kowari as the most scalable SemWeb database. I hope we will get the chance to implement some of them in 2006 so we can keep ahead of the game.

Dave Beckett moved from the University of Manchester to Yahoo! in October 2005. It is interesting to see Yahoo! hiring SemWeb people and reportedly using them on SemWeb projects.

The most interesting talk (to me) was on a new repository for OWL data called OWLIM [2]. The talk was given by primary author Atanas Kiryakov. OWLIM supports RDFS and OWL DLP and (yet another) variant he called "OWL Horst", based on a ISWC 2005 paper written by H.J. ter Horst [3].

Owl Horst is an RDFS-compatible OWL fragment which allows extensions with rules. It contains less Description Logic than OWL Lite, but more capability for rules extensibility. OWL Horst provides entailment over RDF graphs based on rules of triple patterns, with variables at any position (subject, object and/or predicate).

Atanas said that general rules extensibility in OWLIM is "possible"; which probably means that they have not exposed a generic rules engine yet. Interestingly, InverseFunctionalProperty is included as a primitive property. No cardinality constraints are included, though, which I contend limits its usefulness with ontologies dealing with business data.

OWLIM is available as a Storage and Retrieval Layer (SAIL) for Sesame. Unfortunately, although it is Open Source (LGPL), it runs on Ontotext's TRREE engine, which is freely available but proprieary to Ontotext. OWLIM reportedly has very fast upload, and retreival/query speeds, but relatively slow deletes. Its scalability becomes limited with high implicit/explicit statement ratios. They claim 30 million statements as an upper limit on reasonable hardware (64-bit Opteron), 10 million on 32-bit desktops. It is written in Java 1.5. Query speeds are claimed to be linear with data size; delete time is also linear (20 sec per million statements present).

Atanas strongly encourged the Kowari team to head toward an implementation of OWL Horst instead of the OWL Lite plus full cardinality support that we planned. I will have to look into whether we could add full cardinality to OWL Horst. If we can, it sounds like a reasonable thing to do.

Steve Harris presented his recent work on implementing SPARQL in his 3Store RDF repository [4]. 3Store is written in C and uses mySQL as a storage backend. Steve has managed to auto-generate reasonable SQL from SPARQL to allow 3Store to handle some tens of millions (up to 35 million) of RDF statements. He did not implement SPARQL's nested optional, nested union or case-insensitive regexps, but that still makes it nearly complete with the SPARQL specification.

I am sure that Steve's work will come in really handy for Oracle, if they look at it. No one from Oracle was in attendance.

Denis Ranger from Mind Alliance presented his work with Jean-Francois Cloutier on a query algorithm for scalable SemWeb P2P systems [5]. Their algorithm relies on Pastry and Scribe, both created by Microsoft Research (A non-interoperable FreePastry implementation has been released under a BSD-like license from Rice University.) Data routing is handled by Pastry; peers are connected to a small set of neighbors and the connections are rebalanced automatically as peers come and go. Scribe provides publish and subscribe message- and topic-handling. They are working on a simulation and I look forward to seeing it work.

It is interesting that several researchers have been using the Lehigh University Benchmark (LUBM) data to benchmark OWL-oriented systems. It seems to be becoming a de facto standard.

References:

[1] Wood, D., Scaling the Kowari Metastore, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 193-198, 2005.

[2] Kiryakov, A., Ognyanov, D., and Manov, D., OWLIM- A Pragmatic Semantic Repository for OWL, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 182-192, 2005.

[3] ter Horst, H.J., Combining RDF and part of OWL with Rules: Semantics, Decidability, Complexity. In Proc of ISWC 2005.

[4] Harris, S., SPARQL Query Processing with Conventional Relational Database Systems, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 235-244, 2005.

[5] Ranger, D. and Cloutier, J.F., Scalable Peer-to-Peer RDF Query Algorithm, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 266-274, 2005.

No comments:

Post a Comment