Wednesday, November 19, 2008

Mulgara on the Cloud

Chris Wilper reportedly put a Mulgara instance on Amazon EC2 and loaded a quarter-billion triples into it, just for fun. The data he used was generated from Paul Gearon's numbers RDF generator. The generator creates facts about numbers and keeps creating RDF until you stop it.

Chris was using the new XA version 1.1 storage layer for Mulgara. The quarter-billion triples loaded in about a day, which is about the same loading performance that Paul has seen on his laptop. The size of the indexes were about 4.6 gigs compressed and 51 gigs uncompressed. Paul notes that the XA version 2 storage layer will store strings and URIs much more efficiently and thus reduce the index sizes considerably.

Way to go, guys! We can't wait for XA2!

Tuesday, November 18, 2008

New PURL Code in the Wild

Several PURL installations have been seen on the 'net using the new code base:
  1. NeuroCommons
  2. The National Center for Biomedical Ontology (NCBO)
  3. Semantic Report
  4. Zepheira (naturally)
Others include at least one startup that doesn't have a Web site yet (GRACE Research Corporation) and a couple of others still in stealth. Not bad for a project that never had an official launch. Keep 'em coming, folks. I have some hope that OCLC will join that list soon, as well as a couple more startup companies.

PURL Server v1.2 Released

I released the new Persistent URL (PURL) server, version 1.2 over the weekend. This release fixes a major bug in version 1.1 whereby PURLs would not resolve for those not having a cookie set. Minor upgrades include better error reporting and URL handling.

All users of earlier PURL servers are encouraged to upgrade immediately.

Binary (JAR) and source code downloads are available from

Saturday, November 08, 2008

The Perfect 11-year-old Boy Birthday Party

4:00 PM: Archery (supervised, of course)

4:45 PM: A few minutes free time on a trampoline

5:00 PM: Fencing lesson (even more stringently supervised, with cardboard boxes as opponents)

6:15 PM: Make your own pizza for dinner

7:15 PM: Communal game of Crossfire on a map created by the birthday boy himself

8:30 PM: Cookies

8:35 PM: Pass-the-parcel (an Aussie game for our Aussie son)

8:40 PM: Open presents

8:50 PM: Movie night (The Princess Bride)

10:30 PM: Sleep over

8:00 AM: Breakfast of chocolate-chip pancakes

9:00 AM: Pickup by parents

Wednesday, November 05, 2008

My Last Degree

Tonight, I passed the oral defense for my Ph.D., having submitted my thesis in July. It's all administration from here :)

Thursday, October 02, 2008

Dulles TSA Lets Leatherman Squirt Slip By

Knives on a plane? Believe it.

I flew from Dulles International Airport on Monday and forgot to remove my Leatherman Squirt from my keyring. I put my keys in a bin with my shoes when passing through security and the TSA simply didn't notice. Neither did I until I was on the plane :/

Tuesday, September 16, 2008

Aidan Beats Sarah

Wow. My son, 10, today gave his sister a succinct and cogent description of the Bush Doctrine on the way home from school today. He covered all the basics, including the ethics of preemptive strikes, the use of sanctions and inspections and the limitations on presidential power. Take that, Sarah Palin :)

Truck Fire on I-81 Today

Originally uploaded by prototypo
I traveled from Fredericksburg to Lexington, Virginia today in the beautiful Shenandoah Valley to kick off a seminar series that I organized at VMI. Unfortunately, an accident involving a semi rig completely shut down a section of I-81 this morning about 8 miles South of Staunton for many hours and I arrived late. I was luckily still able to give my talk.

I haven't seen any other reporting on this accident other than this blog post, which is surprising considering the amount of time the entire Southbound lanes were closed. Traffic was still crawling through a single lane a couple of hours ago.

Friday, September 12, 2008

The Canary in Ohio

I checked in with my favorite indicators of political opinion in the swing state of Ohio today; my parents. Their political position in support of McCain was not surprising, but the details are interesting.

My mom, 81, favors abortion rights, but loves Sarah Palin. She stopped listening to the Youngstown radio station because she "couldn't believe how hard they were being on her." She suspected that the reason the media was being hard on Palin was "because she is a woman." She was amazed to hear about a poll that suggested that the women of Ohio were supportive of Obama, because everyone she discusses politics with supports McCain.

Palin's lack of understanding of the Bush Doctrine did not bother her at all: "She'll learn." When asked about Palin's religious views, especially her assertion that the war in Iraq is "God's Plan", she dismissed it as propaganda from the Obama campaign. I pointed her to the video. We'll see what she says after viewing it.

My mom also didn't care that Palin linked Iraq to 9/11. My mom does, too. She didn't fully believe me when I explained the Big Lie and that even George Bush has publicly repudiated the notion.

My dad, 82, expressed disdain for all four candidates and the electoral process in general. He was not rocked by the allegations that McCain may have used his Senate influence to tamper with a DEA investigation, saying that "they all do that sort of thing". He went on to say that he has been going to the library to rent old movies and other videos until two weeks before the election. At that point, he will try to listen to the position of both candidates and make a decision.

Thursday, September 11, 2008

China as an Island

This makes sense!

John Mauldin, an analyst at Investors Insight, came up with an awesomely useful idea: Instead of thinking of China as a huge land power, imagine them as an isolated island. Geographically, politically and demographically it makes a lot of sense.

John's idea is just part of a larger analysis of China to be found in this article, although the site was not responding today when I checked it.

The graphic is worth repeating. It is powerfully concise:

I found both the graphic and the link to the original article on Strange Maps, a blog worth watching.

Thursday, August 21, 2008

Running for President? No, Thanks.

I had to laugh at this interesting thought from my dad:

Dear Dave:

I finally decided that I have only one interest in the upcoming election and it involves your birthday.

Since you and Obama are of the same generation and about the same age, it occured to me that your life experiences are much broader than his.

You have served in the military, have operated a business under difficult circumstances, and have never had to shape your beliefs for political purposes as all politicians do.

Just wanted you to know I am glad you are not running for president.



Thursday, August 14, 2008

Bernadette's Mom Has Passed

Bernadette's mom died this morning at 2:30 AM US EDT. She passed peacefully in her sleep at 91 years old.

The family requests that mourners consider sending a contribution to the Rhode Island Hospice or your local Hospice. Contributions are tax deductible.

Wednesday, August 06, 2008

NetKernel Architects Conference

This year's NetKernel Architects Conference is being held 25-27 September 2008 at the University of Mary Washington, right here in Fredericksburg, VA. Brian and I are helping Peter and Randy from 1060 Research to organize it.

1060 will be introducing NetKernel version 4.0 Pre-Release. Each attendee will get a copy on the first day. Brian and I had the chance to see and play with it last month in London. All I can say is, wow! Fans of resource-oriented computing are about to get a very powerful tool to add to their toolbox. The Architects Conference is a great way to learn the concepts and get a jump on the competition.

The conference schedule and price list are on the Web.

Y'all come down now, y'hear?

Tuesday, August 05, 2008

New Linked Data Blogroll

Talis has set up a blog aggregator for discussions of Linking Open Data at Good show, Talis.

Gartner Gets FLOSS; Late as Usual

A new report by Gartner on Free/Libre/Open Source Software concludes, " ... if you do not think you use it, then you use it; and if you think you do use it, then you use lots more of it than you know." No kidding. They rightly call out the fact that many common big-enterprise vendors use FLOSS components, but fail to mention FLOSS use by the dominant players like Microsoft.

The full report is accessible only by those with money, but KMWorld reported on it, as did ZDNet. Both reporters focused on the use of FLOSS by cloud computing centers, but the report is much more interesting than that. The most interesting statement to me was the positioning of SaaS for enterprises as a primary cost-cutting measure. SaaS is just another form of outsourcing, but this time to a much more level playing field.

Gartner is a funny organization. I haven't quite forgiven them for reporting in 1997 (!) that the Web was not likely to be important to businesses. They even gave that ridiculous statement a 70% confidence rating, IIRC. This time, they seem to be on track, but reporting a bit late. Why wait until FLOSS use is so ubiquitous in mainstream software before telling enterprise CIOs about it?

Monday, August 04, 2008

Digging out from Under

Whew! I've submitted my Ph.D. thesis (finally) and am now waiting upon the reviewers. It should be a couple of months and then an oral defense. I may be able to post more regularly once I find my inbox so I can start clearing it.

Tuesday, July 08, 2008

The World's First Metadata

Originally uploaded by prototypo
I spent a good bit of 29 June at the British Museum in London. Those Brits really know how to loot! Attached is a picture of Mesopotamian girginakku from 5300 years ago. Girginakku were clay tags that were hung off of tablets or rolls to provide clues to their contents. They were originally used as tokens much like an IOU, but evolved into the world's first metadata.

The keepers of the markers ("rab girginakku") were the first metadata wranglers; the intellectual forebears of SemWebbers. Cool, eh?

On the right in the photo are bullae, clay envelopes containing girginakku, presumably to keep the girginakku from being tampered with before they were needed (sort of like a modern lead seal). Unfortunately, neither the girginakku nor the bullae were described in this detail or given their proper names on the descriptive cards near the artifacts. I only recognized them for what they were because I have recently been researching the history of metadata.

Saturday, June 21, 2008

Making Glass

Making Glass
Originally uploaded by prototypo
Bernadette and I had the fun of learning something of the art of glass making last week. The resident glass blower at Oglebay Park, WV showed us the basics and let us make some paperweights. Great fun!

Monday, April 21, 2008

Semantic Technology Conference 2008

I will be speaking at SemTech again this year. Everyone from Zepheira will be in attendence and most of us will be speaking. Brian and I will be giving a 3-hour tutorial on Semantic Resource Oriented Architectures and Eric and I will be talking about the new PURL capabilities.

Wednesday, April 09, 2008


Originally uploaded by prototypo
The Software Agent Maintenance Model (SWAMM) is a NetLogo model of a software maintenance methodology I have been working on. A runnable Java applet of the model is available at:

Monday, March 31, 2008

Cherry Blossoms 2008

Another view of the cherry blossoms surrounding the Tidal Basin in Washington, DC on Sunday, 30 March 2008. The Combat Air Patrol contrails are visible over the Washington Monument.

Cherry Blossoms 2008

Originally uploaded by prototypo
Bernadette and I had the great fortune to view the cherry blossoms surrounding the Tidal Basin in Washington, DC at sunrise on Sunday, 30 March 2008. They were at their peak and the light was just perfect. It rained today, so we were especially lucky not to have put it off.

Appreciation of beauty seems so antithetical from the normal activities of Washington, DC, but perhaps that should make us try harder. The contrails of Combat Air Patrol jets painted the sky behind the blossoms.

Sunday, February 24, 2008

Semantic Conference Scheduling Application

This year's Semantic Technology Conference has a nifty new semantic conference scheduling tool, courtesy of fellow Zepheirans Uche and Eric. Nicely done, guys!

The scheduler allows you to graphically fill out a calendar of sessions that you want to attend. Conflicts are immediately visible. The faceted navigation on the left allows you to find sessions based on tagged data (speaker, company, topic, day, etc). When you are happy, you can import your schedule right into your ical-compliant calendar. I wish all conferences had this.

Monday, February 18, 2008

Beyond Redirection: Rich and Active PURLs

It has been a while since I posted about the new Persistent URL work being done by Zepheira and OCLC. That is partially because the work has gone much more slowly than we had planned. We have been actively gold plating the new PURL server because we are trying to satisfy several communities. The result, though, is laying the foundations for new services for the Web.

The key to the new PURL service is the typing of PURLs. The existing public PURL service at OCLC returns an HTTP 302 (Found) response, causing a Web client to redirect to another URL. The new PURL server allows PURLs to return one of several status codes (301, 302, 303, 310, 404 and 410).

We can go even further, though. The original PURL server had some internal concepts like "cloning" a PURL (basing a new PURL definition on an existing one) and "chaining" a PURL (allowing multi-person management of a URL resolution process). Combining these concepts with the choice of HTTP response codes got us thinking about arbitrary types of PURLs.

We have been experimenting with types of PURLs that combine with other services. A "Rich" PURL, for example, is the combination of a PURL and metadata. We have a prototype service that combine strong identifiers with rich metadata, providing the building blocks for other semantic applications. Rich PURLs are a combination of two related services: A PURL server for management of the resolution and RDF (or other metadata format) file hosting.

The W3C TAG finding regarding the use of HTTP 303 responses seems to suggest that we could use rich PURLs in interesting ways. For example, we could do the following:

A 301 PURL pointing to an RDF resource == Metadata about an information resource
A 303 PURL pointing to an RDF resource == Metadata about a non-information resource (i.e. a physical or conceptual resource)

That usage would be consistent with the TAG finding, even if it goes a bit beyond it.

NB: You can tell if you get a PURL by looking at the PURL header in an HTTP response.

But wait, there's more. What if a PURL pointed to a Web service (in the sense of dynamic content, not necessarily limited to SOA Web Services, but including them)? The combination of a Rich PURL and a metadata reference to a Web service yields an "Active PURL". That is, an Active PURL is a PURL naming a graph of metadata describing a Web service.

Consider a simple Web service like an RSS feed. Placing an Active PURL in front of that feed allows you to describe how that feed should be handled. You could name the facets that you want to use to make an Exhibit or provide any other presentation advice you desired.

Alternatively, an Active PURL might itself by a sort of Web service that provides dynamic metadata about another Web service and can either serve the metadata or redirect to its target service. Such an Active PURL could be used to name a SPARQL graph, accept query parameters for it and return metadata about it, such as a count of results or information on the meanings of columns in the result set. I believe that named graphs are very handy things and something that we as a community are paying inadequate attention to. Given that SPARQL may be the query language that finally integrates our silos of relational databases, fronting them with Active PURLs seems like a promising line of research.

Lists of URLs as proposed by Stu Weibel would be easy to implement as an Active PURL.

Perhaps the most interesting use of Active PURLs to enterprises might be the ability to provide standardized RDF metadata about SOA Web Services as well as relational databases. UDDI is so broken, we might as well fix it with existing SemWeb standards. That is not a new idea, but the application of Active PURLs to the problem is, I think.

Tuesday, February 12, 2008

Friday, February 08, 2008

GRDDL Article on DevX

Brian has written a new DevX article entitled "Gleaning Information From Embedded Metadata", explaining GRDDL. He used my home page at Zepheira as an example of a live page with embedded, machine readable metadata.

Wednesday, January 30, 2008

A Great Week for Old Technology

It seems that ships are again sailing the ocean using wind power. The inventor claims that his kites can reduce fuel consumption of most merchant ships by 10-35%. That's awesome.

Even more bizarre is the Mach 20 paper airplane that a Japanese origami expert wants to throw from the International Space Station. Really, you can't make this stuff up.

Friday, January 25, 2008

Hang onto your towel

Burt Rutan and his business partner announced the design of SpaceShipTwo. Very cool! Oddly, his business partner is apparently Zaphod Beeblebrox...

Bitten by Polymorphism

Architecture discussions can get ugly. I was participating in one when Brian suggested the domain name for our activities. A quick check on showed that someone is actually holding that domain, although nothing resolves there. The polymorphism arose when we realized that suggested some alternative domain names, including and *Sigh*

Wednesday, January 16, 2008

SPARQL Set to Change the Web

The W3C announced yesterday the standardization of SPARQL. This broken and immature standard has the capability to rapidly change business operations, Web searching, the advertising model of most Web revenue and enable a new generation of Web-based services.

SPARQL is broken and immature for the simple reasons that it failed to include a way to write data. I have my gripes about the way it reads data, too, but those are less important. As long as the W3C continues to treat the Web as a read-only system, the longer the Web will primarily be a read-only system. It is bad enough that we have failed to widely implement and use HTTP PUT and DELETE (fully half of REST), but we really should know better than to create new standards in 2008 that make the Web look like an information retrieval system and not an information aggregation and creation system.

For all that, SPARQL is still an incredibly important set of standards. There are three of them, SPARQL the query language for the Semantic Web, SPARQL the protocol and SPARQL the results format in XML.

SPARQL is important because it gives the world a standard way to perform distributed queries across disparate data sets. In other words, it allows you to treat the Web (the Semantic Web) as a database. Relational databases and other data stores can play, too. They just need a SPARQL overlay. This is something that the relational database crowd has never been able to pull off. I suspect that a standards body would have failed do create a standard for an RDB distributed query language if they had tried due to industry competitive pressures, but SPARQL is that critical end run. By pursuing the goal in the Semantic Web community, we now have something that will work for the RDB folks, too.

I remain fussed that Mulgara doesn't have SPARQL support yet, but Paul tells me it should come soon. I certainly haven't done anything to help, so I shouldn't complain.

Why do I think SPARQL can fundamentally change business models? Because of my experiences with this blog. I started experimenting with advertising once my readership reached reasonable levels. Impressions are few, even though readership is decent. Why? RSS. Most people read this blog via news readers or aggregators and not via the Web. They don't ever see the ads. Cool, right? Yeah. Imagine, though, the impact on the online advertising market (which financially supports Google and its competitors) when the Web is a database. Nobody will see the ads. Watch out, world. Where there is chaos, there is opportunity. I can't wait to see what happens next.

Tuesday, January 08, 2008

Tucana's Fate Sealed

Northrop Grumman Corporation's Electronic Systems Sector has been attempting to sell its Tucana RDF database software for some time. They have been quietly seeking buyers since they dropped public references to it as a "rsemd1-extweb50 Sensor1 Replacement Server" in August 2006. Unfortunately for them, potential buyers keep calling me.

Each call is similar in nature and goes something like this:

Buyer: We are very happy to tell you that Northop Grumman has accepted our cash offer for the Tucana technology. We would like to talk to you about how we can make use of it.

Dave: Sure. I am sure Northrop must have done some work on it since they bought it. What has been added to the code base?

Buyer: They made us swear to secrecy, so we can't tell you that.

Dave: OK, I understand. Can you tell me whether it differs substantially from its Open Source baseline, Mulgara?

Buyer: Umm, there is an Open Source project? What is the URL?


Buyer: Is it an active project? Do people use it?

Dave: Oh, yes. It is used in production by a number of for-profit and non-profit companies and many researchers.

Buyer: I thought Northrop had killed the Open Source project?

Dave: Nope. They tried but failed. Mulgara was a fork to avoid any future legal disputes. It has no code contributed by Northrop or its contractors.

Buyer: Uh, perhaps we should look into that and get back to you.

Naturally, that is generally the last I hear about it until the next potential buyer calls. The last one was yesterday, but they weren't the first (or the second or the third) and probably won't be the last.

The sad thing is, of course, that if a single manager at Northrop had tried to work with the Open Source community instead of building an empire the project could have been wildly successful in their customer base. There is still a market need for a more scalable RDF database outside of the government, as evidenced by the list of potential buyers, the life sciences community's desire to represent genomic data semantically, Garlik's creation of a custom one last year and continued funding from Mulgara users for scalability development.