Tuesday, October 08, 2013

Repairing Makerbot Replicator 2 Nozzle Insulation

The Problem: Filament Ball Tears Insulation Tape

I have finally used my Makerbot Replicator 2 (not a 2x) enough to run into a problem that is not strictly end-user serviceable.  A large filament ball stuck to the ceramic insulation tape surrounding the Replicator's nozzle and ripped the tape when the ball was removed.

Makerbot does have a video on removing a filament ball (or "blob" as they call it - must be a technical term).  Look for the video entitled "Removing a PLA Blob" on their troubleshooting page.  They suggest heating the extruder nozzle and gently lifting off the ball while being "careful not to rip the ceramic insulation tape".  Yes, indeed.  I agree that it is always best not to remove a filament ball when the nozzle is cold, but the described process wasn't sufficient for me.  I have since discovered that I can use the "Load filament" option under the Utilities menu to heat the extruder and use newly applied filament to melt and gently force downward the ball.  That seems to make it easier to remove cleanly.

I decided to replace the ceramic insulation tape and the kapton tape that holds it in place.  This required a rather complete tear-down of the extruder assembly, but it wasn't really that bad.  Anyone reasonably careful should be able to do it themselves.


The filament ball complete with insulation tape stuck to it:


Parts

I sourced some ceramic insulation tape and some kapton tape to cover it from UltiMachine, a RepRep printer supply company in Tennessee.  Unfortunately, Makerbot doesn't supply such parts.



  • 1 x Dupont Kapton High Temp. Adhesive Tape 1/4" ($6.00), SKU UMKPTON025
  • 1 x Ceramic Insulation Tape ($1.50), SKU UMCERTAPE


The kapton tape from UltiMachine was quite thin side-to-side, much thinner than the tape used by Makerbot.  I used it anyway and just wrapped it around to ensure a complete coverage.  It seems to be fine.

Repair Instructions

The first thing to do is to carefully remove the fan assembly and extruder motor from your Makerbot Replicator 2.  These instructions will help you if you are unfamiliar with the process.  They were developed for the upgrade of the extruder assembly.  Follow them just until you remove the fan assembly and the motor, but do not take off the extruder itself. 


Next you will need to remove the side fan assembly.  Use a Phillips screwdriver to remove the two screws holding the assembly to its chassis (shown at 10 o'clock and four o'clock in the picture below).


Disconnect the electrical wire connectors so the fan assembly hangs down out of your way.  Note that you will still have two wires (in a single insulated cable) connected at this time.  Those are the thermocouple wires that are used to heat the nozzle bock.  They will be removed shortly.

Remove the two hex screws from the base of the chassis that connect the chassis to the aluminum block beneath it.


Removal of the chassis exposes the aluminum heat sink.  The nozzle assembly is attached via a bolt and hangs beneath the heat sink.

Removal of the aluminum heat sink requires you to look underneath the assembly to find two more hex screws, one on each side of the heat sink.


Once you have the heat sink entirely removed from the extruder mount, it should look something like this:


Now it is time to remove the thermocouple wires.  Unscrew the thermocouple connector from the nozzle block.  Note that the wires will twist if you leave the block in place!  Instead, you should rotate the nozzle block to remove the thermocouple connector as soon as you loosen the connection enough to do so.  That will keep the wires from breaking.


You should now have the nozzle block and heat sink completely removed from the printer.  You can move it to a more convenient working surface, such as a workbench.

Use a crescent wrench or socket wrench to remove the nut at the top of the heat sink, as shown.  You will see that the bolt has a hole through its center; that is where the filament passes on its way to the nozzle.


Remove the heat sink and set it aside.  You should also remove the nut between the nozzle block and the heat sink.


Clean off the old kapton tape and ceramic insulation tape.  Some of mine was baked on and was difficult to remove.  Carefully remove the tape so you don't damage the brass nozzle, the aluminum block or the wires.  I used a utility knife to remove most of the material and found that heating the material with a portable butane touch made it easier to remove.

Please note that a steel utility knife blade is substantially harder than the aluminum of the block or even the brass of the nozzle.  You can shave off bits of them if you aren't careful.  You only want to remove the old tape.


I placed the block into a vise to facilitate cleaning it and also (carefully) used a wire brush for the final touches.


Cut some ceramic insulation tape to the width of the nozzle block.


Cut holes in the ceramic insulation tape for the bolt and the nozzle.  The round hole on the right is for the nozzle because I wanted to ensure that the join in the tape would not be at or near the nozzle.  Wrap the tape around the nozzle block by putting the nozzle through one hole and the bolt through the other.  Obviously, it works better to start by putting the tape over the long bolt and then wrapping around to the nozzle.  Cut the end of the tape to fit the size of the block.


Wrap the nozzle block with kapton tape to hold the ceramic insulation tape in place.


Reassembly

You will need to follow the tear-down instructions in reverse.  You might follow the pictures.  I did :)

There are a couple of things to watch out for:

1.  Make certain to align your nozzle block on the heat sink so it is straight (not crooked as shown below) and make sure that the screw holes are facing toward the nozzle (so you can screw them in from the bottom when you remount the unit).


2.  Next, and this is really very important, ensure that the bolt on the top of the nozzle block-heat sink unit (shown on the right in this picture) does not protrude above the top of the nut.  If it does, the extruder will not fit over top of it.

This also seems to ensure that the nut in between the nozzle block and the heat sink does not compress the insulation.


3.  Another minor gotcha occurs when remounting the side fan to the fan housing.  The picture below shows where the two sets of black-and-white wires fit in a slot on the left side of the housing (toward the back of the Replicator 2).  The picture shows me pushing the wires into the slot with my index finger.  This will keep the wires from being kinked when the fan is screwed into place.


Getting it Working Again

Don't forget to level your build plate after the reassembly!  This is critical because you may have changed the nozzle height, however slightly.  Look under the Makerbot's Utilities menu for the "Level the build plate" option and follow the on screen directions.

Next, load some filament and try a small test print.  The "Mr. Jaws" model from the Makerbot's SD card printed flawlessly for me the first time.  Yay!

You might ask how long it took me to generate another filament ball.  Less than an hour ;-)  Fortunately, that one didn't tear my newly fixed insulation.




Friday, June 03, 2011

Schema.org and the Semantic Web

The interweb is all atwitter today about schema.org and what it may mean for the Semantic Web. Here is my take.

There has been a long-standing argument between microformats and the Semantic Web. Many developers, and to some degree, search engines have preferred microformats because they are easy to use and to understand. Microformats are widely deployed because of this. However, there is simply no way to combine microformats on a single page. This is the Achilles heel of microformats; sooner or later someone wishes to use more than one (or a few if they play together particularly nicely) at a time and can't do it.

RDF is harder to understand (although an experiment in Germany showed that fifth graders could easily be taught RDF. It is adults who have already learned other ways to think who have trouble.) RDF is a completely general solution to the problems that microformats solve. RDF's raison d'ĂȘtre is to allow for the combination of data from multiple people (e.g. developers and search engines, or multiple relational databases or as an interchange format between proprietary system). RDF can represent any type of data, and combines easily with other RDF.

The argument between microformats and RDF can thus be thought of as an argument between short-term pragmatism and long-term planning. Those who want to solve a specific problem now use microformats. Those who want to solve more general problems in the future use RDF.

The presumption in the Semantic Web community is that the best (perhaps the only) way to combine microformats is either RDF or something very much like it. Further, people have been expressing needs to combine the use of multiple microformats on Web pages for about five years.

Microsoft is aware of the Semantic Web and, in fact, was an early supporter of the RDF standards at the World Wide Web Consortium (W3C). They even paid for some marketing; proof may be found here.

Unfortunately for those interested in open standards, Microsoft decided that Netscape's (remember Netscape??) use of RDF in their portal was threatening, so they decided to reinvent RDF internally as a proprietary technology. Microsoft's internal version of RDF has appeared in their file system, Sharepoint and other products. That's the way this story simply must play out: Use RDF or reinvent it.

Yahoo was the first search engine to support RDFa (RDF in Web pages), followed by Google. Both supported particular vocabularies of RDFa, which is the same as saying 'microformats encoded in RDF' and therefore along the lines of my earlier comments.

The new schema.org announcement is a partnership between "Google, Bing and Yahoo" or "Google, Microsoft and Yahoo" depending where you look. Since Microsoft bought Bing and Yahoo has licensed Bing for its search services, schema.org is really between Google and Microsoft.

So, I read schema.org as an attempt (actually, a further attempt) by Microsoft to reduce the impact of RDF and Semantic Web techniques on the search business specifically and their larger business in general. Time will tell whether that will work. History suggests that it will partially work by changing the places RDF is seen as threatening to big business. Another similar area to watch will be RDF and Linked Data's threat to the Data Warehousing market (a $10 billion market in 2010). That fight will be primarily between standards and Oracle.

Michael Hausenblas at DERI released Schema.org in RDF while I was writing this. Well done to Michael and his colleagues. As Michael said, "We're sorry for the delay". Awesome.

Update 2011-06-15: Google announced at a BOF at SemTech 2011 last week that they will continue to support Rich Snippets (their RDFa implementation). That's helpful and what we should be promoting.

Tuesday, February 08, 2011

Semantic Web Elevator Pitch

Eric Franzon at SemanticWeb.com asked the community to provide "elevator pitches" for the Semantic Web. Here's my attempt:

Thursday, January 06, 2011

Ian was Right and I was Wrong

Following Ian Davis' post A Guide to Publishing Linked Data Without Redirects, I followed with A(nother) Guide to Publishing Linked Data Without Redirects. In that post, I argued that resource descriptions should be separated from resource representations at the HTTP level. I see now that I was wrong.

Ian challenged me to come up with a compelling reason why HTTP should encode the difference between a resource representation and a resource description and, after some effort, I simply could not. Ian summarized his thoughts in a new post: Back to Basics with Linked Data and HTTP.

The problem in my mind has always related to the use of HTTP URIs to identify things in the real world. We can get around that easily enough by returning RDF whenever someone resolves those URIs. You get a description of a real-world thing that is as richly described as the publisher wanted it to be. Cool.

Tuesday, November 09, 2010

A(nother) Guide to Publishing Linked Data Without Redirects

It seems to me that we in the Linked Data community have a need to:

  1. Assign URIs to resources, be they physical, conceptual or virtual (information resources) in nature.

  2. Apply the same mechanisms for metadata description to any resource, regardless of type.

  3. Be able to traverse in obvious ways from a resource to its metadata description and from a metadata description to its resource.

Unfortunately, we can't do all that yet, at least not easily and in all circumstances. We are close, but not close enough.

Linked Data deployment is hampered by the requirement for so-called "slash" URLs to be resolved via a 303 (See Other) redirection. Unfortunately, many people wishing to publish Linked Data don't understand the subtleties of 303 redirection, nor do many of them have adequate control over their Web server configurations to implement 303 redirections. Ian Davis has been looking for a solution to this problem. Unfortunately, I don't think he has found it yet.

Ian published A Guide to Publishing Linked Data Without Redirects specifically to find a way around the confusing (and sometimes difficult) usage of 303 redirects for Linked Data. Ian's original question was: "What breaks on the web if we use status code 200 instead of 303 for our Linked Data?"

Unfortunately, the use of the Content-Location header with Linked Data begs the same questions as 303s:

  1. It requires a change of thinking regarding the meaning of 200 (OK), specifically to the http-range-14 finding.

  2. It suffers from the same problem as 303s in relation to deployment with current hosting companies/IT departments. If you don't have control over your Apache, you can't publish your Linked Data.

  3. There is an "implicit redirect", in that one may wish or need to check the URL in the Content-Location header.

The first one admittedly bothers me most. If one resolves a URL and receives a 200 (OK) response, we are currently guaranteed that both (a) our request succeeded in the way we expected and (b) that the thing we received is an information resource. We expect that the thing we received is an information resource that is a representation of the resource we requested (and identified by its URL address).

In short, I think Ian's proposal mostly but not completely solves the problems that Ian was meaning to address. Unfortunately, there is practically little difference from the status quo. Tom Heath has some of the same concerns.

If we are going to fix fundamental problems with serving Linked Data, I'd prefer to explicitly address the fundamental questions related to URI naming of physical, conceptual and information resources (the overloading of the HTTP name space), so I proposed an alternative solution on the public-lod@w3.org mailing list last week. This post expands on those thoughts with some more detail.

The use of 303 redirections by the Semantic Web and Linked Data community is a bit of a hack on top of the existing 303 functionality laid down in the early Web. The http-range-14 debate tried to end the arguments, but only slowed them down. We can't really hack at the 303 any more than we have. I explored that in 2007 and came up pretty empty.

My Proposal


I propose deprecating the 303 for use in Linked Data (only) in favor of a new HTTP status code. The new status code would state "The URI you just dereferenced identifies a resource that may be informational, physical or conceptual. The information you are being returned in this response contains a metadata description of the resource you dereferenced." This new status code would be used to disambiguate between generic information resources and the special class of information resources that describe (via metadata) an addressed URI.

The "metadata description" would generally be in some form of RDF serialization, but could also be in HTML (for human consumption) or in some future metadata representation format. Existing HTTP content negotiation approaches and Content-Type headers would be sufficient to inform both requester and Web server what they received.

I propose that the new status code be called 210 (Description Found).

Existing HTTP status codes may be found in RFC 2616 Section 10.

Example Requests and Responses


Let's start with the basics. If we resolve a URI to an information resource, we get a 200 (OK) response upon success:

# Get an information resource:
$ curl -I http://example.com/toucan.info
HTTP/1.1 200 OK
Date: Wed, 10 Nov 2010 21:37:44 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/html;charset=UTF-8
Content-Length: 1739



An information resource that supports some (any!) form of embedded RDF can easily point to its metadata description at another URL (e.g. via a link element or a POWDER description). The metadata description can easily point back to the described resource.

Physical and conceptual resources are where we have historically ran into trouble on the Web of Data. A "slash" URI assigned to name a physical or conceptual resource has required a 303 redirection to another document and the semantics are unclear at best. Instead, this proposal suggests that physical and conceptual resources explicitly return a 210 (Description Found) status code, thus removing any ambiguity from the response.

The resolution of a URI to a physical resource might return:

# Get an information resource:
$ curl -I http://example.com/toucan.physical
HTTP/1.1 210 Description Found
Date: Wed, 10 Nov 2010 21:38:52 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/turtle
Content-Length: 1739



The body of the response would naturally be (in this case) an RDF document describing the physical resource. The fact that the resource is physical would be encoded in an RDF statement in the description.

Conceptual resources could be handled in an identical manner. The only difference would be in the requested URI and differing content returned:

# Get an information resource:
$ curl -I http://example.com/toucan.concept
HTTP/1.1 210 Description Found
Date: Wed, 10 Nov 2010 21:40:12 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/turtle
Content-Length: 1214



Again, the fact that the resource is conceptual would be encoded in an RDF statement in the description.

Savvy readers might note that the existing status code 300 (Multiple Choices) could be used when multiple metadata descriptions of a resource are available:

The requested resource corresponds to any one of a set of
representations, each with its own specific location, and
agent- driven negotiation information (section 12) is being
provided so that the user (or user agent) can select a
preferred representation and redirect its request to that
location.


Note that Ian's statement that when using a 303 "only one description can be linked from [a resource's URI]" is not correct; standards-compliant Web servers could use a 300 status code should they so wish (and can figure out a way to configure their Web server to do that).

Ramifications


How does my proposal stack up to Ian's? Ian proposed nine problems with the 303, the most important of which (in my opinion) were:

  • it requires an extra round-trip to the server for every request (at least, that's important to those implementing browsers, spiders and Linked Data clients and to those with limited bandwidth)

  • the user enters one URI into their browser and ends up at a different one, causing confusion when they want to reuse the URI (PURLs also suffer from this due to odd UI decisions by browser makers)

  • having to explain the reasoning behind using 303 redirects to mainstream web developers simply reinforces the perception that the semantic web is baroque and irrelevant to their needs.


Additionally, three of his concerns related to the difficulties of Web server configuration:

  • its non-trivial to configure a web server to issue the correct redirect and only to do so for the things that are not information resources.

  • the server operator has to decide which resources are information resources and which are not without any precise guidance on how to distinguish the two

  • it cannot be implemented using a static web server setup, i.e. one that serves static RDF documents



The 210 status code proposal would effectively deal with Ian's major issues. Metadata describing a resource could be returned in a single GET if the resource were physical or conceptual (that is, not an information resource). It would be reachable for information resources, although requiring two hops if the URL to the metadata is not known. The URI displayed by a browser would not change. Importantly, the 210 is conceptually much easier to explain.

Support For Existing Web Servers


Web servers, even existing ones at hosting centers, can be easily configured to serve 210 content immediately. At least, via a simple hack. The one we use for 3roundstones.com (Arvixe) allows limited site configuration using cpanel. Cpanel allows Apache handlers to be associated with file extensions in URLs. One of the Apache handlers installed by default with Apache is mod_asis.

mod_asis is used to send a file "as is". A file sent that way can contain HTTP headers separated by a blank line. Using that trick, we might associate a URI (say, http://example.com/toucan.physical) with a metadata description of a physical object. The resource file served when that URL is resolved looks like this (inclusive of the 210 status code!):

Status: 210 Description Found
Date: Mon, 10 Nov 2010 15:07:14 GMT
Content-Type: text/turtle
 
<http://example.com/toucan.physical>
a <http://dbpedia.org/resource/Toucan> ;
...



The combination of mod_asis and a file (with a mapped extension) containing custom HTTP headers (including a Status pseudo header) will result in the remainder of the file being served with the designated headers. In this case, that means that we can return 210 status codes from any URL we wish using a stock Web hosting service.

Some might consider the use of file extensions restrictive (or just a PITA), but the Principle of URI Opacity protects us from people like that :)

Other Considerations


It may interest some to note that common Web clients (including human-oriented browsers and command line clients such as curl and wget) do not seem to mind a non-standard 200-series status code. They return the document and the new status code without complaint.

There are some disadvantages to the 210 proposal. Most importantly, this proposal is a change to the very fabric of HTTP and thus the Web. The W3C and IETF would need to standardize the 210 status code, probably in a new IETF RFC. That will take time and effort. Web server operators would have to configure their Web servers to return the correct status code (as described above), at least until Web servers ship with 210 support by default.

Please comment. If we want to build the Semantic Web and the Linked Data community on a designed fabric instead of a series of hacks, the time to start is now. Even now is late, but it is not (yet) impossible.

Wednesday, October 20, 2010

Linking Enterprise Data Book on the Web

The book Linking Enterprise Data is now on the Web in its entirety at http://3roundstones.com/linking-enterprise-data/.

You can buy print or ebook versions at Springer or Amazon:

Thursday, October 07, 2010

Call for Chapters: Linking Government Data

I'm working on a new contributed book to be entitled Linking_Government_Data. Please see the Call for Chapters if you have any interest in contributing.

A primary goal of this book is to highlight both costs and benefits to broader society of the publication of raw data to the Web by government agencies. How might the use of government Linked Data by the Fourth Estate of the public press change societies? How can agencies fulfill their missions with less cost? How must intra-agency culture change to allow public presentation of Linked Data?

Starting at Talis

I am very pleased to have accepted a job offer at Talis. I'll be helping to stand up a new U.S. subsidiary for them and will continue to focus my efforts on the evolving Linked Data market, both in relation to government transparency and its use in commercial markets.

Monday, August 02, 2010

New Book in Pre-Production

Linking Enterprise Data is an edited volume contributed by worldwide leaders in Semantic Web research, standards development and early adopters of Semantic Web standards and techniques. Linking enterprise data is the application of World Wide Web architecture principles to real-world information management issues faced by commercial, not-for-profit and government enterprises.

I edited this book for Springer and the publisher has created a Web site for it as it enters production.

Springer seems to think the book won't be out until 2011, but I'm hoping on November because I'll be speaking at a conference then and would like to see it out.

I have been given the rights to put the entire book's content on the Web and plan to do so as Linked Data sometime shortly.

Leaving Zepheira

I have decided to leave Zepheira and seek employment elsewhere. Uche, Eric, Bernadette and I have worked closely together over the last couple of months to arrange a clean transition for me. With my current projects at or near an end, this seemed like a good time. My last official day as an employee of Zepheira was 31 July.

I wish Zepheira well and believe I am leaving at a time when the company is strong and their future looks bright.

The future for me is a bit less certain at the moment, but I'm speaking with a number of good people. More when a decision has been made, probably in late August around my birthday. In the meantime, I've updated my resume and Linked In profile as I make the rounds.

Feel free to contact me or leave a comment if you know of exciting opportunities.

Thursday, July 01, 2010

Introducing the Callimachus Project

Callimachus is a Semantic Web framework for easily building hyperlinked Web applications. Callimachus allows Web authors to quickly and easily create Semantically-enabled Web applications with a minimal knowledge of Semantic Web principles. James Leigh and I have been working on it for a while. A presentation is also available.

Callimachus version 0.1.1 is now available. This release includes
updated documentation and the first sample applications.
Please see the directions in the file SAMPLE-APPS.txt to understand
the sample applications. More are coming soon!

You can acquire this release either by downloading the ZIP archive
from the downloads area or by checking out the v0.1.1 tag:

svn checkout http://callimachus.googlecode.com/svn/tags/0.1.1/

Either way, follow the directions in README.txt to get started.

Have fun and please report your experiences with Callimachus to the discussion list!

Thursday, October 22, 2009

Chinese Units of Measure


Chinese Units of Measure
Originally uploaded by prototypo
I found this antique Chinese ruler in Seoul, ROK, last week. It uses the old Chinese units of length measure, the fēn, cĂčn and chǐ units.

The tiny fēn is about 3 mm. The cĂčn is traditionally the width of a person's thumb at the knuckle. The chǐ (or Chinese 'foot') is derived from the length of a human forearm, like a cubit. Or so says Wikipedia.

Those were hard-working people, to have thumbs as wide as a cĂčn.

The ruler is wooden, with brass inlays marking the units.

Friday, September 04, 2009

Why I will Never Own a Kindle

Fears of Internet security experts everywhere were realized today when Amazon revealed, apparently by accident, that it keeps copies of annotations made on the Kindle ebook reader.

This article at Reuters reported on damage control attempts at Amazon after it (in a delicious piece of irony) deleted copies of George Orwell's 1984 from its Kindles in July. The provider of the ebook version of 1984 apparently did not own the appropriate publication rights. Readers were naturally upset at the sudden disappearance of content from their readers, although of course they forget to read the fine print, didn't they? You can't buy an ebook, you can only rent. Amazon was technically within their rights to delete the content.

That's hardly the full story, though. Amazon was sued by a high school student for having also removed his "copious notes" regarding the deleted novel. The Reuters story linked above showed Amazon's hand when they reported:
Amazon's email on Thursday said that the company would replace
the deleted books along with any annotations made by customers.

That's right, Kindle fans. Amazon has admitted publicly that they, like Orwell's Big Brother, keep copies of any annotations that Kindle users make on the devices. For at least months. Holy cow!

The full text of Amazon's email to affected customers is available at the WSJ.

Perhaps more amazing is that Kindle readers don't particularly seem to care (cf. comments to the WSJ blog post). Kindle notes are synced to an Amazon server and thus available to readers over the Web. That may seem like a feature to some, but not to me. I'll back up my own notes, thanks.

Friday, August 21, 2009

Just Published: 97 Things Every Project Manager Should Know

O'Reilly Media has published 97 Things Every Project Manager Should Know. My colleagues Kathy MacDougall and James Leigh also wrote for this book.

This is a new style of "collective wisdom" books from O'Reilly. An earlier one was aimed at software architects.

I was pleased to see that O'Reilly used one of my quotes at the top of their home page for the book ("Clever Code Is Hard to Maintain...and Maintenance Is Everything").

The tips I wrote for this book were:
  • Clever Code Is Hard To Maintain
  • The 60/60 Rule
  • The Fallacy Of Perfect Execution
  • The Fallacy Of Perfect Knowledge
  • The Fallacy Of The Big Round Ball
  • The Web Points The Way, For Now
Check it out if you do project management. There's some good stuff in there.

Monday, August 17, 2009

The Death of the Copenhagen Interpretation?

Wow. A climate researcher in the UK has had the guts to propose a new geometry for space-time that provides a new way of answering pesky questions in quantum mechanics. This article in Physorg (see also the article's full text) provides a good overview.

I don't know if I still have the math to slog through it, but it looks to be worth the effort.

Called the Invariant Set Postulate, the proposed law offers a geometry of space-time that resolves long-standing difficulties in quantum mechanics, including complementarity, quantum coherence, superposition and wave-particle duality. Quantum description of gravity may even be possible. Wow. That is an amazingly out-of-the-box contribution.

For the faint of heart, here is a key quote: "The Invariant Set Postulate appears to reconcile Einstein’s view that quantum mechanics is incomplete, with the Copenhagen interpretation that the observer plays a vital role in defining the very concept of reality."

Monday, June 15, 2009

OK, OK, I'm back on Twitter

I'll be at the 2009 Semantic Technology Conference this week and will be twittering on @prototypo.

Friday, June 12, 2009

Freemix is Live!

Freemix is live in invitational Beta. Come check it out!

We at Zepheira will officially introduce it to the SemTech crowd next week and to the press on Tuesday.

You can see my profile on http://freemix.it/profiles/dwood/.

Whew!

Monday, June 01, 2009

Musicians and Coders

Bernadette recently gave me a CD (yeah, really, not an iTunes gift card! She's very quaint.) by Jeremy Pelt, a fantastic jazz trumpeter. In the inside cover of November, he says, "Our greatest responsibility as musicians is to live and grow... then, you might play something hip!"

Indeed.

Friday, May 29, 2009

Announcing Freemix

Zepheira has announced the forthcoming launch of Freemix, a social networking site for data and the people who use it. Freemix will be officially launched at the Semantic Technologies Conference in San Jose, California on June 16, 2009.

Zepheira partners Eric Miller, Uche Ogbuji and myself will brief representatives of the press at 12:00 US Pacific Time in the Fairmont Hotel in San Jose. Zepheira will demonstrate Freemix in a booth on the SemTech exhibit floor.

SemTech conference attendees may also attend a briefing on Freemix on Wednesday, 17 June 2009 from 5:00-6:00 PM US PST.

If you are a spreadsheet user and want to share your data more widely, Freemix is for you. Wouldn't it be nice if your data had friends, too?

Thursday, May 21, 2009

Speaking at SemTech

I will be speaking at the Semantic Technology Conference in San Jose again this year from June 14-18, 2009.

Dan McCreary and I will be giving a three-hour tutorial on entity extraction on the Monday. I'll be presenting a talk on Active PURLs: Stored Procedures for the Semantic Web on the Tuesday. Additionally, it seems likely that I will replace Uche on a panel dubiously entitled Web3-4-Web2, also on the Tuesday.

Speakers have been authorized to share coupons for up to $200 off registration fees. If you would like to get the coupon code, please contact me or leave a comment here by May 29, 2009.

Zepheira is a gold sponsor again this year and we will have a very cool announcement. We are going to officially launch Freemix at the conference. The site is still under authentication, but will be released to the public just before the conference. It should be exciting. If you care are putting real, live, useful, everyday data on the Semantic Web, come see it.





Saturday, May 16, 2009

Playing with Wolfram Alpha

Wolfram Alpha has been launched and is available for the public to try. I sat down to play with it.

FIrstly (using the rare American adverb here - don't be confused), you can't expect Wolfram Alpha to act like Google. It is a new kind of search engine, as one should expect from Stephen Wolfram. Wolfram is famously the inventor of Mathematica and author of A New Kind of Science.

Wolfram Alpha seems to consist of a linguistic interpretation engine coupled to Mathematica and a growing number of databases. Google, on the other hand, is a free-text indexer of Web content. That suggests that while one might be able to type just about any word or phrase into Google that is somewhere on the Web, one must limit Wolfram Alpha queries to concepts that are in its databases or may be treated as mathematical relationships. Indeed, this seems to be the case.

Wolfram's overview video is well worth watching. It, and the example search results available from the home page, give a flavor for the powerful searches one can do with the site.

Following a lead from the video, I tried typing the female name "Bernadette" into the search box. Wolfram Alpha, as advertised, did indeed respond with a presumption that I wanted information about the name and results that included a time distribution plot of popularity. Searching for "Bernadette David" gave me a distribution plot of both names which showed the highest combined popularity did in fact occur around our birth years. Well done, Wolfram Alpha.

Changing the previous search to "Bernadette Peters" resulted in some minor information about the actress and a link to her Wikipedia entry. Wikipedia links are provided where possible, as a transparent but useful attempt to provide flesh to limited source content.

However, more general searches, such as the word "Zepheira", produced no results. Wolfram Alpha responds to null result sets with a message saying "Wolfram|Alpha isn't sure what to do with your input.". That alone makes it clear that Wolfram Alpha and Google are at best complimentary.

Too many users on the site result in a cute message saying "I'm sorry Dave, I'm afraid I can't do that..." - which is only mildly freaky if your name happens to be Dave. The reference naturally comes from the mutiny of the HAL 9000 computer in the film "2001: A Space Odyssey".

Math, science, engineering and finance queries work well, as expected. A Web interface to Mathematica is useful in itself. I suspect that the site will be most effectively used by college students and some working professionals. My mom and dad are unlikely to find it compelling (although my dad is a weather geek and weather data is well represented, so I might be wrong). Still, the lack of detailed weather results such as live RADAR images would more likely lead him to weather.com.

One can do funky and useless math with aplomb. Wolfram Alpha rapidly provided me with the correct interpretation, unit dimensions and unit conversions for the search "100 furlongs per microfortnight", a speed well above that of sound but under that of light.

Minor misspellings were handled effectively (e.g. "area of icosehedron" was correctly interpreted as "area of icosahedron"). Similarly, "volume of icosahedron" resulted in a correct interpretation. I expected the search "distance to a star" to fail miserably, but the answer was surprisingly useful. Try it yourself to see what I mean.

The problem with this kind of interface is that interpretations of intent are notoriously hard, if not impossible, in the general case. How can Wolfram Alpha expect to know that when I typed "birth year of gandhi" that I meant Mahatma Gandhi? What if I meant Indira Gandhi? Guessing is fine as far as it goes, but most search engines chose to give up that approach a decade ago in favor of appendation of search results.

The interface style is also naturally limited by its underlying data. Searching for "the size of the World Wide Web" resulted in a suggested to try "the size of the world wide" - which it could answer as the diameter of Earth.

I wonder how many people recall that Yahoo used to allow mathematical equations in their search engine? They seem to have removed the functionality. One can only presume that they got in the way of becoming a more general Internet search engine. I suspect there is a lesson there for Wolfram Research. Will Wolfram Alpha stay aimed at specialists or will they grow into a more general tool? Time will tell. Their promise to integrate more databases does not promise to address the inherent limitations of guessing linguistic intent.

In summary, Wolfram Alpha is an expert-friendly search system for specialists and is best used as an orthogonal complement to Google and other general search engines. Its approach is pure Wolfram - unashamedly different and unapologetically ignoring of lessons learned by others.

Sunday, April 26, 2009

Finally, Some Clues on the Domestication of Rice

The subtitle to this blog promises posts about "the origins of agriculture", although the field is so slow moving that I have not posted on the topic in three years and have not reported meaningful research here since speculating on religion as a driver.

Fortunately, others are doing active research on agricultural origins even if I am not. Dr. Dorian Fuller of the Institute of Archaeology at University College London has cracked a very special nut, indeed. He and his team have located substantial evidence of the location and timing of rice domestication in the Lower Yangtze region of Zhejiang, China.

Dr. Fuller and his colleagues discovered a location where the local diet shifted dramatically from a hunter-gatherer lifestyle to an agricultural one over a mere three hundred years. That alone is fascinating and an important discovery. Equally interesting was the dating of the shift, from 6900 to 6600 years ago. That places rice domestication in a timeframe fully two thousand years later than thought and lends serious support to diffusion theories (versus parallel development).

The process used by Fuller collected mixtures of midden material from the site, and painstakingly separated wild rice remains from domesticated rice remains. Specifically, they looked at spikelets, the place where rice seeds attach to stalks. Like other domesticated plants, rice underwent a genetic shift to retain the seeds for harvest by humans by a process of artificial selection. The shape of the spikelets is sufficiently different as to be distinguishable.

There is a nice scanning electron microscope image of a wild rice spikelet base at the Agricultural Biodiversity Weblog.

The last I heard, Londo's investigation1 was still suggesting multiple independent origins of rice in Southeast Asia and lower China. Hopefully Fuller's paper2 will put that to rest. Londo at least admitted that his team wasn't certain.

Wikipedia's entry on rice says, "Rice has been cultivated in Asia likely over 10,000 years." It is clearly time to correct that entry and, more broadly, correct the education of literally billions of people who are taught it. I really need to get back to work on my Origins of Agriculture summary and update it with these findings.

References:
[1] Londo, J.P., Chiang, Y-C, Hung, K-H, Chiang, T-U and Schaal, B.A. (2006). "Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa". PNAS, http://www.pnas.org/content/103/25/9578.long

[2] Fuller, D.Q., Qin, L., Zheng, Y., Zhao, Z., Chen, X., Hosoya, L.A. and Sun, G-P (2009, March 20). The Domestication Process and Domestication Rate in Rice: Spikelet Bases from the Lower Yangtze, Science 20 March 2009, 323/5921, pp. 1607-1610, http://www.sciencemag.org/cgi/content/abstract/323/5921/1607

Saturday, April 25, 2009

Back to Basics

Torture is wrong, regardless of efficacy, regardless of culture, regardless of how it is justified, regardless of how the enemy is dehumanized, regardless of whether someone - anyone - thinks it is useful. Period.

Even my eight-year-old can figure this one out, all by herself and with no hints.

Tuesday, April 07, 2009

OCLC PURL Server Migrates to PURLZ

The Online Computer Library Center (OCLC) migrated http://purl.org to the PURLZ software this morning at 6 AM US EST (GMT -5).

I'll admit to some frustration that the legacy PURLs were not tested completely and some errors remain. We are working with them to iron out the relatively few remaining problems with the legacy data migration. Most of the legacy data seems to be working as expected.

Update: Nope. They rolled back again. Sigh. Maybe next time they will do it right.

Sunday, April 05, 2009

Unfortunate Names for 2009

I have to add Babelease Limited in the UK to my unofficial list of unfortunate names. Babel-ease. Yeah, that makes sense. I completely mis-parsed the syllables the first time...

Monday, March 23, 2009

With Apologies to Emerson

Rich are the Web-gods: who gives gifts but they?
They grope the Web for PURLs, but more than PURLs:
They pluck Force thence and give it to the wise.

Thursday, March 05, 2009

O'Reilly Media Joins the Semantic Web

O'Reilly Media (http://oreilly.com/), the current name for the geek publishing giant founded by Tim O'Reilly, has finally joined the Semantic Web.  O'Reilly's coining of the term "Web 2.0" and early misunderstandings of the Semantic Web stack lead some to think that he didn't see much value in machine readable information.  That seems to have changed, at least in within O'Reilly Labs.

O'Reilly Labs launched a Beta product last month called the O'Reilly Product Metadata Interface (OPMI), which is available at http://labs.oreilly.com/opmi.html.  The OPMI is a technical platform for the exchange of metadata between publishing trading partners.  Now that it is in RDF and publicly accessible, the rest of us can play with it, too.

It is easy to retrieve RDF/XML describing any book that O'Reilly publishes. You simply perform an HTTP GET on a URL constructed with the book's International Standard Book Number (ISBN). Every edition of a published book has an ISBN and they come in two flavors, the older 10-digit variety and the newer 13-digit version. All ISBNs issued after 1 January 2007 have been 13 digits. Some books are assigned both forms by their publishers for convenience during the transition.

For example, let's get the metadata description of an O'Reilly book I wrote, Programming Internet Email. The 13-digit ISBN for the second edition of the paperback is 9781565924796, and the 10-digit equivalent is 1-56592-479-7. The OPMI nicely works with either one, but the returned RDF uses the modern 13-digit one as canonical, as it should.

The URL for any O'Reilly book is http://opmi.labs.oreilly.com/product/ followed by its ISBN, in this case 9781565924796. The full URL is thus http://opmi.labs.oreilly.com/product/9781565924796.

An HTTP GET may be done with any Web browser, of course, or on a command line by use of the curl utility:

$ curl http://opmi.labs.oreilly.com/product/9781565924796


The returned RDF includes a wealth of information about the book. The OPMI uses four vocabulary descriptions in its RDF: Dublin Core for describing books (title, subject, language, etc), Friend-of-a-Friend (FOAF) for describing people associated with those books, the library community's MARC (MAchine Readable Cataloging) relator codes for relating people and books and the Metadata Object Description Schema (MODS) for specifying the edition of a book. MARC and MODS come from the Library of Congress and are traditionally used in library cataloging systems.

Since this metadata is on the Web, we can use standard Semantic Web query tools to query it. Using SPARQLer, a SPARQL query language processor available freely on the Web, we can query the RDF to extract bits we want. A bit of playing around makes it easy to get the author's name and the unique URI assigned to the author by O'Reilly:

prefix dc:
prefix foaf:
prefix rdf:
SELECT ?work ?authorURI ?author
FROM
WHERE {
?work dc:creator ?authorType .
?authorType rdf:_1 ?authorURI .
?authorURI foaf:name ?author
}


The results look like this:
work authorURI author
<urn:x-domain:oreilly.com: product:9781565924796.IP> <urn:x-domain:oreilly.com: agent:pdb:2495> "David Wood" @en
<urn:x-domain:oreilly.com: product:9781565924796.BOOK> <urn:x-domain:oreilly.com: agent:pdb:2495> "David Wood" @en


There are two results because the first (.IP) is the overall URI for the work in all of its possible formats. The second (.BOOK) is the book edition of the work. If this book had been published on Safari, O'Reilly's electronic publishing forum, it would also have a URL ending in ".SAF". E-books get an ".EBOOK" and Apple iPhone applications get a ".APP".

O'Reilly claims published metadata for over 1100 books, which is a pretty reasonable addition to the Semantic Web, even in Beta. Naturally, I now want O'Reilly to publish machine-readable metadata on their human-readable Web pages using RDFa. There has been no sign of that yet, though.

This content was cross-posted to Semantic Universe.

Monday, March 02, 2009

PURL Legacy Loader Now Open Source

A legacy loader is available to take old OCLC version 1 Persistent URL (PURL) database dumps and upload PURLs into the new project’s RESTful API. This is not production code, but is provided in the hope that it may be useful to operators of old PURL servers wishing to migrate to a more modern PURL server. The legacy loader has been released under an Apache 2.0 license.

To get the legacy loader, use Subversion to check it out like this:

svn co http://purlz.zepheira.com/svn/purlz/purlsbulkloader

Check out the code and follow the directions in the file README.txt.

This information is also available at the PURL Project's Download Area.

Persistent URL (PURL) Server version 1.4 Released

The PURLZ Persistent URL Server version 1.4 is now available. See the PURLZ Downloads area to get your copy now. This release improves handling of URLs with query strings and special characters. It is recommended for immediate use by all PURL server operators.

PURLs are Web addresses or Uniform Resource Locators (URLs) that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons. Details are available on the PURLZ community site.

Please see also the README and Release Notes for version 1.4.

Saturday, February 14, 2009

Fun with Blimps

Aidan, Mikayla and I had a blast today by attaching a digital camera to a helium blimp and flying it around our neighborhood :)

Friday, February 13, 2009

No Darwin in the South

I know I live well South of the Mason-Dixon Line. The slower pace of life here, the older attitudes and the more formal politeness is often pleasant. Sure, there are prejudices and many of the public schools aren't very good (others are, naturally). There is a lot of societal stress due to Northern migration. Virginia was even a blue state in the last election. All in all, many people from many places live in Virginia and call it home.

That's why I was shocked that my kids' school didn't even mention the 200th birthday of Charles Darwin yesterday. Neither my fifth or second grader knew who he was, or why he was famous. They know now, though. We talked about the The Voyage of the Beagle, On the Origin of Species and The Decent of Man all through dinner. Tomorrow, I plan to describe his work on worms. Kids love that sort of thing, even more than discussions of sexual selection vs. natural selection.

Shame on Fredericksburg Academy! They call themselves a college prep school? They don't even teach sex education until seventh grade! By that time, the kids have had the chance to figure it out for themselves, often in inappropriate ways. I had "the talk" with my fifth grader earlier this year. He is better for it, too. I'm honestly looking forward to talking to the head of the Lower School when she gets the rant I just sent her on Monday morning.

Monday, February 09, 2009

Desperately Seeking SKOS Vendors

A Fortune 500 customer of Zepheira's has a problem that could readily be solved with SKOS. You might think that would be sufficient to attract the attention of some tools vendors, especially since SKOS is in "last call" at the W3C and is likely to become a standard later this year. If that is so, I've missed it.

Can anyone tell me where to get decent tool support for SKOS?

Mulgara has some cool support for SKOS, as I mentioned here. Unfortunately, the state of that support still requires some care and feeding by an expert.

I approached Revelytix, hoping that they would agree to provide SKOS support in Knoodl, but they demurred until at least later this year. It should be easy for them given their existing support for OWL and their use of Mulgara.

Another alternative may be ThManager, an Open Source SKOS editor/visualizer.

Until tools vendors support SKOS directly, we are limited to existing taxonomy creation and maintenance tools, such as BiblioTech or Synaptica to build ANSI/NISO standard thesauri (Z39.19) then convert them to SKOS. For the moment, though, conversion tools seem to be in the same boat as editors.

SKOS in Mulgara's RLog

I have long been impressed by Paul's technical prowess. His recent implementation of SKOS definitions in Mulgara's RLog has done it again.

RLog is a logic programming language like Prolog that Paul created. RLog natively understands URIs and RDF's notions of subject-predicate-object relations. RLog's implementation of SKOS requires a mere 7 rules (!) once the 95 axioms are laid down. Naturally, those axioms and rules include huge chunks of RDFS and OWL.

RLog makes it easy (if you are a logic programmer) to make rules files for Mulgara's Krule rule engine. Support for RDFS has been provided in Krule for some time.

Paul has been talking about integrating RLog into Mulgara for over two years. I hope he can make that happen during 2009. Scalable or not, it is insanely cool. Until an integration happens, RLog must be run as a separate tool, as does Krule.

Friday, February 06, 2009

Ph.D. Thesis Published

My Ph.D. thesis, entitled Metadata Foundations for the Life Cycle Management of Software Systems has been published on UQ eSpace, The University of Queensland's institutional digital repository. Get your copies now while they're hot :)

Interestingly, at least to me, is that UQ eSpace is built on Fedora Commons, and therefore uses Mulgara. Sweet!

Tuesday, February 03, 2009

IET Software Journal Article Finally Published

The British journal IET Software finally published an article I wrote nearly three years ago. It was apparently published last August but I just recently found out.

The article is Towards a software maintenance methodology using Semantic Web techniques and paradigmatic documentation modelling.

The citation is:

Hyland-Wood, D., Carrington, D. and Kaplan, S. (2008, August). Towards a software maintenance methodology using Semantic Web techniques and paradigmatic documentation modelling, IET Software, 2/4, pp. 337-347

Wednesday, January 21, 2009

What is an Oracle-Mulgara Instance?

Paul pointed out a US government contract solicitation involving Mulgara. It mentions something intriguingly called an "Oracle-Mulgara instance". I am intensely curious what that is!

Persistent URL (PURL) Server version 1.3 Released

The PURLZ Persistent URL Server version 1.3 is now available. See the PURLZ Downloads area to get your copy now. This release contains substantial improvements for speed of indexing, stability and numerous bug fixes. It is recommended for immediate use by all PURL server operators.

PURLs are Web addresses or Uniform Resource Locators (URLs) that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons. Details are available on the PURLZ community site.

Please see also the README and Release Notes for version 1.3.

Monday, January 19, 2009

Meanwhile, Back in the Real World...

Aidan is hooked on the Mac OS X port of Nethack. Ya gotta laugh.

The Content of their Characters

Today is Martin Luther King, Jr. Day in the United States and rightfully so. We watched his "I have a dream" speech in its entirety at lunch today and I realized, in explaining his legacy to my children, how many modern-day prophets have paid the ultimate price. King, his mentor of non-violence Mohandas Gandhi and Abraham Lincoln, the three men arrayed in spirit at King's speech, were all removed from this Earth by assassins' bullets. All of them were killed for having the courage to say to small minds that people should be free.

Raised on the ideals of the American union, I was a child of King in a literal sense. King spoke at the Lincoln Memorial on the day that I was born. I grew up in prejudiced times but in hearing the conversation that he started, learned to tolerate, then to embrace, cultural differences. There are no racial differences, of course, and have not been since Neanderthals walked Europe alongside Homo Sapiens Sapiens. Such minor differences as skin color are trivial and recent evolutionary adaptations to environmental conditions that we have long worked around with forms of transportation. Culture, not race, is all that separates us.

Culture is fungible. We can change it. We have the ability if we only have the will. Do we want to live together on this increasingly tiny planet, or do we wish to let our subtle differences rip us apart? The time has come to choose. We have to work together to address the problems of our time. Climate change, energy production, medical ethics, poverty and war won't go away unless we will them to. The only way to address any of them is to live together, in peace if not always in harmony. THE challenge of our time is thus laid bare.

Tomorrow Barack Obama will become the 44th president of the United States. I am pleased that so many feel a sense of pride and accomplishment in the victory of his genes, but hope that they will remember that his victory is not about his genes, his past, or his parents. It is about our future. I, for one, support him not because he is African American, but because I believe him to be the best man for the very difficult job. I attempted to judge him, simply, not on the color of his skin, but on the content of his character.

Obama is following a dangerous path. He will need to ignore his own rock star status, to avoid offers from young women, to avoid the corrupting influences of Washington. He will need to avoid assassins' bullets. If he lives, if he stays sane, if he can just do what he has set out to do, he just might become truly great. I hope he can. I hope we can follow.

Wednesday, November 19, 2008

Mulgara on the Cloud

Chris Wilper reportedly put a Mulgara instance on Amazon EC2 and loaded a quarter-billion triples into it, just for fun. The data he used was generated from Paul Gearon's numbers RDF generator. The generator creates facts about numbers and keeps creating RDF until you stop it.

Chris was using the new XA version 1.1 storage layer for Mulgara. The quarter-billion triples loaded in about a day, which is about the same loading performance that Paul has seen on his laptop. The size of the indexes were about 4.6 gigs compressed and 51 gigs uncompressed. Paul notes that the XA version 2 storage layer will store strings and URIs much more efficiently and thus reduce the index sizes considerably.

Way to go, guys! We can't wait for XA2!

Tuesday, November 18, 2008

New PURL Code in the Wild

Several PURL installations have been seen on the 'net using the new code base:
  1. NeuroCommons
  2. The National Center for Biomedical Ontology (NCBO)
  3. Semantic Report
  4. Zepheira (naturally)
Others include at least one startup that doesn't have a Web site yet (GRACE Research Corporation) and a couple of others still in stealth. Not bad for a project that never had an official launch. Keep 'em coming, folks. I have some hope that OCLC will join that list soon, as well as a couple more startup companies.

PURL Server v1.2 Released

I released the new Persistent URL (PURL) server, version 1.2 over the weekend. This release fixes a major bug in version 1.1 whereby PURLs would not resolve for those not having a cookie set. Minor upgrades include better error reporting and URL handling.

All users of earlier PURL servers are encouraged to upgrade immediately.

Binary (JAR) and source code downloads are available from purlz.org.

Saturday, November 08, 2008

The Perfect 11-year-old Boy Birthday Party

4:00 PM: Archery (supervised, of course)

4:45 PM: A few minutes free time on a trampoline

5:00 PM: Fencing lesson (even more stringently supervised, with cardboard boxes as opponents)

6:15 PM: Make your own pizza for dinner

7:15 PM: Communal game of Crossfire on a map created by the birthday boy himself

8:30 PM: Cookies

8:35 PM: Pass-the-parcel (an Aussie game for our Aussie son)

8:40 PM: Open presents

8:50 PM: Movie night (The Princess Bride)

10:30 PM: Sleep over

8:00 AM: Breakfast of chocolate-chip pancakes

9:00 AM: Pickup by parents

Wednesday, November 05, 2008

My Last Degree

Tonight, I passed the oral defense for my Ph.D., having submitted my thesis in July. It's all administration from here :)

Thursday, October 02, 2008

Dulles TSA Lets Leatherman Squirt Slip By

Knives on a plane? Believe it.

I flew from Dulles International Airport on Monday and forgot to remove my Leatherman Squirt from my keyring. I put my keys in a bin with my shoes when passing through security and the TSA simply didn't notice. Neither did I until I was on the plane :/

Tuesday, September 16, 2008

Aidan Beats Sarah

Wow. My son, 10, today gave his sister a succinct and cogent description of the Bush Doctrine on the way home from school today. He covered all the basics, including the ethics of preemptive strikes, the use of sanctions and inspections and the limitations on presidential power. Take that, Sarah Palin :)

Truck Fire on I-81 Today


truckOnFireI81
Originally uploaded by prototypo
I traveled from Fredericksburg to Lexington, Virginia today in the beautiful Shenandoah Valley to kick off a seminar series that I organized at VMI. Unfortunately, an accident involving a semi rig completely shut down a section of I-81 this morning about 8 miles South of Staunton for many hours and I arrived late. I was luckily still able to give my talk.

I haven't seen any other reporting on this accident other than this blog post, which is surprising considering the amount of time the entire Southbound lanes were closed. Traffic was still crawling through a single lane a couple of hours ago.

Friday, September 12, 2008

The Canary in Ohio

I checked in with my favorite indicators of political opinion in the swing state of Ohio today; my parents. Their political position in support of McCain was not surprising, but the details are interesting.

My mom, 81, favors abortion rights, but loves Sarah Palin. She stopped listening to the Youngstown radio station because she "couldn't believe how hard they were being on her." She suspected that the reason the media was being hard on Palin was "because she is a woman." She was amazed to hear about a poll that suggested that the women of Ohio were supportive of Obama, because everyone she discusses politics with supports McCain.

Palin's lack of understanding of the Bush Doctrine did not bother her at all: "She'll learn." When asked about Palin's religious views, especially her assertion that the war in Iraq is "God's Plan", she dismissed it as propaganda from the Obama campaign. I pointed her to the video. We'll see what she says after viewing it.

My mom also didn't care that Palin linked Iraq to 9/11. My mom does, too. She didn't fully believe me when I explained the Big Lie and that even George Bush has publicly repudiated the notion.

My dad, 82, expressed disdain for all four candidates and the electoral process in general. He was not rocked by the allegations that McCain may have used his Senate influence to tamper with a DEA investigation, saying that "they all do that sort of thing". He went on to say that he has been going to the library to rent old movies and other videos until two weeks before the election. At that point, he will try to listen to the position of both candidates and make a decision.

Thursday, September 11, 2008

China as an Island

This makes sense!

John Mauldin, an analyst at Investors Insight, came up with an awesomely useful idea: Instead of thinking of China as a huge land power, imagine them as an isolated island. Geographically, politically and demographically it makes a lot of sense.

John's idea is just part of a larger analysis of China to be found in this article, although the site was not responding today when I checked it.

The graphic is worth repeating. It is powerfully concise:



I found both the graphic and the link to the original article on Strange Maps, a blog worth watching.

Thursday, August 21, 2008

Running for President? No, Thanks.

I had to laugh at this interesting thought from my dad:

Dear Dave:

I finally decided that I have only one interest in the upcoming election and it involves your birthday.

Since you and Obama are of the same generation and about the same age, it occured to me that your life experiences are much broader than his.

You have served in the military, have operated a business under difficult circumstances, and have never had to shape your beliefs for political purposes as all politicians do.

Just wanted you to know I am glad you are not running for president.

Love,

Dad

Thursday, August 14, 2008

Bernadette's Mom Has Passed

Bernadette's mom died this morning at 2:30 AM US EDT. She passed peacefully in her sleep at 91 years old.

The family requests that mourners consider sending a contribution to the Rhode Island Hospice or your local Hospice. Contributions are tax deductible.

Wednesday, August 06, 2008

NetKernel Architects Conference

This year's NetKernel Architects Conference is being held 25-27 September 2008 at the University of Mary Washington, right here in Fredericksburg, VA. Brian and I are helping Peter and Randy from 1060 Research to organize it.

1060 will be introducing NetKernel version 4.0 Pre-Release. Each attendee will get a copy on the first day. Brian and I had the chance to see and play with it last month in London. All I can say is, wow! Fans of resource-oriented computing are about to get a very powerful tool to add to their toolbox. The Architects Conference is a great way to learn the concepts and get a jump on the competition.

The conference schedule and price list are on the Web.

Y'all come down now, y'hear?

Tuesday, August 05, 2008

New Linked Data Blogroll

Talis has set up a blog aggregator for discussions of Linking Open Data at http://planet.linkeddata.org/. Good show, Talis.

Gartner Gets FLOSS; Late as Usual

A new report by Gartner on Free/Libre/Open Source Software concludes, " ... if you do not think you use it, then you use it; and if you think you do use it, then you use lots more of it than you know." No kidding. They rightly call out the fact that many common big-enterprise vendors use FLOSS components, but fail to mention FLOSS use by the dominant players like Microsoft.

The full report is accessible only by those with money, but KMWorld reported on it, as did ZDNet. Both reporters focused on the use of FLOSS by cloud computing centers, but the report is much more interesting than that. The most interesting statement to me was the positioning of SaaS for enterprises as a primary cost-cutting measure. SaaS is just another form of outsourcing, but this time to a much more level playing field.

Gartner is a funny organization. I haven't quite forgiven them for reporting in 1997 (!) that the Web was not likely to be important to businesses. They even gave that ridiculous statement a 70% confidence rating, IIRC. This time, they seem to be on track, but reporting a bit late. Why wait until FLOSS use is so ubiquitous in mainstream software before telling enterprise CIOs about it?

Monday, August 04, 2008

Digging out from Under

Whew! I've submitted my Ph.D. thesis (finally) and am now waiting upon the reviewers. It should be a couple of months and then an oral defense. I may be able to post more regularly once I find my inbox so I can start clearing it.

Tuesday, July 08, 2008

The World's First Metadata


Girginakku
Originally uploaded by prototypo
I spent a good bit of 29 June at the British Museum in London. Those Brits really know how to loot! Attached is a picture of Mesopotamian girginakku from 5300 years ago. Girginakku were clay tags that were hung off of tablets or rolls to provide clues to their contents. They were originally used as tokens much like an IOU, but evolved into the world's first metadata.

The keepers of the markers ("rab girginakku") were the first metadata wranglers; the intellectual forebears of SemWebbers. Cool, eh?

On the right in the photo are bullae, clay envelopes containing girginakku, presumably to keep the girginakku from being tampered with before they were needed (sort of like a modern lead seal). Unfortunately, neither the girginakku nor the bullae were described in this detail or given their proper names on the descriptive cards near the artifacts. I only recognized them for what they were because I have recently been researching the history of metadata.

Saturday, June 21, 2008

Making Glass


Making Glass
Originally uploaded by prototypo
Bernadette and I had the fun of learning something of the art of glass making last week. The resident glass blower at Oglebay Park, WV showed us the basics and let us make some paperweights. Great fun!

Monday, April 21, 2008

Semantic Technology Conference 2008

I will be speaking at SemTech again this year. Everyone from Zepheira will be in attendence and most of us will be speaking. Brian and I will be giving a 3-hour tutorial on Semantic Resource Oriented Architectures and Eric and I will be talking about the new PURL capabilities.


Wednesday, April 09, 2008

SWAMM


SWAMM
Originally uploaded by prototypo
The Software Agent Maintenance Model (SWAMM) is a NetLogo model of a software maintenance methodology I have been working on. A runnable Java applet of the model is available at: http://www.itee.uq.edu.au/~dwood/models/SWAMMApplet/SWAMM.html

Monday, March 31, 2008

Cherry Blossoms 2008

Another view of the cherry blossoms surrounding the Tidal Basin in Washington, DC on Sunday, 30 March 2008. The Combat Air Patrol contrails are visible over the Washington Monument.

Cherry Blossoms 2008


CherryBlossoms2008_Detail
Originally uploaded by prototypo
Bernadette and I had the great fortune to view the cherry blossoms surrounding the Tidal Basin in Washington, DC at sunrise on Sunday, 30 March 2008. They were at their peak and the light was just perfect. It rained today, so we were especially lucky not to have put it off.

Appreciation of beauty seems so antithetical from the normal activities of Washington, DC, but perhaps that should make us try harder. The contrails of Combat Air Patrol jets painted the sky behind the blossoms.

Sunday, February 24, 2008

Semantic Conference Scheduling Application

This year's Semantic Technology Conference has a nifty new semantic conference scheduling tool, courtesy of fellow Zepheirans Uche and Eric. Nicely done, guys!

The scheduler allows you to graphically fill out a calendar of sessions that you want to attend. Conflicts are immediately visible. The faceted navigation on the left allows you to find sessions based on tagged data (speaker, company, topic, day, etc). When you are happy, you can import your schedule right into your ical-compliant calendar. I wish all conferences had this.

Monday, February 18, 2008

Beyond Redirection: Rich and Active PURLs

It has been a while since I posted about the new Persistent URL work being done by Zepheira and OCLC. That is partially because the work has gone much more slowly than we had planned. We have been actively gold plating the new PURL server because we are trying to satisfy several communities. The result, though, is laying the foundations for new services for the Web.

The key to the new PURL service is the typing of PURLs. The existing public PURL service at OCLC returns an HTTP 302 (Found) response, causing a Web client to redirect to another URL. The new PURL server allows PURLs to return one of several status codes (301, 302, 303, 310, 404 and 410).

We can go even further, though. The original PURL server had some internal concepts like "cloning" a PURL (basing a new PURL definition on an existing one) and "chaining" a PURL (allowing multi-person management of a URL resolution process). Combining these concepts with the choice of HTTP response codes got us thinking about arbitrary types of PURLs.

We have been experimenting with types of PURLs that combine with other services. A "Rich" PURL, for example, is the combination of a PURL and metadata. We have a prototype service that combine strong identifiers with rich metadata, providing the building blocks for other semantic applications. Rich PURLs are a combination of two related services: A PURL server for management of the resolution and RDF (or other metadata format) file hosting.

The W3C TAG finding regarding the use of HTTP 303 responses seems to suggest that we could use rich PURLs in interesting ways. For example, we could do the following:

A 301 PURL pointing to an RDF resource == Metadata about an information resource
A 303 PURL pointing to an RDF resource == Metadata about a non-information resource (i.e. a physical or conceptual resource)

That usage would be consistent with the TAG finding, even if it goes a bit beyond it.

NB: You can tell if you get a PURL by looking at the PURL header in an HTTP response.

But wait, there's more. What if a PURL pointed to a Web service (in the sense of dynamic content, not necessarily limited to SOA Web Services, but including them)? The combination of a Rich PURL and a metadata reference to a Web service yields an "Active PURL". That is, an Active PURL is a PURL naming a graph of metadata describing a Web service.

Consider a simple Web service like an RSS feed. Placing an Active PURL in front of that feed allows you to describe how that feed should be handled. You could name the facets that you want to use to make an Exhibit or provide any other presentation advice you desired.

Alternatively, an Active PURL might itself by a sort of Web service that provides dynamic metadata about another Web service and can either serve the metadata or redirect to its target service. Such an Active PURL could be used to name a SPARQL graph, accept query parameters for it and return metadata about it, such as a count of results or information on the meanings of columns in the result set. I believe that named graphs are very handy things and something that we as a community are paying inadequate attention to. Given that SPARQL may be the query language that finally integrates our silos of relational databases, fronting them with Active PURLs seems like a promising line of research.

Lists of URLs as proposed by Stu Weibel would be easy to implement as an Active PURL.

Perhaps the most interesting use of Active PURLs to enterprises might be the ability to provide standardized RDF metadata about SOA Web Services as well as relational databases. UDDI is so broken, we might as well fix it with existing SemWeb standards. That is not a new idea, but the application of Active PURLs to the problem is, I think.

Tuesday, February 12, 2008

Friday, February 08, 2008

GRDDL Article on DevX

Brian has written a new DevX article entitled "Gleaning Information From Embedded Metadata", explaining GRDDL. He used my home page at Zepheira as an example of a live page with embedded, machine readable metadata.

Wednesday, January 30, 2008

A Great Week for Old Technology

It seems that ships are again sailing the ocean using wind power. The inventor claims that his kites can reduce fuel consumption of most merchant ships by 10-35%. That's awesome.

Even more bizarre is the Mach 20 paper airplane that a Japanese origami expert wants to throw from the International Space Station. Really, you can't make this stuff up.

Friday, January 25, 2008

Hang onto your towel

Burt Rutan and his business partner announced the design of SpaceShipTwo. Very cool! Oddly, his business partner is apparently Zaphod Beeblebrox...



Bitten by Polymorphism

Architecture discussions can get ugly. I was participating in one when Brian suggested the domain name turdpolisher.com for our activities. A quick check on register.com showed that someone is actually holding that domain, although nothing resolves there. The polymorphism arose when we realized that register.com suggested some alternative domain names, including PolishedToes.com and PolishHeritage.org. *Sigh*

Wednesday, January 16, 2008

SPARQL Set to Change the Web

The W3C announced yesterday the standardization of SPARQL. This broken and immature standard has the capability to rapidly change business operations, Web searching, the advertising model of most Web revenue and enable a new generation of Web-based services.

SPARQL is broken and immature for the simple reasons that it failed to include a way to write data. I have my gripes about the way it reads data, too, but those are less important. As long as the W3C continues to treat the Web as a read-only system, the longer the Web will primarily be a read-only system. It is bad enough that we have failed to widely implement and use HTTP PUT and DELETE (fully half of REST), but we really should know better than to create new standards in 2008 that make the Web look like an information retrieval system and not an information aggregation and creation system.

For all that, SPARQL is still an incredibly important set of standards. There are three of them, SPARQL the query language for the Semantic Web, SPARQL the protocol and SPARQL the results format in XML.

SPARQL is important because it gives the world a standard way to perform distributed queries across disparate data sets. In other words, it allows you to treat the Web (the Semantic Web) as a database. Relational databases and other data stores can play, too. They just need a SPARQL overlay. This is something that the relational database crowd has never been able to pull off. I suspect that a standards body would have failed do create a standard for an RDB distributed query language if they had tried due to industry competitive pressures, but SPARQL is that critical end run. By pursuing the goal in the Semantic Web community, we now have something that will work for the RDB folks, too.

I remain fussed that Mulgara doesn't have SPARQL support yet, but Paul tells me it should come soon. I certainly haven't done anything to help, so I shouldn't complain.

Why do I think SPARQL can fundamentally change business models? Because of my experiences with this blog. I started experimenting with advertising once my readership reached reasonable levels. Impressions are few, even though readership is decent. Why? RSS. Most people read this blog via news readers or aggregators and not via the Web. They don't ever see the ads. Cool, right? Yeah. Imagine, though, the impact on the online advertising market (which financially supports Google and its competitors) when the Web is a database. Nobody will see the ads. Watch out, world. Where there is chaos, there is opportunity. I can't wait to see what happens next.

Tuesday, January 08, 2008

Tucana's Fate Sealed

Northrop Grumman Corporation's Electronic Systems Sector has been attempting to sell its Tucana RDF database software for some time. They have been quietly seeking buyers since they dropped public references to it as a "rsemd1-extweb50 Sensor1 Replacement Server" in August 2006. Unfortunately for them, potential buyers keep calling me.

Each call is similar in nature and goes something like this:

Buyer: We are very happy to tell you that Northop Grumman has accepted our cash offer for the Tucana technology. We would like to talk to you about how we can make use of it.

Dave: Sure. I am sure Northrop must have done some work on it since they bought it. What has been added to the code base?

Buyer: They made us swear to secrecy, so we can't tell you that.

Dave: OK, I understand. Can you tell me whether it differs substantially from its Open Source baseline, Mulgara?

Buyer: Umm, there is an Open Source project? What is the URL?

Dave: http://mulgara.org

Buyer: Is it an active project? Do people use it?

Dave: Oh, yes. It is used in production by a number of for-profit and non-profit companies and many researchers.

Buyer: I thought Northrop had killed the Open Source project?

Dave: Nope. They tried but failed. Mulgara was a fork to avoid any future legal disputes. It has no code contributed by Northrop or its contractors.

Buyer: Uh, perhaps we should look into that and get back to you.

Naturally, that is generally the last I hear about it until the next potential buyer calls. The last one was yesterday, but they weren't the first (or the second or the third) and probably won't be the last.

The sad thing is, of course, that if a single manager at Northrop had tried to work with the Open Source community instead of building an empire the project could have been wildly successful in their customer base. There is still a market need for a more scalable RDF database outside of the government, as evidenced by the list of potential buyers, the life sciences community's desire to represent genomic data semantically, Garlik's creation of a custom one last year and continued funding from Mulgara users for scalability development.

Tuesday, December 04, 2007

UPDATED: Helping the Ogbuji Family

Some of you have contacted me about where to send items or other forms of well wishing to the family of Chime Ogbuji. The best way to help the family is to donate the Ogbuji Family Fund, set up by Chime's father, Dr. Linus Ogbuji. Donations may be made by clicking the "Donate" button on http://thekingdomkids.org/fund/.

Thanks in advance to anyone providing what comfort they can.

Monday, December 03, 2007

Tragedy

Chime Ogbuji, SemWebber extraordinaire and brother to Zepheira business partner Uche has suffered the most tragic occurrence of which I can conceive. Chime lost two of his children in a fire over the weekend, and his third is in critical condition. I waited for the last two days to say something meaningful, but there just are no words for this. I am so, so sorry for Chime and his family. The sad tale is reported here. Chime, our thoughts are with you.

Sunday, November 04, 2007

The Poor State of SPARQL Implementations

*Sigh* I had a simple task. Really I did. I am putting the final touches on a journal article and wanted to expand an example to be more interesting. All I wanted to do was demonstrate (in SPARQL) that multiple RDF graphs can be pulled in from URLs and the dynamically-assembled graph queried. I wouldn't have thought that was such a big ask for 2007. Alas, I was wrong.

Here is the query:


prefix sec: <http://www.itee.uq.edu.au/~dwood/ontologies/sec.owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?class ?test ?testresults
FROM <http://www.itee.uq.edu.au/~dwood/ontologies/sec-example.owl>
FROM <http://www.itee.uq.edu.au/~dwood/ontologies/sec-testresults.owl>
WHERE {
?class rdf:type sec:OOClass .
?test sec:isTestOf ?class .
?test sec:hasTestResults ?testresults
}


Redland won't do it because it does not support FROM (or FROM NAMED). The same for OpenLink Virtuoso SPARQL and JRDF. Sesame 2.0 might do it, but I got tired of looking. I'll have to get back to it tomorrow.

In the meantime, I hacked around the problem by using a little-known feature of JRDF - one can import a series of RDF or OWL files and query the subsequent graph. It is annoying, and requires local copies of the documents, but it works (kind of).

The really sad thing is that Tucana had this feature (in the iTQL query language) in 2000 or 2001. Mulgara still does, of course. Paul assures me that SPARQL support in Mulgara is finally close. That is wonderful, but it does make me feel a bit guilty for not contributing to it given its obvious need.

I still (since 2000) think that querying multiple data sources from the WEB makes the SEMANTIC WEB a bit more useful, and interesting. *Sigh* I guess I will have to either contribute more or live with it.

UPDATE: Sesame does not support SPARQL datasets according to this bug, even though a patch has apparently already been contributed.

UPDATE: OpenLink Virtuoso demos at http://demo.openlinksw.com/sparql and http://demo.openlinksw.com/isparql now both return results. However, they return four results where I expect two.

Dave Beckett claims that the latest Redland/Rasql from svn now supports the query, but that he also gets four results.

Danny's SPARQLer now returns correct results (two).

Thanks to everyone who responded! Having proper FROM and FROM NAMED support opens a floodgate of potential new SemWeb applications.

UPDATE: Changing "SELECT" to "SELECT DISTINCT" returns the correct two results from Virtuoso. I suspect that change may be needed with others, too.