Thursday, October 22, 2009

Chinese Units of Measure

Chinese Units of Measure
Originally uploaded by prototypo
I found this antique Chinese ruler in Seoul, ROK, last week. It uses the old Chinese units of length measure, the fēn, cùn and chǐ units.

The tiny fēn is about 3 mm. The cùn is traditionally the width of a person's thumb at the knuckle. The chǐ (or Chinese 'foot') is derived from the length of a human forearm, like a cubit. Or so says Wikipedia.

Those were hard-working people, to have thumbs as wide as a cùn.

The ruler is wooden, with brass inlays marking the units.

Friday, September 04, 2009

Why I will Never Own a Kindle

Fears of Internet security experts everywhere were realized today when Amazon revealed, apparently by accident, that it keeps copies of annotations made on the Kindle ebook reader.

This article at Reuters reported on damage control attempts at Amazon after it (in a delicious piece of irony) deleted copies of George Orwell's 1984 from its Kindles in July. The provider of the ebook version of 1984 apparently did not own the appropriate publication rights. Readers were naturally upset at the sudden disappearance of content from their readers, although of course they forget to read the fine print, didn't they? You can't buy an ebook, you can only rent. Amazon was technically within their rights to delete the content.

That's hardly the full story, though. Amazon was sued by a high school student for having also removed his "copious notes" regarding the deleted novel. The Reuters story linked above showed Amazon's hand when they reported:
Amazon's email on Thursday said that the company would replace
the deleted books along with any annotations made by customers.

That's right, Kindle fans. Amazon has admitted publicly that they, like Orwell's Big Brother, keep copies of any annotations that Kindle users make on the devices. For at least months. Holy cow!

The full text of Amazon's email to affected customers is available at the WSJ.

Perhaps more amazing is that Kindle readers don't particularly seem to care (cf. comments to the WSJ blog post). Kindle notes are synced to an Amazon server and thus available to readers over the Web. That may seem like a feature to some, but not to me. I'll back up my own notes, thanks.

Friday, August 21, 2009

Just Published: 97 Things Every Project Manager Should Know

O'Reilly Media has published 97 Things Every Project Manager Should Know. My colleagues Kathy MacDougall and James Leigh also wrote for this book.

This is a new style of "collective wisdom" books from O'Reilly. An earlier one was aimed at software architects.

I was pleased to see that O'Reilly used one of my quotes at the top of their home page for the book ("Clever Code Is Hard to Maintain...and Maintenance Is Everything").

The tips I wrote for this book were:
  • Clever Code Is Hard To Maintain
  • The 60/60 Rule
  • The Fallacy Of Perfect Execution
  • The Fallacy Of Perfect Knowledge
  • The Fallacy Of The Big Round Ball
  • The Web Points The Way, For Now
Check it out if you do project management. There's some good stuff in there.

Monday, August 17, 2009

The Death of the Copenhagen Interpretation?

Wow. A climate researcher in the UK has had the guts to propose a new geometry for space-time that provides a new way of answering pesky questions in quantum mechanics. This article in Physorg (see also the article's full text) provides a good overview.

I don't know if I still have the math to slog through it, but it looks to be worth the effort.

Called the Invariant Set Postulate, the proposed law offers a geometry of space-time that resolves long-standing difficulties in quantum mechanics, including complementarity, quantum coherence, superposition and wave-particle duality. Quantum description of gravity may even be possible. Wow. That is an amazingly out-of-the-box contribution.

For the faint of heart, here is a key quote: "The Invariant Set Postulate appears to reconcile Einstein’s view that quantum mechanics is incomplete, with the Copenhagen interpretation that the observer plays a vital role in defining the very concept of reality."

Monday, June 15, 2009

OK, OK, I'm back on Twitter

I'll be at the 2009 Semantic Technology Conference this week and will be twittering on @prototypo.

Friday, June 12, 2009

Freemix is Live!

Freemix is live in invitational Beta. Come check it out!

We at Zepheira will officially introduce it to the SemTech crowd next week and to the press on Tuesday.

You can see my profile on


Monday, June 01, 2009

Musicians and Coders

Bernadette recently gave me a CD (yeah, really, not an iTunes gift card! She's very quaint.) by Jeremy Pelt, a fantastic jazz trumpeter. In the inside cover of November, he says, "Our greatest responsibility as musicians is to live and grow... then, you might play something hip!"


Friday, May 29, 2009

Announcing Freemix

Zepheira has announced the forthcoming launch of Freemix, a social networking site for data and the people who use it. Freemix will be officially launched at the Semantic Technologies Conference in San Jose, California on June 16, 2009.

Zepheira partners Eric Miller, Uche Ogbuji and myself will brief representatives of the press at 12:00 US Pacific Time in the Fairmont Hotel in San Jose. Zepheira will demonstrate Freemix in a booth on the SemTech exhibit floor.

SemTech conference attendees may also attend a briefing on Freemix on Wednesday, 17 June 2009 from 5:00-6:00 PM US PST.

If you are a spreadsheet user and want to share your data more widely, Freemix is for you. Wouldn't it be nice if your data had friends, too?

Thursday, May 21, 2009

Speaking at SemTech

I will be speaking at the Semantic Technology Conference in San Jose again this year from June 14-18, 2009.

Dan McCreary and I will be giving a three-hour tutorial on entity extraction on the Monday. I'll be presenting a talk on Active PURLs: Stored Procedures for the Semantic Web on the Tuesday. Additionally, it seems likely that I will replace Uche on a panel dubiously entitled Web3-4-Web2, also on the Tuesday.

Speakers have been authorized to share coupons for up to $200 off registration fees. If you would like to get the coupon code, please contact me or leave a comment here by May 29, 2009.

Zepheira is a gold sponsor again this year and we will have a very cool announcement. We are going to officially launch Freemix at the conference. The site is still under authentication, but will be released to the public just before the conference. It should be exciting. If you care are putting real, live, useful, everyday data on the Semantic Web, come see it.

Saturday, May 16, 2009

Playing with Wolfram Alpha

Wolfram Alpha has been launched and is available for the public to try. I sat down to play with it.

FIrstly (using the rare American adverb here - don't be confused), you can't expect Wolfram Alpha to act like Google. It is a new kind of search engine, as one should expect from Stephen Wolfram. Wolfram is famously the inventor of Mathematica and author of A New Kind of Science.

Wolfram Alpha seems to consist of a linguistic interpretation engine coupled to Mathematica and a growing number of databases. Google, on the other hand, is a free-text indexer of Web content. That suggests that while one might be able to type just about any word or phrase into Google that is somewhere on the Web, one must limit Wolfram Alpha queries to concepts that are in its databases or may be treated as mathematical relationships. Indeed, this seems to be the case.

Wolfram's overview video is well worth watching. It, and the example search results available from the home page, give a flavor for the powerful searches one can do with the site.

Following a lead from the video, I tried typing the female name "Bernadette" into the search box. Wolfram Alpha, as advertised, did indeed respond with a presumption that I wanted information about the name and results that included a time distribution plot of popularity. Searching for "Bernadette David" gave me a distribution plot of both names which showed the highest combined popularity did in fact occur around our birth years. Well done, Wolfram Alpha.

Changing the previous search to "Bernadette Peters" resulted in some minor information about the actress and a link to her Wikipedia entry. Wikipedia links are provided where possible, as a transparent but useful attempt to provide flesh to limited source content.

However, more general searches, such as the word "Zepheira", produced no results. Wolfram Alpha responds to null result sets with a message saying "Wolfram|Alpha isn't sure what to do with your input.". That alone makes it clear that Wolfram Alpha and Google are at best complimentary.

Too many users on the site result in a cute message saying "I'm sorry Dave, I'm afraid I can't do that..." - which is only mildly freaky if your name happens to be Dave. The reference naturally comes from the mutiny of the HAL 9000 computer in the film "2001: A Space Odyssey".

Math, science, engineering and finance queries work well, as expected. A Web interface to Mathematica is useful in itself. I suspect that the site will be most effectively used by college students and some working professionals. My mom and dad are unlikely to find it compelling (although my dad is a weather geek and weather data is well represented, so I might be wrong). Still, the lack of detailed weather results such as live RADAR images would more likely lead him to

One can do funky and useless math with aplomb. Wolfram Alpha rapidly provided me with the correct interpretation, unit dimensions and unit conversions for the search "100 furlongs per microfortnight", a speed well above that of sound but under that of light.

Minor misspellings were handled effectively (e.g. "area of icosehedron" was correctly interpreted as "area of icosahedron"). Similarly, "volume of icosahedron" resulted in a correct interpretation. I expected the search "distance to a star" to fail miserably, but the answer was surprisingly useful. Try it yourself to see what I mean.

The problem with this kind of interface is that interpretations of intent are notoriously hard, if not impossible, in the general case. How can Wolfram Alpha expect to know that when I typed "birth year of gandhi" that I meant Mahatma Gandhi? What if I meant Indira Gandhi? Guessing is fine as far as it goes, but most search engines chose to give up that approach a decade ago in favor of appendation of search results.

The interface style is also naturally limited by its underlying data. Searching for "the size of the World Wide Web" resulted in a suggested to try "the size of the world wide" - which it could answer as the diameter of Earth.

I wonder how many people recall that Yahoo used to allow mathematical equations in their search engine? They seem to have removed the functionality. One can only presume that they got in the way of becoming a more general Internet search engine. I suspect there is a lesson there for Wolfram Research. Will Wolfram Alpha stay aimed at specialists or will they grow into a more general tool? Time will tell. Their promise to integrate more databases does not promise to address the inherent limitations of guessing linguistic intent.

In summary, Wolfram Alpha is an expert-friendly search system for specialists and is best used as an orthogonal complement to Google and other general search engines. Its approach is pure Wolfram - unashamedly different and unapologetically ignoring of lessons learned by others.

Sunday, April 26, 2009

Finally, Some Clues on the Domestication of Rice

The subtitle to this blog promises posts about "the origins of agriculture", although the field is so slow moving that I have not posted on the topic in three years and have not reported meaningful research here since speculating on religion as a driver.

Fortunately, others are doing active research on agricultural origins even if I am not. Dr. Dorian Fuller of the Institute of Archaeology at University College London has cracked a very special nut, indeed. He and his team have located substantial evidence of the location and timing of rice domestication in the Lower Yangtze region of Zhejiang, China.

Dr. Fuller and his colleagues discovered a location where the local diet shifted dramatically from a hunter-gatherer lifestyle to an agricultural one over a mere three hundred years. That alone is fascinating and an important discovery. Equally interesting was the dating of the shift, from 6900 to 6600 years ago. That places rice domestication in a timeframe fully two thousand years later than thought and lends serious support to diffusion theories (versus parallel development).

The process used by Fuller collected mixtures of midden material from the site, and painstakingly separated wild rice remains from domesticated rice remains. Specifically, they looked at spikelets, the place where rice seeds attach to stalks. Like other domesticated plants, rice underwent a genetic shift to retain the seeds for harvest by humans by a process of artificial selection. The shape of the spikelets is sufficiently different as to be distinguishable.

There is a nice scanning electron microscope image of a wild rice spikelet base at the Agricultural Biodiversity Weblog.

The last I heard, Londo's investigation1 was still suggesting multiple independent origins of rice in Southeast Asia and lower China. Hopefully Fuller's paper2 will put that to rest. Londo at least admitted that his team wasn't certain.

Wikipedia's entry on rice says, "Rice has been cultivated in Asia likely over 10,000 years." It is clearly time to correct that entry and, more broadly, correct the education of literally billions of people who are taught it. I really need to get back to work on my Origins of Agriculture summary and update it with these findings.

[1] Londo, J.P., Chiang, Y-C, Hung, K-H, Chiang, T-U and Schaal, B.A. (2006). "Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa". PNAS,

[2] Fuller, D.Q., Qin, L., Zheng, Y., Zhao, Z., Chen, X., Hosoya, L.A. and Sun, G-P (2009, March 20). The Domestication Process and Domestication Rate in Rice: Spikelet Bases from the Lower Yangtze, Science 20 March 2009, 323/5921, pp. 1607-1610,

Saturday, April 25, 2009

Back to Basics

Torture is wrong, regardless of efficacy, regardless of culture, regardless of how it is justified, regardless of how the enemy is dehumanized, regardless of whether someone - anyone - thinks it is useful. Period.

Even my eight-year-old can figure this one out, all by herself and with no hints.

Tuesday, April 07, 2009

OCLC PURL Server Migrates to PURLZ

The Online Computer Library Center (OCLC) migrated to the PURLZ software this morning at 6 AM US EST (GMT -5).

I'll admit to some frustration that the legacy PURLs were not tested completely and some errors remain. We are working with them to iron out the relatively few remaining problems with the legacy data migration. Most of the legacy data seems to be working as expected.

Update: Nope. They rolled back again. Sigh. Maybe next time they will do it right.

Sunday, April 05, 2009

Unfortunate Names for 2009

I have to add Babelease Limited in the UK to my unofficial list of unfortunate names. Babel-ease. Yeah, that makes sense. I completely mis-parsed the syllables the first time...

Monday, March 23, 2009

With Apologies to Emerson

Rich are the Web-gods: who gives gifts but they?
They grope the Web for PURLs, but more than PURLs:
They pluck Force thence and give it to the wise.

Thursday, March 05, 2009

O'Reilly Media Joins the Semantic Web

O'Reilly Media (, the current name for the geek publishing giant founded by Tim O'Reilly, has finally joined the Semantic Web.  O'Reilly's coining of the term "Web 2.0" and early misunderstandings of the Semantic Web stack lead some to think that he didn't see much value in machine readable information.  That seems to have changed, at least in within O'Reilly Labs.

O'Reilly Labs launched a Beta product last month called the O'Reilly Product Metadata Interface (OPMI), which is available at  The OPMI is a technical platform for the exchange of metadata between publishing trading partners.  Now that it is in RDF and publicly accessible, the rest of us can play with it, too.

It is easy to retrieve RDF/XML describing any book that O'Reilly publishes. You simply perform an HTTP GET on a URL constructed with the book's International Standard Book Number (ISBN). Every edition of a published book has an ISBN and they come in two flavors, the older 10-digit variety and the newer 13-digit version. All ISBNs issued after 1 January 2007 have been 13 digits. Some books are assigned both forms by their publishers for convenience during the transition.

For example, let's get the metadata description of an O'Reilly book I wrote, Programming Internet Email. The 13-digit ISBN for the second edition of the paperback is 9781565924796, and the 10-digit equivalent is 1-56592-479-7. The OPMI nicely works with either one, but the returned RDF uses the modern 13-digit one as canonical, as it should.

The URL for any O'Reilly book is followed by its ISBN, in this case 9781565924796. The full URL is thus

An HTTP GET may be done with any Web browser, of course, or on a command line by use of the curl utility:

$ curl

The returned RDF includes a wealth of information about the book. The OPMI uses four vocabulary descriptions in its RDF: Dublin Core for describing books (title, subject, language, etc), Friend-of-a-Friend (FOAF) for describing people associated with those books, the library community's MARC (MAchine Readable Cataloging) relator codes for relating people and books and the Metadata Object Description Schema (MODS) for specifying the edition of a book. MARC and MODS come from the Library of Congress and are traditionally used in library cataloging systems.

Since this metadata is on the Web, we can use standard Semantic Web query tools to query it. Using SPARQLer, a SPARQL query language processor available freely on the Web, we can query the RDF to extract bits we want. A bit of playing around makes it easy to get the author's name and the unique URI assigned to the author by O'Reilly:

prefix dc:
prefix foaf:
prefix rdf:
SELECT ?work ?authorURI ?author
?work dc:creator ?authorType .
?authorType rdf:_1 ?authorURI .
?authorURI foaf:name ?author

The results look like this:
work authorURI author
< product:9781565924796.IP> < agent:pdb:2495> "David Wood" @en
< product:9781565924796.BOOK> < agent:pdb:2495> "David Wood" @en

There are two results because the first (.IP) is the overall URI for the work in all of its possible formats. The second (.BOOK) is the book edition of the work. If this book had been published on Safari, O'Reilly's electronic publishing forum, it would also have a URL ending in ".SAF". E-books get an ".EBOOK" and Apple iPhone applications get a ".APP".

O'Reilly claims published metadata for over 1100 books, which is a pretty reasonable addition to the Semantic Web, even in Beta. Naturally, I now want O'Reilly to publish machine-readable metadata on their human-readable Web pages using RDFa. There has been no sign of that yet, though.

This content was cross-posted to Semantic Universe.

Monday, March 02, 2009

PURL Legacy Loader Now Open Source

A legacy loader is available to take old OCLC version 1 Persistent URL (PURL) database dumps and upload PURLs into the new project’s RESTful API. This is not production code, but is provided in the hope that it may be useful to operators of old PURL servers wishing to migrate to a more modern PURL server. The legacy loader has been released under an Apache 2.0 license.

To get the legacy loader, use Subversion to check it out like this:

svn co

Check out the code and follow the directions in the file README.txt.

This information is also available at the PURL Project's Download Area.

Persistent URL (PURL) Server version 1.4 Released

The PURLZ Persistent URL Server version 1.4 is now available. See the PURLZ Downloads area to get your copy now. This release improves handling of URLs with query strings and special characters. It is recommended for immediate use by all PURL server operators.

PURLs are Web addresses or Uniform Resource Locators (URLs) that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons. Details are available on the PURLZ community site.

Please see also the README and Release Notes for version 1.4.

Saturday, February 14, 2009

Fun with Blimps

Aidan, Mikayla and I had a blast today by attaching a digital camera to a helium blimp and flying it around our neighborhood :)

Friday, February 13, 2009

No Darwin in the South

I know I live well South of the Mason-Dixon Line. The slower pace of life here, the older attitudes and the more formal politeness is often pleasant. Sure, there are prejudices and many of the public schools aren't very good (others are, naturally). There is a lot of societal stress due to Northern migration. Virginia was even a blue state in the last election. All in all, many people from many places live in Virginia and call it home.

That's why I was shocked that my kids' school didn't even mention the 200th birthday of Charles Darwin yesterday. Neither my fifth or second grader knew who he was, or why he was famous. They know now, though. We talked about the The Voyage of the Beagle, On the Origin of Species and The Decent of Man all through dinner. Tomorrow, I plan to describe his work on worms. Kids love that sort of thing, even more than discussions of sexual selection vs. natural selection.

Shame on Fredericksburg Academy! They call themselves a college prep school? They don't even teach sex education until seventh grade! By that time, the kids have had the chance to figure it out for themselves, often in inappropriate ways. I had "the talk" with my fifth grader earlier this year. He is better for it, too. I'm honestly looking forward to talking to the head of the Lower School when she gets the rant I just sent her on Monday morning.

Monday, February 09, 2009

Desperately Seeking SKOS Vendors

A Fortune 500 customer of Zepheira's has a problem that could readily be solved with SKOS. You might think that would be sufficient to attract the attention of some tools vendors, especially since SKOS is in "last call" at the W3C and is likely to become a standard later this year. If that is so, I've missed it.

Can anyone tell me where to get decent tool support for SKOS?

Mulgara has some cool support for SKOS, as I mentioned here. Unfortunately, the state of that support still requires some care and feeding by an expert.

I approached Revelytix, hoping that they would agree to provide SKOS support in Knoodl, but they demurred until at least later this year. It should be easy for them given their existing support for OWL and their use of Mulgara.

Another alternative may be ThManager, an Open Source SKOS editor/visualizer.

Until tools vendors support SKOS directly, we are limited to existing taxonomy creation and maintenance tools, such as BiblioTech or Synaptica to build ANSI/NISO standard thesauri (Z39.19) then convert them to SKOS. For the moment, though, conversion tools seem to be in the same boat as editors.

SKOS in Mulgara's RLog

I have long been impressed by Paul's technical prowess. His recent implementation of SKOS definitions in Mulgara's RLog has done it again.

RLog is a logic programming language like Prolog that Paul created. RLog natively understands URIs and RDF's notions of subject-predicate-object relations. RLog's implementation of SKOS requires a mere 7 rules (!) once the 95 axioms are laid down. Naturally, those axioms and rules include huge chunks of RDFS and OWL.

RLog makes it easy (if you are a logic programmer) to make rules files for Mulgara's Krule rule engine. Support for RDFS has been provided in Krule for some time.

Paul has been talking about integrating RLog into Mulgara for over two years. I hope he can make that happen during 2009. Scalable or not, it is insanely cool. Until an integration happens, RLog must be run as a separate tool, as does Krule.

Friday, February 06, 2009

Ph.D. Thesis Published

My Ph.D. thesis, entitled Metadata Foundations for the Life Cycle Management of Software Systems has been published on UQ eSpace, The University of Queensland's institutional digital repository. Get your copies now while they're hot :)

Interestingly, at least to me, is that UQ eSpace is built on Fedora Commons, and therefore uses Mulgara. Sweet!

Tuesday, February 03, 2009

IET Software Journal Article Finally Published

The British journal IET Software finally published an article I wrote nearly three years ago. It was apparently published last August but I just recently found out.

The article is Towards a software maintenance methodology using Semantic Web techniques and paradigmatic documentation modelling.

The citation is:

Hyland-Wood, D., Carrington, D. and Kaplan, S. (2008, August). Towards a software maintenance methodology using Semantic Web techniques and paradigmatic documentation modelling, IET Software, 2/4, pp. 337-347

Wednesday, January 21, 2009

What is an Oracle-Mulgara Instance?

Paul pointed out a US government contract solicitation involving Mulgara. It mentions something intriguingly called an "Oracle-Mulgara instance". I am intensely curious what that is!

Persistent URL (PURL) Server version 1.3 Released

The PURLZ Persistent URL Server version 1.3 is now available. See the PURLZ Downloads area to get your copy now. This release contains substantial improvements for speed of indexing, stability and numerous bug fixes. It is recommended for immediate use by all PURL server operators.

PURLs are Web addresses or Uniform Resource Locators (URLs) that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons. Details are available on the PURLZ community site.

Please see also the README and Release Notes for version 1.3.

Monday, January 19, 2009

Meanwhile, Back in the Real World...

Aidan is hooked on the Mac OS X port of Nethack. Ya gotta laugh.

The Content of their Characters

Today is Martin Luther King, Jr. Day in the United States and rightfully so. We watched his "I have a dream" speech in its entirety at lunch today and I realized, in explaining his legacy to my children, how many modern-day prophets have paid the ultimate price. King, his mentor of non-violence Mohandas Gandhi and Abraham Lincoln, the three men arrayed in spirit at King's speech, were all removed from this Earth by assassins' bullets. All of them were killed for having the courage to say to small minds that people should be free.

Raised on the ideals of the American union, I was a child of King in a literal sense. King spoke at the Lincoln Memorial on the day that I was born. I grew up in prejudiced times but in hearing the conversation that he started, learned to tolerate, then to embrace, cultural differences. There are no racial differences, of course, and have not been since Neanderthals walked Europe alongside Homo Sapiens Sapiens. Such minor differences as skin color are trivial and recent evolutionary adaptations to environmental conditions that we have long worked around with forms of transportation. Culture, not race, is all that separates us.

Culture is fungible. We can change it. We have the ability if we only have the will. Do we want to live together on this increasingly tiny planet, or do we wish to let our subtle differences rip us apart? The time has come to choose. We have to work together to address the problems of our time. Climate change, energy production, medical ethics, poverty and war won't go away unless we will them to. The only way to address any of them is to live together, in peace if not always in harmony. THE challenge of our time is thus laid bare.

Tomorrow Barack Obama will become the 44th president of the United States. I am pleased that so many feel a sense of pride and accomplishment in the victory of his genes, but hope that they will remember that his victory is not about his genes, his past, or his parents. It is about our future. I, for one, support him not because he is African American, but because I believe him to be the best man for the very difficult job. I attempted to judge him, simply, not on the color of his skin, but on the content of his character.

Obama is following a dangerous path. He will need to ignore his own rock star status, to avoid offers from young women, to avoid the corrupting influences of Washington. He will need to avoid assassins' bullets. If he lives, if he stays sane, if he can just do what he has set out to do, he just might become truly great. I hope he can. I hope we can follow.