Tuesday, October 08, 2013

Repairing Makerbot Replicator 2 Nozzle Insulation

The Problem: Filament Ball Tears Insulation Tape

I have finally used my Makerbot Replicator 2 (not a 2x) enough to run into a problem that is not strictly end-user serviceable.  A large filament ball stuck to the ceramic insulation tape surrounding the Replicator's nozzle and ripped the tape when the ball was removed.

Makerbot does have a video on removing a filament ball (or "blob" as they call it - must be a technical term).  Look for the video entitled "Removing a PLA Blob" on their troubleshooting page.  They suggest heating the extruder nozzle and gently lifting off the ball while being "careful not to rip the ceramic insulation tape".  Yes, indeed.  I agree that it is always best not to remove a filament ball when the nozzle is cold, but the described process wasn't sufficient for me.  I have since discovered that I can use the "Load filament" option under the Utilities menu to heat the extruder and use newly applied filament to melt and gently force downward the ball.  That seems to make it easier to remove cleanly.

I decided to replace the ceramic insulation tape and the kapton tape that holds it in place.  This required a rather complete tear-down of the extruder assembly, but it wasn't really that bad.  Anyone reasonably careful should be able to do it themselves.

The filament ball complete with insulation tape stuck to it:


I sourced some ceramic insulation tape and some kapton tape to cover it from UltiMachine, a RepRep printer supply company in Tennessee.  Unfortunately, Makerbot doesn't supply such parts.

  • 1 x Dupont Kapton High Temp. Adhesive Tape 1/4" ($6.00), SKU UMKPTON025
  • 1 x Ceramic Insulation Tape ($1.50), SKU UMCERTAPE

The kapton tape from UltiMachine was quite thin side-to-side, much thinner than the tape used by Makerbot.  I used it anyway and just wrapped it around to ensure a complete coverage.  It seems to be fine.

Repair Instructions

The first thing to do is to carefully remove the fan assembly and extruder motor from your Makerbot Replicator 2.  These instructions will help you if you are unfamiliar with the process.  They were developed for the upgrade of the extruder assembly.  Follow them just until you remove the fan assembly and the motor, but do not take off the extruder itself. 

Next you will need to remove the side fan assembly.  Use a Phillips screwdriver to remove the two screws holding the assembly to its chassis (shown at 10 o'clock and four o'clock in the picture below).

Disconnect the electrical wire connectors so the fan assembly hangs down out of your way.  Note that you will still have two wires (in a single insulated cable) connected at this time.  Those are the thermocouple wires that are used to heat the nozzle bock.  They will be removed shortly.

Remove the two hex screws from the base of the chassis that connect the chassis to the aluminum block beneath it.

Removal of the chassis exposes the aluminum heat sink.  The nozzle assembly is attached via a bolt and hangs beneath the heat sink.

Removal of the aluminum heat sink requires you to look underneath the assembly to find two more hex screws, one on each side of the heat sink.

Once you have the heat sink entirely removed from the extruder mount, it should look something like this:

Now it is time to remove the thermocouple wires.  Unscrew the thermocouple connector from the nozzle block.  Note that the wires will twist if you leave the block in place!  Instead, you should rotate the nozzle block to remove the thermocouple connector as soon as you loosen the connection enough to do so.  That will keep the wires from breaking.

You should now have the nozzle block and heat sink completely removed from the printer.  You can move it to a more convenient working surface, such as a workbench.

Use a crescent wrench or socket wrench to remove the nut at the top of the heat sink, as shown.  You will see that the bolt has a hole through its center; that is where the filament passes on its way to the nozzle.

Remove the heat sink and set it aside.  You should also remove the nut between the nozzle block and the heat sink.

Clean off the old kapton tape and ceramic insulation tape.  Some of mine was baked on and was difficult to remove.  Carefully remove the tape so you don't damage the brass nozzle, the aluminum block or the wires.  I used a utility knife to remove most of the material and found that heating the material with a portable butane touch made it easier to remove.

Please note that a steel utility knife blade is substantially harder than the aluminum of the block or even the brass of the nozzle.  You can shave off bits of them if you aren't careful.  You only want to remove the old tape.

I placed the block into a vise to facilitate cleaning it and also (carefully) used a wire brush for the final touches.

Cut some ceramic insulation tape to the width of the nozzle block.

Cut holes in the ceramic insulation tape for the bolt and the nozzle.  The round hole on the right is for the nozzle because I wanted to ensure that the join in the tape would not be at or near the nozzle.  Wrap the tape around the nozzle block by putting the nozzle through one hole and the bolt through the other.  Obviously, it works better to start by putting the tape over the long bolt and then wrapping around to the nozzle.  Cut the end of the tape to fit the size of the block.

Wrap the nozzle block with kapton tape to hold the ceramic insulation tape in place.


You will need to follow the tear-down instructions in reverse.  You might follow the pictures.  I did :)

There are a couple of things to watch out for:

1.  Make certain to align your nozzle block on the heat sink so it is straight (not crooked as shown below) and make sure that the screw holes are facing toward the nozzle (so you can screw them in from the bottom when you remount the unit).

2.  Next, and this is really very important, ensure that the bolt on the top of the nozzle block-heat sink unit (shown on the right in this picture) does not protrude above the top of the nut.  If it does, the extruder will not fit over top of it.

This also seems to ensure that the nut in between the nozzle block and the heat sink does not compress the insulation.

3.  Another minor gotcha occurs when remounting the side fan to the fan housing.  The picture below shows where the two sets of black-and-white wires fit in a slot on the left side of the housing (toward the back of the Replicator 2).  The picture shows me pushing the wires into the slot with my index finger.  This will keep the wires from being kinked when the fan is screwed into place.

Getting it Working Again

Don't forget to level your build plate after the reassembly!  This is critical because you may have changed the nozzle height, however slightly.  Look under the Makerbot's Utilities menu for the "Level the build plate" option and follow the on screen directions.

Next, load some filament and try a small test print.  The "Mr. Jaws" model from the Makerbot's SD card printed flawlessly for me the first time.  Yay!

You might ask how long it took me to generate another filament ball.  Less than an hour ;-)  Fortunately, that one didn't tear my newly fixed insulation.

Friday, June 03, 2011

Schema.org and the Semantic Web

The interweb is all atwitter today about schema.org and what it may mean for the Semantic Web. Here is my take.

There has been a long-standing argument between microformats and the Semantic Web. Many developers, and to some degree, search engines have preferred microformats because they are easy to use and to understand. Microformats are widely deployed because of this. However, there is simply no way to combine microformats on a single page. This is the Achilles heel of microformats; sooner or later someone wishes to use more than one (or a few if they play together particularly nicely) at a time and can't do it.

RDF is harder to understand (although an experiment in Germany showed that fifth graders could easily be taught RDF. It is adults who have already learned other ways to think who have trouble.) RDF is a completely general solution to the problems that microformats solve. RDF's raison d'être is to allow for the combination of data from multiple people (e.g. developers and search engines, or multiple relational databases or as an interchange format between proprietary system). RDF can represent any type of data, and combines easily with other RDF.

The argument between microformats and RDF can thus be thought of as an argument between short-term pragmatism and long-term planning. Those who want to solve a specific problem now use microformats. Those who want to solve more general problems in the future use RDF.

The presumption in the Semantic Web community is that the best (perhaps the only) way to combine microformats is either RDF or something very much like it. Further, people have been expressing needs to combine the use of multiple microformats on Web pages for about five years.

Microsoft is aware of the Semantic Web and, in fact, was an early supporter of the RDF standards at the World Wide Web Consortium (W3C). They even paid for some marketing; proof may be found here.

Unfortunately for those interested in open standards, Microsoft decided that Netscape's (remember Netscape??) use of RDF in their portal was threatening, so they decided to reinvent RDF internally as a proprietary technology. Microsoft's internal version of RDF has appeared in their file system, Sharepoint and other products. That's the way this story simply must play out: Use RDF or reinvent it.

Yahoo was the first search engine to support RDFa (RDF in Web pages), followed by Google. Both supported particular vocabularies of RDFa, which is the same as saying 'microformats encoded in RDF' and therefore along the lines of my earlier comments.

The new schema.org announcement is a partnership between "Google, Bing and Yahoo" or "Google, Microsoft and Yahoo" depending where you look. Since Microsoft bought Bing and Yahoo has licensed Bing for its search services, schema.org is really between Google and Microsoft.

So, I read schema.org as an attempt (actually, a further attempt) by Microsoft to reduce the impact of RDF and Semantic Web techniques on the search business specifically and their larger business in general. Time will tell whether that will work. History suggests that it will partially work by changing the places RDF is seen as threatening to big business. Another similar area to watch will be RDF and Linked Data's threat to the Data Warehousing market (a $10 billion market in 2010). That fight will be primarily between standards and Oracle.

Michael Hausenblas at DERI released Schema.org in RDF while I was writing this. Well done to Michael and his colleagues. As Michael said, "We're sorry for the delay". Awesome.

Update 2011-06-15: Google announced at a BOF at SemTech 2011 last week that they will continue to support Rich Snippets (their RDFa implementation). That's helpful and what we should be promoting.

Tuesday, February 08, 2011

Semantic Web Elevator Pitch

Eric Franzon at SemanticWeb.com asked the community to provide "elevator pitches" for the Semantic Web. Here's my attempt:

Thursday, January 06, 2011

Ian was Right and I was Wrong

Following Ian Davis' post A Guide to Publishing Linked Data Without Redirects, I followed with A(nother) Guide to Publishing Linked Data Without Redirects. In that post, I argued that resource descriptions should be separated from resource representations at the HTTP level. I see now that I was wrong.

Ian challenged me to come up with a compelling reason why HTTP should encode the difference between a resource representation and a resource description and, after some effort, I simply could not. Ian summarized his thoughts in a new post: Back to Basics with Linked Data and HTTP.

The problem in my mind has always related to the use of HTTP URIs to identify things in the real world. We can get around that easily enough by returning RDF whenever someone resolves those URIs. You get a description of a real-world thing that is as richly described as the publisher wanted it to be. Cool.

Tuesday, November 09, 2010

A(nother) Guide to Publishing Linked Data Without Redirects

It seems to me that we in the Linked Data community have a need to:

  1. Assign URIs to resources, be they physical, conceptual or virtual (information resources) in nature.

  2. Apply the same mechanisms for metadata description to any resource, regardless of type.

  3. Be able to traverse in obvious ways from a resource to its metadata description and from a metadata description to its resource.

Unfortunately, we can't do all that yet, at least not easily and in all circumstances. We are close, but not close enough.

Linked Data deployment is hampered by the requirement for so-called "slash" URLs to be resolved via a 303 (See Other) redirection. Unfortunately, many people wishing to publish Linked Data don't understand the subtleties of 303 redirection, nor do many of them have adequate control over their Web server configurations to implement 303 redirections. Ian Davis has been looking for a solution to this problem. Unfortunately, I don't think he has found it yet.

Ian published A Guide to Publishing Linked Data Without Redirects specifically to find a way around the confusing (and sometimes difficult) usage of 303 redirects for Linked Data. Ian's original question was: "What breaks on the web if we use status code 200 instead of 303 for our Linked Data?"

Unfortunately, the use of the Content-Location header with Linked Data begs the same questions as 303s:

  1. It requires a change of thinking regarding the meaning of 200 (OK), specifically to the http-range-14 finding.

  2. It suffers from the same problem as 303s in relation to deployment with current hosting companies/IT departments. If you don't have control over your Apache, you can't publish your Linked Data.

  3. There is an "implicit redirect", in that one may wish or need to check the URL in the Content-Location header.

The first one admittedly bothers me most. If one resolves a URL and receives a 200 (OK) response, we are currently guaranteed that both (a) our request succeeded in the way we expected and (b) that the thing we received is an information resource. We expect that the thing we received is an information resource that is a representation of the resource we requested (and identified by its URL address).

In short, I think Ian's proposal mostly but not completely solves the problems that Ian was meaning to address. Unfortunately, there is practically little difference from the status quo. Tom Heath has some of the same concerns.

If we are going to fix fundamental problems with serving Linked Data, I'd prefer to explicitly address the fundamental questions related to URI naming of physical, conceptual and information resources (the overloading of the HTTP name space), so I proposed an alternative solution on the public-lod@w3.org mailing list last week. This post expands on those thoughts with some more detail.

The use of 303 redirections by the Semantic Web and Linked Data community is a bit of a hack on top of the existing 303 functionality laid down in the early Web. The http-range-14 debate tried to end the arguments, but only slowed them down. We can't really hack at the 303 any more than we have. I explored that in 2007 and came up pretty empty.

My Proposal

I propose deprecating the 303 for use in Linked Data (only) in favor of a new HTTP status code. The new status code would state "The URI you just dereferenced identifies a resource that may be informational, physical or conceptual. The information you are being returned in this response contains a metadata description of the resource you dereferenced." This new status code would be used to disambiguate between generic information resources and the special class of information resources that describe (via metadata) an addressed URI.

The "metadata description" would generally be in some form of RDF serialization, but could also be in HTML (for human consumption) or in some future metadata representation format. Existing HTTP content negotiation approaches and Content-Type headers would be sufficient to inform both requester and Web server what they received.

I propose that the new status code be called 210 (Description Found).

Existing HTTP status codes may be found in RFC 2616 Section 10.

Example Requests and Responses

Let's start with the basics. If we resolve a URI to an information resource, we get a 200 (OK) response upon success:

# Get an information resource:
$ curl -I http://example.com/toucan.info
HTTP/1.1 200 OK
Date: Wed, 10 Nov 2010 21:37:44 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/html;charset=UTF-8
Content-Length: 1739

An information resource that supports some (any!) form of embedded RDF can easily point to its metadata description at another URL (e.g. via a link element or a POWDER description). The metadata description can easily point back to the described resource.

Physical and conceptual resources are where we have historically ran into trouble on the Web of Data. A "slash" URI assigned to name a physical or conceptual resource has required a 303 redirection to another document and the semantics are unclear at best. Instead, this proposal suggests that physical and conceptual resources explicitly return a 210 (Description Found) status code, thus removing any ambiguity from the response.

The resolution of a URI to a physical resource might return:

# Get an information resource:
$ curl -I http://example.com/toucan.physical
HTTP/1.1 210 Description Found
Date: Wed, 10 Nov 2010 21:38:52 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/turtle
Content-Length: 1739

The body of the response would naturally be (in this case) an RDF document describing the physical resource. The fact that the resource is physical would be encoded in an RDF statement in the description.

Conceptual resources could be handled in an identical manner. The only difference would be in the requested URI and differing content returned:

# Get an information resource:
$ curl -I http://example.com/toucan.concept
HTTP/1.1 210 Description Found
Date: Wed, 10 Nov 2010 21:40:12 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/turtle
Content-Length: 1214

Again, the fact that the resource is conceptual would be encoded in an RDF statement in the description.

Savvy readers might note that the existing status code 300 (Multiple Choices) could be used when multiple metadata descriptions of a resource are available:

The requested resource corresponds to any one of a set of
representations, each with its own specific location, and
agent- driven negotiation information (section 12) is being
provided so that the user (or user agent) can select a
preferred representation and redirect its request to that

Note that Ian's statement that when using a 303 "only one description can be linked from [a resource's URI]" is not correct; standards-compliant Web servers could use a 300 status code should they so wish (and can figure out a way to configure their Web server to do that).


How does my proposal stack up to Ian's? Ian proposed nine problems with the 303, the most important of which (in my opinion) were:

  • it requires an extra round-trip to the server for every request (at least, that's important to those implementing browsers, spiders and Linked Data clients and to those with limited bandwidth)

  • the user enters one URI into their browser and ends up at a different one, causing confusion when they want to reuse the URI (PURLs also suffer from this due to odd UI decisions by browser makers)

  • having to explain the reasoning behind using 303 redirects to mainstream web developers simply reinforces the perception that the semantic web is baroque and irrelevant to their needs.

Additionally, three of his concerns related to the difficulties of Web server configuration:

  • its non-trivial to configure a web server to issue the correct redirect and only to do so for the things that are not information resources.

  • the server operator has to decide which resources are information resources and which are not without any precise guidance on how to distinguish the two

  • it cannot be implemented using a static web server setup, i.e. one that serves static RDF documents

The 210 status code proposal would effectively deal with Ian's major issues. Metadata describing a resource could be returned in a single GET if the resource were physical or conceptual (that is, not an information resource). It would be reachable for information resources, although requiring two hops if the URL to the metadata is not known. The URI displayed by a browser would not change. Importantly, the 210 is conceptually much easier to explain.

Support For Existing Web Servers

Web servers, even existing ones at hosting centers, can be easily configured to serve 210 content immediately. At least, via a simple hack. The one we use for 3roundstones.com (Arvixe) allows limited site configuration using cpanel. Cpanel allows Apache handlers to be associated with file extensions in URLs. One of the Apache handlers installed by default with Apache is mod_asis.

mod_asis is used to send a file "as is". A file sent that way can contain HTTP headers separated by a blank line. Using that trick, we might associate a URI (say, http://example.com/toucan.physical) with a metadata description of a physical object. The resource file served when that URL is resolved looks like this (inclusive of the 210 status code!):

Status: 210 Description Found
Date: Mon, 10 Nov 2010 15:07:14 GMT
Content-Type: text/turtle
a <http://dbpedia.org/resource/Toucan> ;

The combination of mod_asis and a file (with a mapped extension) containing custom HTTP headers (including a Status pseudo header) will result in the remainder of the file being served with the designated headers. In this case, that means that we can return 210 status codes from any URL we wish using a stock Web hosting service.

Some might consider the use of file extensions restrictive (or just a PITA), but the Principle of URI Opacity protects us from people like that :)

Other Considerations

It may interest some to note that common Web clients (including human-oriented browsers and command line clients such as curl and wget) do not seem to mind a non-standard 200-series status code. They return the document and the new status code without complaint.

There are some disadvantages to the 210 proposal. Most importantly, this proposal is a change to the very fabric of HTTP and thus the Web. The W3C and IETF would need to standardize the 210 status code, probably in a new IETF RFC. That will take time and effort. Web server operators would have to configure their Web servers to return the correct status code (as described above), at least until Web servers ship with 210 support by default.

Please comment. If we want to build the Semantic Web and the Linked Data community on a designed fabric instead of a series of hacks, the time to start is now. Even now is late, but it is not (yet) impossible.

Wednesday, October 20, 2010

Thursday, October 07, 2010

Call for Chapters: Linking Government Data

I'm working on a new contributed book to be entitled Linking_Government_Data. Please see the Call for Chapters if you have any interest in contributing.

A primary goal of this book is to highlight both costs and benefits to broader society of the publication of raw data to the Web by government agencies. How might the use of government Linked Data by the Fourth Estate of the public press change societies? How can agencies fulfill their missions with less cost? How must intra-agency culture change to allow public presentation of Linked Data?

Starting at Talis

I am very pleased to have accepted a job offer at Talis. I'll be helping to stand up a new U.S. subsidiary for them and will continue to focus my efforts on the evolving Linked Data market, both in relation to government transparency and its use in commercial markets.

Monday, August 02, 2010

New Book in Pre-Production

Linking Enterprise Data is an edited volume contributed by worldwide leaders in Semantic Web research, standards development and early adopters of Semantic Web standards and techniques. Linking enterprise data is the application of World Wide Web architecture principles to real-world information management issues faced by commercial, not-for-profit and government enterprises.

I edited this book for Springer and the publisher has created a Web site for it as it enters production.

Springer seems to think the book won't be out until 2011, but I'm hoping on November because I'll be speaking at a conference then and would like to see it out.

I have been given the rights to put the entire book's content on the Web and plan to do so as Linked Data sometime shortly.

Leaving Zepheira

I have decided to leave Zepheira and seek employment elsewhere. Uche, Eric, Bernadette and I have worked closely together over the last couple of months to arrange a clean transition for me. With my current projects at or near an end, this seemed like a good time. My last official day as an employee of Zepheira was 31 July.

I wish Zepheira well and believe I am leaving at a time when the company is strong and their future looks bright.

The future for me is a bit less certain at the moment, but I'm speaking with a number of good people. More when a decision has been made, probably in late August around my birthday. In the meantime, I've updated my resume and Linked In profile as I make the rounds.

Feel free to contact me or leave a comment if you know of exciting opportunities.

Thursday, July 01, 2010

Introducing the Callimachus Project

Callimachus is a Semantic Web framework for easily building hyperlinked Web applications. Callimachus allows Web authors to quickly and easily create Semantically-enabled Web applications with a minimal knowledge of Semantic Web principles. James Leigh and I have been working on it for a while. A presentation is also available.

Callimachus version 0.1.1 is now available. This release includes
updated documentation and the first sample applications.
Please see the directions in the file SAMPLE-APPS.txt to understand
the sample applications. More are coming soon!

You can acquire this release either by downloading the ZIP archive
from the downloads area or by checking out the v0.1.1 tag:

svn checkout http://callimachus.googlecode.com/svn/tags/0.1.1/

Either way, follow the directions in README.txt to get started.

Have fun and please report your experiences with Callimachus to the discussion list!

Thursday, October 22, 2009

Chinese Units of Measure

Chinese Units of Measure
Originally uploaded by prototypo
I found this antique Chinese ruler in Seoul, ROK, last week. It uses the old Chinese units of length measure, the fēn, cùn and chǐ units.

The tiny fēn is about 3 mm. The cùn is traditionally the width of a person's thumb at the knuckle. The chǐ (or Chinese 'foot') is derived from the length of a human forearm, like a cubit. Or so says Wikipedia.

Those were hard-working people, to have thumbs as wide as a cùn.

The ruler is wooden, with brass inlays marking the units.

Friday, September 04, 2009

Why I will Never Own a Kindle

Fears of Internet security experts everywhere were realized today when Amazon revealed, apparently by accident, that it keeps copies of annotations made on the Kindle ebook reader.

This article at Reuters reported on damage control attempts at Amazon after it (in a delicious piece of irony) deleted copies of George Orwell's 1984 from its Kindles in July. The provider of the ebook version of 1984 apparently did not own the appropriate publication rights. Readers were naturally upset at the sudden disappearance of content from their readers, although of course they forget to read the fine print, didn't they? You can't buy an ebook, you can only rent. Amazon was technically within their rights to delete the content.

That's hardly the full story, though. Amazon was sued by a high school student for having also removed his "copious notes" regarding the deleted novel. The Reuters story linked above showed Amazon's hand when they reported:
Amazon's email on Thursday said that the company would replace
the deleted books along with any annotations made by customers.

That's right, Kindle fans. Amazon has admitted publicly that they, like Orwell's Big Brother, keep copies of any annotations that Kindle users make on the devices. For at least months. Holy cow!

The full text of Amazon's email to affected customers is available at the WSJ.

Perhaps more amazing is that Kindle readers don't particularly seem to care (cf. comments to the WSJ blog post). Kindle notes are synced to an Amazon server and thus available to readers over the Web. That may seem like a feature to some, but not to me. I'll back up my own notes, thanks.

Friday, August 21, 2009

Just Published: 97 Things Every Project Manager Should Know

O'Reilly Media has published 97 Things Every Project Manager Should Know. My colleagues Kathy MacDougall and James Leigh also wrote for this book.

This is a new style of "collective wisdom" books from O'Reilly. An earlier one was aimed at software architects.

I was pleased to see that O'Reilly used one of my quotes at the top of their home page for the book ("Clever Code Is Hard to Maintain...and Maintenance Is Everything").

The tips I wrote for this book were:
  • Clever Code Is Hard To Maintain
  • The 60/60 Rule
  • The Fallacy Of Perfect Execution
  • The Fallacy Of Perfect Knowledge
  • The Fallacy Of The Big Round Ball
  • The Web Points The Way, For Now
Check it out if you do project management. There's some good stuff in there.

Monday, August 17, 2009

The Death of the Copenhagen Interpretation?

Wow. A climate researcher in the UK has had the guts to propose a new geometry for space-time that provides a new way of answering pesky questions in quantum mechanics. This article in Physorg (see also the article's full text) provides a good overview.

I don't know if I still have the math to slog through it, but it looks to be worth the effort.

Called the Invariant Set Postulate, the proposed law offers a geometry of space-time that resolves long-standing difficulties in quantum mechanics, including complementarity, quantum coherence, superposition and wave-particle duality. Quantum description of gravity may even be possible. Wow. That is an amazingly out-of-the-box contribution.

For the faint of heart, here is a key quote: "The Invariant Set Postulate appears to reconcile Einstein’s view that quantum mechanics is incomplete, with the Copenhagen interpretation that the observer plays a vital role in defining the very concept of reality."

Monday, June 15, 2009

OK, OK, I'm back on Twitter

I'll be at the 2009 Semantic Technology Conference this week and will be twittering on @prototypo.

Friday, June 12, 2009

Freemix is Live!

Freemix is live in invitational Beta. Come check it out!

We at Zepheira will officially introduce it to the SemTech crowd next week and to the press on Tuesday.

You can see my profile on http://freemix.it/profiles/dwood/.


Monday, June 01, 2009

Musicians and Coders

Bernadette recently gave me a CD (yeah, really, not an iTunes gift card! She's very quaint.) by Jeremy Pelt, a fantastic jazz trumpeter. In the inside cover of November, he says, "Our greatest responsibility as musicians is to live and grow... then, you might play something hip!"


Friday, May 29, 2009

Announcing Freemix

Zepheira has announced the forthcoming launch of Freemix, a social networking site for data and the people who use it. Freemix will be officially launched at the Semantic Technologies Conference in San Jose, California on June 16, 2009.

Zepheira partners Eric Miller, Uche Ogbuji and myself will brief representatives of the press at 12:00 US Pacific Time in the Fairmont Hotel in San Jose. Zepheira will demonstrate Freemix in a booth on the SemTech exhibit floor.

SemTech conference attendees may also attend a briefing on Freemix on Wednesday, 17 June 2009 from 5:00-6:00 PM US PST.

If you are a spreadsheet user and want to share your data more widely, Freemix is for you. Wouldn't it be nice if your data had friends, too?

Thursday, May 21, 2009

Speaking at SemTech

I will be speaking at the Semantic Technology Conference in San Jose again this year from June 14-18, 2009.

Dan McCreary and I will be giving a three-hour tutorial on entity extraction on the Monday. I'll be presenting a talk on Active PURLs: Stored Procedures for the Semantic Web on the Tuesday. Additionally, it seems likely that I will replace Uche on a panel dubiously entitled Web3-4-Web2, also on the Tuesday.

Speakers have been authorized to share coupons for up to $200 off registration fees. If you would like to get the coupon code, please contact me or leave a comment here by May 29, 2009.

Zepheira is a gold sponsor again this year and we will have a very cool announcement. We are going to officially launch Freemix at the conference. The site is still under authentication, but will be released to the public just before the conference. It should be exciting. If you care are putting real, live, useful, everyday data on the Semantic Web, come see it.

Saturday, May 16, 2009

Playing with Wolfram Alpha

Wolfram Alpha has been launched and is available for the public to try. I sat down to play with it.

FIrstly (using the rare American adverb here - don't be confused), you can't expect Wolfram Alpha to act like Google. It is a new kind of search engine, as one should expect from Stephen Wolfram. Wolfram is famously the inventor of Mathematica and author of A New Kind of Science.

Wolfram Alpha seems to consist of a linguistic interpretation engine coupled to Mathematica and a growing number of databases. Google, on the other hand, is a free-text indexer of Web content. That suggests that while one might be able to type just about any word or phrase into Google that is somewhere on the Web, one must limit Wolfram Alpha queries to concepts that are in its databases or may be treated as mathematical relationships. Indeed, this seems to be the case.

Wolfram's overview video is well worth watching. It, and the example search results available from the home page, give a flavor for the powerful searches one can do with the site.

Following a lead from the video, I tried typing the female name "Bernadette" into the search box. Wolfram Alpha, as advertised, did indeed respond with a presumption that I wanted information about the name and results that included a time distribution plot of popularity. Searching for "Bernadette David" gave me a distribution plot of both names which showed the highest combined popularity did in fact occur around our birth years. Well done, Wolfram Alpha.

Changing the previous search to "Bernadette Peters" resulted in some minor information about the actress and a link to her Wikipedia entry. Wikipedia links are provided where possible, as a transparent but useful attempt to provide flesh to limited source content.

However, more general searches, such as the word "Zepheira", produced no results. Wolfram Alpha responds to null result sets with a message saying "Wolfram|Alpha isn't sure what to do with your input.". That alone makes it clear that Wolfram Alpha and Google are at best complimentary.

Too many users on the site result in a cute message saying "I'm sorry Dave, I'm afraid I can't do that..." - which is only mildly freaky if your name happens to be Dave. The reference naturally comes from the mutiny of the HAL 9000 computer in the film "2001: A Space Odyssey".

Math, science, engineering and finance queries work well, as expected. A Web interface to Mathematica is useful in itself. I suspect that the site will be most effectively used by college students and some working professionals. My mom and dad are unlikely to find it compelling (although my dad is a weather geek and weather data is well represented, so I might be wrong). Still, the lack of detailed weather results such as live RADAR images would more likely lead him to weather.com.

One can do funky and useless math with aplomb. Wolfram Alpha rapidly provided me with the correct interpretation, unit dimensions and unit conversions for the search "100 furlongs per microfortnight", a speed well above that of sound but under that of light.

Minor misspellings were handled effectively (e.g. "area of icosehedron" was correctly interpreted as "area of icosahedron"). Similarly, "volume of icosahedron" resulted in a correct interpretation. I expected the search "distance to a star" to fail miserably, but the answer was surprisingly useful. Try it yourself to see what I mean.

The problem with this kind of interface is that interpretations of intent are notoriously hard, if not impossible, in the general case. How can Wolfram Alpha expect to know that when I typed "birth year of gandhi" that I meant Mahatma Gandhi? What if I meant Indira Gandhi? Guessing is fine as far as it goes, but most search engines chose to give up that approach a decade ago in favor of appendation of search results.

The interface style is also naturally limited by its underlying data. Searching for "the size of the World Wide Web" resulted in a suggested to try "the size of the world wide" - which it could answer as the diameter of Earth.

I wonder how many people recall that Yahoo used to allow mathematical equations in their search engine? They seem to have removed the functionality. One can only presume that they got in the way of becoming a more general Internet search engine. I suspect there is a lesson there for Wolfram Research. Will Wolfram Alpha stay aimed at specialists or will they grow into a more general tool? Time will tell. Their promise to integrate more databases does not promise to address the inherent limitations of guessing linguistic intent.

In summary, Wolfram Alpha is an expert-friendly search system for specialists and is best used as an orthogonal complement to Google and other general search engines. Its approach is pure Wolfram - unashamedly different and unapologetically ignoring of lessons learned by others.

Sunday, April 26, 2009

Finally, Some Clues on the Domestication of Rice

The subtitle to this blog promises posts about "the origins of agriculture", although the field is so slow moving that I have not posted on the topic in three years and have not reported meaningful research here since speculating on religion as a driver.

Fortunately, others are doing active research on agricultural origins even if I am not. Dr. Dorian Fuller of the Institute of Archaeology at University College London has cracked a very special nut, indeed. He and his team have located substantial evidence of the location and timing of rice domestication in the Lower Yangtze region of Zhejiang, China.

Dr. Fuller and his colleagues discovered a location where the local diet shifted dramatically from a hunter-gatherer lifestyle to an agricultural one over a mere three hundred years. That alone is fascinating and an important discovery. Equally interesting was the dating of the shift, from 6900 to 6600 years ago. That places rice domestication in a timeframe fully two thousand years later than thought and lends serious support to diffusion theories (versus parallel development).

The process used by Fuller collected mixtures of midden material from the site, and painstakingly separated wild rice remains from domesticated rice remains. Specifically, they looked at spikelets, the place where rice seeds attach to stalks. Like other domesticated plants, rice underwent a genetic shift to retain the seeds for harvest by humans by a process of artificial selection. The shape of the spikelets is sufficiently different as to be distinguishable.

There is a nice scanning electron microscope image of a wild rice spikelet base at the Agricultural Biodiversity Weblog.

The last I heard, Londo's investigation1 was still suggesting multiple independent origins of rice in Southeast Asia and lower China. Hopefully Fuller's paper2 will put that to rest. Londo at least admitted that his team wasn't certain.

Wikipedia's entry on rice says, "Rice has been cultivated in Asia likely over 10,000 years." It is clearly time to correct that entry and, more broadly, correct the education of literally billions of people who are taught it. I really need to get back to work on my Origins of Agriculture summary and update it with these findings.

[1] Londo, J.P., Chiang, Y-C, Hung, K-H, Chiang, T-U and Schaal, B.A. (2006). "Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa". PNAS, http://www.pnas.org/content/103/25/9578.long

[2] Fuller, D.Q., Qin, L., Zheng, Y., Zhao, Z., Chen, X., Hosoya, L.A. and Sun, G-P (2009, March 20). The Domestication Process and Domestication Rate in Rice: Spikelet Bases from the Lower Yangtze, Science 20 March 2009, 323/5921, pp. 1607-1610, http://www.sciencemag.org/cgi/content/abstract/323/5921/1607