Friday, October 29, 2004

Why Information Technology hasn't Transformed Life Sciences

Bob Robbins from Fred Hutchinson Cancer Research has a nice theory on
why information technology hasn't transformed life sciences research
yet:

UPS had an annual budget of US$20 billion (five year old numbers) and
spent about $1 billion per year on IT. That is about 5%. The National
Institutes of Health have a combined annual budget of $26 billion and
the National Cancer Institute part of NIH has an annual budget of $8
billion. However they are only spending $20 million in the first year
of caBIG. That is one quarter of a percent on IT for the largest life
sciences effort.

If the UPS model of a centralized, coordinated and efficient
distribution system is a reasonable basis for a cost model for a
distributed, uncoordinated and inefficient life sciences data sharing
environment, then information technology for life sciences research is
seriously underfunded. The reality is probably much worse.

So, Bob says, information technology has not transformed life sciences
research because IT spending by research organizations is underfunded
by one to two orders of magnitude.

Thursday, October 28, 2004

Day 2 at the SW/LS Workshop

Day two of the W3C Semantic Web in Life Sciences Workshop (agenda).

I watched the full lunar eclipse last night. We had a perfect view of it from Providence, Rhode Island where my sister-in-law lives. One can just imagine how scary that kind of event must have been to primitive agricultural societies with celestially-oriented mythologies. The moon turned a beautiful blood red at passage. The goddess certainly appeared angry, but I doubt the harvest will fail this year.

The keynote this morning was given by Ken Buetow (Director, National Cancer Institute Center for Bioinformatics) on the Cancer Bioinformatics Grid (caBIG). This is a huge project funded by the National Cancer Institute (NCI) to the tune of US$20M for the first year and $30M for the second year. This is a pilot study and will attract further funding if successful. caBIG is probably the best-funded semantic project on the planet at the moment.

Sean Martin (IBM) said, "We haven't even begun using I.T. in the biology research community yet." Scary as it may sound, it collaborates the comments from Ted Slator about Excel and Powerpoint yesterday. IBM's paper on their Semantic Layered Research Platform is here. Their system purports to do a lot of things Tucana does currently, but it was described at a prototype, with no plans to release it for the moment. It will be interesting to keep an eye on it.

There has been a lot of discussion today on Life Science Identifiers (LSIDs). The community does not seem to grasp the difference between URIs and URNs. LSIDs are URNs (and therefore URIs), but some people seem to want them to be URLs (inclusive of location information).



Wednesday, October 27, 2004

Esoterica from the W3C Semantic Web in Life Sciences Workshop

Slides from my talk with Ben Lund from Nature Publishing on the integration of the Urchin RSS aggregator and Kowari are here. The full paper, with Taowei David Wang and Kendall Clark from the University of Maryland is here.

It was great to see that the University of Colorado at Denver Health Sciences Center (UCHC) Center for Computational Pharmacology is using Kowari, according to Ian Wilson. His paper was Exploring Semantic Web Infrastructure for Life Science Knowledge-bases.

Life Sciences Industry Perspectives on the Semantic Web

Still at the W3C Semantic Web in Life Sciences Workshop (agenda).

The most consistent message from the pharma industry has been that the data management and discovery problems they face are complex. Otto Rittter from Astrazeneca said, "Drug discovery is a complex, costly, risky, information-driven enterprise". Not a bad quote, but it doesn't make you feel the truth like Ted Slator's comments about Pfizer. Pfizer is the industry's largest R&D organization. They have 12,500 employees and plan to spend US$7.9 billion on R&D alone in 2004. They have hundreds of ongoing R&D efforts in 18 theapuetic areas. Now, take that money-to-person ratio and combine it with the fact that the current state of knowledge management is driven by M$ Excel and Powerpoint. That is, data is collected in Excel and shown to other reseachers solely (in most cases) via Powerpoint. Wow.

According to Eric Neumann (Global Head of Knowledge Management, Aventis Pharmaceutical), the primary concerns when developing drugs are safety, efficacy (will it do what it is supposed to do), cost effectiveness and timeliness. It strikes me that the same list could be applied to software engineering projects; they are simply statements of economics. However, software projects that violate the rules are often fielded, anyway.

A fundamental problem of the application of semantic techniques to the life sciences industry is that basic terms are not well defined. Even simple terms like "protein" and "gene" are the subject of much argument. This would definitely hamper the development of ontologies. Still, many people are doing it in the best spirit of just getting on with things.

Two subjects of discussion in the Semantic Web Best Practices Working Group have been highlighted here: Provenance and transitive relationships. Biology is complex, and the statement that a gene encodes a protein may only be true within a certain context (including species of the genome, the gene sequence used, version info, etc). That makes transitive relationships suspect, and infers (there's that word again) that they should be made only when context is very clear. Even simple in silico experiments suffer from a lack of software version capture, as well as operating environments. That situation gets worse when biological experiments fail to encode full provenance.

The industry currently has no information supply chain data exchange standards, nor are they likely to come soon. Trust issues and funding sources ensure that data is simply not shared. This could result in semantic techniques being applied solely within companies in the short and medium terms. I would love to see some of the pharmas get together to define common ontologies, though, even if the instance data is purely internal.

Overall, the industry is drowning in data and an inability to get their hands around it. Ted Slator (Pfizer) says, "Our domain is too big to fit in our heads", and yet data integration is generally being attempted that way. It is no wonder that this workshop attracted such a large attendence.

TBL Keynote at W3C Semantic Web in Life Sciences Workshop

Tim Berners-Lee presented the keynote at the W3C Semantic Web in Life Sciences Workshop, held in Cambridge, MA. The workshop agenda is here. Tim's slides are here.

True to form, Sir Tim ensured that everyone present understood that they should all use URIs for everything. I wonder what his sock drawer looks like. His example was to define a URI for a concept like "colour". He also railed against software patents (who doesn't??) and pushed SVG as a standards-based replacement for PDF.

The problem with using SVG is, of course, that Adobe Acrobat is free and works, while SVG still relies on (often poorly implemented) browser plug-ins. Similarly, the Semantic Web suffers from a lack of applications which absorb and produce RDF. Only time and effort can change that. At least we are seeing progress in academia, the open source community and the commercial market.

Tim was asked where to find a browser for the Semantic Web (again). The difference this year is that there is starting to be a good answer to that question. Have a look at (Longwell) and (Haystack. He also pointed to ( Ontaria, which is more of a directory than a browser. Haystack is slow, but quite cool. I haven't downloaded Longwell, yet. Ontaria needs input.

One of the life sciences guys (Bob Robbins from the Fred Hutchinson Cancer Center) asked Tim about using semantic techniques for both descriptive data ("What is this plant in my backyard?") versus active research ("What are the opinions in the community about how this plant may be used?"). I don't think his answer was great (summary: This is an exciting area of research), but there is a good answer. Jim Hendler likes to make the point that the difference between OWL and traditional AI data descriptions is that OWL allows (indeed, encourages) differences of opinion. The Web Ontology Working Group admitted this user requirement up front, which is anathema to traditional AI. Add Jen Golbeck's work on trust descriptions and I think we are getting there.

Friday, October 15, 2004

Endangered Species

More than one-third of the world's amphibians are threatened in a new round of mass extinctions, according to a new study. Amphibians, such as frogs and snakes, may reflect greater environmental damage than previously expected. Their porous skins make them vulnerable to environmental changes, including pollution. That makes them better than canaries at telling us something. Anyone listening?

At the same time, global trade is causing the spread of many super-species from their own hostile environments to the cosy corners of the world. Snakeheads have been found in the Great Lakes for the first time, indicating a failure to contain the Chinese fish to the Southeastern US. Fire ants are threatening my house in Australia. They came on a ship from South America.

I discovered today that I am an endangered species, too. This article in USA Today (I found it on /.), notes that US programmers are going the way of the dodo, err, amphibians. In the best quip from Slashdot, scientists will be forced to "set up reserves with massive attempts to create offspring". Heh. If only it were that simple. We're from the government and we're here to help you...

The failure of governments to address any of these problems comes down to one phenomenon: short-term thinking. They are reacting to events one at a time, failing to see long-term trends and putting the economy first. What do they think is going to happen to the economy when the environment has crashed?

The funny thing is, it is not their fault. Really. Memetics would expect governments and most individuals to fail to react to a long-term crisis. Human history is littered with examples, from the collapse of grain production in Libya to the Cuban Missile Crisis. Evolution has created species that are over-specialized and hence vulnerable, just as it rewards good short-term planning (until an extinction event).

The only way to create a governmental policy to effectively deal with long-term issues is to deal with these very human failings up front. Much as we institutionalize punishment for murder so that personal revenge and feuds are avoided, we must institutionalize the response to environmental degradation and job migration. Otherwise, short-term incentives will rule. Shell Oill will leave a mess in Africa, IBM will outsource to India and frogs will die.

Thursday, October 14, 2004

Selling Your Soul

When we shifted business models from software services to software products, we had to sell our soul to the venture capital community. It was always going to be different, but I didn't realize that we would refer to our investors as "the syndicate", our customers as "users" and periodically want to "shoot someone in the head". Every time we have a board meeting it is like spending the day in an episode of The Sopranos.

Wednesday, October 13, 2004

Apple Powerbooks and External Displays

I love my 17" Powerbook. When at home, I connect it to a 23" cinema display. Unfortunately, when connecting and unconnecting the external display from a suspended state, the Powerbook would occasionally appear to freeze. I believe that the state of the display hardware was confused. The machine was up, but simply had no active display. This problem can be solved by a change in usage pattern: I now ensure that the machine is awake before disconnecting the external display. One does not have to be logged in; simply pressing a key to show the login box is sufficient (if you require a password after suspension). Oddly, it appears that you may connect an external display without worrying about the suspended state.

The State of Health Care in the US

I, a medically retired veteran of the US Navy, am watching the US presidential debate and just heard President Bush say that veterans have excellent health care. I am counting to ten. One, two, three, ARRRRGGGGHHHHHH! What an asshole. That man should spend some time in VA hospitals.