Musings on books, the near future, the process of writing, the Semantic Web, the origins of agriculture, evolutionary meme theories, the venture capital process and the occasional political rant; not necessarily in that order.
See my books at http://hyland-wood.org.
Sunday, October 19, 2014
Book Review: On Intelligence by Jeff Hawkins
On Intelligence [Amazon, Goodreads] purports to explain human intelligence and point the way to a new approach toward artificial intelligence. It partially succeeds on the former and knocks it out of the park on the latter.
This is only book that Jeff Hawkins has written. Silicon Valley insiders may remember Hawkins as the creator of the PalmPilot back in the 1990s and, when the owners restricted his vision, he left to create Handspring. Both companies made a lot of money, which is all that matters on the Sand Hill Road side of Silicon Valley. The tech side of the Valley cares more about the fact that Hawkins succeeded in the handheld computing market where the legendary Steve Jobs had failed (with the Newton).
Hawkins' journalist co-author Sandra Blakeslee, on the other hand, has an Amazon author page that scrolls and scrolls. She has co-authored ten books, several of which have related to the mind, consciousness and intelligence. Her most recent book, Sleights of Mind: What the Neuroscience of Magic Reveals About Our Everyday Deceptions, was published as recently as 2011 with neuroscientists Stephen L. Macknik and Susana Martinez-Conde and was an international best seller. She has seemingly made a career out of helping scientists effectively communicate thought-provoking ideas.
Hawkins focuses all of his attention on uncovering the algorithm implemented by the human neocortex. Where that is impossible due to lack of agreement or basic science, he makes some (hopefully) reasonable assumptions and proceeds without slowing down. That will strike most neuroscientists as inexcusable. It makes perfect sense to an engineer.
Albert Einstein once said, "Scientists investigate that which already is; Engineers create that which has never been." Or, to quote myself, scientists look at the world and ask, "How does this work?". Engineers look at the world and say, "This sucks! How can we make it better?" There is a fundamental difference in philosophy required of scientists and engineers. Hawkins proved himself to be an engineer through and through even when he bends over backward when attempting to do some science.
There is a particularly useful review on Goodreads that drives a crowbar though the core of the book as if it were the left frontal lobe of Phineas Gage. The reviewer who goes solely by the name of Chrissy rightly points out Hawkins' overfocus on the neocortex.
It became clear that Hawkins was so fixated on the neocortex that he was willing to push aside contradictory evidence from subcortical structures to make his theory fit. I've seen this before, from neuroscientists who fall in love with a given brain region and begin seeing it as the root of all behaviour, increasingly neglecting the quite patent reality of an immensely distributed system.
Chrissy is correct. Hawkins' work is nevertheless critically important. Although the cortex is without doubt only part of the brain and only part of the "seat" of consciousness, his work to define a working theory of the "cortical learning algorithm" has lead directly to a new branch of machine learning. It is one that has borne substantial fruit since the book's 2004 debut.
It shouldn't surprise anyone that Hawkins' reviewers confuse science and engineering. Professionals are often confused on the separation themselves. Any such categorization is arbitrary and people have the flexibility to change their perspective, and thus their intent, on demand. To make matters worse, computer science is neither about computers nor science. It is the Holy Roman Empire of the engineering professions. Computer science involves the creation and implementation of highly and increasingly abstract algorithms to solve highly and increasingly abstract problems of information manipulation. It is certainly different from computer engineering, which actually does involve building computers, and it is also generally different from its subfield software engineering. Of course reporters and even scientists get confused.
Writing On Intelligence has not made Hawkins into a neuroscientist. That does not seem to have been his goal. Hawkins goal was to build a more intelligent computer program - one that "thinks" more like a human thinks. His explorations of the human brain have had that goal constantly in mind.
Hawkins himself states his goal differently, but I stand by my interpretation. Why? Consider what he says (pp. 90):
What has been lacking is putting these disparate bits and pieces into a coherent theoretical framework. This, I argue, has not been done before, and it is the goal of this book.
That makes him sound like a scientist. But he went on to do exactly what I claim. He described a framework and then implemented it as a computer program. That's engineering.
It seems almost strange that it took fully five years from the book's publication for Hawkins' group at the Redwood Neuroscience Institute (now called the Redwood Center for Theoretical Neuroscience at UC Berkeley) to publish a more technical white paper detailing the so-called cortical learning algorithm (CLA) described in the book. The white paper provides sufficient detail to create a computer program that works the way that Hawkins understands the human neocortex to work. Again surprisingly, another four years passed before an implementation of that algorithm became available for download by anyone interested. The Internet generally works faster than that when a good idea comes along. The only reasonable explanation is that a fairly small team has been working on it.
You can, since early 2013, download an implementation of the CLA yourself and run it on your own computer to solve problems that you give it. Programmers normally love this sort of thing. It is interesting to note that the Google self-driving car uses exactly the traditional artificial intelligence techniques that Hawkins denigrates in his first chapter. Hawkins may have come too late for easy acceptance of his ideas. There are entrenched interests in AI research and Moore's Law ensures that they can still find success with their existing approaches. A specialist might note that the machine learning algorithms in the Google car have stretched traditional neural networking well beyond its initial boundaries and toward many of the aspects described by Hawkins, without ever quite buying into his approach.
The implementation is called the Numenta Platform for Intelligent Computing (NuPIC). It is dual licensed under a commercial license and the GNU GPL v3 Open Source license. That means that you can use it for free or they will help you if you want to pay. You can choose.
Hawkins lists and briefs brief critiques for the major branches of artificial intelligence, specifically expert systems, neural networks, auto-associative memories and Bayesian networks. He is right to criticize all of them for not having looked more carefully at the brain's physical structure before jumping to simple algorithmic approaches. The closest of the lot is perhaps neural networks, which is notionally based on composing collections of software-implemented "neurons". These artificial neurons are rather gross simplifications of biological neurons and the networks, with their three-tier structure, are poor substitutes for the complex relationships known to exist in the brain of even the most primitive animals. Still, the timing of Hawkins book was unfortunate in that its publication occurred at the beginning of our current golden age of neuroscience. AI is back and AI research is suddenly well funded again. So-called deep learning networks currently contain many more than the three traditional layers, up to eight or even more. IBM has recently moved neural networks to hardware with their announcement of their SyNAPSE chip that "has one million neurons and 256 million synapses" implemented in silicon. All approaches are currently blooming for AI and are being applied to everything from voice and facial recognition to automatically filling spreadsheet cells to autonomous robots. There is currently less reason for the AI community to investigate, or lobby for hardware implementing, a brand new general approach. None of that makes Hawkins wrong. The human brain is still the only conscious system we know of and neuroscience is still doing a bad job of looking at its structures from the top down.
The largest single criticism of On Intelligence from me is that the cortex Hawkins describes is a blank slate, also called a tabula rasa. We know that the human brain is not. The idea that a mind is empty until filled solely by experience dates back at least to Aristotle. The Persian philosopher Ibn-Sīnā, popularly called Avicenna in Europe - a name still taught in Western universities, coined the term tabula rasa a thousand years ago as he interpreted and translated Aristotle's de Anima. We have known for decades that we are born with a number of innate functions, such as facial perception, so the brain is not a blank slate. Other animals have their own innate behavior such as the fear that many bird species have for the shape of a hawk. Hawkins does address the changing nature of brain function during life but does not even peripherally describe how innate functions fit into his theory.
Hawkins is often criticized for failing to provide a collated list of his assumptions. They are indeed buried in the prose. Hawkins comes right after the book's last chapter by providing an appendix that lists eleven predictions. They are all testable given the right science. Scientists are explicitly asked to validate or repudiate those predictions. A decade later, I am not aware of a comprehensive attempt to do so.
I have attempted to find all of Hawkins presumptions and have listed them here in the hope that they will both help other reviewers and neuroscientists who might pick away at them. All page numbers are from the 2004 St. Martin's Griffen paperback edition. All indications of emphasis are in the original text unless otherwise marked. The assumptions generally flow from the highest level of abstraction to the lowest, as Hawkins mostly does.
1. "We can assume that the human neocortex has a similar hierarchy [to a monkey cortex]" pp. 45. This one not only seems reasonable but is an assumption held by many scientists. It is in line with the many independent threads of evidence from evolutionary theory. Hawkins was intentionally careful when he used the word "similar".
2. "We don't even have to assume the cortex knows the difference between sensation and behavior, to the cortex they are both just patterns." pp. 100. This is actually a negative assumption in that he is not making one. This kind of thinking, determining what assumptions are necessary to a system, is in keeping with Hawkins' coding background. It is an engineering necessity.
3. "Prediction is not just one of the things your brain does. It is the primary function of the neocortex, and the foundation of intelligence." pp. 89. This is Hawkins' central idea and the one that informs not only the book and the implementation of NuPIC but the philosophic approach to his understanding of the brain and its functions. Hawkins relates the traditional AI approach of artificial auto-associative memories and declares, "We call this chain of memories thought, and although its path is not deterministic, we are not fully in control of it either." pp. 75. He proposes that "the brain uses circuits similar to an auto-associative memory to [recall memories]" pp. 31.
Here is also where Hawkins is forced to leave the cortex and venture into its relationships with another area of the brain. He notes the large number of connections between the cortex and the thalamus and the delay inherent in passing signals that way. He declares that the cortex-thalamus circuit is "exactly like the delayed feedback that lets auto-associative memory models learn sequences." pp. 146. He is onto something here, but one must question his oversimplification. The thalamus is also known to be involved in the regulation of sleep and thus almost assuredly implements more than just a delayed communication loop with the cortex.
Eventually he is able to bring his prediction model into sharp focus: "If the cortex saw your arm moving without the corresponding motor command, you would be surprised. The simplest way to interpret this would be to assume your brain first moves the arm and then predicts what it will see. I believe this is wrong. Instead I believe the cortex predicts seeing the arm, and this prediction is what causes the motor commands to make the prediction come true. You think first, which causes you to act to make your thoughts come true." pp. 102. This focus on the predictive nature of the neocortex is key to Hawkins understanding. Either the neocortex implements an algorithm really quite similar to the CLA as described by Hawkins and is therefore a "memory-prediction framework" or he has got it wrong. The predictive abilities of NuPIC suggest that he is on the right track in spite of his many assumptions.
4. Hawkins makes two interesting and useful assumptions for the purposes of developing a top down theory: "For now, let’s assume that a typical cortical area is the size of a small coin" pp. 138 (he does acknowledge there is substantial variation), and "I believe that a column is the basic unit of prediction" pp. 141. Why does it matter to Hawkins how large a cortical area is, much less a typical one? It shouldn't matter to a typical neuroscientist. They take the anatomy the way they find it. Remember though that Hawkins' purpose is to build a more intelligent computer program. He betrays his intent in making assumptions that all cortical regions have fundamentally the same structure (in spite of minor variations that he readily admits are in the literature) and in setting a typical size for an area of cortex. These assumptions will help him to design a computer program that learns in a new way. He is on better footing with the purpose of a cortical column. Cortical columns are indeed very regular in their construction and distribution, a fact that Hawkins dug out of 1970s research and relies upon heavily. It is striking and probably key to any successful high-level theory.
From this point forward Hawkins' assumptions get progressively more technical as he moves toward something that he can implement using existing technology. This may be the most important criticism of On Intelligence even though I personally find it perfectly excusable. Those seeking new neuroscience will be disappointed. Those seeking new and more general ways to approach artificial intelligence will be rapt.
Any review attempting to list Hawkins' more technical assumptions will need to pause to introduce new vocabulary for the general reader. A cortex, animal or human, is the outer layer of the brain. It consists of valleys and folds in order to increase its surface area in the small space afforded it in the skull. Its basic structure is a "cortical column" of six layers. The human brain has "some 100,000 neurons to a single cortical column and perhaps as many as 2 million columns." The Blue Brain Project of the Brain and Mind Institute of the École Polytechnique in Lausanne, Switzerland is currently attempting to model a complete brain, or at least the cortex. They have already succeeded in modeling a rat's cortical column. This is much more than Hawkins attempted, but a top-level theory of cortical function has yet to emerge from the project.
The six layers of a cortical column have many connections to other layers, other columns, other regions of the cortex and other areas of the brain. It is a complex network. Each layer consists of differently shaped cells. Hawkins collected the many, many neurons in a cortical column into functions at each layer. That alone may be a very valuable contribution if it is shown that level of abstraction can be made without sacrificing higher level function.
It will be useful and fascinating to see what emerges from a study of the Blue Brain Project's cortical column models. In the meantime, Hawkins has provided us with a roadmap of questions to ask.
5. Noting the obvious disparity between streams of sensory inputs and highly abstract thought, Hawkins illustrates how a hierarchical set of relationships between cortical areas could produce abstractions ("invariant representations") at the higher levels. "The transformation—from fast changing to slow changing and from spatially specific to spatially invariant—is well documented for vision. And although there is a smaller body of evidence to prove it, many neuroscientists believe you’d find the same thing happening in all the sensory areas of your cortex, not just in vision." pp. 114. Hawkins goes on to take this as written, which is just what he needs to do in the absence of established science in order to build a system.
6. Continuing with the vision system, possibly the best studied areas of the brain to date, Hawkins discusses some of the key regions called by neuroscientists V1, V2 and so on. He says, "I have come to believe that V1, V2, and V4 should not be viewed as single cortical regions. Rather, each is a collection of many smaller subregions." pp. 122. Hawkins is making a rather classic reductionist argument here. The question is not how arbitrary regions are defined or what they are called. The problem in front of our engineer is how they are connected. He needs that information to make reasonable (not necessarily physiologically accurate) assumptions if he is to uncover the mechanisms of the brain's learning system.
7. A region of cortex, says Hawkins, "has classified its input as activity in a set of columns." pp. 148. It is hard to argue with this suggestion given the success of Hawkins' artificial CLA in making predictions without the traditional training necessary to other forms of AI. Further, the cortex gets around limits on variation handling found in early artificial auto-associative memories, "partly by stacking auto-associative memories in a hierarchy and partly by using a sophisticated columnar architecture." pp. 164.
8. There are several assumptions about the detailed workings of a cortical column. "Let's also assume that one class of cells, called layer 2 cells, learns to stay on during learning sequences", says Hawkins (pp. 152). He makes no judgement whether that "learning" is innate or actively learned during life. He doesn't even know that it is really there. Something like it must be in order to make his theory work. That is no criticism! It is instead a testable hypothesis and thus the very model of scientific advancement. It also allows him to build something.
"Next, let’s assume there is another class of cells, layer 3b cells, which don’t fire when our column successfully predicts its input but do fire when it doesn’t predict its activity. A layer 3b cell represents an unexpected pattern. It fires when a column becomes active unexpectedly. It will fire every time a column becomes active prior to any learning. But as a column learns to predict its activity, the layer 3b cell becomes quiet." pp. 152. This might seem unjustified. What would make Hawkins jump to a conclusion in the apparently complete absence of supportive science. The answer is that the engineer clearly sees the necessity of feedback when it is presented to him. There simply must be a mechanism that fills the role or no learning could occur. Hawkins merely suggests a reasonable place for it and encourages the neuroscience community to look for it.
As for the lowest level, layer 6: "cells in layer 6 are where precise prediction occurs." pp. 201.
9. Finally, Hawkins rightly notes some differences between biological neurons and the artificial neurons used in neural networking models. It makes one wonder what IBM implemented on their SyNAPSE chip. How biologically correct were they? Hawkins says, "neurons behave differently from the way they do in the classic model. In fact, in recent years there has been a growing group of scientists who have proposed that synapses on distant, thin dendrites can play an active and highly specific role in cell firing. In these models, these distant synapses behave differently from synapses on thicker dendrites near the cell body. For example, if there were two synapses very close to each other on a thin dendrite, they would act as a 'coincidence detector.' That is, if both synapses received an input spike within a small window of time, they could exert a large effect on the cell even though they are far from the cell body. They could cause the cell body to generate a spike." pp. 163. This is exactly the sort of thing that can have great biologic effect and cause great trouble for overly simplistic implementors. It would seem that Hawkins was careful to avoid this over simplification even while embracing others.
Hawkins has also uncovered something really quite important and almost painfully subtle. Philosophers of mind, psychologists and priests have for centuries argued that the mind is fundamentally different from the body. We moderns have become comfortable with considering huge swaths of the body as mechanistic in nature. We can replace an arm, a leg, a kidney, even a heart for a while. We can insert a pacemaker, or a hearing aide. Surgery can cut, sew and sometimes almost magically repair, replace or augment much of our bodily infrastructure. We tend to view the body as a mechanism, however complicated, as a natural result. The brain, though, the mind, is a different matter. All the neuroscience conducted to date fails to convince most of us that the brain implements an algorithm. We cannot, so it is said, be reduced to an algorithm because that would imply that we could - one day - make a machine with all the abilities of people. Perhaps it would need to have all the rights, too. That scares people badly.
Parts of the brain have come to be accepted as algorithmic. Are you aware that a computerized cerebellum has been created for a rat? That was in 2011. Scientists and engineers are starting to soberly discuss creating such a device for paralyzed human beings.
The slow, painfully slow, admission that the body is a series of devices each of which chemically implement algorithms has been a long time coming. Parts of the brain have now unarguably fallen to the algorithmic worldview. First the ears, the eyes, the entire vision system. The cerebellum. The pineal gland. Hormonal balances. Most of the pons. Hawkins takes on the neocortex and, in spite of Chrissy's complaint, he did find it necessary to include the thalamus in his model. The bottom line is that the cortical learning algorithm is an algorithm. Philosophers of mind fear such a finding.
The idea that thinking is a form of computation dates from 1961 when Hilary Putnam first expressed it publicly. It has become known as the Computational Theory of Mind or CTM. Although CTM has its detractors (especially John Searle's Chinese Room, although that has been debunked to my personal satisfaction) it has become the basis for current thinking in evolutionary and cognitive psychology. The so called new synthesis of CTM is roughly a combination of the ideas of Charles Darwin's evolution, mathematician Alan Turing's universal computation and limits to computability proofs and linguist Noam Chomsky's rationalist epistemology. The basic idea is still the same, that human thought in human brains are algorithms even if they are quite complex ones that we haven't fully deconstructed. The new synthesis is about proving that theory.
"The dissociation between mind and matter in men and machines is very striking", observed David Berlinski in his book The Advent of the Algorithm[Amazon, Goodreads], "it suggests that almost any stable and reliable organization of material objects can execute an algorithm and so come to command some form of intelligence."
We know what to do with algorithms. We implement them. It doesn't really matter how. We can implement algorithms in computer software or by creating DNA from a vat of chemicals or by lining up sticks and stones in clever ways. The only difference is the efficiency of the implemented algorithm. Electronic computers give us a way to perform calculations - implement algorithms - blindingly fast but they aren't the fastest way to implement all algorithms. Optical computers can do some things faster. Bodily chemistry, too. Or quantum computing. Each is just another way to implement algorithms be they designed by people or discovered by the search algorithm that we call evolution.
Discovering that the brain is algorithmic is arguably the most important realization of this or any other century. It means we can make more by any means we choose. That will shatter many world views even if Hawkins only got us part way there.