Thanks to John Schwenkler for the invitation to guest-blog this week about my new book Surfing Uncertainty: Prediction, Action, and the Embodied Mind (Oxford University Press NY, 2016).
From Rag-bags to (Unified) Riches?
Is the human brain just a rag-bag of different tricks and stratagems, slowly accumulated over evolutionary time? For many years, I thought the answer to this question was most probably ‘yes’. Sure, brains were fantastic organs for adaptive success. But the idea that there might be just a few core principles whose operation lay at the heart of much neural processing was not one that had made it on to my personal hit-list. Seminal work on Artificial Neural Networks had, of course, opened many theoretical and practical doors. But the cumulative upshot was not (and is not) a unifying vision of the brain so much as a plethora of cool engineering solutions to specific problems and puzzles.
Meantime, the sciences of the mind (and especially robotics) have been looking increasingly outwards, making huge strides in understanding how bodily form, action, and the canny use of environmental structures were co-operating with neural processes. That was a step in a very promising direction. But without a satisfying picture of the role of the biological brain, ‘embodied cognition’ was (I fear) never going to look very much like a systematic, principled science.
Ever the optimist, I think we may now be glimpsing the shape of just such a science. It will be a science that will take many cues from an emerging vision of the brain as a multi-layer probabilistic prediction machine. In these posts, I want to run over some of the core territory that makes this vision such a good fit – or so I claim – with the agenda of embodied cognitive science, take a look at some far horizons concerning conscious experience, and review some potential trouble-spots.
First though, a mega-rapid recap of the basic story, and then a worry about how best to describe it.
The Strange Architecture of Predictive Processing
In a 2012 paper the AI pioneer Patrick Winston wrote about the puzzling architecture of the brain – an architecture in which “Everything is all mixed up, with information flowing bottom to top and top to bottom and sideways too.” Adding that “ It is a strange architecture about which we are nearly clueless”.
It is a strange architecture indeed. But that state of clueless-ness is mostly past. A wide variety of work – now spanning neuroscience, psychology, robotics and artificial intelligence – is converging on the idea that one key role of that downward-flowing influence is to enable higher-levels to attempt (level-by-level, and as part of a multi-area cascade) to try to predict lower-level activity and response. That predictive cascade leads all the way to the sensory peripheries, so that the guiding task becomes the ongoing prediction of our own evolving flows of sensory stimulation. The idea that the brain is (at least in part, and at least sometimes) acting as some form of prediction engine has a long history, stretching from early work on perception all the way to recent work in ‘deep learning’.
In Surfing Uncertainty I focus on one promising subset of such work: the emerging family of approaches that I call “predictive processing”. ‘Predictive processing’ plausibly represents the last and most radical step in the long retreat from a passive, feed-forward, input-dominated view of the flow of neural processing. According to this emerging class of models biological brains are constantly active, trying to predict the streams of sensory stimulation before they arrive. Systems like that are most strongly impacted by sensed deviations from their predicted states. It is these deviations from predicted states (prediction errors) that now bear much of the information-processing burden, informing us of what is salient and newsworthy within the dense sensory barrage. When you see that steaming coffee-cup on the desk in front of you, your perceptual experience reflects the multi-level neural guess that best reduces visual prediction errors. To visually perceive the scene, if this story is on track, your brain attempts to predict the scene, allowing the ensuing error (mismatch) signals to refine its guessing until a kind of equilibrium is achieved.
Learning in Bootstrap Heaven, and Other Benefits
To appreciate the benefits, first consider learning. Suppose you want to predict the next word in a sentence. You would be helped by a knowledge of grammar. But one way to learn a surprising amount of grammar, as work on large-corpus machine learning clearly demonstrates, is to try repeatedly to predict the next work in a sentence, adjusting your future responses in the light of past patterns. You can thus use the prediction task to bootstrap your way to the world-knowledge that you can later use to perform apt prediction.
Importantly, multi-level prediction machinery then delivers a multi-scale grip on the worldly sources of structure in the sensory signal. In such architectures, higher levels learn to specialize in predicting events and states of affairs that are – in an intuitive sense – built up from the kinds of features and properties (such as lines, shapes, and edges) targeted by lower levels. But all that lower-level response is now modulated, moment-by-moment, by top-down predictions. This helps make sense of recent work showing that top-down effects (expectation and context) impact processing even in early visual processing areas such as V1. Recent work in cognitive neuroscience has begun to suggest some of the detailed ways in which biological brains might implement such multi-level prediction machines.
Perception of this stripe involves a kind of understanding too. To perceive the hockey game, using multi-level prediction machinery, is already to be able to predict distinctive patterns as the play unfolds. The more you know about the game and the teams, the better those predictions will be. Perception here phases seamlessly into understanding. What we quite literally see, as we watch a game, is constantly informed and structured by what we know and what we are thus already busy (consciously and non-consciously) expecting.
This, as has recently been pointed out in a New York Times piece by Lisa Feldman Barrett, has real social and political implications. You might really seem to start to see your beloved but recently deceased pet enter the room, when the curtain moves in just the right way. The police officer might likewise really seem to start to see the outline of a gun in the hands of the unarmed, cellphone-wielding suspect. In such cases, the full swathe of good sensory evidence should soon turn the tables – but that might be too late for the unwitting suspect.
On the brighter side, a system that has learnt to predict and expect its own evolving flows of sensory activity in this way is one that is already positioned to imagine its world. For the self-same prediction machinery can also be run ‘offline’, generating the kinds of neuronal activity that would be expected (predicted) in some imaginary situation. The same apparatus, more deliberately seeded and run, may enable us to try out problem solutions in our mind’s eye, thus suggesting a bridge between offline prediction and more advanced (‘simulation-based’) forms of reasoning.
Thinking about perception as tied intimately to multi-level prediction is also delivering new ways to think about the emergence of delusions, hallucinations, and psychoses, as well as the effects of various drugs, and the distinctive profiles of non-neurotypical (for example, autistic) agents. In such cases, the delicate balances between top-down prediction and the use of incoming sensory evidence may be disturbed. As a result, our grip on the world loosens or alters in remarkable ways.
But there is a problem, or at least, a potential hiccup. These ways of pitching the story, though perfectly correct as far as they go, can sometimes give a rather misleading impression. They can give the impression that the brain is in the business of searching for the hypothesis that best explains the sensory data. And that in turn (unless it is heard very carefully indeed) can make it sound as if brains are organs whose guiding rationale is representational – as if they are restless organs forever seeking to find the picture of the world that best accommodates the sensory evidence, given what they already know about the world.
The trouble with this way of pitching the story is pretty evident – at least from the standpoint of a more embodied approach to the mind. It makes action play second fiddle to something like representational fidelity. But a moment’s reflection ought to convince us that it is action – not perception – that real-world systems really need to get right. Perceiving a structured scene is adaptively pointless unless it enables you to do something better, or to avoid doing something bad. And that, in an often hostile, time-pressured world means that fidelity needs to be traded against speed. Indeed, even if you remove the time-pressure, it makes adaptive sense to devote as little energy as you can get away with to encoding a picture of the world. Because all that really matters (for any adaptive purpose that I can think of) is what you do, not what you see or perceive.
This poses an interesting puzzle. For predictive processing stories do seem to make perception (via the reduction of sensory prediction error) paramount. Even the accompanying accounts of action (next post) make action depend on the reduction of a select subset of sensory prediction error.
Fortunately, this tension is merely apparent. Properly understood, predictive processing is all about efficiently translating energetic stimulation into action. In the next post I’ll try to sketch that part of the story, and thus motivate a slightly different way of talking about these strange architectures.