I’m an optimist by nature, and so Surfing Uncertainty mostly explores the positives, laying out the surprising reach and potential of the ‘predictive brain picture’ and celebrating its near-perfect interlock with work on the embodied, extended, and enactive mind. But there’s no doubt that the picture still has plenty of holes in it. There is much that is incomplete and unclear, and much that may well be proven false as the science unfolds. To round off this short run of blog posts then, I thought I should visit the three biggest potential ‘accident black-spots’ on our (otherwise quite scenic) route.
- Predictive Coding?
Surfing Uncertainty has a kind of double flavor. Much of it is really about a quite general vision of the brain as a multi-level engine of prediction. But it also explores a more specific proposal about the detailed shape and character of that multi-level prediction machine. This is the proposal known as predictive coding. Predictive coding depicts the brain as relying heavily upon a strategy (familiar from real-world data compression techniques such as motion-compressed video) in which predicted signal elements are ignored or suppressed and only residual errors (prediction errors) get to drive further response and learning. This is a neat story that builds useful bridges with work on neural efficiency. But it is not yet known whether this is the strategy that is always, or even mostly, used by multi-level prediction machinery in the brain. There are many ways in which to combine top-down predictions and bottom-up information. Predictive Coding characterizes just one small corner of this large and varied space.
- Full Unity, Restricted Unity, or Meta-Unity?
Nor is it known to what extent multi-level prediction machinery (whether or not it implements predictive coding) characterizes all, or even most, aspects of the neural economy. This is perhaps most clearly evident when we move away from the realm of cortical processing to confront the multiple sub-cortical structures that are responsible for huge swathes of biological response.
Even where multi-level prediction machinery exists, it may not always be used in online response. It seems entirely possible – even within cortical processing regimes – that certain contexts may recruit simple feed-forward strategies if they can enable some desired behavior. So is predictive processing playing some role throughout much (or even all) of the neural economy, or is it really just one more trick in the brain’s ever-expanding toolkit? The jury is out.
To make this issue even more challenging, it seems increasingly likely (see previous post) that decisions about what cognitive strategy to use, and when, may themselves rely upon various forms of self-estimated uncertainty. That means that even if some specific online strategy does not itself involve top-down prediction, the larger apparatus of uncertainty estimation may be implicated in the selection of that strategy at that time (perhaps courtesy of variable precision-weighting mechanisms). This would provide for a kind of meta-level unity, in which core estimations of relative uncertainty select among strategies some of which do not involve prediction or (additional) top-down influence.
- Motivation, Novelty, and Exploration
Finally, imagine you are (just as the full predictive processing story suggests) constantly self-organizing so as to reduce your own (most salient, high-precision) prediction errors. Where does that leave us with regard to the fullness and richness of a human life? Wouldn’t that constant drive to minimize prediction error make you fundamentally novelty-averse, unable to explore, fatally attracted (perhaps) by quiet dark corners where your sad, simple predictions – of darkness and silence and lack of food – all come true? This is the worry sometimes glossed as the apparent fatal allure – for the predictive brain – of the ‘darkened room’.
The official response to this worry involves an appeal to the shaping forces of evolution, and hence to the nature of preferred embodied exchanges with the world. For the deep point of minimizing prediction error is to stay viable in a complex and changing world. That, in turn, may mean being set up so as to ‘predict’ food, water, and general flourishing.
On the face of it, this seems plainly false. Surely I may come to predict lack of food (e.g. during a famine). But that may be to stop at too superficial a level. For although I, the conscious reporting agent, may not be expecting any food, perhaps my brain (or parts of it, that have special links to action) ‘thinks’ otherwise. Perhaps I am a prediction machine that at some level cannot help but predict being fed and watered, that cannot help but expect to be provided with warmth, love, and mates. My search for food, despite my conscious prediction of continuing hunger, may thus reflect a deeper kind of fossilized or sedimented prediction, one most likely merely implicit in the interaction of many simple (‘homeostatic’) mechanisms.
It is by no means clear, however, that these kinds of bedrock evolved ‘predictions’ are really predictions at all. Instead, they seem more like deep structural features (more akin perhaps to healing skin than cascading prediction) that help ‘fit’ the active organism to its niche. In the book, I treat these features not as implicit predictions but rather as part of the larger landscape against which faster time-scale processes of (genuine) prediction and prediction-error minimization must operate. Other aspects of that larger landscape include, for example, the animals’ morphology (shape) and the materials from which it is built.
Sometimes, I wonder whether I am being too unimaginative here. Someone used to thinking about Newtonian space and time might have had trouble accepting the usage involved in thinking about curved space-time. But the unifying pay-off, at least in the case of special relativity (!) is well worth stretching our initial understanding for. Might it be the same with life, prediction, and mind?
Whichever route we favour (my ‘backdrop for prediction and surprise minimization’ notion or the more extended uses of prediction and surprise minimization found in the literature) the simplest worry about darkened rooms is resolved. But a related question remains. It is the question of how best to factor human motivation into the overall story. Are human motivations (specific desires for this or that, or even general desires for play, novelty, and pleasure) best understood as disguised implicit or explicit predictions – predictions that there will be such-and-such, or that there will be play, novelty, and pleasure? Or is this a class of cases in which the vision of the predictive, surprise-minimizing brain needs to be complemented by some other kind of vision – traditionally, one that highlights value, desire, and cost? Here too there is promising and provocative work, much of it turning on the way desire and motivation might implicate varying precision estimations. But the jury is again out.
The Future of Prediction
In many ways, these worries are all variations on a theme. That theme is scope.
The potential reach and unifying power of these accounts is nothing short of staggering. By self-organizing around prediction error, and by learning a generative rather than a merely discriminative (i.e. pattern-classifying) model, these approaches realize many of the dreams of previous work in artificial neural networks, robotics, dynamical systems theory, and even (dare I say it) classical cognitive science. They perform unsupervised learning using a multi-level architecture, and acquire a satisfying grip – courtesy of the problem decompositions enabled by their hierarchical form – upon structural relations within a domain. They do this, moreover, in ways that are firmly grounded in patterns in their own sensorimotor encounters. Courtesy of precision-based restructuring of webs of effective connectivity, they can nest simplicity within complexity, and make as much (or as little) use of body and world as task and context dictate. They thus have the potential to deliver shallow (‘model-free’) and deeper (‘model-rich’) modes of response within a single integrated regime, and may reveal the deep principles underlying the co-construction of perception, action, reason, and imagination.
This is a tempting package indeed. But is prediction really the main, or even the most important, tool underlying fluid adaptive intelligence? Or is multi-level neural prediction just one small cog in the complex evolved adaptive machine? I’m betting on it being way, way more than just one small cog. But beyond that, your (multi-level, precision-modulated) guess is as good as mine.
 
					
				
									
				
			 
			
Do a Find/Replace on this blogpost, simply replacing “brain” by “mind”.
Does the result seem weird? Does it make claims that one would want to challenge, when one would not want to challenge the corresponding claim in the original document?
If the answer is No, what should we conclude?
Dear Max,
How lovely to hear from you – and with an elegant parlour game to boot.
I did the search and replace and (so far) can only find one place where the result seemed odd. I’ll deal with that first, then turn to what may be your real question!
The odd result is where I write:
“For although I, the conscious reporting agent, may not be expecting any food, perhaps my brain (or parts of it, that have special links to action) ‘thinks’ otherwise. Perhaps I am a prediction machine that at some level cannot help but predict being fed and watered, that cannot help but expect to be provided with warmth, love, and mates.”
Replacing (as per your suggestion) ‘brain’ with ‘mind’ we get:
“For although I, the conscious reporting agent, may not be expecting any food, perhaps my mind (or parts of it, that have special links to action) ‘thinks’ otherwise.”
That does seem odd I agree (and that’s why I said ‘brain’ not ‘mind’, and also why ‘thinks’ is in scare quotes). But I am comfortable with the notion that there may be a mismatch between some of the contents endorsed by me (as a conscious reflective agent) and some of the stuff computed by my brain, that nonetheless guides aspects of my behaviour. This seems especially likely where powerful homeostatic mechanisms are involved.
Now, perhaps that is the very point you are making. Is there something fishy about the ease of replacement? For that post, I don’t think so. Most of what I say about the brain in that post could indeed be said loosely (but not incorrectly) of the mind. That’s unsurprising since worries about the ‘predictive brain’ story would make people wonder about claims that we deploy a ‘predictive mind’. Nonetheless, I think there is more to minds than what brains do, and I can’t really make sense of talk about hierarchical probabilistic processing in the mind. That’s a processing story that seems to me to target a mechanism, and that mechanism is the brain.
Well, those are my first thoughts. To recall another parlour game – am I getting warm yet?
Hi Andy,
thanks for another great post. It made me think of a looming question that came up several times at a recent (and excellent) workshop organized by David Chalmers and Ned Block at NYU. The question is whether PP is falsifiable and — if so — in what way it is falsifiable. It seems that some of the issues you raise here — for example in 3 — may be partly conceptual. Whether human motivations, such a specific desires, can be understood as disguised predictions is maybe — at least partly — a conceptual matter. But some of the issues may be empirical — for example, whether the brain actually does predictive coding (as described in 1). How do you see this issue? In what way is PP falsifiable, if it is?
(I realize that you may just repeat what you said in the post in response to this, but I wanted to make this question explicit).
Thanks again,
Nico
Dear Nico,
Great question! I waver a little on this issue.
There is a very large-scale picture here that strikes me as verging on the tautological – that creatures that persist avoid surprising exchanges with the environment. On some readings of ‘surprising’ this is probably a tautology.
But that doesn’t make it uninteresting. One potent and flexible (learning-enabled) way to avoid such exchanges is to be a system organized in the broad way suggested by PP. But I can’t see how to test PP itself. That’s because it is a mechanism-schema more than a mechanism. What we CAN test, however, are specific proposals concerning the implementation of PP by the brain.
That’s what we see in work by Lars Muckli, by Bruno Kopp, by Egner, Monti, and Summerfield, etc. If we don’t get good evidence that some PP-implementation is instantiated by the brain, we should obviously abandon PP – and perhaps look elsewhere in the large space of prediction/generative model involving scenarios.
Yes, great question Nico!
Andy – my understanding is that Friston himself tends to view most (all?) of that stuff in the “large space of prediction/generative model involving scenarios” you refer to as versions of PP, which does rather raise the spectre of tautology, or at least triviality in some weaker sense. It would be helpful to have an example on the table that can’t (or at least shouldn’t) be given a PP reading. Then we can try to force a PP reading on it and see if it works; and if it doesn’t, see why not.
Dear Dan,
I don’t think there is any shortage of possible exemplars! Take, for example, Hinton’s digit recognition network. This is trained using prediction-learning, but then runs in purely feedforward fashion. So that’s a case of a system where there is probabilistic generative-model based learning but where subsequent behaviour is not generated in the PP fashion.
One way to try to suck that kind of case into PP would be to go meta – perhaps by suggesting that even a purely feedforward flow of influence can be selected and enabled (in context) by setting the value of PE regarding certain inputs to zero. Prediction error for those inputs would then be systemically inert. I’m not sure how (or if) this would work in practice though.
Hi Andy,
Thank you for the great posts!
Do you have any thoughts on how solutions to the “dark room” problem would allow for the “prediction error minimization” thesis to account for aesthetically driven actions/behaviors/responses to stimuli that seem to seek out novelty?
For example , a well done plot twist in a movie frustrates our expectations, but seems enjoyable.
Also, music appreciation sometimes involves novel tweaks to familiar structures that “surprise” the ear.
It seems that in both these cases the prediction error minimization being frustrated drives actions to see some movies or listen to some music.
It isn’t clear to me that these examples are just specific versions of the “darkened room” scenario, but it does seem that they would need to be explained nonetheless.
Do you feel that that they could be explained generally with a solution to “darkened room” or might they admit of a different explanation which could imply that minimization of prediction error is not the complete story for human brains?
Again, thank you for writing!
If I have misconstrued anything , I apologize.
Dear John,
That’s a great worry to pursue. I say quite a bit about this in Chapters 8 and 9 of the book, but here’s a taster.
It seems very plausible that evolution and lifetime learning (acting individually and together) will install policies that actively favour increasingly complex forms of novelty-seeking and exploration. A policy, in this sense, is just a rather general action selection rule – one that entails whole sequences of actions rather than a single act. The simplest such policy is one that reduces the value of a state the longer that state is occupied.
From the PP perspective, this corresponds to actively predicting that we will not stay in the same state for extended periods of time. As our trajectories though space and time unfold, potentially stable stopping points (attractors, in the language of dynamical systems) constantly arise and dissolve, often under the influence of the our own evolving inner states and actions.
Here’s an example drawn from the book. In a paper aptly called ‘The Goldilocks Effect’, Kidd and colleagues conducted a series of experiments with 7 and 8 month old infants measuring attention to sequences of events of varying (and well-controlled) complexity. Infant attention, they found, was focusing upon events presenting an intermediate degree of predictability – neither too easily predictable, nor too hard to predict. The probability of an infant looking away was thus greatest when complexity (calculated as negative log probability) was either very high or very low.
Such tendencies to seek out ‘just-novel-enough’ situations would cause active agents to self-structure the flow of information in ways ideally suited to the incremental acquisition and tuning of an informative generative model of their environment. More generally still, agents that inhabit complex, changing worlds would be well-served by a variety of policies that drive them to explore those worlds, even when no immediate gains or rewards are visible.
Extending this perspective, Schwartenbeck et al in a recent piece called “Exploration,novelty, surprise, and free energy minimization’ suggest that certain agents may acquire policies that positively value the opportunity to visit new states. For such agents, the value of some current state is partially determined by the number of possible other states that it allows them to visit. Policies such as these could be generated and nuanced in different ways in different domains.
This brings us finally to the kinds of domain you mention. For the complex human-built environments of art, literature, and science are prime examples of domains in which we deliberately train human minds to expect certain kinds of exploration and novelty. In structuring our worlds so as to install generative models apt for science, art, and literature, we thus structure ourselves in ways that promote increasingly rarified patterns of exploration and novelty-seeking.
This incremental cultural self-scaffolding is perhaps the culmination of humanity’s escape from the lure of the darkened room!
Hi Andy
Great post, thank you.
Predictive processing is a hot topic at the moment and I think that one area where it could benefit from is areas of neuroeconomics (specifically Glimcher on value). As it is, predictive processing is explaining processes by looking at the nature and function of the system but it is saying very little on the role of neurotransmitters and hormones. Looking at work done by Schultz and and Mirenowicz in the late 90’s, we see that value is processed in the dopaminergic system in exactly the same manner as we now propose perceptual stimuli is processed in the visual system.
This can then also answer John’s question on why surprising plot twists are more enjoyable. When we engage in a narrative, we are solving a problem that is presented to the characters on screen. When there is a twist in the plot, prediction error is greater and our cognitive system is under ‘stress’. The greater the stress, the greater the relief when the problem in the narrative is solved resulting in a greater activation of the dopaminergic system.
I agree with your point that we are explorers and that we seek stimuli that can update the models we have of our world. This could also explain why one film could be more enjoyable than another. If a film presents a problem relevant to its target market with a novel solution, it is expected that that film will be more enjoyable than a film presenting a relevant problem with an expected solution (predictable ending?).
I’d like to hear your thoughts on this as it is a project I am working at the moment and the ideas are still under construction.
Dear Elmarie,
Great to hear from you – and sorry for the delay (standard start of semester madness). I don’t quite agree that neurotransmitters currently play little role in PP, as dopamine is centrally implicated in precision-weighting, which is a core PP mechanism. But the issue of how best to incorporate ‘value’ remains a hard and crucial one.
Friston and colleagues think that dopaminergic variation helps encode the precision of prediction error. The idea here is that “dopamine might not encode the prediction error on value (Schultz et al., 1997) but instead the value of prediction error, i.e., the precision-weighting of prediction errors (Friston, 2009). “ (quote from Mathys, C., Daunizeau, J., Friston, K. J., & Stephan, K. E. (2011). A Bayesian foundation for individual learning under uncertainty. Frontiers in Human Neuroscience, 5(May), 39. doi:10.3389/fnhum.2011.00039). Could our intuitive notions of ‘value’ be reconstructed in terms of varying precision estimations in different areas and at multiple time-scales? It’s not yet clear.
The other issues you raise are equally challenging. Does high prediction error automatically equate to some form of (I guess conscious, agentive) stress? It’s not obvious. In visual perception, for example, I might generate a swathe of high-precision PE before settling on a percept, with no systemic stress at all….But perhaps persisting unresolved high-precision prediction error does cause something like systemic stress. This is one of the key ‘pressure points’ at which we might try to explore the complex relations between PP as a story about sub-personal processing and PP as a story that bears on personal-level experience. That’s an area that desperately needs more attention, and your project sounds as if it could be very helpful in this regard. Look forward to hearing more about it as it unfolds.
Nico, Andy and Dan,
apologies for chipping-in far too late. On the worries about tautology, I think Andy is spot on, minus the wavering.
Specifically, the very large picture does entail a tautology, but one doesn’t need to find this worrying at all, on the contrary, it’s a plus.
Consider natural selection, at it’s core, you can highlight the tautological side of the concept:
What is better equipped to persist and reproduce will persist and reproduce better than what isn’t.
However, serious people won’t fling tautological mud against evolutionary theory (ET). Why not? Because ET starts from natural selection and builds a theory on top of it, the end result is mechanistic theory of extraordinary explanatory powers. In this context, the tautological core provides a solid, unassailable foundation: as Andy points out, the debate rightly focuses at the level of the mechanistic explanations produced, the tautological core is not and should not be questioned.
So, for predictive brain (and predictive life, a la Friston) the real question becomes “can we produce mechanistic explanations that carry a useful amount of explanatory power?”. If you unpack the last question you get: “can we produce specific models of brain mechanisms based on prediction and prediction error minimisation, and do these models produce verifiable (preferably surprising) new predictions (e.g. gain new predictive power)?”.
Once again, Andy hits the mark in pointing out that this second sort of question is still largely to be answered, but nevertheless, people are getting excited because the potential explanatory power (as a function of the unifying potential, I presume) seems huge.
Which sends me back home: I don’t think we need to worry about the un-flasifiable side of prediction-based understanding of brains, on the contrary, I see the tautological side as the source of all the hope associated with the idea. Added bonus: allows to question/challenge simplistic views of the scientific method, which is usually good fun and generally useful.
FWIW, I’ve written some more articulated thoughts on this here.
Dear Sergio,
Thanks for this. I agree 100%. Neat blog-post too.
Maybe worth noting, though, that the tautology (about avoiding certain kinds of surprising exchange with the environment) could be true even if moment-by-moment the brain was not generating anything like predictions at all. For example, a very simple system using only feedforward resources could be so well evolved as to be avoiding surprising exchanges with the environment. This speaks to a possible gap between ‘meaty’ visions of a predictive brain and the more all-encompassing (and tautologically-grounded) vision.
Dear Andy,
thanks so much! Very nice to see that we’re on the same page.
Your additional comment immediately planted the seed of two new ideas (new to me!). Both are embryonic, so I apologise for the public wild-speculation that follows.
The first idea is perhaps a bit mundane, and is merely unpacking your main point:
The moment we include action in our broad picture, it becomes possible to model virtually any organism as a Prediction Error Minimisation (PEM) Engine – as per the canonical thermostat example. Thus, at the largest scale, the picture shows its tautological side. On the other hand, if we restrict the scope, and look only at the brain, or even better, only within some narrowly defined brain subsystem (be it a brain region, a particular neural network, and so forth), then we can propose the hypothesis “this particular subsystem acts as a PEM Engine in ecologically relevant circumstances”. In turn, this second kind of hypothesis is not necessarily tautological, and is the kind of work where the general idea may find (if confirmed) its real-world usefulness.
I would even go one step further, and propose a corollary, probably inspired by my own expectations, sensibility and no doubt biases (given my personal background): it seems to me that picking smaller and smaller subsystems would make a successful attempt more and more interesting (useful and surprising). If we attempt to interpret the whole brain behaviour (in terms of I/O) in such a way, if the hypothesis that the vast majority of an organism behaviours is controlled by the brain is correct, it should follow that the attempt has to be possible. The only interest in this operation would be about picking the right interpretation of I/O signals (not a small thing in itself). Similarly, if we “just” consider a single hemisphere, I would still expect the tautological side to be close enough to guarantee some sort of success. However, if we drill down, across brain regions, all the way to neural architectures (in terms of connectivity patters between neurons of various types), and we still find a convincing way to describe the dynamics of neural firing in terms of PEM, this operation will be very interesting and come with plenty of new explanatory insights.
In other words, the tautological side pertains whole systems/organisms, and the more we drill down into the details of subsystems, the less it applies. Thus the informative potential increases as we hone into finer and finer subsystems. (If you want: the prior probability of successfully finding a way to describe a sub-system in terms of prediction error minimisation decreases with the size of our subsystem of choice).
This way of looking at the problem should pacify even the most militant “falsificationists” (nothing wrong with them!), and concurrently makes me understand why I don’t get too excited when I read of fMRI results interpreted in PEM terms…
The other idea, on the other hand is just an embryo of something which might work. It’s grossly underdeveloped and stems from your “[a] system using only feedforward resources could be so well evolved as to be avoiding surprising exchanges with the environment” statement. I’ve been struggling to neatly describe the relation between good old evolutionary theory and prediction error minimisation. That’s because I have the intuition that they both refer to the same core kind of process, but whenever I try to put them together in a sort of unifying meta-construct, the result is messy and ultimately unconvincing. The problem is to define what carries the feedback error signal in an evolutionary mechanism, and do so without creating over-complicated conceptualisations that inevitably smell of “ad-hoc”, or “for the sake of it” wishful thinking. In this context, your phrase points to the Baldwin effect and its supposed “simplifying” drive towards “well evolved” (very specifically adapted) organisms. In turn, this points to the significance of speed of ecological variation. The fastest the changes, the more each single individual would benefit from being adaptable, and thus, the more need there is for feedback “prediction error” signals. I don’t really know where to go with this tiny proto-idea (I would be surprised if it is new!), but seems something worth exploring.