Thanks to John Schwenkler for the invitation to guest-blog this week about my new book Surfing Uncertainty: Prediction, Action, and the Embodied Mind (Oxford University Press NY, 2016).
In the previous post, I spoke about the emerging view of the perceiving brain as a prediction machine. Brains like that are not cognitive couch-potatoes, passively awaiting the next waves of sensory stimulation. Instead, they are pro-active prediction engines constantly trying to anticipate the shape of the incoming sensory signal. I sketched a prominent version of this kind of story (for a fuller sketch try here). Today, I want to contrast two ways of understanding that story.
First Way: Conservative Predictive Processing (CPP)
Suppose we ask: What’s going on when a predictive processing (PP) system deals with sensory inputs? What happens, CPP says, is best seen as the selection of the hypothesis best able (given prior knowledge and current estimated sensory uncertainty) to explain the sensory data. Prediction error, on this account, signifies information not yet explained by an apt hypothesis.
This kind of vision dominated the early history of work on the predictive brain, and is visible in many recent treatments (including some of my own) too. It is not exactly wrong. But I increasingly believe that it is at least potentially misleading.
Second Way: Radical Predictive Processing (RPP)
RPP flows, it seems to me from taking some further aspects of the PP framework very seriously indeed.
The place to start is with Karl Friston’s notion of ‘active inference’. The core idea is that there are two ways for brains to match their predictions to the world. Either find the prediction that best accounts for the current sensory signal (perception) or alter the sensory signal to fit the predictions (action). If I predict I am seeing my cat, and error ensues, I might recruit a different prediction (e.g. ‘I am seeing the computer screen’). Or I might move my head and eyes so as to bring the cat (who as it happens is right here beside me on the desk) into view. Importantly, that flow of action can itself be brought about, some of this work suggests, by a select sub-set of predictions. Action is thus (see also Lotze, 1852; James, 1890) a kind of self-fulfilling prophecy (an idea that has resonances in contemporary sports-science). The resulting picture is one in which perception and action are manifestations of a single adaptive regime geared to the reduction of organism-salient prediction error.
This already hints at a beguiling picture of embodied flow in which action and perception are continuously co-constructed around the same basic computational routine. For perception and action are now locked in a circular causal embrace. Perceptual states inform actions that help elicit, from the world, the very streams of multi-modal stimulation that they predict.
This circular causal embrace is further structured by an endemic drive towards frugality. This is because the goodness of a predictive model, within PP, is a joint function of the success of the model in guiding apt actions and responses, and the frugality of the model itself (a model with less parameters is, ceteris paribus, more frugal than one with more parameters). Ultimately, this flows from a deeper imperative to maintain organismic viability at minimal (long-term average) energetic cost. But proximally, the effect is to favor the least complex (fewest parameters) models that will serve our needs.
It is at about this point, it seems to me, that CPP and RPP really start to diverge. CPP seems to suggest that action makes no fundamental difference to the basic (Helmholtz/Gregory -style) story. It claims we can (and should) still think about the prediction-error-minimizing regime in terms of neurally-generated perceptual hypotheses, and inferences to the best such hypotheses. Moreover (still according to CPP) we should see the role of the prediction-generating system – the ‘generative model’ – as that of recapitulating the causal structure of the world so that the brain becomes an ‘internal mirror of nature’. Good mirrors, one might say, make the best predictions about the shape and structure of the reflected world.
I think the ‘mirror’ picture and the ‘best hypothesis’ picture are importantly in tension with the suggestions concerning frugality and the circular causality binding perception and action. To see this, reflect that the predictive brain simultaneously drives perception and action so as to reduce organism-salient prediction error. That means action gets called upon to reduce complexity too. Consider, for example, the way some diving seabirds (gannets) predict time-to-impact according to the relative rate of expansion of the image in the optic array. Such strategies involve the use of cheaply computed cues, available only to the active organism, and selected for the control of specific types of action (such as pre-impact wing-folding).
It can sometimes seem as if these types of strategy are the polar opposites to the generative-model-based strategies highlighted by PP. A major goal of the book is to argue that this is not the case. Instead, such strategies fall neatly into place once we re-orient our thinking around the role of frugal predictions for the control of action.
Better yet, the full PP apparatus provides a potential mechanism for toggling between richer (intuitively model-based) and less rich (frugal, or ‘shallow model-based’)) strategies, according to self-estimated sensory uncertainty. Roughly, you continuously estimate which predictive strategy best reduces organism-salient sensory uncertainty here-and-now, given task and context. There are now promising explorations of such uses of ‘precision-weighting’ to accomplish strategy switching.
At the limits, precision-weighted re-balancings could temporarily select a purely feed-forward strategy, essentially switching off higher-level influence whenever raw sensory data can unambiguously specify correct response. PP thus implements (within a single processing regime) strategies that can be knowledge-rich, knowledge-sparse, and all-points-in-between. This hints at a possible reconciliation between key tenets of ecological psychology and the broader commitments of an information-processing paradigm.
There’s much more to say about all this. Of special importance is the way PP turns out (or so I argue) to deliver an affordance-based model of perception and action. But the blog-moral of the day is just that we should resist the common thought that the prediction error signal encodes the sensory information that is as-yet-unexplained, and that its role is to recruit – or help learn – a better hypothesis, or to install or activate a ‘mirror of nature’. Such glosses obscure a key effect of the whole PP framework: mashing perception, cognition, and action together within a single, action-oriented, frugality-obsessed processing regime.
What might we say instead? I think it would be better to say (and here comes the promised Radical Predictive Processing gloss) that prediction error signals sensory information that is not yet leveraged to guide a rolling set of apt engagements with the world. I mean this to be true not just when prediction error is resolved (or even resoluble) by action. Rather, the idea is that even in rare cases of ‘pure’ perception, the way the world is parsed reflects nothing so much as our ongoing needs for action and intervention. The job of the prediction error signal, I want to say, is thus to provide an organism-computable anchor for self-organizing processes apt for the control of action, motion, and environmental intervention.
By controlling (or better, by helping to control) our rolling engagements with the world, prediction-error minimizing routines become positioned to ‘fold in’ work done beyond the brain, and beyond the body. By calling upon action to resolve salient (high precision) prediction error, PP allows arbitrary amounts of work to be done by bodily form (morphology) and by the use of all manner of environmental features and ‘scaffoldings’. These range from counting on our fingers, to using an abacus or an iPhone, all the way to co-operating with other agents. (Quick aside for anyone wondering about my other work, on the so-called Extended Mind – this is why the RPP vision, although not itself an argument for extended minds, is exactly as compatible with existing arguments for the extended minds as any other dynamical, self-organizing, story).
To wrap up, I believe that the RPP ‘rebranding’ is both kind of cosmetic and kind of important. It’s cosmetic in that the (active inference) mathematics and algorithms are unaltered. So if this is what was already meant when glossing that story with talk of hypotheses and mirrors of nature, that’s fine (albeit potentially misleading). But its important in that the RPP gloss helps us see the deep complementarity between this view of what brains do and the large bodies of work that stress the complex interplay between brain, body, and world.
In the next post, I’ll be continuing the positive story about PP as an account of embodied experience, before finally turning to the dark side and examining some potential trouble spots.
Hi Andy! Congratulations on the book, it’s a really stimulating read.
I have a question that’s been bothering me ever since you first drew the distinction between ‘radical’ and ‘conservative’ predictive processing: how radical do you actually want to go? Or to put it more precisely – do you envision your brand of predictive processing as part of the wider radical embodied approach to cognition, one which eschews (non-linguistic) representational contents other than mere covariance? I’m simply curious about the kind of representational commitments hiding behind statements like the one in which you say that “prediction error signals sensory information that is not yet leveraged to guide a rolling set of apt engagements with the world”.
Hi Krzysztof – great to hear from you.
The short answer is, I don’t mean to be so radical as to reject the idea that we are here dealing with systems that trade in internal representations. For example, some folk (maybe Engel, A., Maye, A., Kurthen, M., and König, P (2013) in Where’s the Action? The Pragmatic Turn in Cognitive Science. Trends in Cognitive Sciences 17, no. 5 202–9. doi:10.1016/j.tics.2013.03.006) might reject all talk of generative models and internal representations, and speak only of an achieved action-centric ‘grip’ upon the world. Others speak of grip upon a ‘field of affordances’ (Bruineberg, J. & Rietveld, E. (2014) Self-organization, free energy minimization, and optimal grip on a field of affordances. Frontiers in Human Neuroscience 8 (599), pp. 1-14). This probably means there is a third option: CPP, RPP, and RRPP : Really Radical Predictive Processing!!
RRPP is a more purely cybernetic vision (see e.g. Anil Seth’s Open-Mind.net piece on ‘The Cybernetic Bayesian Brain’) But what can it say about why the PP organization works? As far as I can tell, it can only say something like this –
(RRPP)Downward (and laterally) flowing influence aims to induce recurrent flows that alter the impact of forwards (and laterally) flowing influence so as to maintain the organism within certain bounds (basically, the bounds of viability and persistence).
But this seems to me to sell short the understanding we actually possess of what is going on. For example, PP is hugely committed to some kind of functional asymmetry between ‘predictions’ and ‘PE signals’ (
if there is no deep asymmetry here, PP simply fails as a distinctive empirical account). Conceptually, at the heart of that asymmetry lies the idea that something is being compared to something, so as to deliver residual error. And at the heart of that idea, lies the claim that the stuff that is being compared (‘top-down predictions’) can be generated using a probabilistic model that is learnt, or tuned, by exposure to the training data.
The rock-bottom core notion is thus that of a probabilistic generative model (PGM), acting as the prediction-maker. But RRPP seems to require ditching the idea that predictions have contents rooted in the knowledge-base of the PGM. I think if we subtract that guiding vision, we lose useful information. What remains – as mentioned in my reply to Inman Harvey – is just a picture of complex (and mysteriously asymmetric) looping dynamics spanning brain, body, and world.
One reason RRPP can seem attractive, I think, is because it is very likely that the contents of the predictions (especially at middle and lower levels) will be very unfamiliar and ‘alien’. For example, PP co-computes percepts and actions in a way that is constantly informed by changing estimates of precision, that track our own sensory uncertainty.
What is being computed moment-by-moment is thus an odd admixture of estimated partial (task-relevant) states of the world, estimates of our own sensory uncertainty, and recipes for acting so as to alter the sensory signal itself. And instead of simply describing ‘how the world is’ these models – even when considered at the ‘higher’ more abstract levels – are primarily geared to engaging those aspects of the world that matter to us. The contents at issue are thus alien, non-linguaform, and typically mush together aspects of perceiving, thinking, and acting.
Another reason you might be drawn to RRPP is because the whole PP story is at root a story about self-organizing neuronal dynamics. Self-organizing around prediction error, these systems entrain perception and action, folding in the exploitation of bodily and environmental structure and opportunities along the way. That indeed has a wonderfully enactive (and embodied, and perhaps even extended) feel to it. I pursue those parallels in the final chapters of the book.
Hi Andy, great posts so far! I’ll be sure to check back throughout the week for the rest.
[Apologies if this comment appears several times. My computer is currently a source of large amounts of prediction-error.]
Firstly, I agree with you in as much as the conservative PP story is potentially misleading when making reference to ‘hypotheses’ and other overly-intellectual notions that seem to suggest a reconstructive role for perception. I think the move to radical predictive processing brings some much needed terminological clarity to the framework, and helps to demonstrate (some of) the connections between the RPP framework and closely related work in enactivism, which both emphasise the fundamental role of action in subserving cognition (and perception).
I have a question regarding the appeal to Friston’s notion of active inference as the starting point for RPP, and the way in which we should understand the ways in which prediction-error (or free-energy) can be minimised.
In some recent work Friston (et al., Active Inference and Epistemic Value) has argued that active inference can minimise prediction error (free-energy) in two ways, which correspond to ‘pragmatic’ actions, and ‘epistemic’ actions – reminiscent of Kirsh and Maglio’s distinction, though it doesn’t appear as though Friston is using the terms in the same way as Kirsh and Maglio did. Epistemic actions, Friston claims, move the organism away from the goal of prediction-error minimisation temporarily (exploration), but in some cases this may be the best policy (e.g. when a known source of food has been depleted). Pragmatic actions by contrast refer to strategies known by the organism to be effective at minimising prediction-error (exploitation).
This appears to be quite different to the approach in PP that claims that prediction-error is minimised either by updating beliefs (perceptual inference), or resampling (active inference) sensory signals, and my question is how do you see the two accounts connecting (if at all)?
I can think of a couple of responses:
1) The separation of active inference into epistemic and pragmatic actions is simply a finer-grained distinction contained within the PP account of action as resampling.
2) The notion of perception as PP defines it, should be understood in a stronger ‘active’ sense, which is closer to the pragmatic action that Friston defines. This is likely related to some embodied and enactive approaches that attempt to recast perception in more active terms.
3) The two accounts are describing different phenomena entirely.
Of course, there’s always option 4) “something else entirely”.
I think a) is probably the intuitive response, but I think b) is worth exploring, and may affect how we understand the distinction between perceptual inference (PI) and active inference (AI) in PP.
It has been claimed that PI and AI relate to perception and action respectively, in ways that honour the traditional distinction in psychology. However, it is also possible that PI and AI are inseparable except in formal terms, and this may be what motivates Friston’s emphasis on active inference. I think it could also be a natural route to pursue if we wish to avoid seeing perception as a reconstructive process.
I’d be interested in knowing what your thoughts are.
Dear Chris,
Thanks for the characteristically deep and challenging questions! I’ll have a go, but my self-estimated confidence here is only about 60%.
I don’t think the notion of active inference is used consistently throughout this literature . I don’t even think that I have used it consistently, but I’m working on it! For example, I think it is probably a mistake to contrast ‘active inference’ and ‘perceptual inference’. Instead, ‘active inference’ should be used to name the unified mechanism by which perceptual and motor systems conspire to reduce prediction error using the twin strategies of altering predictions to fit the world, and altering the world to fit the predictions. We here confront a single process (distributed across many brain areas and implicating action and world in rolling cycles).
If we lose sight of this, it can seem as if the organism always has a kind of extra task – deciding whether to opt for perceptual or active inference. In other words, shall I make the world conform to my predictions, or alter my predictions to conform to the world (so as better to accommodate the current sensory signal). But although I agree that these are indeed two distinct ways to reduce prediction error (PE), I don’t think they operate one at a time or even (for the most part) independently of one another. Instead, at every moment we respond to the sensory barrage by recruiting a flow of top-down influence (prediction) that simultaneously delivers a bit of perception and a bit of action – even if that action is merely a saccade or an alteration of the distribution of attention (precision).
Does that make sense? I thus see this story as finally delivering on that 1994 call by Churchland, Ramachandran, and Sejnowksi (in their ground-breaking ‘Critique of Pure Vision’) for a truly moto-centric vision of perception and action – a vision in which motor and perceptual systems are integrated thoughout the processing. Indeed, PP may well be (as Michael Anderson once said to me) the ultimate expression of the ‘active perception’ program in A.I and cognitive science.
Hi Andy, thanks for the response.
It’s interesting to think of the alteration of precision-weighting (attention) as an action. Max was telling me about a study this morning, concerning top-down factors in the sensory cortex of mice being responsible for differentiating between identical (but highly uncertain) stimuli, where the presence of top-down signalling can make the difference between whether a stimuli is perceived or not.
I’m yet to read the full article in Nature Neuroscience (only the science daily commentary!), but it sounds like it could be an interesting case for this idea, and the author postulates a role for attention in accounting for the difference between perceived and non-perceived trials.
Here’s the link to the science daily summary, which is a short read:
https://www.sciencedaily.com/releases/2015/12/151207131743.htm
Thanks Chris – cool stuff. Should fit nicely.
Hi Andy, thanks for the very interesting post and discussion! I know little about this area but ever since I’ve come across this stuff, I’ve been bothered by exactly this: “the organism always has a kind of extra task – deciding whether to opt for perceptual or active inference. ” I was hoping you might elaborate.
As I understand it, the problem with this is that to make this choice between acting or updating its hypotheses the organism will need some further decision procedures, and those procedures won’t (or may not) be captured by PP, undermining its claim to universality.
But I wonder how this could possibly be resolved. Consider especially the sorts of things we ordinarily think of as relatively long-term desires, intentions and so on. For states of that sort, the whole point seems to be that we stay in them for a while, despite consistent high error (mismatch with their satisfaction conditions). I could eliminate the mismatch between my desire for food and my empty stomach by predicting that my stomach will stay empty, but that won’t do me any good; better to stay in a high error state until I find food.
Is there some standard answer to this in the literature?
Hi Markos,
That’s a nice way of raising the issue. I totally agree (as per my reply to Krzysztof) that for the most part, an organism ought not have an ‘extra decision’ of this kind to make.
It may help to think about skilled performance in sports. The well-trained sportsperson has learnt routines in which her rolling predictions yield a variety of prediction error (PE) signals. Some will concern the orientation of her body in space, some may concern other states of her body, and states of the world. In addition, she is constantly attending to the scene in ways that bring forth or highlight the features that are key for the target performance. That means there is a medley of PE signals, each of which is already (just in virtue of being that error signal in that context) demanding to be resolved in a certain kind of way. Learning to be expert at a sport crucially involves learning what kinds of error signal need to be remedied when, and in what kinds of way (knowledge that will be reflected in changing estimations of the precision of different PE signals). So ‘on the day’ there is no additional task of deciding how to deal with such and such a PE.
Most of daily life, it seems to me, is an expression of this kind of expert flow – flow in which PE arises and is quashed by a specific mixture of alterations to the body, to the world, and to our own ‘take’ on the body and the world. When immediate fluid trained response fails, there are (I agree) more decisions to be made. But these will often be handled by further layers of expertise. The skilled bobsledder (to opt for a festive example) knows how to respond when things go wrong too – that’s part of what her training installs.
So the key point, I think, is that specific PE signals (in context) demand specific modes of resolution. They do not simply imply ‘ remove me’. Rather, they must be removed by altering specific states of the organisms or the world. This may be clearest in the case of simple ‘homeostatic’ PE – salient PE concerning blood sugar levels needs to be remedied by increasing blood sugar levels, not by altering the predicted level! Of course, there may be several ways to increase our blood sugar level. You might go and eat something you have already stored away, or go search for new food. So it is not as if there are no decisions to be made at all. These kinds of decisions are, however, very nicely treated in the same broad framework (see the ‘active inference and epistemic value’ paper cited by Krzysztof ).
Hi Andy
Fantastic post much of which I am totally on board with! I want however to press you a little on your response to Krzysztof. You say that the real radicals do not have as rich a story to tell about the functional asymmetry between predictions and PE signals. Let me try to briefly respond to this worry. We (Erik Rietveld, Jelle Bruineberg and myself) take PE signals to be a measure of disattunement between internal and external environmental dynamics. This is to say we take PE signals to measure the extent to which an organism deviates from optimal grip in its skilled interaction with the environment.
What is being compared with what in our interpretation of predictive processing? We take internal dynamics to be richly nested layers of action-readiness formed on the basis of learned habits and skills. We think habit and skill (or know-how) to do the work of your probabilistic models. It is know-how that we refer to as internal dynamics. (Some of what you say above in response to Markos suggests you may also be on board with this. Then this would make you a real radical by our counts.)
Prediction errors arise then when there is a mismatch between what you are expecting (what you are ready to do) based on your habits and skills, and current interoceptive and exteroceptive sensory states. This is what we describe in terms of deviation from optimal grip. The agent then acts so as to reduce this deviation and bring about the sensory states it expects. Minimising prediction error is what the organism does in reducing its disattunement with the environment, and it is action (or responding to affordances) that moves the organism closer towards being in equilibrium with the environment.
This would all seem to fit very snugly with what you say above about sensation sometimes immediately eliciting a response. We say this happens for experts that are smoothly coping, able to anticipate what will happen before it happens. However this is rare and more often there will be obstacles that obstruct the smooth pursuit of one’s goals. This is where the work Mark and I are doing on precision-processing becomes relevant but this is another story.
Interested to hear for now why you think our story requires a richer more representation-hungry architecture. We think we have a story about the functional asymmetry you point to that avoids the need for such an architecture. This also relates to Nico’s question (in response to your previous post). Nico asks you whether rejecting the conservative version of predictive processing might also mean that you give up on predictive processing as an implementation of Bayes. She suggests perhaps your view allows for a motley of different strategies (skills in our terms) for minimising prediction error.
Hi Andy,
Thanks a lot for these posts and discussions! This post is meant as a follow up to Krzysztof’s question and your and Julian’s reply. I feel a bit uneasy being pushed in the Really Radical Predictive Processing camp, no need to pile up adjectives. I guess one of the main differences between your account and ours is how central a place self-organization has and how much the free-energy principle and predictive-coding (as a mechanistic proposal) can come apart.
In the infamous “Life as we Know it” paper, Friston gives an account for how predictive processing emerges (in a non-spooky) ways out of coupled dynamical systems. What is distinctive here, is that no prediction-error is calculated or represented, but just that there is a disattunement between the coupled dynamics on a macroscopic level. The functional role of predicting or anticipating can then be assumed only on a macroscopic level, while the dynamics on the microscopic level only express their intrinsic dynamics modulo some local constraints. There is no need to encode or compute prediction-error, because the disattunement emerges out of the local interaction (at least in simple cases in a pretty non-spooky way). I guess this is also the difference between our and Anil Seth’s cybernetic account that you point to.
The main difference then is, I think, the way to understand grip: whether of some internal estimate of one’s current worldly doings, or as the disattunement of neurodynamics and worldly dynamics. Although, I am not sure the former leads to a homuncular position, as Inman Harvey seemed to suggest in a comment to your last post, the two are importantly different. The main intuition pump here is “general entrainment” (a la Huygens synchronizing oscillating clocks). This allows one to rethink terms like generative model (nothing more than that the clocks need to have a similar swinging time), Markov blankets (just the wobbly table the clocks stand on), and active inference (just the synchronization of the two clocks) and I guess it ridiculizes ‘hypothesis’. Surely, this metaphor will not do all explanatory work, but combined with ecological psychology and phenomenology, it might get us further. To add to this, the mysterious asymmetry you pointed out here, is the asymmetry between intrinsic neurodynamics and control parameters triggering the self-organization of the brain, I don’t think that is very mysterious.
I am looking forward to read the rest of your book! The blog and discussion here is already a promising start!
Dear Julian and Jelle, and with a sidebar to Inman too!
Thanks for these extremely useful reactions. They really help sharpen the debate. I’m replying to you all at once as the issues you are raising seem related.
Julian and Jelle – you both want to unpack the notion of prediction error (PE) signals
as (quoting from Julian) “ a measure of disattunement between internal and external environmental dynamics. And that, in turn, is unpacked as deviations from optimal grip.
Which you say implies only layers of ‘action-readiness’ and a bunch of know-how. I’m guessing that Inman (commenting on Post 1) might be happier with the disattunement story, so I am including him here too.
Julian and Jelle: I’m on board with the notion of disattunement. But I think (and this is where I suspect we disagree) that that story comes in two very different flavours. The first version is the one very neatly glossed by Jelle who says, speaking of the Friston work on very basic lifeforms, that “What is distinctive here, is that no prediction-error is calculated or represented, but just that there is a disattunement between the coupled dynamics on a macroscopic level.” In such cases, Jelle rightly notes, there is no need for the system’s brain (hell, it may not even HAVE a brain) to compute and then respond to prediction errors. I suspect Inman may like that too.
I think this is true and very important. A (literally) toy example can further illustrate the case. In my office at work I have a little toy car that you wind up and that backs off as it approaches a table edge. It does so because when the front (unpowered) wheels go over the table edge, the middle of the vehicle drops and a sideways facing wheel (usually raised and impotent) makes contact with the table surface. The clockwork drive (that had been moving the car towards disaster) now moves the car away from the edge, the front wheels re-engage, and off it goes again. This is neat behaviour, achieved without any traditional sensors or computation. If survival and reproduction depended on such behaviour, we might properly describe this in terms of brief disattunement and a rapidly remedied deviation from optimal grip. Smooth coping, even. Similarly, assuming once again that staying on the table-top helps the creature survive, temporarily resisting the second law, the wheels and clockwork arrangement will be a free energy minimizing device in good standing.
But are such devices in the business of prediction and prediction error minimization?
I don’t think so. I agree 100% that such ‘computation- and representation- free’ strategies play a huge role in adaptive success. But I am simply not convinced (nor, it seems, is Jelle) of the value of describing such cases using the apparatus of predictions and prediction errors. The same is of course true (and here I am again agreeing with Jelle) of the dynamically coupled pendula.
By contrast, at least some of what human brains do looks to involve the computation, at various timescales, of explicit prediction error signals that are used to learn, and later to exploit, a multi-level probabilistic generative model that reflects structured patterns in some target domain. Or at any rate, that (for my money) is the substantial empirical claim upon which PP must stand or fall.
At this point, Inman (to add another respondent to this heady mix) may press a very reasonable question. Inman may ask how one can tell when the right story requires only the recognition of neat tricks and dynamical attunements, and when we should invoke the apparently more extravagant PP machinery. But a reasonable response, it seems to me, is simply to point to the very existence of huge swathes of apparently progressive neuroscientific and neurocomputational research that finds these kinds of models to offer the best overall fit with converging bodies of data (from various kinds of neuro-imaging, some single-cell recordings, and large bodies of work in cognitive psychology, computational neuroscience, and psychophysics). A nice example here is Egner, Monti and Summerfield’s (2010 Journal of Neuroscience) fMRI and model-comparison study of population responses to face stimuli in visual cortex. The authors claim that their study “ formally and explicitly demonstrate[s] that population responses in visual cortex are in fact better characterized as a sum of feature expectation and surprise responses than by bottom-up feature detection (with or without attention)” Egner et al (2010) p. 16607. To reach this conclusion, the authors took their imaging data and conducted multiple careful model-comparisons. A PP-style model unambiguously provided the best fit to the data.
For my own part, I don’t think it should be all that surprising if it turns out that some of the core principles that govern how humans (and many other animals) think and reason involve strategies not found in very simple cases of adaptive response. I sometimes fear that the waters here get muddied by the larger free-energy minimizing framework. That would a shame, since the free energy minimizing framework provides an invaluable bridge linking the simpler cases (that do not involve the computation of predictions and PE signals) to the progressively more complex.
Finally, I may as well put one more card on the table. I think that some creatures experience a structured and meaningful world while others (single-celled organisms, for example) do not. The apparatus of hierarchical predictive processing may be part of the mosaic that enables us, but not the single-celled protozoan, to know (and not merely respond to) its world.
Thanks Julian for the really useful comments. I’ve generated a joint reply to you and Jelle Bruineberg, that appears below the Bruineberg post. For some reason, I can’t get this little note to you to appear after your own post!!
Hi Andy
Thanks for taking the time to write this really helpful reply. The issue we seem to be converging on here is I suspect that old chestnut of the life-mind continuity thesis. Now we can all agree that self-organizing systems come in greater and lesser degrees of complexity in terms of their organization. We want to say however that in the end it is basically a matter of skill all the way up. The skills in question concern improving grip on the environment. Basic life-forms and humans are not in the end doing anything qualitatively different – both are coping with what their environment offers in terms of affordances. Both are relative to their repertoire of skills acting in such a way as to improve grip on the environment. This is the continuity claim.
That being said of course there is a massive quantitative difference in skills. There is also a massive qualitative difference on the side of the ecology of these different lifeforms – in their respective forms of life as we would put it. Differenced in skill and corresponding differences in niche scale up to the sort of qualitative differences that we might think call for explanation in terms of thinking and reason and the computation of complex nested probabilistic models. We think however that there is no need to explain these qualitative differences in this way. This in broad brush-strokes is the story we are pursuing here in Amsterdam.
Now back to the issue of computation and representation-free strategies. We are all agree that smooth coping can be described in computation and representation-free terms. Is it only the toy car and creatures without brains that do smooth coping? We don’t think so. We want to treat smooth coping as what cognition is all about.
Now if there is no need to describe smooth coping in terms of the calculation and representation of PE then we want to argue there is no need to describe cognition in these terms either.
This is not to say that minimization of surprisal as a stand in for free energy might not be incredibly important for smooth coping. On the contrary we think it is fundamental. So what is the relationship between minimization of surprisal and minimization of PE?
This depends on what you mean by PE. This is where we try to get away with just talking in terms of disattunement between internal and external dynamics. The difference between the human case and that of other lifeforms comes down to the internal and external dynamics – the skills on the one side and the environmental niche on the other.
Now about the neurocomputational models and their nice fit to the data. We don’t want to dispute any of this but rather to appropriate all of this fine work under the umbrella of our own non-representational and non-computational framework. This we think we can do by offering a deflationary description of informational surprise and inference along the lines sketched above. This is all work in progress of course and maybe we are not going to get away with this kind of bate and switch. But I’m optimistic!
Hi Julian,
Yes, I think we are indeed once more circling around those dizzying life-mind continuity issues!
You write:
“Now if there is no need to describe smooth coping [in the toy car case, for example] in terms of the calculation and representation of PE then we want to argue there is no need to describe cognition in these terms either.”
But just because there can be some smooth coping without prediction and PE, that (of course) doesn’t mean that all smooth coping can be explained or supported that way. And smooth coping in an expert game of chess is intuitively much more demanding than anything in the table-top-car scenario. It must be an empirical question what strategies are in play in different cases.
I think there is one further consideration that I ought to mention. That consideration is hierarchical structure. A major feature of the PP accounts I favour is that they profit from the complex nestings made available by hierarchical form. That form is important in learning. And it is important in enabling the system to deal with real-world complexity. For one way to finesse some of the complexity (in search, learning, and deployment) is to exploit the way large-scale regularities are constructed out of smaller-scale regularities. Words form clauses that form whole sentences that form parts of larger discourses. Trees (I mean real ones, not the trees of grammar) have branches, branches have leaves, leaves have veins. In worlds that exhibit such structure, generative models that need to get to grips with complex real-world phenomena can benefit from the dimensionality reductions that structured (nested) internal encodings provide.
Here too, I think there is a processing vision that gets lost if we try to re-tell the story without appeal to contents. The best we might then do, it seems to me, is to try to re-discover the benefits of hierarchical structure by focussing on the separation of time-scales that often emerges in multi-level recurrent dynamical systems. But once again, I fear that ditching talk of structured and nested contents robs us of an appealing account of why that separation is adaptive.
Once again, we end up with complex dynamics that do indeed help everything go smoothly. But there is more that we can say about why this is so, if we allow ourselves to depict the levels as uncovering nested structure in some target domain. So why deprive ourselves of that further understanding?
Thanks, Andy, this is useful. I recall really enjoying a paper of yours making similar points with regard to the outfielder problem and how elegantly it is handled in this framework. Let me press a bit more, however.
You say:
“So the key point, I think, is that specific PE signals (in context) demand specific modes of resolution. They do not simply imply ‘ remove me’. ”
I agree that this is the key point. The question is how the RPP framework can handle it. On the most ambitious reading, the RPP framework seems to say something like this: all that cognitive systems ever do is minimise prediction error, subject to (i) some global domain-general constraints like frugality, and (I suppose) (ii) application specific brute physiological constraints.
This seems to be in tension with the idea that PE signals in different applications might mean different things (where, I am assuming, the issue is not just brute physiological constraints but genuine alternative ways to minimise error).
Even in the bobsled case, why do we assume that the bobsleder won’t minimise error by simply predicting she will crash and then crashing? If all she cares about is minimising error in the most frugal way, it looks like a mystery why this doesn’t happen.
Your suggestion may be that this is taken care of by context, but I am not sure I see how. I agree that in a context where *we assume that* updating to an internal model of herself crashing is already ruled out, PE signals will imply different things compared to a context where this constraint is not in place. But this part of “context” seems to be eminently cognitive (intuitively, the bobsleder doesn’t want to crash), and at least at first blush outside the RPP framework.
Anyway, thanks a lot for engaging, I look forward to reading the book.
Dear Markos,
You ask “Even in the bobsled case, why do we assume that the bobsledder won’t minimise error by simply predicting she will crash and then crashing? If all she cares about is minimising error in the most frugal way, it looks like a mystery why this doesn’t happen”.
There’s more on this in Post 4. But at root, the answer has to be that we do not care simply about minimizing error but about minimizing errors defined with respect to specific predictions. And the bobsledder will be frantically trying to reduce error relative to the high-level prediction that the crash is avoided and she wins the race.
What if there is a single [type of ] causal process running in the/a system, namely subtraction of present input and output values from past values? No fundamental distinction can then be made between bottom-up and top-down information processing (whatever such distinctions there may be neurally and societally). Interactions between such processes can be more sensory/perceptual or more intentional/actional (or indeed more ‘thoughtful’) according to what ecological content is dominating / salient / attended to at the moment.
We have been presenting such past-minus-present processes for some while. Amusingly, the math is determinate; it’s the action and perception affording environment that can be uncertain, as well as of course a particular prediction being less that perfect.
Dear David Booth,
That sounds really interesting. Can you point me to a good thing to read on this?
Andy, submissions of my theoretical updates have been delayed by relocation. The latest published update shows how the “present minus past” theory works out on a variety of multisensory data (with a philosophical postscript) – Seeing and Perceiving (2011) 24, 485-511, 639. I’ll try to email you (and anyone else who asks) the PDF of a three-part theoretical spiel that doesn’t fit any review journal, called “How a mind works.”
My earlier three-part theoretical statement is now mounted as a Working Paper on ResearchGate.
Cool, thanks David. I’ll try and get it now.
Andy
Can’t seem to find it! maybe you could just email it to me
andy.clark@ed.ac.uk
thanks!
Andy
Great posts and discussion so far – sorry to step in this late.
Let’s assume that if a system minimizes prediction error, then it approximates Bayesian inference (where ‘approximates’ means that the estimates of the system gets closer and closer to what explicitly following Bayes’ rule would achieve: not every step of the system will be what Bayes ordains but in the long run the system will settle close to the Bayesian estimate).
Also, that if a system approximates Bayesian inference, then there is something right about describing what it does in inferential terms. After all, it is not a coincidence that it ends up on the Bayesian estimate. It must end up with an internal model that represents (‘mirrors’) the states it is trying to estimate, otherwise the approximation-to-Bayes-aspect just falls away. If it does this through prediction error minimization, then it must – again ‘in some sense’ – be correct to say that there are predictions and prediction errors involved: it must be possible to reasonably interpret some part of the mechanism doing this approximation to Bayes in terms of representations, predictions, errors; and the function of the mechanism must be to ‘explain away’ evidence, again since this is what happens in Bayesian inference.
If we throw out all this inferential, intellectualizing vernacular, then we throw out the idea that a prediction error minimizing system approximates Bayes, and if we throw out that, then there is no predictive processing. I prefer to keep it and make it central to understanding it all, while of course being aware that the realization of this in the brain should not be understood as literally doing inference in the way I might do inference with pen and paper.
So the challenge to Andy (and perhaps Julian and others) takes the shape of a dilemma: either the brain minimizes prediction error and is inferential in this rather substantial sense, or it is not inferential at all (all the inferential interpretation should be thrown out) and then it is (oxymoronically) predictive processing without prediction error minimization.
Will action help with dealing with this dilemma? No, not at all. The reason is simple: action minimizes prediction error so therefore action approximates Bayesian inference. Hence ‘active inference’. So action is just as inferential as perception.
If action and perception are both inferential, then the push is towards internalism, it seems to me. The brain sits behind the sensory veil and cares not if it does perception or action, it just chases the best (most precise) long term prediction error minimization. When it happens through action, the brain enslaves the body, which is just one (albeit a welcome one) cause amongst others in the environment. Andy is right that there is much of interest in the way the brain picks and chooses amongst the ways of doing prediction error minimization – but the examples are all reducible to a question of what the Bayesian learning rate is. For example, being totally open to the “raw” sensory data is just having a very high learning rate (this doesn’t make the data more raw than when the learning rate is low). This is, in my optics, not Radical but rather sophisticated Conservatism.
Yours sincerely,
Staunch Internalist, Melbourne ☺
Dear Jakob (and maybe of some interest to Julian and Jelle too),
Thanks for coming by! As usual, you get right to the heart of the matter.
I occupy a kind of middle ground between you and Julian. It seems likely that each of you will try to show me that no such middle ground exists – that I must be sucked into one of your two camps. But I like this middle ground, so let me try and at least say a bit more about it.
You write:
“So the challenge to Andy (and perhaps Julian and others) takes the shape of a dilemma: either the brain minimizes prediction error and is inferential in this rather substantial sense, or it is not inferential at all (all the inferential interpretation should be thrown out) and then it is (oxymoronically) predictive processing without prediction error minimization.”
I am happy with the idea that the brain is a distributed, frugality obsessed, action-oriented inference machine. But then you say:
“If action and perception are both inferential, then the push is towards internalism, it seems to me.”
That’s where we diverge. At least, if by internalism you mean to suggest something deeply at odds with the kinds of work on embodied cognition that I’ve long been attracted by. There are lots of threads here, and I try to weave them together in a forthcoming paper called Busting Out: Predictive Brains, Embodied Minds, and the Puzzle of the Evidentiary Veil, to appear in Nous. I’ll pop a draft version up on ResearchGate later today.
Meantime, let me pick up on your comments about the brain “sitting behind the sensory veil”. In reply to that, I’d want to question the idea that there is a fixed, privileged sensory veil. Just as a creature may be born blind but slowly develop the ability to see, so a human being might be born without (say) infra-red (IR) vision but later come to possess that capacity thanks to some technological intervention. If an IR-sensitive lens were permanently attached to one of my retinas, and its outputs properly integrated in downstream neural processing, it would seem strangely unmotivated to insist that my true visual evidentiary boundaries remain those of the bare (IR-insensitive) biological system. Closer to home, we can easily imagine IR patterns being made available via a mediating wearable technology such as Google Glass.
Faced with such examples you may insist that the biological boundaries are privileged because they are implicated in the longer-term minimization of prediction error (minimization over whole lifetimes and at more evolutionary timescales). This is correct, but I don’t think it establishes the relevant conclusion. If we wish to understand my current capacity to identify living beings in the dark using their IR signatures (thermal imaging) we are not primarily concerned with my ancient metabolic boundaries or whatever more local sensory shields they may imply. Instead, we are free to locate the bounds of sensing according to a target set of capacities of interest – capacities that, in the case of IR-sensitive seeing, now implicate larger bio-technologically hybrid forms.
The perspective I am recommending may seem challenging insofar as it invites us to contemplate agents whose sensory boundaries are not fixed and even (although I haven’t argued for this here – but see the Nous paper) whose effective cognitive architectures may sometimes extend beyond the biological brain. But perhaps this should not surprise us unduly. For we have already encountered a potent inner equivalent in the PP account of attention and variable precision-weighting itself. Attentional mechanisms, that story suggested, alter patterns of effective connectivity so as to implement minimal short-lived cognitive circuits that are highly specialized for the task at hand. Attention, if this is correct, imposes a kind of transient organizational form upon the brain. Attentional mechanisms may thus be driving the formation and dissolution of transient partitionings within the neural economy itself, temporarily insulating some aspects of on-board processing from others according to the changing demands of task and context. So the idea of transient organizational forms is already there in PP. Seeing beyond staunch internalism then only requires us to add that those transient neuronal ensembles can recruit (and are also be recruited by) shifting coalitions of bodily and worldly elements resulting in the repeated construction (or ‘soft-assembly’) of temporary task- specific devices that span brain, body, and world.
All that is consistent, it seems to me, with the image of the brain (the ‘head ganglion’ as W. Ross Ashby liked to call it) as a distributed, frugality obsessed, action-oriented inference machine.
Hi Andy
Thank you for your posts.
My worry relates to an issue raised by Carrie Figdor, and touches on things talked about by Jelle Bruineberg and Julian Kiverstein.
Perhaps I’m missing something, but I’m still not clear what exactly is ‘predictive’ about RPP. The distinction between conservative predictive processing (CPP) and radical predictive processing (RPP) can be framed in terms of informational content. Informational content, by most people’s lights, is that which makes a claim about the world. Such content has, at a bare minimum, a property like truth. CPPers, like Hohwy, think that PP frameworks do have internal states with bona fide semantic or linguistic contents. RPPers, like yourself, deny that PP frameworks have states with contentful properties. (I take it that this is (partly) what earns RPP the title ‘radical’.)
RPP then, unlike CPP, avoids the onerous task of explaining how a property like truth can be naturalized via a mechanism engaging in minimising prediction error. This should, I think, count as a significant mark in favour of RPP. But this also ensures that RPP now bears a distinct affinity with Hutto and Myin’s Radical Enactive Cognition (REC) view. Hutto and Myin make it clear that, on their REC account, nothing in the brain rises to the level of content. Content is instead scaffolded by particular linguistic practices. This entails that while REC allows that the brain can manipulate statistical regularities or covariances between states of affairs, none of this can be couched in terms of ‘prediction’, at least if we are following the ordinary usage of this term.
However, I think this affinity between RPP and REC points towards a significant tension within RPP. For if there is no content in the mechanism, then what exactly is ‘predictive’ about RPP? As you point out in your reply to Krzysztof, there is a functional asymmetry at the heart of PP: something must be compared with something else, in order to generate error. Yet if mechanisms ‘predict’ in the manner favoured by RPP, then they are not ‘predicting’ in any normal, everyday sense of the term ‘prediction’. But why then call what they doing ‘prediction’ at all, and not something else? In the end, what is distinctively ‘predictive’ about what goes on within a RPP mechanism?
Dear Victor,
The points you make are good ones. But they apply not to RPP (the view I defend) but to RRPP (Really Radical…).
What makes RPP radical is not that it depicts predictions without contents. I don’t think (see, for example, the last few exchanges with Inman Harvey re Post 1) that view is ultimately defensible – it collapses into complex dynamics alone.
Instead, RPP is radical in that it makes action the pivot rather than descriptive fidelity. But since actions can succeed or fail, and since actions require some kind of ‘take’ on how the world is, it seems possible to combine an action-centric viewpoint with a substantive notion of predictions (as having contents that must answer to how things are).
Does this help?
Andy
Sorry for being so late to the debate. Just a quick(ish) comment on RPP vs RRPP. To my mind there is an important difference between the overall goal of (R)RPP and how it is mechanistically realised. While CPP can be cartooned as seeking an accurate ‘mirror’ of external causal structure, both RPP and RRPP emphasize organism-salience. (The appeal to ‘cybernetics’ is here evident in the notion of allostatic regulation of essential variables). It seems to me that organism-salience is orthogonal as to whether the relevant ‘prediction errors’ are explicitly computed w.r.t. a probabilistic generative model, or whether they are merely a way of describing a non-mechanistically-inferential bag-of-tricks. The Cybernetic Bayesian Brain idea (CBB, why not) approach emphasizes allostatic regulation but also allows (and encourages) that the mechanism may be explicitly inferential. Historically this appeals to Conant & Ashby’s good regulator theorem, that ‘every good regulator of a system must be a model of that system’. This idea leaves open the important open question of distinguishing between BE and HAVE, which is equivalent to the distinction between an explicitly inferential mechanism vs a bag of tricks. What then motivates the ‘Bayesian’ (inferential) part of CBB? Here we go right back to Helmholtz and the need to deconvolve complex mixtures of hidden causes – which is equally relevant to perception and regulation. To put it more bluntly, when radical enactivists talk about reducing ‘disattunement’ and the like, there is almost never any mechanism provided. In contrast, an explicitly probabilistic generative model can provide *counterfactual* predictions about sensory signals given particular actions, which can guide the organism back into states of attunement (homeostasis) with its milieu. At least, that’s what I hope we will be able to show in models and experiments to come. A key testing ground will be to examine how ‘active inference’ plays out differently in interoceptive and exteroceptive contexts.
These ideas are fleshed out a bit more coherently in my reply to Wanja Wiese’s comment on the CBB idea: https://open-mind.net/papers/inference-to-the-best-prediction
Sorry again for being so rudely late to the game.
Hi Anil (and this might be of interest to Julian and Jelle and Inman too)
Great to hear from you! Your comments are really useful. I agree that those issues can, and should, be distinguished. In terms of laying bets, I was struck by this part of your comments:
“In contrast, an explicitly probabilistic generative model can provide *counterfactual* predictions about sensory signals given particular actions, which can guide the organism back into states of attunement (homeostasis) with its milieu. “
This links nicely to some very recent remarks by Karl Friston (personal communication). Karl suggests that things start to look more strongly representational (and the really radical perspectives fail) when, and only when, we move from thinking about simple systems to thinking about systems that select actions on the basis of predictions about the resulting shape of the future, given some possible action. This is ‘counterfactual prediction’ right?
It is interesting that this move replicates the way earlier discussions (in the ‘philosophy of dynamical systems’ literature that grew up around Tim van Gelder’s Watt Governor case etc) went – roughly, the strongest cases for explicit representing/predicting involve action selection in the absence of adequate signals from the current environment.
Maybe we have come full circle to Clark and Toribio’s 1994 piece ‘Doing Without Representing’?
Dear Andy (and others),
Much has been said already, but I think it is useful to distinguish two directions in RRPP (still don’t like the label 🙂 ). On the hand there is the class of questions pursued by Victor Laughlin and Inman Harvey concerning the relation between prediction and content. The radical enactive debates on teleo-semantics and –semeotics might be applied to PP here. On the other hand, there is a genuine body of work using complex systems science to understand anticipatory dynamics, animal-environment coupling and even intentionality. I think both questions are worth pursuing and might in the end be complementary, but I am mainly interested in the latter.
Good examples of research pursuing the latter question are Tschacher and Haken’s (2007) work, Scott Kelso’s work, Varela et al., (2001), and also some of Friston’s work (see: Friston, K. J. (1997). Transients, Metastability, and Neuronal Dynamics. NeuroImage, 5(2), 164–171. https://doi.org/10.1006/nimg.1997.0259). The explanans here is large-scale neurodynamics, phase-synchronization… one knows the buzzwords. These large-scale dynamics play no doubt a very important role in having the ‘right’ context-sensitive effective connectivity (where ‘right’ here is understood in selective openness to the affordances that make the animal tend towards grip).
The big question I think, is how more locally mechanistic and computational PP accounts tie in together with this more broadly complex systems perspective of anticipatory dynamics. I don’t think a purely PP model for the brain is feasible, for one because explicit generative models become intractable very quickly. I find this complex systems perspective (and the obvious continuity there in Friston’s work) lacking in both your and Jakob’s account (although I have not fully read your book yet).
In the earlier reply, you made the distinction between simple life-forms and complex ones. Predictive processing would only apply to the latter. I am not sure whether that is a useful distinction. For one, entrainment, synchronization and the like seem to be going on in the human brain and seem to play a functional role. Second, the focus on predictive processing now seems to be exactly an opportunity to find a continuity between cognitive science and theoretical physics and complex systems. The free-energy principle as well as Haken’s work seem to make that possible, but also, for example the following paper: Still, S., Sivak, D. A., Bell, A. J., & Crooks, G. E. (2012). Thermodynamics of Prediction. Physical Review Letters, 109(12), 120604. https://doi.org/10.1103/PhysRevLett.109.120604
All these papers are pretty hard to understand, but getting clear on the theoretical distinctions between ‘embodying a model’ and ‘having a model’, ‘predicting’ and ‘anticipating’, ‘inference’ and ‘synchronization’, ‘computing explicitly’ and ‘computing by means of its dynamics’ is I think crucial for a sensible debate in the philosophy of mind on predictive processing.
The charm of FEP is exactly that it is able to “muddy the waters” and at least is able to conceptually integrate PP accounts, complex systems accounts and any fancy combination of the two. By moving away a bit from PP, one does not only “deprive oneself from further understanding”, but also remains open to (non-mechanistic) alternatives/complements that do aid anticipation. In that sense, our Frontiers paper you mentioned is a start for thinking in that direction, where what is ‘lost’ by getting rid of Helmholtz and hypothesis testing is won gaining insights from ecological psychology and the phenomenology of skilled action.
Dear Jelle,
I like your suggestion to distinguish the two sets of issues, one centered very much on the justification of content-ascriptions and the other on complex dynamics. Though for Inman (and anti-representationalists in general), the notion of ‘anticipatory dynamics’ ought (I think) to be every bit as fraught as that of prediction or predictive dynamics. After all, talk of anticipations or anticipatory dynamics (to be worthy of the name) seems every bit as implicitly content-invoking as talk of ‘predictions’ or ‘predictive dynamics’. In each case, there needs to be something that is being predicted/anticipated, and so the content-question can be raised.
Regarding explicit generative models, you are of course right that the computations here can become rapidly expensive. That’s why the models need to be as frugal as possible. But I don’t think frugal is the same as ‘merely structurally implicit’. That said, I think it’s actually rather hard to tell these options apart – for (as you note) there is no generally agreed-upon way of demarcating explicit and inexplicit representings.
Additionally, Friston himself (personal communication) suggests that once we enter the realm of anticipations/predictions concerning (not simply the present sensory signal but) the future, and hence targeting what is currently counterfactual or non-existent, the need for stronger forms of internal encoding increases. As I said in my earlier reply to Anil Seth, it is striking that this move brings us full circle to earlier discussions (in the ‘philosophy of dynamical systems’ literature) that suggested that the strongest cases for internal representation involve action selection in the absence of adequate guiding signals from the current environment. But I fully agree with your comment that:
“getting clear on the theoretical distinctions between ‘embodying a model’ and ‘having a model’, ‘predicting’ and ‘anticipating’, ‘inference’ and ‘synchronization’, ‘computing explicitly’ and ‘computing by means of its dynamics’ [will be] crucial for a sensible debate in the philosophy of mind on predictive processing.”
I agree too that exploring the frugal/ecological dimensions is crucial. My own bet is that much of human skilled behavior flows from the development and flexibly-frugal use of multi-level models that separate out interacting distal (or bodily) causes operating at varying scales of space and time. For that gives us a neat way of dealing with domains rich in articulated structure.
Finally, in suggesting that PP may not describe very simple FEM (free-energy minimizing) systems, such as bacteria, I didn’t mean to imply that there is no interesting continuity between the simpler and more complex strategies. Indeed, the continuity may well be precisely that both are ways of minimizing free energy, though PP (as I pitch it) does so in ways that make greater use of lifetime learning and multi- time-scale dynamics.
Thanks for the constructive and engaging comments. I’m really looking forward to seeing lots more of your important work exploring the ecological/dynamical manifestations of FEM.