Is prediction error minimization all there is to the mind?

The prediction error minimization theory (PEM) says that the brain continually seeks to minimize its prediction error – minimize the difference between its predictions about the sensory input and the actual sensory input. It is an extremely simple idea but from it arises a surprisingly resourceful conception of brain processing. In this post, I’ll try to motivate the somewhat ambitious idea that PEM explains everything about the mind.

The first objection people have when learning about the idea of prediction error minimization is that is must obviously be false. Minimizing prediction error is minimizing surprise, and the best way to minimize surprise, when it comes to sensory input, is to not have any sensory input. If we minimize prediction error we should therefore all seek out dark rooms and stay there. But we obviously don’t, so PEM is false.

This objection rests on a misunderstanding about what the theory says. It is crucial to see that it concerns prediction error minimization on average and in the long run. The brain is doing lots of things to maintain its ability to minimize prediction error reasonably well at the current time and over time. This means that just seeking out the dark room and staying there is not going to work since after a while in the dark room prediction error is going to increase. Hunger, thirst, loneliness are all states we don’t expect in the long run, so they are surprising. Similarly, concerned family members, council workers and landlords are going to come knocking, creating prediction error that staying in the room cannot deal with. (See here for great paper on the dark room).

What the dark room problem tells us is that prediction error minimization always happens given a model, a set of expectations. We will find an organism chronically in the dark room only if this is the kind of creature that on average is expected to be found in a dark room.

There is much more to say about the idea that prediction error minimization always is given a model (see my book and Andy Clark’s terrific BBS paper for introductions). In many ways, here it would make sense to change to talking about the free energy principle and its relation to self-organized systems.

In this post however I will use the idea that prediction error minimization happens over time to motivate the range of activities the brain engages in to safeguard its ability to minimize error.

Assume the brain harbours models of the environmental causes of its sensory input. On the basis of these models, it generates predictions about what the next sensory input will be. These predictions occur concurrently on several time scales, ordered hierarchically up through the cortex. In this way the sensory input is predicted under expectations of what might happen very soon and also under expectations about how what may happen very soon is influenced by what happens at slower time scales (I might expect sensory input as of a leaf dropping to the ground, and that expectation is modulated by my expectations about the windy conditions, and by my expectations about the lighting at this time of day).

Perception happens, then, as the model generates expectations that anticipate the actual sensory input. The parameters of the model predictions will be updated in the light of any prediction error in approximate Bayesian inference: prediction error is weighted by how much we already know (the prior precision), by how much we are learning from the input (the likelihood precision). What we perceive is then determined by the currently best performing predictions.

It is clear that if predictions are not informed by longer-term expectations, then it will be hard to predict sensory input very well (the trajectory of the leaf will be hard to anticipate, and the precision of the sensory input at dusk may be confounding). This follows from the simple observation that we live in a world where causes interact (cf. the famous cat behind the fence). This kind of ‘convolving’ of expectations based on causal regularities at different time scales ensures that perceptual inference, via prediction error minimization, can capture the full, integrated richness of perception. This is made possible by incorporating a long term perspective on prediction error minimization. Here we immediately get a learning (and memory) perspective because building up a hierarchical model then requires extracting causal regularities over various time scales and using them to predict better.

The idea so far is that there is a rich, top-down cascade of interwoven predictions, which seeks to dampen down the sensory input. This can be conceived as driving top-down messages. But we also need a specific type of modulating top-down messages. This is because levels of uncertainty change in the world and thereby the trustworthiness or precision of the bottom-up prediction error changes in a context or state-dependent manner. (To take a long-term example, even if I have learnt to expect certain kinds of rainfalls in certain seasons I might have to adjust this in the light of a shift from La niña to El niño, such that a bit of rain can’t be trusted as a sign of a good season). This is important because if model parameters are to be updated optimally in the light of prediction error then there needs to be an estimation of the precision of that prediction error (it is no good to change the hypothesis in the light of an imprecise measurement). This calls for building up expectations of the precisions of prediction errors, that is, expectations for in which contexts prediction errors tend to be trustworthy. This is just more prediction error minimization, but of a higher statistical order (for example, we can be surprised at the precision of prediction error). Rather interestingly, this gives us attention. This is because attention is allocation of resources to worthwhile signals, and expectations of precisions can guide prediction error efforts to worthwhile or precise signals. There is of course much more to say about this idea but it is enormously appealing because it brings attention in at the ground level and as separate from from, yet intricately related to, perception.

This gives us perception, learning and attention direct from PEM. Next think about what I will (somewhat artificially) call understanding. I conceive of understanding as having a reasonable model for making sense of a domain, even if there is still uncertainty about the states of the domain. The opposite of understanding is confusion, which is not knowing which model can reasonably be appealed to. Whereas perception seeks to minimize uncertainty about what causes the sensory input, understanding is concerned with selecting a reasonable model with which to minimize uncertainty. For example, if you see a dice showing some number of eyes, confusion will ensue if you think of that input under a coin tossing model. It is clear that prediction error minimization is helped by selecting good models since the wrong model will be no good at anticipating the next input. A good model also captures the given input in a minimally complex fashion, without too many unnecessary parameters. If the model is overfitted then it will be poor at predicting what the next series of input is going to be. Overfitting may give decent momentary or short term prediction error minimization but is bound to fail in the long run. Hence we should expect the prediction error minimizing brain to be engaged in model selection and complexity reduction, in other words, that it will aim at understanding.

The last thing to add is action. The whole story here is a little more involved even though the basic idea is utterly simple. If a hypothesis has predictions that don’t hold up, then the hypothesis can be changed to fit the input or the input can be changed to fit the hypothesis. So far the discussion has been about the first direction of fit but of course it is possible to minimize prediction error under the other direction of fit too, and this is action. This is central to PEM (and more generally to the free energy principle). This was implicit in the discussion of the dark room problem above: we need to act in the world to minimize prediction error in the long run. If we ever only update our model parameters in the light of the error, then we will not be able to maintain ourselves in low surprise states (given our model). More concretely, we have to conceive of action in terms of a competition of hypotheses about what the true state of the world is. For example, one (actually true) hypothesis is that my hand is close to the keyboard and another (actually false) hypothesis is that my hand is on the cup of coffee next to me. Action ensues when the actually false hypothesis begins to win, which it does when I increasingly mistrust the actual sensory input: the false hypothesis is then made true by minimizing its prediction error as my hand reaches out for the cup. It sounds rather intricate but is a compelling idea, which does away with cost functions and motor commands. Currently, we are having a lot of fun with this notion of ‘active inference’ both in self-tickle experiments with George van Doorn and reaching tasks in studies of autism spectrum disorder with Colin Palmer, Peter Enticott and Bryan Paton.

So now, from the sparse beginnings of PEM, we get integrated conceptions of perception, learning, attention, understanding and action. Moreover, we get this at multiple, interwoven timescales stretching from the lowest sensory attributes to the most frontal, long-term representation in the brain. Everyone of these aspects of PEM are being investigated in labs around the world.

The PEM mechanism takes care of the problem of perception since high mutual information between brain and world is ensured on the basis of comparison of two quantities that the brain actually has access to, namely predictions and sensory input. It also relates nicely to ideas about what it takes to be a living organism.

Viewed as this kind of package, PEM has the promise to account for everything mental. What more could you want than perception, learning, attention, understanding and action? This promise is strengthened when these aspects of PEM are applied to different areas, such as interoception (yielding emotion) and self (viewed as a parameter that helps explain the evolution of the sensory input), etc. (for more, see my book, Andy Clark’s recent series of papers, work by Anil Seth, and from Karl Friston’s group).

For these reasons, and also for a few other reasons, I think prediction error minimization is all the brain ever does.

Of course, there are very many questions to ask about PEM. Like: what’s the evidence for it? Is it a good thing if something like PEM unifies work on the mind? In what sense does PEM explain, and how is PEM implemented in the brain? And so on. This post is conveniently long enough now however. In the next post, which will hopefully be shorter, the plan is to talk about how PEM might relate to embodied cognition, and then in the last post say a little about consciousness and PEM.

25 Comments

Bill Skaggs

June 22, 2014 at 9:17 am 11 years ago

Hi Jakob,

I hate to say anything critical here, because as a theoretical neuroscientist I think the concept of PEM is likely to be extremely important for understanding some aspects of the brain, and I wouldn’t want to discourage work on it.

But it seems to me that the idea of PEM explaining *everything about the mind* is a non-starter, and can only steer the discussion in unproductive directions. To explain everything about the mind, PEM would have to explain action by saying that all of our actions are chosen solely for the purpose of minimizing our prediction errors. Since there is no direct relationship between prediction error and adaptive fitness, that hypothesis strikes me as surely insufficient.

Or have I leaped to spurious conclusions?

Best regards, Bill Skaggs
- Bryan Paton
  
  June 22, 2014 at 10:09 pm 11 years ago
  
  Hi Bill,
  
  In regards to their not being a direct link between prediction error and adaptive fitness have a read of these papers:
  
  https://www.fil.ion.ucl.ac.uk/~karl/Life%20as%20we%20know%20it.pdf
  
  https://www.fil.ion.ucl.ac.uk/~karl/A%20Free%20Energy%20Principle%20for%20Biological%20Systems.pdf
  
  https://www.fil.ion.ucl.ac.uk/~karl/Free%20Energy%20Value%20and%20Attractors.pdf
  
  Bryan.
  - Jakob Hohwy
    
    June 23, 2014 at 12:35 am 11 years ago
    
    Thanks Bill and Bryan,
    The argument that I am playing with here is that PEM is not a non-starter as an explanation of the mind. I think you’re right Bill, that action is one of the most severe obstacles. There is however a very direct way to link action and adaptive fitness (set out in the papers Bryan links to) but going that route involves accepting the free energy principle as being at the root of PEM. I think this is compelling but not everyone wants to go there (e.g, Clark might not). I think it is the right way because it is the only way long term PEM makes sense (and as the post says, PEM has to be long term). The idea here is that everything we do helps maintain us in our expected states, where expected states define our phenotype and thereby relate to adaptive fitness. In this way of thinking, the model relative to which prediction error is minimized is just the agent. Friston and Stephan in Synthese from 2007 describe this well. This is to say that there is in fact a direct link from action to adaptive fitness under PEM – it is not to say that this is therefore an idea that should be accepted. But I do think it overcomes the obstacle and ensures PEM is not a non-starter.
- Neil Howard
  
  July 15, 2014 at 11:19 pm 11 years ago
  
  Hi,
  
  I think Jakob’s perspective on PEM ‘explaining *everything about the mind*’ is right. PEM will not be sufficiently right in detail but it is a better avenue for exploration than anything else, on what will be a very long scientific journey. I make a comparison with Lamarckism…
  
  (The following taken from https://headbirths.wordpress.com/talks/intelligence-and-the-brain/
  )
  
  “I would suggest that Free Energy[/(PEM)] is currently where evolution was in around the year 1800 – around the time of Lamarckian evolution. … The point here: Lamarckism may be a wrong theory but it provided an example of what a correct theory might look like, 150 years before all the science was put in place. On a scale from ‘creationism’ to ‘DNA’, Lamarckism is right next to ‘DNA’ – it is right on the big issues and wrong on the details. Now, it may be 150 years before we have a comparable scientific theory for intelligence[/mind/brain]. None of us alive today will be around then. Here, I want to provide a glimpse at what that (correct) theory of intelligence[/mind/brain] might look like.”
  - Jakob Hohwy
    
    July 17, 2014 at 2:39 am 11 years ago
    
    Hi Neil – that is an interesting take on the state of the debate. I think there are probably lots of analogies between evolution and free energy. Some of these analogies are historical/methodological, some are deeper, I suspect. Cool overall post on FEP too!
Dan Ryder

June 22, 2014 at 3:48 pm 11 years ago

Hi Jakob – very glad you’re doing this series! I am convinced of the importance of PEM, though skeptical of the claim that it explains everything. (Worth giving it a shot, though!) I also think it fits nicely with SINBAD, if you’re familiar with that. (Though SINBAD isn’t so strictly hierarchical about the direction of prediction, I think it offers a mechanism for model-building that can suit the needs of PEM.)

Like Bill Skaggs, I’d also like to put pressure on the action side of things: at first glance, it doesn’t really seem to accommodate desires and other pro-attitudes in the right way. Here’s a simple example that’s meant to show that it’s pro-attitudes that drive action, and not PEM. I want her to go out with me, and I fully expect her to say no. But I had better not act so as to make that expectation even more probable! This problem will generalize at least to cases where the preferred outcome isn’t the expected one. So… I assume there must be some way that PEM handles the assignment of value to fix this.

Thanks!
- Jakob Hohwy
  
  June 23, 2014 at 1:00 am 11 years ago
  
  hi Dan,
  Exactly! PEM is as you say “worth giving a shot” as an explanation of everything about the mind. This is why I am excited about it. And I can’t think of any other theoretical framework that comes even close to this.
  
  As I said in response to Bill, I agree that action may require a bit more work than some of the other elements (even though action is at the heart of PEM). I’ll begin by making a slightly weasley move and say that PEM will explain everything about the mind but that we should expect to learn something new about the mind through PEM too; in other words, PEM will be revisionary with respect to some aspects of mind, or drive our conception of mind in certain directions. So we should not expect PEM to account for all commonsense notions and categories of the mind.
  
  I say this because I think some of our preconceptions about attitudes may come under pressure. In particular, I think the categorical distinction between belief and desires begins to wash out, as does the distinction between perception and belief. It is a nice question what to do with other attitudes (Zoe Jenkin raised this on the birmingham blog and I speculated in response about hope and fear, for example https://philosophybirmingham.blogspot.com.au/search/label/Hohwy)
  
  It is true that I might act on something I think is unlikely to happen. The way to accommodate this is to consider the hierarchy of time scales and the uncertainty about how to minimize prediction error in action. There might be a very long term, very confident expectation that one will partner up, which guarantees that we act upon it. However, there is very much uncertainty about what the policy is for getting to this expected state (partly because we know it is competitive, and partly because it involves complex modelling of the interacting causes that are several other agents). This means that we might have to act on a policy that it riddled with uncertainty (I might assign a very low probability to anyone wanted to go on a date with me), or that we might have to explore the environment in the hope of learning new patterns of sensory input.
  
  So I think there are resources for handling cases like this within PEM. We need to think about expected states, and expectations for how to get there (these are all priors in the Bayesian sense).
  - Dan Ryder
    
    June 23, 2014 at 1:28 pm 11 years ago
    
    Thanks, that’s very helpful! It sounds like the hierarchy of time scales (and the hierarchy from concrete-perceptual to abstract-conceptual?) might help with a lot of apparent problems. But I imagine it also causes its own problems, especially Quine-Duhem type problems. The more we increase the scope and complexity of the theories relevant to our prediction, the harder it will be to assign blame in updating the model. I’m not so much worried about the epistemological problem here as I am about the mechanistic one. While the proposals in the perception literature seem straightforward enough that they could be implemented neurophysiologically, I’m worried that your more ambitious proposal takes us away from anything realistically implementable.
    
    A related worry is about falsifiability. Yes, you can accommodate cases like the dating one by appealing to the “right” level of the temporal hierarchy, but it starts to sound like you can accommodate any data by appealing to the “right” level. What would falsify the account? (Maybe just that it isn’t what the mechanism is actually doing, connecting back to the Quine-Duhem/implementation problem.)
    
    Finally, I find myself having trouble wrapping my mind around such a profound departure from folk psychology. But no doubt from your perspective the departure doesn’t seem that profound, at least nothing to worry about. I probably just need to live with the theory for a while!
    - Jakob Hohwy
      
      June 25, 2014 at 10:40 am 11 years ago
      
      hi Dan,
      sorry I didn’t spot this response until now. I think the Quine-Duhem worry is an important one. In fact, the whole story seems quite Quinesque, in so far as on the PEM story sensory input literally impinges (causally) onthe periphery and is filtered (in the computational sense) up through the hierarchy (I’ll talk more about this in the next post, see also Bryan’s comments below). I think there are indeed epistemological issues here since it will be a non-trivial task to recruit the right level of the hierarchy to deal with the input in a given situation – such decisions will only be tested over time, as the long-term average of prediction error minimization is impacted by processing biases. We think this is crucially involved in various mental and developmental disorders. In particular, we have studied clinical autism and healthy individuals high in autism-like traits from this perspective. We believe we find that there is less context-dependence as autism-like traits add up. Given that context is tied to time-scales this suggests a Quine-Duhem style problem being at the core here.
      
      I mention this also because our experiments suggest that it is possible to explore this empirically – re your point about falsifiability. In general I think it is right that we have to worry about just-so stories for Bayesian accounts. So positing a specific set of priors and likelihoods (at a given temporal scale) should ideally be backed up by experimental evidence or at least by the existence of similar priors in other contexts.
      
      You point out that the Quine-Duhem problem might relate to an implementational issue. I might need to learn more about this. For now notice that PEM comes with considerable more structural constraints than the Quinean network of belief. As we work inwards from the periphery successive processing steps must increase in time-scale (and decrease in level of detail). As we work outwards from the stable rules at the center, we see how deep priors work as control parameters on ever lower levels. And levels are characterised by conditional independence. In this way the implementation of the hierarchy is regularised, which helps claiming biological plausibility.
      
      You’re right about the challenge to folk psychology! Sometimes I fear that eliminativism lurks just around the corner. Certainly, once PEM gets under your skin it is tempting to view people’s behaviour in terms of priors, likelihoods and precision-weighted inference! Happiness is just the absence of prediction error, after all… However, I do think that this is what makes PEM worth the effort. I would be much less interested in it if it left everything as is. In my view it is exciting to use a completely general theory to challenge folkpsychological notions of perception, belief, desire, decision (and much more). Time to shake up things a bit.
Glenn

June 22, 2014 at 9:10 pm 11 years ago

Hi Jakob
I’m trying to work out what I think about this, so please forgive a few requests for clarification. (I hate numbering things, but I will for clarity)

1. You say “Hunger, thirst, loneliness are all states we don’t expect in the long run, so they are surprising.” But that doesn’t strike me as obviously true, we don’t I expect hunger to occur? After all it occurs several times a day. Does this mean I don’t expect it to occur and not be acted on?

Moreover, the affective/visceral nature of hunger (etc) seems sufficient to explain why such states act as motivations. What does it add to say by becoming satiated we reduce prediction error?

2. How could PEM deal with scenarios where people actively seek novelty or surprising situations?

3. If predictions are made on Bayesian grounds, can we have a non-aribitrary naturalistic account of how priors are set?

4. So far as I can gather, and I may well be wrong, the main argument for PEM seems to be that it gives us a unifying principle for accounting for perception, action, attention, learning and understanding. But, don’t we already have such a principle in representation?
cheers
Glenn
Assaf

June 23, 2014 at 3:03 am 11 years ago

Hi Jakob,

Taking the point of view of evolutionary psychologists, the mind is supposed to contain many modules, each of which has evolved to solve a specific information-processing problem, often using heuristics that apply to this specific problem but not to others. So, at least on its face, one would expect to find diversity in the basic (nontrivial) principles governing these modules (with some modules doing PEM and others using some other information-processing strategy), but if PEM explains *everything* about the mind, all these modules obey the same (nontrivial) basic principle – they all do prediction error minimization. So instead of diversity we find uniformity, or so it appears. How do you think this (apparent) tension should be resolved? (sorry if this is a well known point)

Thanks!
- Bryan Paton
  
  June 23, 2014 at 4:11 am 11 years ago
  
  Hi Assaf,
  
  If you take the modular hypothesis to be true then all of the modules should be doing prediction error minimisation. Even though there is one mechanism there are different routes to achieve that same mechanistic goal. For instance both Action and Perception both aim to minimise prediction errors but each has a different route to get there. But even at lower physiological levels prediction errors can be minimised in different ways, for instance the structure of the visual cortex and auditory cortex share some similarities but have very different architectures. This architecture, how best to configure the prediction error minimisation apparatus, will depend heavily on what task the physiology has.
  
  Even at a more cognitive level, which might be closer to the level of description you are aiming for, there will be differences. How to solve a quadratic equation is going to require different cognitive mechanisms than how to interpret the facial expressions of friend to whom you have just told a joke (but maybe not depending on your personality). Both different mechanisms are in service of reducing prediction error but the means to do so is different.
  
  Does this fit with what you were thinking of?
  
  Bryan.
  - Assaf Weksler
    
    June 23, 2014 at 4:31 am 11 years ago
    
    Thanks Bryan,
    You successfully show how the idea that all modules do PEM is compatible with diversity in cognitive architecture. But what I had in mind is a somewhat different tension: Given that PEM is not the only possible way to successfully solve information-processing problems, and given that each module has evolved to solve a local, specific information-processing problem, we should expect to find some modules that are not governed by EPM.
    - Jakob Hohwy
      
      June 23, 2014 at 10:31 pm 11 years ago
      
      Hi Assaf – This is a nice challenge since if PEM can’t explain modularity, then it can’t explain something quite important about the mind. I agree with Bryan though that modularity is probably the upshot of different bits of brain latching on to different statistical properties of the world: it still requires PEM to extract those regularities however.
      
      In general, the notion of modularity is of course still debated. I think people like Fodor would not be too hostile to the idea that there is PEM within a module; for Fodor it is more a question of denying something like top-down modulation into low level modules.
      
      One thing that is often under-discussed in the modularity debates is the question of connectivity between modules. PEM, in Friston and Hinton and other’s treatment, speaks to this by having essential roles for both functional segregation (modularity) and functional connectivity.
      
      In the book (ch 7), I offer some further considerations for why there might be modularity, namely in terms of the probabilistic value of having a few conditionally independent sources of evidence for a given hypothesis (like witnesses in a courtroom).
Jakob Hohwy

June 23, 2014 at 3:29 am 11 years ago

Hi Glenn – good to see some itemizing:)

1. You’re right we expect thirst/hunger etc to occur but if you lock yourself in the dark room, then your expectations for when and how they occur are violated. Let’s say we have learned that interoceptive prediction error, which can be gotten rid of by eating, occurs about every 8 hours; then if it occurs all the time, this will be a prediction error. And the affective/visceral nature of hunger etc is to be understood as prediction error (a la James-Lange, appraisal theory, my take on emotion/sensation in Mind & Language 2011, Anil Seth on interoception etc.).

2. This is another classic question (also related to the comments by Bill and Dan). Whereas we don’t explore so much that we threaten homeostasis too much, we do explore more gently. This is because the world is a changing place where staying put for too long will incur prediction error cost; so we explore under the expectation that we’ll encounter precise prediction error.

3. Where do the priors come from? They are honed in empirical Bayes, that is they are learned from prior experience. It is like a running update of a mean, with an optimal learning rate; every time we’ve updated we use that as the prior for the next sample. The priors over time come to recapitulate the real structure of the world. Very naturalistic.

4. The problem, I think, with an appeal to representation is that it doesn’t explain anything. It just asserts that there is representation, but not how it comes about (it assumes the problem of perception can be solved). Nor does it explain how attention arises or how it relates to perception, or learning; and of course as you say, there is no relation to action. PEM is attractive because it allows us to see how the processing might go. It also makes representation an upshot of prediction error minimization, rather than a goal in itself: we must be recapitulating the structure of the world if we act to maintain ourselves in our expected states. But maybe I am not getting what you have in mind with the appeal to representation here?
- Bryan Paton
  
  June 23, 2014 at 3:54 am 11 years ago
  
  I just want to add to Jakob has said regarding where do priors come from. In addition to those priors learned empirically some constraints (enabling and otherwise) on priors will result from phylogenetic and environmental factors, e.g. fish will have different priors for movement and falling objects than non-aquatic creatures. Those constraints on priors that naturally arise out of learning will themselves be hyperparameterised by the phylogenetic and/or environmental priors.
  
  Bryan.
Lars Marstaller

June 23, 2014 at 7:09 am 11 years ago

Hi Jakob,

I think Glenn has a point as far as representations go. If you define representations as internal models based on which predictions are made, the minimization of prediction error amounts to the development of an accurate/useful internal model. Check out our recent paper where we show how such representations arise from evolution: https://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00475

Cheers,
Lars
- Jakob Hohwy
  
  June 23, 2014 at 9:41 pm 11 years ago
  
  Hi Lars – that is a great paper!
  
  To be clear, PEM relies on internal models, i.e. representations. These models are honed in prediction error minimization (which is Bayesian updating of hypotheses). In this sense, there is no conflict between PEM and representation: predictions are made on the basis of generative models. This is why I was confused about the appeal to representations in a criticism of PEM: it can’t be that representations rather than PEM explains stuff, since PEM is representational.
  
  There might of course be other theories of representation than PEM. In my view none of them are very good since they don’t really seem to explain self-supervised systems. Related question: is evolutionary search self-supervised?
  
  One interesting question is whether reliance of evolutionary algorithms is different than PEM. I don’t know enough to tell for sure. PEM has room for Bayesian model selection, based on model evidence (though perhaps this goes beyond vanilla free energy principle). I find it difficult to understand evolutionary search if it is not an approximation to Bayes, and if it is then it begins to look much like PEM (but maybe I don’t know enough about evolutionary algorithms).
  
  One difference seems to be that PEM needs to be hierarchical, in the sense that hyperpriors and hyperparameters work top-down to ensure an optimal learning rate, on the other hand levels in the hierarchy are conditionally independent, which speaks to Markov aspects. Functional segregation (discussed in response to Assaf’s comment) will also serve to make the hierarchical structure seem more messy.
  - Lars Marstaller
    
    June 23, 2014 at 10:06 pm 11 years ago
    
    Thanks Jakob,
    the GA is completely self-supervised in the sense that we only define a fitness criterium. I always felt that there is considerable similarity between GAs and Bayesian model selection and I think that Chris Thornton’s paper (https://www.ncbi.nlm.nih.gov/pubmed/21585510) is in a way similar to ours but I can’t really explain why 🙂
    
    Cheers,
    Lars
    - Bryan Paton
      
      June 25, 2014 at 5:09 am 11 years ago
      
      In a trivial sense but also deeper teleological sense the free energy principle (we could even call it a criterion) is by definition a fitness criterion. Only those organisms that successfully can reduce their prediction errors can survive. That specific appeal, however does not help much as it is difficult to falsify.
      
      Thanks.
      
      Bryan.
      - Lars Marstaller
        
        June 25, 2014 at 5:43 am 11 years ago
        
        Hi Bryan,
        
        I think you miss the point here. A perfect mirror system is not necessarily a perfect survival machine. The nice twist about our paper is that the fitness criterium was to solve a specific task, not to evolve representations or minimize prediction error. The representations evolved as a viable solution to the problem of surviving in a particular niche but they didn’t evolve necessarily.
        
        Lars
      - Bryan Paton
        
        June 25, 2014 at 9:08 am 11 years ago
        
        Hi Lars,
        
        I might have missed the point, sorry if I have missed it again but I am not sure I did. I take the necessity point. In terms of a free energy or prediction error scheme, representation (whatever those might be) seems to indeed be necessary. The Free Energy idea goes deeper though. The “Life as we know it” paper I linked to above presents an interesting scenario. Starting with the assumption that an organism wants to maintain its own states (homoeostasis) in the face of environmental perturbation, and further faced with finite resources, finite, fallible perception and a horrendously complex environment then the entire Free Energy (predictive coding) apparatus develops. Representation is thus a necessity in this picture, but in a very speculative way Free Energy also seems to be a necessity for life in general.
        
        A prediction error minimisation system (scheme) does not aim for perfect mirroring, to do so would lead to an unfit system as you point out. A system need only minimise prediction errors sufficient to stave off degradation of its own internal states/systems, in other words resist entropy. Such systems live far from equilibrium, a system designed to be a perfect mirror would very quickly descend to equilibrium with its environment and ultimately expire. The surest way to ensure equilibrium with the environment is to open the cell wall to the external environment.
        
        Thanks.
        
        Bryan.
Chris Thornton

June 24, 2014 at 4:55 am 11 years ago

Hi Lars, Jakob and others in this thread…

It’s true that that my cogsci paper on `Representation recovers Representation’ from 2008 is along the lines of PEM (but was written before I’d really become acquainted with this new stuff). I too feel PEM is a very promising way of explaining things about the mind. My difficulties comes from being more an engineer (actually an AI hacker) than philosopher. So I’m searching for the implementable method, and my problem with the Friston story is that I don’t really see it showing me something I can do in practice.
Minimizing prediction error in structured probabilistic representation has to be a good idea, however you tell the story. The problem is how to make it work in non-trivial structure. It may be the secret is somewhere in Karl Friston’s formulae but I’m too much of a mathematical ignoramus to be sure!

Chris Thornton
- Lars Marstaller
  
  June 25, 2014 at 5:44 am 11 years ago
  
  🙂
  - Jakob Hohwy
    
    June 25, 2014 at 7:34 am 11 years ago
    
    Chris is too modest to mention his own very nice and cute AI-hacker style Braitenberg love-bot using predictive processing. http://www.sussex.ac.uk/Users/christ/demos/vid1-big.mp4.
    (and accompanying paper: http://www.sussex.ac.uk/Users/christ/drafts/draft-p.pdf)