Reconstructing the movie in your head, Redux

I’ve been thinking about the paper and movie linked at the previous post. Have a look at that if you haven’t, because it’s neat.

Here’s what you might think about the movie. You might look at the clip on the left and the movie on the right, and think “Wow! those look pretty similar!” You might further think “Gee, they must look similar because visual areas contain a remarkably accurate map of what people are seeing, and these guys have figured out a way to show me that mapping! That’s cool!” (If you tend towards Fodorian crankiness, you might also think “Who cares? We knew this already! I’ve learned nothing!”).

Watch a little closer, though. Why do the elephants at 0:12 look like the inkblots at 0:06? Why does what appears to be a mattress and some text show up at 0:07 when there’s nothing like that on the left? Why does the African-American dude with the stethoscope at 0:20 suddenly turn into a distinctly un-stethoscoped white woman at 0:21?

There’s a good answer to these questions. Here’s the simple version: the movies on the right are not brain data. They are a bunch of YouTube clips superimposed on top of each other. Which clips were used are based on the brain data: they represent the 100 most likely clips from the test set according to their model. Since the model is quite good at picking clips that subjects had seen, there’s a lot of overlap. But you’re not seeing the “movie in your head” in any important sense.

Here is my best guess at the details, having puzzled through the methods section and the SOMs. (Some caveats: I haven’t had time to work through the details of the modeling. The motion-energy part of it would be beyond my pay grade anyway. It’s late, I’m on a diet, and so I’m ornery. So this is a pretty birds-eye view; if I’ve gotten anything obviously wrong, please let me know.)

Step 1: Take your movie clips, split them into short chunks, downgrade the heck out of them, and throw out chromatic information.

Step 2: Feed those movie-chunks into a fancy model designed to extract motion components.

Step 3: Take the results of step 2 and use them to convolve a hemodynamic response function for each movie-chunk. That gives you a prediction for the BOLD response in a voxel if it was seeing that chunk.

Step 4: In each voxel, determine the goodness of fit for each hemodynamic response function you got in step 3. That gives you the posterior probabilities for each chunk at each voxel. Then take the most discriminating voxels and combine those predictions to make predictions about which chunk was being viewed at that time.

It turns out that those predictions are impressively good. That is a cool feature of this paper: the BOLD signal is usually thought to be too sluggish to get any kind of fine temporal information out if it, especially with this sort of stimulus design. So I’m not downplaying what they did. It’s neat. I wouldn’t have thought it would work, but there you go. But where do the movies come from? So far as I can tell, each voxel is making a prediction about whole movie chunks. Well:

Step 5: Then take the movie clips—the original clips, not the brain data—and average them together, weighted by the posterior probabilities you got in step 4.

That’s why the movies on the right are all eerie and ghostly: each bit you’re seeing is an average of a whole bunch of YouTube clips. That’s why there are lots of odd artifacts: sometimes the model misfires, and so clips that have nothing at all to do with the original movie get included. That’s why it seems to work pretty well with talking heads: those are plentiful in their data set, so the model’s mistakes tend to be other talking heads. That’s why it doesn’t work so well with elephants and inkblots—the misfires tend to pick out unrelated chunks. That’s why the text bits are unintelligible but text-y. They’re using movie trailers, and the bits of text in each are (I’m guessing) pretty hard to tell apart using this kind of model, so it’s just kind of glomming all of them together.

In short, this is not reading movies off of your brain. This is using a complicated model to predict what movies someone saw, and then averaging together the original clips corresponding to the best predictions. The movies on the right are a snazzy, misleading way of representing the posterior probability distributions over the set of possible clips. Since many of the wrong guesses still match up well with the original stimulus, it partly validates the model’s ability to pick out clips with similar visual features at a relatively fine time scale. (More cynically, if they’d used only the top prediction of the model, which was usually pretty accurate, it would be obvious that they were just showing you youtube clips).

There’s a lot of cool stuff in the paper aside from the methods, especially about selectivity in V1. It’s an impressive proof of concept. But we’re still pretty far away from reading minds.

15 Comments

Dan Ryder

September 30, 2011 at 4:12 pm 14 years ago

Bingo! Nice post, Colin.
Eric Thomson

September 30, 2011 at 6:16 pm 14 years ago

All good points. Good to see someone take up the gauntlet and start a serious discussion! Now that it has begun, I have a few comments.

1. Picking a basis set
I tried to get at my (and your) main concern with the study when I wrote that the “influence of the limited training data can clearly be seen.” You can easily see the undue influence of the closest movies (highest posteriors) in the training data, and it is clear how the reconstruction sinks toward certain modes that are obviously way off the mark.

Ultimately researchers will get very good at this stimulus reconstruction once the set of training movies used for the attempted reconstructions is improved. It is very hard to find a good basis set for movies, because the set of possible stimuli is effectively infinite. They ended up with a fairly arbitrary grab-bag of movies (e.g., talking head, bouncing ball, etc). I think movies that are less naturalistic (e.g., moving edges) would perform way better, generalize better to arbitrary stimuli because they match the filters used by the lower-level sensory areas from which they are recording. I predict a study will come out in the next few years doing exactly this, as it is just too obvious!

2. Reconstructing movies in the head?
Looking at the forest, we have a study that does what pretty much every sensory neuroscientist does in their papers: attempt to reconstruct, based on neuronal data, the stimulus actually presented. We really don’t mean to say that the brain reconstructs the stimulus, but that we observers of the brain, using often very technical mathematical tools, can predict the stimulus (with such-and-such error) given the brain signals.

There is not much to worry about in talking about ‘reconstructing the movie in the head,’ as when we focus on the content (not the vehicles) of the signals in the visual system, they basically are movies (or, really, scenes). And that’s what they are attempting to reconstruct: the content of the brain’s responses to the world. It would be like saying they are reconstructing the ‘songs in the head’ if they were to perform the analysis on auditory cortex. I’m OK with that.

Whether this is the best way to recover content from vehicle is of course debatable.

3. Consciousness
Despite my joking about consciousness, the study combined information indiscriminately from the occipital pole (multiple early visual areas including V1), and likely the same reconstructions would work in subjects that were not conscious (e.g., someone under anesthesia). It would be very cool for them to apply these algorithms to bistable percepts (e.g., structure from motion) and see how well they track percepts versus stimuli.

4. The other half of the study
Focusing too much on the movie-reconstruction work (Figure 4 in the paper) would miss one of the coolest results in the study: regardless of the movie context in which it occurs, localized motion can be predicted very well with individual voxels from the MRI (Figures 2/3 in the paper). This result reinforces my hunch that using more localized, moving-edge type of stimuli, would yield even more impressive performance in the movie reconstruction part of the paper.

At any rate, given the extremely limited spatial and temporal resolution of MRI, this result is really impressive. However, ultimately MRI will still be limited by its low spatial and temporal resolution. Sure, it might be better than we would intuitively predict, but blood flows aren’t action potentials, and the latter are the fine-grained computational workhorses of the brain.
Colin Klein

September 30, 2011 at 6:48 pm 14 years ago

Eric: All good points. I agree with you that the really interesting application is probably going to involve much simpler moving stimuli and trying to hone in on the areas that are specific to local motion. I hadn’t thought of using bistable stimuli, but that would also be a cool way to go.
I should say again that I have no *deep* objection to the study (just to the hype!). It seems to me like the sort of thing people have already been doing with multivoxel pattern analysis, though in a new and cool way. I do think that the spatial/temporal resolution of fMRI is ultimately going to put a limit on this—but on the other hand, I would have thought that this couldn’t be done either, so who knows!
Eric Thomson

September 30, 2011 at 7:10 pm 14 years ago

I think you are right to want to reign in the hype a bit. My comment about the cartesian theater didn’t help matters

Ultimately what they did is not qualitatively surprising. Your curmudgeonly Fodorian response seems effectively right. Paul Bloom likes to say that the only reason people act surprised by these types of results is because they are intuitively dualists, that naturalism about mind hasn’t sunk in quite yet.
Richard Brown

October 1, 2011 at 4:39 pm 14 years ago

I don’t really get the Bloom remark…dualists can be just as happy with these results as the naturalist…all one needs is some kind of connection between non-physical mental episodes and brain episodes (a causal connection would do)..even worse property dualists have absolutely no issues accounting for this (especially those like Dave who endorse the idea that consciousness is a ‘functional invariant’….I like to say that these kinds of remarks (those like Bloom’s) are due to not really understanding (whether willfully or not) the opponent’s views…
Arnold Trehub

October 1, 2011 at 7:32 pm 14 years ago

Richard: “… all one needs is some kind of connection between non-physical mental episodes and brain episodes (a causal connection would do)”

On what principled grounds could a causal connection exist between non-physical mental episodes and biophysical brain episodes?
djc

October 1, 2011 at 7:37 pm 14 years ago

in other work bloom has been explicit that by dualism he just means something like, cartesian substance dualism with a separate soul that does the thinking and perceiving. he says that more sophisticated modern forms of dualism, such as property dualism, aren’t subject to his critique.
Arnold Trehub

October 2, 2011 at 2:34 pm 14 years ago

Do you take dual-aspect monism to be a form of property dualism?
Richard Brown

October 3, 2011 at 2:08 pm 14 years ago

There are lots of principled grounds, but the most obvious is some kind of Hume-inspiried view about causation…if causation is just constant conjunction then there is no problem with there being a causal connection between material and non-material entities….but even more generally the causal issue really only arises if one is thinking in terms of 17th/18th century notions of causation as some kind of ‘bumping’…
Richard Brown

October 3, 2011 at 2:20 pm 14 years ago

Ah, thanks for this clarification dave!

But even so, does that justify the claim above? The images on my screen and the processing in the cpu are distinct but yet we still expect to be able to ‘read out’ the contents of the cpu and could ‘reconstruct’ the images on my screen from that data…and we also expect that damage to the cpu will result in a change in screen imagery…so how does the two things being distinct result in the bloom remark? It seems to me that one would have to be thinking of substance dualism as the view that there is never any interaction or causal dependance between the two substances…then it would be a surprise to find out that changing one effects the other…

By the way, there is some reason to believe that Descartes held some kind of view like that the body was necessary for conscious experiences of colors and tastes (the disembodied mind of Descartes was merely capable of abstract rational thought)….this is generally of a piece with his claims that the mind/body connection is ‘tight’ and not like ‘a sailor to his vessel’….I have always thought that these remarks of Descartes are in tension with his conceivability argument for substance dualism (or at least the part where we conceive of ourselves as *having the very same experience I am right now* (say while looking at a red patch, having a headache, and eating a chocolate brownie) and yet not having a body…but that is a different story
Arnold Trehub

October 3, 2011 at 3:57 pm 14 years ago

Richard: “.. if causation is just constant conjunction then there is no problem with there being a causal connection between material and non-material entities.. “

Do you believe that correlation (constant conjunction?) is causation? This is not the way causation is understood in science.

Richard: “but even more generally the causal issue really only arises if one is thinking in terms of 17th/18th century notions of causation as some kind of ‘bumping’ …”

No need for a causal connection to make things bump. Since you posited a *causal connection* between non-physical mental episodes and biophysical brain episodes, what kind of causal connection between the non-physical and the physical do you have in mind?
Richard Brown

October 5, 2011 at 3:23 pm 14 years ago

Ok, I’ll bite; how is causation understood in the sciences? Whatever your answer is that is the kind of causal connection that *could* hold between the non-physical and the physical (unless it is some kind of bumping account, which is outdated on anyone’s account).

btw, have you read the relevant Hume? That scientists haven’t taken to heart the philosophical lessons of the Treatise is not really the point…you asked what principled grounds there might be for a position and I gave you those principled grounds. Many people think a Humean account of causation is plausible…but that is not the only possible principled reason…ultimately the causal issue for dualism is a red herring fostered by out-dated notions about causation…
Arnold Trehub

October 6, 2011 at 1:58 pm 14 years ago

You claim that a non-physical mind can cause a change in the biophysical brain. So, in your view, substance dualism poses no problem for a scientific understanding of consciousness, and to question this claim is not justified (a red herring). Is this is a widely held opinion among contemporary philosophers of mind?
Eric Thomson

October 7, 2011 at 4:56 pm 14 years ago

I wouldn’t read too much into Bloom-type claims, which he tends to apply to “folk,” not people that think about this stuff for a living. I just threw it out there as a contrast to the Fodorian curmudgeon.
custom closets ft lauderdale

December 14, 2011 at 5:13 am 14 years ago

There are lots of principled grounds, but the most obvious is some kind of Hume-inspiried view about causation…if causation is just constant conjunction then there is no problem with there being a causal connection between material and non-material entities….but even more generally the causal issue really only arises if one is thinking in terms of 17th/18th century notions of causation as some kind of ‘bumping’…