3. Learning to Perceive in a Multisensory Way

Suppose you are at a live jazz show. The drummer begins a drum solo. You see the cymbal jolt. You hear a clang. And you are aware that the jolt and the clang are part of the same event. This is a case of multisensory perception. In my book, I argue that multisensory perception can be understood only against a background of perceptual learning. Here I’m going to sketch some of the motivation for that view, which turns out to have important implications for a long-standing philosophical question: Molyneux’s question.

It is tempting to think of the multisensory cymbal case as one in which our perceptual system just automatically binds together the jolt and the clang because the jolt and the clang occur at the same place and time. And in fact many psychologists have thought just that. But I think this account neglects the learning dimension of these kinds of cases. To see this, consider the following amusing case.

The internet is filled with videos of animals making funny noises, such as dogs that sound like fire engines or goats that scream like humans. These cases are surprising and humorous in part because we just don’t expect that audio-visual combination. If you think about it, you have probably experienced some variation of these cases in real life, where an odd and unexpected sound is not experienced as coming from the correct place because that sound violates our prior association.

Now consider a slightly more complicated example, which is depicted in this video. Suppose you are listening to music at a friend’s house with their dog nearby. A song comes on that you haven’t heard before. You happen to glance over at the dog, who appears to be moving its mouth in sync with the vocal track. Then you realize that what you thought were the vocals are actually coming from the dog. This is a case of “illusory lip-synching.” You experience it as lip-synching, but actually it’s not. In reality, the sound is coming from the singer (the dog in this case).

It may seem like our perceptual systems just automatically bind together properties from different sense modalities that occur at the same place and time. But such a view does not account for cases like the singing dog case. If there were this automatic spatio-temporal binding, then you would have experienced the sound as coming from the dog, and you didn’t. The dog case makes sense, however, if we hold that our learned associations guide multisensory perception. Through your prior experience, your perceptual system associates the sound that the dog makes more readily with the song. That explains why your perceptual system couples those two properties together in your multisensory experience.

In the book, I argue that this has important implications for a long-standing philosophical question: Molyneux’s question. Molyneux’s question asks whether a person born blind, who can distinguish a cube and sphere by touch, could distinguish those shapes by sight upon having their sight restored. If multisensory perception is learned, then we have a straightforward no answer to Molyneux’s question. The person sees the cube for the first time. No learning has taken place between sight and touch. So, no, the person does not recognize which is the cube and which is the sphere.

I want to return now to the first blog post in this series. In that post, I mentioned that my book defends a thesis about the function of perceptual learning that I call the “Offloading View.” On this view, perceptual learning serves to offload tasks that would normally be done in a controlled and cognitive manner onto the perceptual system, thereby freeing up cognitive resources for other tasks. Multisensory perception is a paradigm case of this. Instead of having to see the jolt of the cymbal, hear the clang, and infer that they are both part of the same event, your perceptual system does this efficiently for you based on your prior learning. It would take a longer time to have to see the jolt, hear the clang, and infer that they are part of the same event. Perceptual learning is a way of offloading that task onto our quick perceptual system. We get the same information— that the jolt and the clang are part of the same event— without having to make inferences to get there. This frees up cognition to make other, more sophisticated inferences, like about the time signature of the piece.

My book explores perceptual learning in multisensory perception as well as in four other perceptual domains: in natural kind recognition, sensory substitution, speech perception, and color perception. However, perceptual learning has relevance for our thinking far beyond philosophy of mind. In my fourth and final post, I present an initial sketch of how we can apply knowledge of perceptual learning beyond philosophy of mind.