Commentary on Bence Nanay, Mental Imagery

Fabrizio Calzavarini, University of Turin

Bence Nanay’s Mental Imagery (OUP) is an impressive work that provides a comprehensive neurofunctional exploration of the idea that mental imagery is “perceptual representation that is not directly triggered by sensory input” (p. 4). Bence’s ability to combine sophisticated philosophical analyses with careful discussion of a huge amount of empirical data is absolutely remarkable, and, without a doubt, this book is one of the most important contributions to the study of this topic in recent years and beyond. I was lucky to have the opportunity to discuss this book with Bence at the final conference of his ERC project in Antwerp (short paper here). In this short commentary I would like to reiterate the point made in that occasion.

Bence’s definition of multimodal mental imagery –where perceptual representations in one modality are triggered by stimulation in another –requires a way to distinguish sensory modalities in the human brain, as Bence himself admits (p. 100). Without such a distinction, it becomes difficult to determine whether an instance of mental imagery is truly multimodal or simply an activation of a more modality-independent representation. In my opinion, this might not be an easy task to accomplish.

The traditional model of perception assumes that sensory modalities are neuroanatomically distinctvision in the occipital cortex, audition in the temporal cortex, touch in the somatosensory cortex, etc. In some of my previous work (Calzavarini 2024a, 2024b, 2025), including a target paper for commentaries (here), I have claimed that this model might be obsolete. Recent research in the neuroscience of perception has shown that most of the putative early modality-specific regions of the brain (e.g., “visual” regions such as LOC, hMT+/V5, FFA, PPA, as well as significant parts of the “auditory”, the “motor”, and the “limbic” cortex) appear to be supramodal in nature – that is, they can process specific information (e.g., shape, motion, frequency) in multiple sensory modalities and in both normal and sensory-deprived individuals. Even primary sensory areas appear to show supramodal responses; V1, for instance, is activated in spatial processing via touch and audition in blind individuals. According to a strong or global interpretation of these data, the entire human brain has a metamodal (Pascual—Leone & Hamilton 2001), supramodal (Ricciardi et al. 2014) or task-specific and modality-invariant organization (Heimler et al. 2015).

Although caution is needed for various reasons, this undoubtedly complicates the attempt to sort the senses at the neural level. Bence seems to be aware of this difficulty. In chapter 14, he explicitly rejects a neuroanatomical criterion for distinguishing sensory modalities, arguing that brain plasticity allows sensory areas to be repurposed after sensory deprivation—for example, occipital areas in blind individuals processing touch and audition instead of vision (pp. 100-101).

While this marks a significant step forward, I think it does not fully address the supramodality challenge, which is not merely a matter of neural plasticity but rather of functional preservation. The key issue is that many so-called modality-specific areas—such as LOC, hMT+, or even V1 and A1—do not undergo a fundamental modality-to-modality transformation (e.g., from visual to tactile processing) after sensory deprivation. Instead, their original computational role remains intact: LOC continues to process shape, hMT+ continues to process motion, V1 and A1 continue to process space and frequency, respectively, and so on, regardless of whether input comes from vision, touch, or audition. This suggests that these regions are not strictly tied to a single sensory modality but are modality-invariant processors from the outset, even in normal individuals (see Making and Krakauer 2023 for a powerful exploration of the limits of cortical plasticity).

Bence makes a very interesting proposal to define sensory modalities functionally rather than neuroanatomically (pp. 100-101; see also Nanay’s response to Daniel Munro). Here “function” means computation or algorithm in David Marr’s sense (not implementation): in this perspective, “visual processing could be identified as something like helping small-scale spatial discrimination or transforming input in a way that preserves the spatial homomorphism between the input and the perceptual processing” (p. 101). Similarly –one can imagine– auditory processing could be defined as a function that specializes in fine-grained temporal discrimination, extracting and organizing sequential patterns, pitch variations, and rhythmic structures from auditory input.

In principle, as observed by Bence (p. 100, note 1), this functional characterization is consistent with the metamodal or supramodal view. The challenge for Bence, I contend, is that many of the computations that are traditionally associated with a given sense appear to be actually modality-invariant rather than uniquely tied to a single sensory system. Motion perception, for instance, is processed in hMT+ whether derived from visual, auditory, or tactile stimuli; spatial representations emerge in V1 even when blind individuals rely on touch or echolocation (Ricciardi & Pietrini 2020); temporal discrimination and rhythm perception is similarly encoded in A1 in both normal and dead individuals, regardless of whether the input comes from sound, vibration, or visual cues (Heimler & Amedi 2020). If sensory processing is fundamentally computation-based rather than modality-based, what justifies keeping traditional sensory distinctions such as “visual processing” or “auditory processing”?

Note that this, in turn, challenges the very notion of modality-specific imagery. If mental imagery is defined in terms of perceptual processing, but perception itself is structured around computational properties rather than distinct modalities, then the traditional classification of imagery as “visual,” “auditory,” or “tactile” may need to be reconsidered in favour of a computation-specific framework that better reflects the supramodal nature of the brain (e.g., shape imagery, motion imagery, rhythm imagery).

Needless to say, I do not claim that the supramodality paradigm is definitively confirmed. In my own work, I have acknowledged certain limitations of this view, and some scholars have argued that this paradigm is not easily applied to the primary cortices (Xion, Chen, Bi 2024; Reilly & Peele 2024) or is inconsistent with data about anatomical connectivity (Cappa 2024). Some other scholars have proposed that representational format might be key to distinguishing sensory modalities, although current methodologies struggle to fully determine whether supramodal areas retain an underlying modality-specific structure (Kiefer et al. 2024; Martin 2024).

Having said that, it seems to me that, if we adopt a functional characterization of sensory processing, as Bence is doing, it is difficult to avoid the conclusion that the classical distinctions between sensory modalities may be very difficult to ground at the level of the brain, and that a more precise taxonomy of mental imagery should be based on the specific computational properties being processed rather than on modality-based labels. I am looking forward to know what Bence thinks about this issue.

References

Calzavarini, F. (2024a). Rethinking modality-specificity in the cognitive neuroscience of concrete word meaning: A position paper. Language, Cognition and Neuroscience, 39(7), 815–837.

Calzavarini, F. (2024b). Rethinking modality-specificity in the cognitive neuroscience of concrete word meaning: Responses to commentators. Language, Cognition and Neuroscience, 39(7), 878–890.

Calzavarini, F. (2025). The conceptual format debate and the challenge from (global) supramodality. The British Journal for the Philosophy of Science.

Heimler, B., & Amedi, A. (2020). Are critical periods reversible in the adult brain? Insights on cortical specializations based on sensory deprivation studies. Neuroscience and Biobehavioral Reviews, 116, 494–507.

Heimler, B., Amedi, A., & Striem-Amit, E. (2015). Origins of task-specific sensory-independent organization in the visual and auditory brain: Neuroscience evidence, open questions, and clinical implications. Current Opinion in Neurobiology, 35, 169–177.

Kiefer, M., Kuhnke, P., & Hartwigsen, G. (2023). Distinguishing modality-specificity at the representational and input level: A commentary on Calzavarini (2024). Language, Cognition and Neuroscience, 39(7), 862–866.

Makin, T. R., & Krakauer, J. W. (2023). Against cortical reorganisation. eLife, 12, e84716.

Pascual-Leone, A., & Hamilton, R. (2001). The metamodal organization of the brain. Progress in Brain Research, 134, 427–445.

Reilly, J., & Peelle, J. E. (2023). Modality-specificity is not a necessary condition for grounded semantic cognition: Commentary on Calzavarini (2023). Language, Cognition and Neuroscience, 39(7), 854–858.

Ricciardi, E., & Pietrini, P. (2020). Does (lack of) sight matter for V1? New light from the study of the blind brain. Neuroscience and Biobehavioral Reviews, 118, 1–2.

Ricciardi, E., Bonino, D., Pellegrini, S., & Pietrini, P. (2014). Mind the blind brain to understand the sighted one! Is there a supramodal cortical functional architecture? Neuroscience and Biobehavioral Reviews, 41, 64–77.

Xiong, Z., Chen, H., & Bi, Y. (2023). Not clear what properties are found or should be: A commentary on Calzavarini (2024). Language, Cognition and Neuroscience, 39(7), 859–861.

Ask a question about something you read in this post.

Back to Top