I want to thank John Schwenkler for inviting me to blog about my new book The Perception and Cognition of Visual Space (Palgrave, 2018). In this first post I outline the two major concerns of 3D vision: (1) inconstancy, and (2) inconsistency, and suggest that inconsistency can be avoided by reformulating the perception/cognition boundary.
1. Theories of Visual Space
3D vision is going through a period of transition. In the space of 20 years it has gone from being a central concern of predictive models of the brain (Knill & Richards, 1996) to not being mentioned at leading conferences (https://sites.google.com/view/tpbw2018/). If you think of 3D vision in Representational terms (as the extraction and representation of the 3D properties of the physical environment), it would appear that most of the interesting questions have been resolved and the discipline is in decline.
And yet 3D vision remains plagued by two fundamental concerns:
- Inconstancy: The 3D properties of an object or scene can vary depending on (a) the distance it is viewed from, and (b) whether it is viewed with one eye or two.
- Inconsistency: The 3D properties of an object or scene can vary depending on the question being asked or the task being performed.
Representational accounts struggle to fully address these concerns, and so in recent years a number of alternative accounts of 3D vision have emerged:
Apart from Direct Realism and Representationalism, the common theme is a shift away from the veridical representation of the 3D properties of the physical environment (see Albertazzi et al., 2010; Wagemans, 2015). In the case of the accounts marked with a *, 3D vision is refined for action in, rather than knowledge of, the environment (reflecting a more general ‘pragmatic turn’ in cognitive science: see Engel et al., 2013; Engel et al., 2016).
2. The Perception/Cognition Divide
In this initial post I want to use the problem of inconsistency to introduce the distinction between ‘perception’ and ‘cognition’ that runs through my book. This distinction is not one that vision science has traditionally embraced. There are three reasons for this:
- Historical: Under behaviourism, perception was equated with ‘discrimination’ (see Miller, 2003). Although the cognitive revolution of the 1950s emphasised the importance of perception, it nonetheless remained tied to cognition under the catch-all term ‘perceptual judgement’. Also a commitment to viewing ‘the organism’ holistically (Goldstein, 1934) has continued to have influence.
- Practical: It is extraordinarily difficult to isolate our perceptual experiences from our judgements about them. Experimenters can either rely on (a) the subject’s own evaluation of the stimulus or (b) a behavioural measure (such as reaching or grasping). But both subjective evaluations and behaviour can be affected by cognitive as well as perceptual influences.
- Theoretical: Finally, vision scientists have self-consciously avoided the question. For instance, Marr (1982) was only concerned with the true judgements we could extract about the environment, and consequently relied heavily upon computer vision where there is simply no distinction between perception and cognition (computer ‘vision’ simply is the fact that a computer ‘judges’ x).
It is true that in recent years vision science has begun to take the distinction between perception and cognition more seriously, at least in two contexts:
- Whether we can distinguish the decision strategies adopted by subjects from their actual perception: see Witt, Taylor, Sugovic, & Wixted (2015).
- Whether there is any meaningful distinction between perception and our beliefs, desires, and emotions: see Firestone & Scholl (2016).
But in the first chapter of my book (which is freely available: https://linton.vision/download/ch1uncorrected.pdf) I argue that such treatments don’t go far enough: First, I argue that there is a distinct level of cognition that is post-perceptual but pre-decisional, and so cannot be captured by the false dichotomy between ‘perception’ and ‘decision strategies’. Second, I argue that the distinction between perception and this species of cognition is surprisingly low-level (much lower than the ‘top-down’ influences Firestone & Scholl have in mind) and this leads us to question the fundamental building-blocks of vision itself, such as pictorial cues and cue integration.
Underpinning my account is the belief that we cannot distinguish ‘perception’ and ‘cognition’ in the abstract. Asking whether we ‘see’ higher-level properties such as causation, intention, or morality is meaningless until we have a proper litmus-test by which to understand just how low-level cognition can be. This can only really be achieved in the context of 3D vision, where we can tease out the lowest-level conflicts between what we think we see and what we actually see.
Readers of this blog will be familiar with the idea that our depth judgements are inconsistent (https://philosophyofbrains.com/2017/12/12/new-action-based-theory-spatial-perception.aspx). I want to consider three examples of this inconsistency:
- Todd & Norman (2003): Subjects were asked to judge the depth of a rotating surface defined by (a) monocular motion (motion viewed with one eye), (b) binocular disparity (the difference in the images between the two eyes) or (c) the combination of binocular disparity and motion (disparity and motion viewed with two eyes). They ranked the depth in the monocular motion display as greater than the depth in the combined display. But when subjects were asked to close one eye whilst viewing the combined display (turning it back it into the monocular motion display) they reported a reduction in depth. Todd & Norman rightly recognised that this demonstrated a conflict between our perceptual experience (combined display > monocular motion) and our perceptual judgement (monocular motion > combined display). They explain it as a consequence of conscious decision strategies; a reflection of the fact that subjects had to convert the perceived depth into metric measurements. But this concern applies equally to both of the displays, so there is no reason why it should have inverted the depth ordering between them.
- González, Allison, Ono, & Vinnikov (2010): Subjects were asked to judge the impression of looming from (a) a dot whose looming motion was specified by binocular disparity, and (b) a background texture whose looming motion was specified by a rapid increase in size. When the two cues were combined in the same display subjects (a) judged the looming from binocular disparity to be equal in magnitude to the looming from the size cue (‘they appeared to move a similar amount in depth’), and yet (b) their visual experience was that the motion of the dot could be discriminated distinctly, and in front of, the motion of the texture.
- Doorschot, Kappers, & Koenderink (2001): Subjects were asked to judge how varying (a) shading and (b) binocular disparity affects the perceived depth in a stereo-photograph. Doorschot et al. found that 94.5% of the depth was perceived no matter how shading and disparity were varied, and that binocular disparity could only account for 1.2% of the change in depth. But as Doorschot et al. recognise, the difference in depth from closing one eye when viewing a stereo-photograph seems much more than 1.2%. Instead, it seems to transform the image.
How should we explain the conflicting reports in each of these studies? One common suggestion (see Norman et al., 2005; Wagner & Gambino, 2016; Glennerster & Stazicker, 2017) is that we ought to give equal weight to each of these conflicting reports, and so we have to give up on the idea that 3D vision is independent of our task or motivation.
By contrast, I argue that in each of these experiments what we find is a conflict between (a) our perceptual judgements (our impression of depth magnitude) and (b) our actual visual experience (as revealed by a simple comparison: e.g. closing one eye in Todd & Norman, 2003 and Doorschot et al., 2001, or attending to the difference in depth between the dot and the texture in González et al., 2010). The only way to account for this conflict between visual experience and perceptual judgement is to posit the existence of a low-level layer of post-perceptual (i.e. cognitive) processing that is automatic (we don’t have to do anything), unconscious (we are unaware of it), and involuntary (we cannot overrule it).
The purpose of my book is to explore experimental strategies that might tease apart our genuine visual experience from this post-perceptual processing, and I argue that the unconscious inferences attributed to 3D vision by a long tradition stretching from al-Haytham (c.1028-38) to Cue Integration (Trommershäuser, Körding, & Landy, 2011) are better thought of in terms of automatic post-perceptual processing.
Albertazzi, L., van Tonder, G. J., & Vishwanath, D. (2010). Perception Beyond Inference (Cambridge, MA: MIT Press).
al-Haytham. (c.1028–1038). Book of Optics. In Smith, A. M. (ed.) (2001). Transactions of the American Philosophical Society, 91(5).
Domini, F., & Caudek, C. (2013). ‘Perception and action without veridical metric reconstruction: an affine approach.’ In Dickinson, S. J., & Pizlo, Z. (eds.), Shape Perception in Human and Computer Vision (London: Springer).
Doorschot, P. C. A., Kappers, A. M. L., & Koenderink, J. J. (2001). ‘The combined influence of binocular disparity and shading on pictorial shape.’ Perception & Psychophysics, 63(6), 1038-1047.
Engel, A. K., Maye, A., Kurthen, M., & König, P. (2013). ‘Where’s the action? The pragmatic turn in cognitive science.’ Trends in Cognitive Sciences, 17(5), 202-9.
Engel, A. K., Friston, K. J., & Kragic, D. (2016). The Pragmatic Turn (Cambridge, MA: MIT Press).
Erkelens, C. J. (2017). ‘Perspective Space as a Model for Distance and Size Perception.’ i-Perception, Nov-Dec 2017.
Firestone, C., & Scholl, B. J. (2016). ‘Cognition does not affect perception: Evaluating the evidence for “top-down” effects.’ Behavioral & Brain Sciences, 39, e229.
Foley, J. M., Ribeiro-Filho, N. P., & Da Silva, J. A. (2004). ‘Visual perception of extent and the geometry of visual space.’ Vision Research, 44, 147-156.
Glennerster, A., & Stazicker, J. (2017). ‘Perception and action without 3D coordinate frames.’ PhilSci Archive: https://philsci-archive.pitt.edu/13494/
Gillam, B., & Chambers, D. (1985). ‘Size and position are incongruous: measurements on the Müller-Lyer figure.’ Perception & Psychophysics, 37(6), 549-556.
Goldstein, K. (1934). The Organism: A Holistic Approach to Biology Derived from Pathological Data (New York: Zone Books).
González, E. G., Allison, R. S., Ono, H., & Vinnikov, M. (2010). ‘Cue conflict between disparity change and looming in the perception of motion in depth.’ Vision Research, 50(2), 136-43.
Hatfield, G. (2011). ‘Philosophy of Perception and the Phenomenology of Visual Space.’ Philosophic Exchange, 42(1), 3.
Hibbard, P. B. (2008). ‘Can appearance be so deceptive? Representationalism and binocular vision.’ Spatial Vision, 21(6), 549-559.
Hoffman, D. D., Singh, M., & Prakash, C. (2015). The interface theory of perception. Psychonomic Bulletin & Review, 22, 1480-1506.
Knill, D., & Richards, W. (1996). Perception as Bayesian Inference (Cambridge: Cambridge University Press).
Koenderink, J. J. (2013). ‘World, environment, Umwelt, and innerworld: A biological perspective on visual awareness.’ Human Vision and Electronic Imaging XVIII, 8651.
Linton, P. (2017). The Perception and Cognition of Visual Space (London: Palgrave).
Loomis, J. M., Philbeck, J. W., & Zahorik, P. (2002). ‘Dissociation between location and shape in visual space.’ Journal of Experimental Psychology: Human Perception and Performance, Vol 28(5), 1202-1212.
Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (Cambridge, MA: MIT Press).
Miller, G. A. (2003). ‘The cognitive revolution in historical perspective.’ Trends in Cognitive Sciences, 7(3), 141-144.
Morgan, M. J. (1980). ‘Phenomenal Space.’ In Josephson, B., & Ramachandran, V. (eds.), Consciousness and the Physical World: https://philpapers.org/archive/JOSCAT-2.pdf
Morgan, M. J. (1989). ‘Vision of Solid Objects.’ Nature, 339, 101-103.
Norman, J. F., Crabtree, C. E., Clayton, A. M., & Norman, H. F. (2005). ‘The perception of distances and spatial relationships in natural outdoor environments.’ Perception, 34, 1315-1324.
Rogers, B. (2018). Perception: A Very Short Introduction (Oxford: Oxford University Press).
Smeets, J. B. J., Sousa, R., & Brenner, E. (2009). ‘Illusions can warp visual space.’ Perception, 38, 1467-1480.
Todd, J. T., & Norman, J. F. (2003). ‘The visual perception of 3-D shape from multiple cues: Are observers capable of perceiving metric structure?’ Perception & Psychophysics, 65(1), 31-47.
Trommershäuser, J., Körding, K., & Landy, M. S. (eds.) (2011). Sensory Cue Integration (Oxford: Oxford University Press).
Vishwanath, D. (2014). ‘Towards a New Theory of Stereopsis.’ Psychological Review, 121(2), 151-78.
Wagemans, J. (2015). ‘Alternatives to Veridicalism in Vision Theory’: https://www.gestaltrevision.be/pdfs/jw_lectures/ECVP_2015_veridicality_Wagemans.pdf
Wagner, M., & Gambino, A. J. (2016). ‘Variations in the Anisotropy and Affine Structure of Visual Space: A Geometry of Visibles with a Third Dimension.’ Topoi, 35(2), 583-598.
Witt, J. K., Taylor, J. E., Sugovic, M., & Wixted, J. T. (2015). ‘Signal Detection Measures Cannot Distinguish Perceptual Biases from Response Biases.’ Perception, 44(3), 289-300.