My account of 3D vision attempts to preserve many of the traditional commitments of naïve realism, whilst rejecting its central tenet of mind-independence. In this fourth post I explain why this provides a more satisfactory solution to variations in scene geometry with viewing conditions than recent ‘four-dimensional’ accounts.
1. Naïve Idealism
My account bears more than a passing resemblance to naïve realism:
1. I define vision in terms of the perception of objects, citing Gibson (1950) and Strawson (1979).
2. I insist the senses are silent and make no claims or representations, citing Austin (1962) and Travis (2004; 2013).
3. I argue that the cues to 3D vision are purely optical, and therefore articulate my account as a ‘purely optical account of stereopsis’.
4. I reject the importance of cognitive inferences for 3D vision.
5. I deny the existence of depth illusions, explaining them in purely cognitive terms.
6. I even advance a purely cognitive account of pictorial space.
It is therefore no surprise that a number of commentators have mistaken my account for Gibsonian ‘direct realism’. Indeed, given I reject the idea that vision conveys meaning, I might be accused of being more Gibsonian (in his commitment to direct realism) than Gibson! Although this characterisation of my account is a mistake, I do think naïve realism makes one key observation, namely that in vision we do not have (a) the experience of an intermediary non-object (sensationalism), or even (b) the experience ‘as of’ an object (representationalism), but simply (c) the experience of an object:
‘The claim is that one’s experience is, so to speak, diaphanous or transparent to the objects of perception, at least as revealed to introspection.’ (Martin, 2002).
I’d also agree with the further claim that such objects are ‘external’, if by ‘external’ we mean ‘over there’ rather than (pointing to my head) ‘in here’. But the claim of naïve realism is more specific, namely that ‘introspection of one’s perceptual experience reveals only … mind independent objects’ (Martin, 2002), and this is where I depart from naïve realism: I don’t think the mind-independence of these objects is conveyed by introspection (vision remains as silent on mind-dependence as anything else), and so mind-dependence remains an open question so far as our visual experience is concerned.
Instead, as unfashionable as it sounds, I think the question of mind-dependence is determined by Pearson (1892)’s telephone exchange analogy: there are no means by which we could have the requisite immediate relation to physical objects, and so the objects of perception must be phenomenal objects.
Indeed, as soon as we move from ‘physical object x’ to a system’s (such as the visual system’s) ‘information about physical object x’ (or even just ‘response to x’) the game is lost: the objects of perception could no more be physical objects than the person who exits Parfit (1984)’s teleporter could be the person who enters. This view is supposed to fall foul of three related fallacies about (1) objects (the sense-datum fallacy), (2) subjects (the homunculus fallacy), and (3) location (the Cartesian theatre fallacy). But so long as we are clear that vision makes no claims about the physical world, specifically that we in no sense ‘see’ physical objects via phenomenal objects, then these concerns can be avoided.
This isn’t Berkeleyan idealism or even transcendental idealism because I do not doubt (a) the existence of physical processes, (b) the causal dependence of visual experience upon those physical processes, or even (c) our ability to specify those physical processes. Indeed, it is the job of the psychophysicist to explore how changes in our visual experience are correlated with changes in the physical stimulus.
But just as (1) the naïve realist is not ignorant of the mental processes that causally precede vision, but only claims that these mental processes fall out of our account because they are not conveyed to us by our visual experience, (2) the naïve idealist is not ignorant of the physical processes that causally precede vision, but claims that these physical processes fall out of our account because they are not conveyed to us by our visual experience. In this sense it is a perceptual, rather than metaphysical or epistemic, version of idealism.
2. Depth Inconstancy
What are the implications for vision science? One of the most surprising things about Cue Integration (the leading theory of 3D vision) is that it cannot explain the depth inconstancies I introduced in my first post: the fact that objects and scenes (a) appear to get flatter with distance, and (b) appear to get flatter when we close one eye. For instance, Sacks (2006) describes viewing St Paul’s Cathedral in the distance as ‘a flat semicircle on the horizon’ until binocular disparity is reintroduced using a hyper-stereoscope (which increases the distance between the eyes using mirrors), at which point he sees it ‘in its full rotundity, projecting towards me.’
Depth information from binocular stereopsis is impoverished at far distances, but the question is what does ‘impoverished’ mean? If it means ‘inaccurate’ (as in biased or distorted) then Cue Integration theorists have to admit that the visual system represents the physical geometry of the world as varying (a) with distance, and (b) whether we view the world with one eye or two. This would be a massively embarrassing concession for a representational theory, and so they often interpret ‘impoverished’ to mean ‘imprecise’ (as in vague or undefined) at far distances. But the problem with this solution is that it doesn’t explain why this should lead to a demonstrably false percept of flatness rather than vagueness. It’s like the visual system being unsure whether the correct depth St Paul’s dome is 50m or 100m, so setting it at 0m instead.
This problem is recognised, and indeed challenged, by two accounts that I listed in my first post as ‘4D’ (or ‘four-dimensionalism’): Hibbard (2008) and Vishwanath (2010; 2014). From the outset it has to be emphasised that these are two very different accounts: Hibbard is attempting to ‘square the circle’ for Cue Integration, whilst Vishwanath uses his discussion to argue for a whole new (and non-veridical) conception of visual space. But what they have in common is an insistence that the 3D properties of the visual scene are invariant, no matter whether they are viewed at far distances or with one eye. Instead, what changes is a 4th dimension of visual space. For both authors this 4th dimension conveys the ‘precision’ information that is so important to Cue Integration, but in very different ways:
1. For Hibbard this 4th dimension isn’t a spatial property at all, but merely the visual system’s way of informing us of how reliable our current visual representations are.
2. For Vishwanath this 4th dimension is a spatial quality, but a special kind of spatial quality that relates to action (‘tangibility’); specifically, the precision of the egocentric distance information required for interaction.
But such attempts to quarantine 3D shape from variations in stereo-depth seem difficult to sustain. The classic study is Johnston (1991) who asked subjects to set the depth of a cylinder defined solely by binocular disparity to match its width: whilst the shape of the cylinder was veridical at a viewing distance of 107cm, she found it was elongated towards the viewer at closer distances, and flattened away from the viewer at further distances. Attempts have been made by Scarfe & Hibbard (2013) to quarantine this effect by introducing perspective cues, but this doesn’t alter the fundamental link between stereo-depth and shape. Similarly, studies of closing one eye affecting the shape of an ellipse go back to Thouless (1931a; 1931b) and continue to this day (Elner & Wright, 2015).
Although I am committed to naïve idealism for the reasons outlined earlier, it is also worth noting how the concern that Hibbard and Vishwanath are trying to address disappears: we simply embrace the fact that the 3D shape of visual objects varies with viewing conditions.
A third position advanced by Erkelens (2013) suggests that binocular disparity only affects 3D shape in some contexts but not others, and concludes: ‘Why binocular disparity is included in some cases and not in others is still a mystery.’ But it is only a mystery if we fail to take the distinction between perception and cognition seriously. Under my account binocular disparity will always contribute to the perception of 3D depth, but our automatic appreciation (or cognition) of the 3D depth in the scene often appeals to shorthand cognitive cues such as linear perspective and the familiarity of objects. This is no surprise: if the 3D shape of visual objects varies with viewing conditions, then decoupling our cognition of 3D shape from our perception of 3D shape enables us to understand the visual scene as invariant, even if our perception of it is not.
References
Austin, J. L. (1962). Sense and Sensibilia. (Oxford: Oxford University Press).
Elner, K. W., & Wright, H. (2015). ‘Phenomenal regression to the real object in physical and virtual worlds.’ Virtual Reality, 19(1), 21-31.
Erkelens, C. J. (2013). ‘Virtual slant explains perceived slant, distortion and motion in pictorial scenes.’ Perception, 42, 253-270.
Gibson, J. J. (1950). The perception of the visual world. (Boston: Houghton Mifflin).
Hibbard, P. (2008). ‘Can appearance be so deceptive? Representationalism and binocular vision.’ Spatial Vision, 21(6), 549-559.
Johnston, E. B. (1991). ‘Systematic distortions of shape from stereopsis.’ Vision Research, 31(7-8), 1351-1360.
Martin, M. G. F. (2002). ‘The Transparency of Experience.’ Mind & Language, 17(4), 376-425.
Parfit, D. (1984). Reasons and Persons (Oxford: Oxford University Press). https://commonweb.unifr.ch/artsdean/pub/gestens/f/as/files/4610/17613_101712.pdf
Pearson, K. (1892). The Grammar of Science. (London: Walter Scott).
Sacks, O. (2006). ‘Stereo Sue.’ In O. Sacks, The Mind’s Eye (New York: Picador). https://www.newyorker.com/magazine/2006/06/19/stereo-sue
Scarfe, P., & Hibbard, P. B. (2011). ‘Statistically optimal integration of biased sensory estimates.’ Journal of Vision, 11(7), 1-17.
Strawson, P. F. (1979). ‘Perception and its objects.’ In G. F. Macdonald (ed.), Perception and identity: Essays presented to A. J. Ayer, with his replies (London: Macmillan).
Thouless, R. H. (1931a). ‘Regression to the real object I.’ British Journal of Psychology, 21(4), 339-359.
Thouless, R. H. (1931b). ‘Regression to the real object II.’ British Journal of Psychology, 22(1), 1-30.
Travis, C. S. (2004). ‘The silence of the senses.’ Mind, 113(449), 57-94.
Travis, C. S. (2013). ‘The silence of the senses.’ In Perception: Essays after Frege. (Oxford: Oxford University Press).
Vishwanath, D. (2010). ‘Visual information in surface and depth perception: Reconciling pictures and reality.’ In L. Albertazzi, G. van Tonder, & D. Vishwanath (Eds.), Perception beyond inference: The informational content of visual processes. (Cambridge, MA: MIT Press).
Vishwanath, D. (2014). ‘Towards a new theory of stereopsis.’ Psychological Review, 121(2), 151-178.