In my first post I argued that inconsistencies in visual space reflect a conflict between visual experience and perceptual judgement. In this second post I argue that the same approach can be applied to (a) the integration of depth cues, and (b) illusions of visual space, to show that they too operate at the level of cognition rather than perception.
1. Cue Integration
The organising principles of vision science over the last 20 years can be articulated as:
- The 3D environment is specified by ‘cues’ such as (a) binocular disparity (the difference between the images in the two eyes), (b) perspective, and (c) shading.
2. The signal from each individual cue is accurate but imprecise; that is, it provides us with an unbiased signal that is corrupted by random noise.
3. The real problem of vision science is therefore the reduction of this random noise.
4. The best way to reduce this random noise is to take a weighted-average of the various signals according to how noisy they are.
It is no mistake that the early advocates of this conception of vision science, known as Cue Integration, were primarily concerned with 3D vision: conceived in this way, the inference of the third dimension from two 2D retinal images is best resolved by aggregating over numerous sources of depth information.
Whether the sources of depth information really are unbiased (see Domini & Caudek, 2011; Scarfe & Hibbard, 2011) or really are integrated in an optimal fashion (see Rahnev & Denison, forthcoming) are increasingly important questions in vision science. But what is never questioned is the idea that the various sources of depth information are integrated at the level of vision rather than cognition, and this is what the second chapter of my book (which is freely available) seeks to challenge:
If the various sources of depth information really are integrated at the level of perception then as Ernst et al. (2000) and Hillis et al. (2002) observe, we should expect perceptual fusion: that is, the various sources of information should be integrated into a single coherent percept. After all, what would be the point of our visual system presenting us with the individual noisy signals rather than a single weighted average?
Indeed, Ernst et al. (2000) and Hillis et al. (2002) find evidence of perceptual fusion. For instance, when the following stereo-images from Ernst et al. (2000) are cross-fused:
Subjects are liable to judge the slant of the surface to be somewhere between 0° and 30°, rather than either the 0° or 30° specified by disparity and texture. Similarly, Likova & Tyler (2003) found that a small amount of convex binocular disparity could be nulled by the presence of a concave shading profile.
I agree with Ernst et al., Hillis et al., and Likova & Tyler that these results demonstrate that the presence of pictorial cues can bias our evaluation of depth from disparity. Note, for instance, the leftward shift of VJB and LTL in the following diagram from Likova & Tyler:
But I would argue that it is a further (and open) question whether this bias is perceptual or cognitive. And, as currently constituted, these stimuli cannot determine that question.
But perhaps they can with some minor amendment. Cue Integration is meant to capture how we see the world, but in the real world we very rarely see single surfaces in isolation. There should therefore be no objections to introducing small reference frames (what I call ‘scaffolding’) into these stimuli. For instance, in the following amended version of Ernst et al.’s stimulus, a single horizontal bar is added to demarcate the fronto-parallel plane:
What does this achieve? The hope is that just as with Todd & Norman (2003), González et al. (2010), and Doorschot et al. (2001) in my first post, this will transform the task from a cognitive evaluation of depth magnitude into a simple comparison (‘is the surface slanted relative to the reference bar?’), thereby revealing our perception whilst eradicating any cognitive bias. The impression is stark: the surface and horizontal bar are clearly rotated in depth relative to one another in when the rotation is specified by disparity, but not when the rotation is specified by pictorial cues. From a Cue Integration perspective there is no reason why this should be the case: both the surface and the horizontal bar are defined by their own combination of pictorial and disparity cues. If the perceptual fusion that is so central to Cue Integration evaporates as soon as it comes into contact with another object, we seriously have to question whether it was really there to begin with.
If this conclusion is correct then it also has interesting implications for neuroscience. Qiu & von der Heydt (2005) have demonstrated that depth cue conflicts are reconciled relatively early in visual processing (V2). If this reconciliation can be shown to be cognitive rather than perceptual then it really does narrow visual experience down to V1.
2. 3D Visual Illusions
Some of the most convincing 3D visual illusions are cue conflict stimuli:
- Hollow-Face Illusion: https://www.michaelbach.de/ot/fcs-hollowFace/index.html
2. Patrick Hughes’ Reverspectives: https://www.michaelbach.de/ot/sze-reverspective/index.html
The question is what happens when we introduce similar vertical and horizontal bars into the hollow face illusion?
Interestingly the illusion persists, but occurs behind the bars rather than protruding through them as we would expect. Indeed, the illusory percept and stereoscopic space seem to talk past one another. I argue that it is very difficult to make sense of this phenomenon unless we think in terms of stereoscopic space operating at the level of perception and the illusion operating at the level of cognition. In which case it would be more accurate to refer to the hollow-face illusion and Reverspectives as ‘delusions’ rather than ‘illusions’.
This phenomenon needs to be experimentally confirmed (this is a current project), but if it is correct it has an interesting implication for schizophrenia. Those under the influence of psychosis are often immune to the hollow-face illusion (see the work of researchers at Rutgers: https://www.youtube.com/watch?v=BlKlpx50Avs). The Rutgers team explain this immunity in perceptual terms: those with schizophrenia see the world more clearly through psychosis (see Keane et al., 2016). But if my explanation is correct, then their immunity to the hollow-face illusion is not reflective of a perceptual process, but an inability to automatically attribute post-perceptual meaning to what they see.
3. 2D Visual Illusions
Can we extend the argument and apply ‘scaffolding’ to 2D illusions? I didn’t develop this argument in my book (see p.63). But consider the following example of size-constancy: https://imgur.com/WBAzkuI. Most people are astonished that the cars have the same x- and y-axis size in the image, and often reach for a ruler to confirm it. That this effect should occur at all is rather puzzling. It would be one thing to say the car in the background looks bigger because (a) it is the same size in the image, but (b) it is depicted as further away. But this is not what we experience. Instead, what we seem to experience are the images of the cars actually being different sizes. (Nor is this effect a consequence of the reduced z-axis depth of pictures: see https://www.psy.ritsumei.ac.jp/~akitaoka/stereo4e.html).
But is this size-constancy effect perceptual or merely cognitive? Do we literally see the images of the cars as being different sizes? Or merely misjudge them as being so? We can frame the question as the distinction between illusions and delusions:
Illusion: We literally see the images of the cars as being different sizes.
Delusion: We merely misjudge the images of the cars as being different sizes.
Again ‘scaffolding’ can help us resolve this question. For instance, if I add distinctive red squares to the cars (https://imgur.com/EXkF0c3), the squares look the same size but the cars still look different sizes. And I would argue that the persistence of the illusion in spite of the fact that we see the squares as being equal size is indicative of a cognitive effect:
1. The argument starts with the visual field. The visual field is a two-dimensional plot of your visual experience in the x-axis and y-axis (Smythies, 1996). Perhaps one of the most famous depictions of the visual field is Ernst Mach’s ‘view from the left eye’: https://farm2.staticflickr.com/1619/24993192252_0c837a126a_z.jpg
2. If you agree that the squares are the same size, then you agree that each of the squares takes up the very same amount of the visual field.
3. Since the squares simply demarcate the x-axis and y-axis extent of the cars, then each of the cars must also take up the very same amount of the visual field, even if this fact is inaccessible when you try and judge their x-axis and y-axis extent directly.
4. And since the visual field is simply your visual experience of x-axis and y-axis extent, it follows that the cars are the same size in your visual experience.
5. You might accept this, but try and argue: ‘I see the cars as the same size, but I see the further car as bigger.’ This is not a coherent claim.
6. What is a coherent claim is the following: ‘The cars take up the same amount of the visual field, but the further car seems to take up more of the visual field than the nearer car.’ I think this is the correct interpretation, and the ‘seems’ can be explained as an inability to accurately evaluate our own visual experience.
7. This bias kicks-in before we’ve consciously engaged in an evaluative judgement, and so it doesn’t appear to be a decisional bias. We therefore need to move beyond the distinction, typical in much of the literature, between ‘perceptual bias’ and ‘decisional bias’ (see Witt, Taylor, Sugovic, & Wixted, 2015).
8. Instead, size-constancy appears to be a post-perceptual ‘type 1’ error (automatic, unconscious inference: see Kahneman, 2011; Melnikoff & Bargh, 2018). But unlike Kahneman I see size-constancy as a cognitive rather than perceptual bias that occurs after perception but before any particular tasked-based decision.
9. Finally, I would appeal to ‘cognitive phenomenology’ to explain the experiential quality of size constancy (the fact that we experience the further car as bigger). Cognitive phenomenology explains how our experience of visual stimuli can change even if our visual experience of the stimuli remains the same.
Even critics of cognitive phenomenology, such as Block (2014) (with regards ‘seeing-as’), recognise its intuitive appeal:
‘But can we be sure from introspection that those ‘looks’ are really perceptual, as opposed to primarily the ‘cognitive phenomenology’ of a conceptual overlay on perception, that is, partly or wholly a matter of a conscious episode of perceptual judgment rather than rather than pure perception?’
Applying the same reasoning to other 2D visual illusions, we might ask whether the tails of the Müller-Lyer illusion bias our perception or our cognition?
Is the visual system fooled by the Müller-Lyer illusion, or merely our type-1 cognition? Again, we can distinguish between:
Illusion: We literally see the lines shrinking and expanding.
Delusion: We merely misjudge the lines to be shrinking and expanding.
When I add concentric circles to the circumference of the lines in the dynamic Müller-Lyer illusion, the circles remain fixed. Just as with the size-constancy illusion, this suggests that the extent of the lines in the visual field, and therefore the extent of the lines in our visual experience, remains fixed even though the ‘illusion’ persists:
The only way to make sense of this is to suggest that the Müller-Lyer illusion deceives our cognition (the tails of the lines biasing our type-1 understanding of what we see), rather than hijacking our visual system to distort the visual field itself.
In conclusion, if we admit both (a) type-1 cognitive biases, and (b) cognitive phenomenology, each of which seem relatively unobjectionable, then the real challenge for the perception/cognition divide will be differentiating the combination of these effects from (c) visual illusions (true perceptual biases). Whether this analysis can be applied to visual illusions more generally, including colour and contrast illusions, remains an open question.
Block, N. (2014). ‘Seeing-As in Light of Vision Science.’ Philosophy and Phenomenological Research, 89(3), 560-572.
Domini, F., & Caudek, C. (2011). ‘Combining Image Signals before Three-Dimensional Reconstruction: The Intrinsic Constraint Model of Cue Integration.’ In Trommershäuser, J., Körding, K., & Landy, M. S. (eds.) (2011). Sensory Cue Integration (Oxford: Oxford University Press).
Doorschot, P. C. A., Kappers, A. M. L., & Koenderink, J. J. (2001). ‘The combined influence of binocular disparity and shading on pictorial shape.’ Perception & Psychophysics, 63(6), 1038-1047.
Ernst, M. O., Banks, M. S., & Bülthoff, H. H. (2000). ‘Touch can change visual slant perception.’ Nature Neuroscience, 3(1), 69-73.
González, E. G., Allison, R. S., Ono, H., & Vinnikov, M. (2010). ‘Cue conflict between disparity change and looming in the perception of motion in depth.’ Vision Research, 50(2), 136-43.
Hillis, J. M., Ernst, M. O., Banks, M. S., & Landy, M. S. (2002). ‘Combining sensory information: mandatory fusion within, but not between, senses.’ Science, 298(5598), 1627-1630.
Keane, B. P., Silverstein, S. M., Wang, Y., Roché, M. W., & Papathomas, T. V. (2016). ‘Seeing more clearly through psychosis: Depth inversion illusions are normal in bipolar disorder but reduced in schizophrenia.’ Schizophrenia Research, 176(2-3), 485-492.
Kahneman, D. (2011). Thinking, Fast and Slow (New York: Farrar, Straus and Giroux).
Likova, L. T., & Tyler, C. W. (2003). ‘Peak localization of sparsely sampled luminance patterns is based on interpolated 3D surface representation.’ Vision Research, 43(25), 2649-2657.
Melnikoff, D. E., & Bargh, J. A. (2018). ‘The Mythical Number Two.’ Trends in Cognitive Sciences, 22(4), 280-293.
Qiu, F. T., von der Heydt, R. (2005). ‘Figure and ground in the visual cortex: v2 combines stereoscopic cues with gestalt rules.’ Neuron, 47(1), 155-166.
Rahnev, D., & Denison, R. N. (forthcoming). ‘Suboptimality in Perceptual Decision Making.’ Forthcoming in Behavioral and Brain Sciences.
Scarfe, P., & Hibbard, P. (2011). ‘Statistically optimal integration of biased sensory estimates.’ Journal of Vision, 11 (7), 12.
Smythies, J. (1996). ‘A Note on the Concept of the Visual Field in Neurology, Psychology, and Visual Neuroscience.’ Perception, 25(3), 369-371.
Todd, J. T., & Norman, J. F. (2003). ‘The visual perception of 3-D shape from multiple cues: Are observers capable of perceiving metric structure?’ Perception & Psychophysics, 65(1), 31-47.
Witt, J. K., Taylor, J. E., Sugovic, M., & Wixted, J. T. (2015). ‘Signal Detection Measures Cannot Distinguish Perceptual Biases from Response Biases.’ Perception, 44(3), 289-300.
Interesting argument. I wonder how you would deal with the image I have uploaded at https://i.imgur.com/tCWpPIT.png ? (I can’t include it here, unfortunately.) This is an effect that I discovered accidentally several years ago — a pattern of parallel sine waves that produce a Necker-cube-like effect, except that instead of the image flipping as a whole, the lines flip individually, leading to a whole percept that is globally inconsistent. (It is also interesting that if you rotate this image by 90 degrees, it becomes globally consistent and does not flip.)
Thank you Bill, that’s a fantastic image!
I think I can get the effect you have in mind by focusing on the outlines: if I focus on two of the peaks on the left as peaks in depth, then the trough between them should be a trough in depth, but if you follow it along, sometimes as you approach the right edge and you see the right peak associated with it, it begins to feel like a peak in depth. I get a similar feeling with impossible figures, e.g. https://www.fink.com/papers/trident.gif – solid cognition of the shape at either edge, and then a sort of fading away in the middle, rather than a simply an inability to make sense of it because of the inconsistency. You’re quite right, those occasional glimpses where it all seems to go wrong seem to rely on depth in the image being computed locally in spite of the effect that the eventual outcome is inconsistency. I know Koenderink, van Doorn, & Wagemans (2015): https://journals.sagepub.com/doi/abs/10.1177/2041669515615713 have explored the extent to which pictorial depth is computed locally, even though the eventual outcome is inconsistent, so it would be interesting to explore to what extent this also holds true of illusory depth?
There is a second question as to whether the illusory depth in your image is best thought of as perception or cognition. On this point I would make the same argument that I will make in tomorrow’s post, which is that we don’t have to rely on introspection. Instead, we can test it: if this is genuinely visual depth, then it should be the very same visual depth that stereo disparity occupies. So we could introduce points with various degrees of disparity into the image when it is viewed binocularly and test it. But my expectation is that this will reveal the image as flat so far as visual space is concerned: the peaks won’t reach up to touch points with disparity above the surface, nor the troughs reach down to touch points with disparity below the surface. In which case, in what sense is this depth visual, if all of the image is seen to be on the same plane in depth?
Thank you ever so much for your question, and your image! I really enjoyed it.