Seeing Depth with One Eye and Pictorial Space

In my second post I questioned whether the integration of pictorial cues and binocular disparity occurs at the level of perception. In this third post, I push the argument further by questioning whether pictorial cues contribute to 3D vision at all.

1. ‘Monocular Stereopsis’ (Seeing Depth with One Eye)

It is often assumed that according to vision science ‘monocular vision’ (vision with one eye) must be flat since stereoscopic depth (the vivid impression of depth with two eyes) is the product of binocular disparity (the difference in perspectives between of the two eyes). This may have been the consensus in the 1950s-60s (see Ogle, 1959), but it was thoroughly rejected in the mid-1990s by two papers: (1) Koenderink, van Doorn, & Kappers (1994) and (2) Landy, Maloney, Johnston, & Young (1995). Landy et al. (1995) is one of the defining papers of Cue Integration discussed yesterday. By contrast Koenderink et al. (1994) is responsible for reviving a long-forgotten tradition of ‘monocular stereoscopy’ from the early-20^th Century (see O’Shea, 2017), which suggests that once the surface cues of pictures (such as frames and reflections) have been removed the remaining pictorial cues – such as perspective, shading, and occlusion – provide a vivid impression of depth that is qualitatively (and perhaps also quantitatively) equivalent to binocular stereopsis (two eyed depth perception).

There are a few notable voices of dissent, most notably Westheimer (1994) who insists that ‘real stereo sensation is absent with monocular viewing’ and Parker (2016) who appears to suggest that ‘a direct sense of depth’ only emerges with binocular vision.

In my book I investigate a third alternative, namely that monocular vision produces visual depth when we look at the real world, but not when we look at a 2D image (even when surface cues have been removed). I argue that (a) pictorial cues only function at the level of cognition, and so (b) in order to explain the experience of depth with one eye in the real world we have to rely on something missing in pictures: the optical defocus blur caused by objects in the real world being at different distances.

a. Monocular Depth from Pictorial Cues

There is significant empirical evidence that subjects experience true stereoscopic depth when viewing 2D images with one eye through an aperture: see Vishwanath & Hibbard (2013) and Volcic, Vishwanath, & Domini (2014): https://www.sciencenews.org/article/3-d-effects-may-require-one-eye-only

But my concern with this literature is twofold:

1. Cue-conflict: In yesterday’s post I suggested that although cue-conflict studies appear to provide evidence of binocular disparity and pictorial cues feeding into a common perceptual process, this evaporates with minor tweaks of the stimulus.

2. Synoptic viewing: The effect of ‘monocular stereopsis’ is reportedly enhanced by synoptic viewing (using mirrors to direct an identical view of the same image to both eyes): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5405891/figure/fig1-2041669517699220/

Although there are some great papers on synoptic viewing (Koenderink et al., 1994; Wijntjes et al., 2016; Wijntjes, 2017), I don’t believe this hypothesis has been fully evaluated. If depth from pictorial cues really does produce the same visual depth as binocular disparity then we should be able to introduce points with binocular disparity into the image and see how they relate to depth in the picture (which is defined purely by pictorial cues). Do the peaks in the picture really reach out to touch points with disparity above the surface, or the troughs in the picture really reach down to touch points with disparity below it?

My prediction is that they do not. Partial confirmation of this comes from another test of pictorial depth that Koenderink et al. (2015) employ, namely Tissot gauges (circles that can be rotated in perspective about a point in the image until they look flush with the surface being depicted):

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTHjXNkDovj2IB_Sy_VH6PvrUtISxn0crp8OA2bmBg6TxRAf10DGg

Tissot gauges function well when the circles are defined by perspective alone, but once disparity cues are added most of the reported depth disappears (see Bernhard et al., 2016).

b. Monocular Depth from Optical Defocus Blur

But when we look at the world with one eye, the world doesn’t appear flat. Is this also merely a cognitive effect? I don’t believe so. Instead, we can appeal to the fact that unlike objects in a picture, objects in the real-world are at different distances, and so are in and out of focus by different degrees. And I believe the visual system can take advantage of this optical defocus to specify depth with one eye, so long as we recognise:

1. Optical not pictorial cue: What the visual system is relying upon is the actual optical stimulation of the retina, not our interpretation of defocus blur as a pictorial cue. It is for this reason that only genuine optical defocus blur, not simulated or pictorial defocus blur, gives rise to convincing depth: see Zannoli et al. (2016).

2. Sub-threshold: The amount of defocus blur is often too small to subjectively notice. Indeed, it is only apparent 4% of the time, which leads Sprague et al. (2016) to argue that it is a cue of very limited application. By contrast, I argue that the visual system is able to take advantage of sub-threshold blur (blur that we do not, and could not, subjectively notice), opening up defocus blur as a cue with general application.

2. Pictorial Space

Since I regard pictorial cues as merely cognitive, it follows that my account of pictorial space also be merely be cognitive as well. I would describe viewing pictures as an ‘understanding experience’ (or ‘cognitive phenomenology’) akin to reading words in a familiar language. For instance, when Mike May, who was born blind, gained his sight he couldn’t recognise drawings of a cube for 6+ months, describing them as ‘squares with lines’ (Fine et al., 2003; see also Gregory & Wallace, 1963). Pictures remained an unfamiliar language to him even though, I would argue, his visual experience was the same as ours, in much the same way that words on a page were unfamiliar to him, even though his visual experience of them was the same as ours.

This is a wider application of ‘cognitive phenomenology’ than is encountered in the literature, which tends to presuppose pictures fall on the ‘perceptual’ rather than ‘cognitive’ side of the divide (see Bayne & Montague, 2011). Other ‘perceptual’ phenomena that I would regard instances of ‘cognitive phenomenology’ include (a) the recognition of icons and symbols ✞, that lie somewhere between pictures and words, (b) the different interpretations of human gestures such as nodding, waving, etc. by different cultures, (note: both (a) and (b) avoid the imputation of an internal voice which has plagued cognitive phenomenology accounts of reading), (c) ‘seeing’ moves in mathematics or on a chessboard, and even (d) Gibsonian affordances such as ‘seeing’ that my car will fit through a gap / in a parking space.

Articulating pictorial space in merely cognitive terms is as unfashionable in vision science as it is in philosophy. But there is a slight, but significant, difference in the way the two disciplines approach this issue. Philosophical discussions tend to distinguish between (a) seeing a 2D picture surface and (b) seeing a 3D depicted scene (or, at least, a 3D scene that the image conveys: Briscoe, 2016; Nanay, 2018). By contrast, vision scientists tend to go directly to the 3D scene being conveyed and notice that the 3D scene itself is subject to cue-conflict: perspective and/or shading telling me that it has depth, but binocular disparity cues *in the 3D scene itself* telling me that it is flat. I think this is an important distinction: when we notice the flatness of a 2D film compared to a 3D film, we are attending to the flatness of the objects *in the 3D scene itself*, and not the flatness of the 2D picture surface. Understood in this way, picture perception is not special, but merely an instance of the cue-conflict we discussed yesterday (see Koenderink, 1998; cf. Vishwanath, 2010; 2014).

References

Bayne, T., & Montague, M. (2011). ‘Cognitive Phenomenology: An Introduction.’ In T. Bayne & M. Montague (eds.), Cognitive phenomenology. (Oxford: Oxford University Press).

Bernhard, M., Waldner, M., Plank, P., Soltészová, V., & Viola, I. (2016). ‘The accuracy of gauge-figure tasks in monoscopic and stereo displays.’ IEEE Computer Graphics and Applications, 36(4), 56-66.

Briscoe, R. (2016). ‘Depiction, Pictorial Experience, and Vision Science.’ Philosophical Topics, 44(2), 41-87.

Fine, I., Wade, A. R., Brewer, A. A., May, M. G., Goodman, D. F., Boynton, G.M., et al. (2003). ‘Long-term deprivation affects visual perception and cortex.’ Nature Neuroscience, 6(9), 915-916.

Gregory, R. L., & Wallace, J. G. (1963). ‘Recovery from early blindness: A case study.’ Experimental Psychology Society Monograph, no. 2.

Koenderink, J. J. (1998). ‘Pictorial relief.’ Philosophical Transactions of the Royal Society A, 356, 1071-1086.

Koenderink, J. J., van Doorn, A. J., & Kappers, A. M. L. (1994). ‘On so-called paradoxical monocular stereoscopy.’ Perception, 23, 583-594.

Koenderink, J., van Doorn, A., & Wagemans, J. (2015). ‘Part and Whole in Pictorial Relief.’ i-Perception, 6(6), 1-21.

Landy, M. S., Maloney, L. T., Johnston, E. B., & Young, M. (1995). ‘Measurement and modeling of depth cue combination: In defense of weak fusion.’ Vision Research, 35, 389-412.

Nanay, B. (2018). ‘Threefoldness.’ Philosophical Studies, 175(1), 163-182.

Ogle, K. (1959). ‘The theory of stereoscopic vision.’ In S. Koch (ed.). Psychology: A study of a science, vol. I, sensory, perceptual and physiological formulations. (New York: McGraw Hill).

O’Shea, J. (2017). ‘Claparède (1904) on Monocular Stereopsis: History, Theory, and Translation.’ i-Perception, 8(5), 1-10.

Parker, A. J. (2016). ‘Vision in our three-dimensional world.’ Philosophical Transactions of the Royal Society B, 371(1697), 20150251.

Sprague, W. W., Cooper, E. A., Reissier, S., Yellapragada, B., & Banks, M. S. (2016). ‘The natural statistics of blur.’ Journal of Vision, 16(10), 23, 1-27.

Vishwanath, D. (2010). ‘Visual information in surface and depth perception: Reconciling pictures and reality.’ In L. Albertazzi, G. van Tonder, & D. Vishwanath (Eds.), Perception beyond inference: The informational content of visual processes. (Cambridge, MA: MIT Press).

Vishwanath, D. (2014). ‘Towards a new theory of stereopsis.’ Psychological Review, 121(2), 151-178.

Vishwanath, D., & Hibbard, P. B. (2013). ‘Seeing in 3D with just one eye: Stereopsis without binocular vision.’ Psychological Science, 24(9), 1673-1685.

Volcic, R., Vishwanath, D., & Domini, F. (2014). ‘Reaching into pictorial spaces.’ Proceedings of SPIE, 9014: Human Vision and Electronic Imaging XIX.

Westheimer, G. (1994). ‘The Ferrier Lecture, 1992. Seeing depth with two eyes: stereopsis.’ Proceedings of the Royal Society B, 257(1349), 205-214.

Wijntjes, M. W. A., Füzy, A., Verheij, M. E. S., Deetman, T., & Pont, S. C. (2016). ‘The synoptic art experience.’ Art & Perception, 4(1-2), 73-105.

Wijntjes, M. W. A. (2017). ‘Ways of Viewing Pictorial Plasticity.’ i-Perception, 8(2), 1-10.

Zannoli, M., Love, G. D., Narain, R., & Banks, M. S. (2016). ‘Blur and the Perception of Depth at Occlusions.’ Journal of Vision, 16(6), 17.