The Necker cube

Hi everyone,

You probably know these lines from Vision: 

“… To be sure, part of the explanation of [the Necker cube’s] perceptual reversal must have to do with a bistable neural network (that is, one with two distinct stable states) somewhere inside the brain… ” (Marr, 1982 p.25-26) 
How do the current neurocognitive theories explain the Necker cube?  Anyone?



  1. Bistable perception of ambiguous stimuli is directly concern with “binocular rivalry” a topic of study within cognitive neuroscience to deal with which parts of the brain represent what parts of objects when and how. There are many paradigms in use to untangle this problem ranging from statistics of directional data, neural modelling, physiology…Recently has been published a volume entiled “Binocular Rivalry” by MIT Press.

    One of the main conlusions is that there is a compettion betwen two visual representations and therefore our perceptual mechanisms cannot decide which to favour producing the bistable perception never to be settle down in terms of one or the other subsequently alternating temporarily.

    From a philosophical perspective Fiona Macpherson is studying this phenomenon stipulating what makes visual experience contenful.

  2. anna-mari rusanen

    What am I doing wrong? The blog does not recognize the link sent by Chris. I´ve tried and tried to change it, but nothing happened. Per che?

  3. anna-mari


    I think that I know the connectionist model you are referring to. Now, the next question, of course, is: How does it model the Necker Cube phenomenon…

    How are the “cubes” A and B distinguished in that model? By interpretation or by the network itself… I´ll say by interpretation.

  4. Anna-Mari


    no more links, please:)… (I tried – at least for 45 minutes – to fix that link problem yesterday. I tried everything that I could imagine, but I could not fix it. Hence: This blog is just an *evil* creature. It does not understand that I have *FEELINGS*.

    Ok, and now back to your yesterday`s nice post. What do you (and honestly please) think about that one: “… two visual representations and therefore our perceptual mechanisms cannot decide which to favour…”? How should this be explained – what is your opinion? Can it be explained “only neurally” or does it (necessarily perhaps) require something else?

    Perhaps this question is (more or less) absurd, let`s see.

  5. It´s not an absurd question. But my guess with respect to “visual tricks” (“visual illusions”…)is that we have to look for an answer at the neural level if we want to find a satisfactory answer. I think we can categorize “visual tricks” within low-level mechanisms devoid of any cognitve processing, where more cognitive theories have not a say properly. This is my guess, but perhaps i´m wrong.

  6. I´m not wrong, i´m completely wrong. One promising avenue of exploration of perceptual illusions such as the necker cube could be through the “vision-like-touch metaphor” (contemporary prponents are A. Nöe among others).

    Maybe, after all, people like Herbert Spencer or the irish bishop George Berkeley were right. They thought that the manipulation of abstact objects by volition or intention, are akin to the operations of the hand in the manipulation of tangible objects.

    What is clear is that the necker cube is two dimensional figure but this is a anti-natural image, not present in the world. The retinal input is also a two-dimensional image, but it needs to be interpreted in a three-dimensional way to capture reality. So to percive stereoscopically we have to interpret. How we interpret, perhaps is based on knowledge of sensorimotor contingencies or regularities, or perhaps is via a set of rules encoded in the genome to be expresed in the visual system.

  7. Anna-Mari


    this is cool: “I´m not wrong, i´m completely wrong”! Wow.

    Two short comments: What does this ” vision-like-touch metaphor” mean?

    And the second one: If I understand this “What is clear is that the necker cube is two dimensional figure but this is a anti-natural image, not present in the world. The retinal input is also a two-dimensional image, but it needs to be interpreted in a three-dimensional way to capture reality” correctly, I agree. And this is precisely the reason why I think that the Necker cube is so fascinating.

    However, did you really mean this: “How we interpret, perhaps is based on knowledge of sensorimotor contingencies or regularities, or perhaps is via a set of rules encoded in the genome to be expresed in the visual system.” Could you open this a bit, please?

  8. Disclosing a bit: if we agree with the main thesis of the enactive approach to perception (Nöe 2004)_perception is not something that only happens inside our “heads” but also something that we create in accordance with knowledge of the sensorimotor regularities or contingencies when we move or when objects move but we sense them in motion_ we acquire the depth peception, taking literally from the fact, that when we grab an object with our hands we can rotate it to see it from multiple points of view.

    Perhaps our inner mental ability to rotate objects in our mind´s eye could be an evolve generalition from the hand´s abilty to manipulate, coopted by the brain´s visual system (and therefore has something to do with the way the hands pick, manipulate, those very objects to extract invariable angles in every geometrical figure). In other words, the mechanism that brains use to disambiguate objects in vision probably derived from some other previous resources within other sensory modalities that first encountered the same problems of ambiguity. In fact, the somaesthetic senses (tocuh, position or propioception, temperature sensivity) appear first in ontogenetic development an the later senses, audition and vision, after them in that order.

    The kind of interpretation in sensorimotor process owes too much from unconscious mchanisms so this save us for intelectualize in excess the mind (always in accordance with the enactive view).

    This seems plausible, because even if we want hard-nose evidence, we have it! The temporal lobe where some visual related areas in the brain are located and where invariance is extracted, that is, stereoscopy, has also reciprocal connections with other parts of the brain particularly with the inferior frontal gyrus, or Broca´s area, the site not only of executive parts of language but also of some receptive fields for hand representation.
    The eminent neuroscientist, Edmund T. Rolls, corroborate this in his latest book on page 71-72.

  9. kenneth aizawa

    Anibal, writes,

    “What is clear is that the necker cube is two dimensional figure but this is a anti-natural image, not present in the world,”

    suggesting that perhaps the illusion has to do with the two dimensional nature of the image.  But, I happen to have in my office a 3D stick cube.  It too produces the flipping illusion.  So, the illusion is not simply due to the use of a two-dimensional image.

  10. Anna-Mari

    Hi Ken,

    A really, really interesting comment. However, I have to defend Anibal here. I am not sure, whether he really meant to say that the flipping illusion is always due to the use of a two-dimensional data.

    It seems to me that he is just saying that the source of the “ambiguity” is not present in the world, but in the brains… I mean if I had write what Anibal wrote, I would have put it this way (I have to admit, this is copy pasted from a paper we – I and Otto – sent to one conference couple of weeks ago):

    “It is the interpretations which explain what the bistable networks inside the brain are all about. What this also means is that the phenomenon of (mechanisms for) bistable representational states may be explained with reference to the computational task of deriving three-dimensional descriptions of objects from two-dimensional data (and the ensuing ambiguity).”

    Does it really make a crucial difference from the philosophical perspective, if there are another forms of flipping illusions… Of course from empirical perspective it is really interesting, but… Okay, I have to think about this. And check some details about retinal images…

    Thanks, a good point.

  11. anna-mari rusanen


    After few minutes (hard) thinking I´ll have to add this:

    It does not really matter, whether or not the actual “stimulus “is three  (a 3D- cube) or two (a drawing of Necker cube) dimensional. The _retinal “image”_ is two dimensional in both cases, right?

  12. kenneth aizawa

    Yes, but it seemed to me that perhaps Anibal was suggesting that it does matter whether the environmental stimulus is a 3D cube or a 2D drawing.  Maybe I’m reading him incorrectly.

  13. kenneth aizawa

    You write,

    It seems to me that he is just saying that the source of the “ambiguity” is not present in the world, but in the brains…

    Here I am not sure I am in agreement with you.  It is true that a 3D cube in the physical world is not ambiguously oriented.  There is a physical fact of the matter about how it is oriented.  The 2D drawing is also not ambiguous.  It’s not a cube at all.  But, consider the pattern of light passing through, say, the pupil.  (You might prefer the “retinal image”.)  This pattern is ambiguous about what gave rise to it.  That is, that pattern of light could have been produced by a 3D cube in one orientation or a 3D cube in another orientation or a 2d drawing.    The necker cube is a case where humans perceive both interpretations of the ambiguous light passing through the pupil.  So, the perceptual fact is in the brain.  On this last score, there are, however, cases of ambiguity of the light passing through the pupil in which we do not perceive the alternative interpretations.  This fact is “in the brain.”

    Think of the Ames room.  What one sees in the Ames room could be the product of, say, one twin being much larger than another in a normal room or two twins of the same size in a distorted room.  Here we perceive only one alternative of the ambiguity.  So, the distal world, so to speak, is not ambiguous.  (You have twins of the same size in a distorted room.)  The light passing through the pupil is ambiguous as to what is producing it (i.e. either the normal room or the distorted room).  The interpretation human brains place on it, finally, is univocal.  On this last score, it is disanalogous to the necker cube.

  14. Anna-Mari


    This is a bit hilarious, but I can actually accept the both parts of your post.

    I can accept the Ames- room story, since I think – as you write – that it is “disanalogous” to the Necker cube. The task of visual system is different in Ames, and so are the neurocognitive visual mechanisms (as far as I know). (And the next episode will be this: A naughty cogneuroscientist will come and tell me that I am complitely wrong here.)

  15. anna-mari


    I am glad too. However, I may be disagreeing with myself… (It happens every now and then…)

    Some questions, I`d really love to hear your opinion about them.

    (i) do you agree with me, if I claimed that the main task of visual system is -in both cases – to produce 3D- images from 2-dimensional data?

    (ii) do you agree with me here: In order to find the differences between Ames-task and Necker-task one must decompose the main task to it`s subtasks?

  16. kenneth aizawa

    Regarding (i), I think it is probably too strong to suppose that the visual system produces a 3D image from 2D data.  A weaker claim is that the visual system produces a representation of some 3D information from 2D data.  The difference here is between the mind producing an image, which I take to be a particular kind of representation of 3D information, and between producing just some, maybe sentential, form of representation of 3D information.

    Regarding (ii), I am not sure what you have in mind.  Talk of “the differences between Ames-task and Necker-task” is a little underspecified.  Clearly there are differences of the obvious sort, say, one involves a room and another a cubelike thing.  Those are uninteresting differences.  So, you would need to specify the interesting differences.  That seems like at least a minor challenge.

    Suppose, however, you do that.  The options for finding “the difference” that come to my mind are that perhaps there is one set of visual processing principles that apply to both stimuli, but yield one output in one case and another output in another case.  Alternatively, it could be that the two stimuli for some reason engage different visual processing principles, hence produce different outputs. 

  17. anna-mari rusanen

    (i) Yes, you are right. It would be much more appropriate to talk about “visual representations ” in the case of vision, not “images”. I did not mean it quite literally.

    However, the issue that actually puzzles me, is this: If one considers the Necker and Ames- cases, there are at least two angles of attack.

    1. If one thinks this “from a rational model of cognition point of view”, in  the case of Necker, there are two equally rational interpretation (of the stimulus) and thus the flip-flapping happens. Visual system just cannot “decide”, which interpretation it should prefer. (And there is no asymmetry between the truth-values of these interpretations.)

    However, in the case of Ames there is just one interpretation, which is clearly false i.e. untrue, but still very rational and “likely”. 

    2. And now; if one thinks that where the difference between these two cases is from the neurocognitive point of view, the differences must be somewhere inside the subtasks of visual system (if the main task is in both cases to produce 3D-whatevertheyare). In the case of Ames the illusion is produced by manipulating the lightning, shadows, and so on. In Necker there is nothing like that – except the drawing. 

    And my problem is that I just do not get this “right”. Why is the Necker case different, how is the task analysis done in “lower level”s and what an earth rationality has got to do with this… 

Comments are closed.

Back to Top
%d bloggers like this: