A Lesson from a Vision Scientist

Brendan Ritchie January 19, 2009 July 13, 2016 Uncategorized

Gualtiero’s recent post “Classicism, Connectionism, and the Harmonic Mind” has produced some great discussion, and I thought I would pick up a challenge that Edouard made in one of the comments, namely: providing a concrete example where we seem to know a lot about “how the representations involved in higher cognition are implemented in the brain!” Well, sort of. My example is not from higher cognition, but vision science. Still, I think the example, and the moral to be drawn from it, is relevant none the less.

I recently had the pleasure of hearing a vision scientist talk about some seminal work in vision science from the 1980s: that of David Marr and Isaac Biederman. Marr’s Vision is of course a classic in cognitive science; this is largely due to the important methodology he described, and the three “levels” of analysis: the computational, algorithmic, and implementational. For many philosophers with an eye towards cognitive science, the level of least interest for their research is the implementational level. But as I learned at the lecture, it dos not always seem to be appreciated how much of Marr’s own work–the parts that he got right, and the parts that he got wrong–depend on the implementational level.

What Marr got right. Marr described several different stages of visual representation after light hits the retina: the grey-level representation, the raw primal sketch, the primal sketch, the 2 1/2 D sketch and the 3D model. Where did the idea from these several different stages come from? The answer, to a large extent, is the groundbreaking work on the cat visual system by Hubel and Wiesel. Marr was, afterall, an accomplished neuroscientist, and, as it was explained to me, the first 4 stages were in a sense him articulating and synthesizing several important results obtained in from neuroscientific work on the visual system.

What Marr got wrong. Marr’s own theory involved the positing of primitives, specifically cones, from which perspective independent/invariant representations of objects are constructed (Biederman’s famous “Recognition by Components Theory” (RBC) also makes use of primitivies, what he calls geons…in fact, he might have been inspired by Marr: I cannot recall). During the lecture, it was reported that theories of vision that posited such primitives were, to a large extend, no longer serious contenders for understanding visual processes (despite continuing defense by their original proponents). The explanation was wonderfully simple: the theories made certain predictions about how the representations would be implemented in the brain, and these had not been born out. One problem, for example, is that representations of visual objects seem to always be prespective dependent, contra those theories that employ primitives.

So, this would not be an unfair characterization of what the lecturer said about Marr (if an overly simple one): what Marr got right, he got right because different kinds of visual representations in the brain were made manifest by careful research at the implementation level; what he got wrong, he got wrong because his theory made predictions concerning the structure of the brain that have not been born out. Vision seems to me to be a concrete example where we know a lot about the implementation of representations in the brain (though, it is not an example from higher cognition).

Also, I think the example is relevant to the previous discussion generated by Gueltiero’s post because of why we know (how visual representations are implemented): it is from doing careful neuroscience. Even assuming we don’t know a whole lot about how representations in higher cognition are implemented in the brain (which I do not think is true), there are important lessons to be drawn:

First, many philosophers and cognitive scientists are still overly dismissive of neuroscience because either (1) it concerns “mere implementation”, or (2) because we just don’t know a lot, or “enough”, about the brain (when would “enough” be?). Both views are mistaken, and they have been mistaken since Marr wrote Vision, if not earlier. Indeed, Marr stands out as an excellent example of how one can have a theory concerning species of representation that, even at more abstract levels, is largely shaped and informed by contemporary neuroscience–and we sure know a heck of a lot more about the brain now than when Marr wrote!!!

Second, I think many (myself included) are sceptical of classic theories (like LOT) because these theories often (i) fail to ingage with neuroscientific research because their proponents endorse either (1) or (2); or (ii) where one tries to reconcile them with contemporary neuroscientific research, the theories neither aid in explanation, or seem empirically well supported (much like the primitives posited by Marr and Biederman).

19 Comments

gualtiero

January 19, 2009 at 2:51 pm 17 years ago

Exactly! 🙂
Arnold Trehub

January 19, 2009 at 3:40 pm 17 years ago

One of the negative consequences of Marr’s influence was to fuel decades of work characterized by computational theories of vision and cognition that were unconstrained by the mere detail of biological plausibility. Theoretical models based on the structure and dynamics of putative brain mechanisms composed of neurons with credible biological properties were largely ignored. For a brief discussion of this issue, see *The Cognitive Brain* (MIT Press 1991), “Levels of Explanation”, pp. 3-5.
Martin Roth

January 19, 2009 at 4:37 pm 17 years ago

In addition to issues concerning implementation, visual science may also have something to teach us about how we should understand representational content. Versions of “indicator semantics” are quite popular, but as Rob Cummins and I argue in a forthcoming paper (“Meaning and Content in Cognitive Science,” a different picture emerges of the relation between indication and representation and of the composition of complex representations generally, if we examine the role the sort of indicators discovered by Hubel and Weisel play in the construction of visual images. One account of this is to be found in recent research by David Field and Bruno Olshausen.
Natural images contain much statistical structure as well as redundancies, but early visual processing effectively retains the information present in the visual signal while reducing the redundancies. In the 1950s, Stephen Kuffler discovered the center-surround structure of retinal ganglion cells’ response, and Joseph Atick showed that this arrangement serves to decorrelate these cells’ responses. As Horace Barlow had suspected, sensory neurons are assembled to maximize the statistical independence of their response. Olshausen and Field recently showed that the same is true of neurons in the primary visual cortex. While Hubel and Wiesel discovered that neurons in the primary visual cortex are sensitive to edges—thus their functional description as edge detectors—they did not know what the functional relevance of this structure was. According to Olshausen and Field, edge detection allows neurons in the primary visual cortex to respond in a maximally independent way to visual signals, thus producing sparsely coded representations of the visual field. They demonstrated this by constructing an algorithm that could identify the minimal set of maximally independent basis functions capable of describing natural images in a way that preserves all the information present in the visual signal. Because natural images typically contain edges, and because there are reliable higher-order correlations (three-point and higher) between pixels along an edge, it turns out that natural images can be fully described as composites of about a hundred such basis functions. Given the statistical structure of natural images in the environment, there was sure to be such a set of functions, but the striking thing is that these basis functions are similar to those Hubel and Wiesel found 40 years earlier: spatially localized and oriented edges. Recently, O’Reilly and Munakata showed how to train a neural network using conditional principle components analysis to generate a similar set of basis functions.
To see how visual representations are assembled out of these basis functions, consider a vector of V1 cortical cells connected to the same retinal area via the same small subset of LGN cells. Each cell has a receptive field similar to one in the minimal set of basis functions. An incorrect way to understand the assembly of
edouard machery

January 19, 2009 at 6:16 pm 17 years ago

Mmh – that’s really a strange dialectical move. I challenge you to give me a detailed example of the implementation of higher cognition, and you give me an example related to vision science.

But, look, my point was not that neuroscience cannot constrain our theories of mental states. Nor was it that we do not know anything about . We know some things, and there are interesting hypotheses to consider.

I was merely replying to what I saw as some exaggerated claims made by Gualtiero about the extent of our knowledge about how higher cognitive states and processes are realized in the brain. Just open any textbook in neuropsychology focused on decision, choice, judgment, and it will be clear that our knowledge is very imperfect.
Arnold Ttrehub

January 19, 2009 at 7:01 pm 17 years ago

I doubt if any here would deny that our knowledge is imperfect. I think the issue is where do we look to improve our knowledge. Past emphasis on abstract computational explanations have led to a dead end. Explanations of higher cognition will be found in the structure and dynamics of competent brain mechanisms and systems. Cognitive-brain theory in which task-relevant models of neuronal mechanisms are detailed and tested seems to be the most promising approach.
Brendan

January 19, 2009 at 10:54 pm 17 years ago

Edouard,

I guess I figured that if Sarah Palin can ignore the question she is asked, and answer another one, then so can I ;)…I talked about vision for other reasons as well, for example, Marr is an important figure, but as Arnold points out, many seem to have taken the wrong things away from his work. If what you mean by “higher cognition” are the sorts of things you listed, then those are certainly topics in which neuroscience textbooks merely point to frontal cortex and say something vague about “executive function”. However, three examples, which I also think qualify as higher cognition are (1) linguistic processing, (2) memory (of various forms), and (3) theory of mind/mindreading. While we do not know how these functions are implemented to the same degree as we do in the case of vision, I think we DO know alot about the representations involved (even if not to the level of single cells and receptive fields, etc).

Martin,

I agree that vision can teach us a lot about representational content; that sounds like a paper I need to read! However, a modest concern: there are many features of how the visual system represents things that do not hold true for other perceptual systems. 1. other “maps” in the brain, e.g. the tonotopic map in A1, are not nicely spatially organized in the same way as the retinotopic maps found in the visual system. 2. the visual system first integrates information from both eyes in layers 2-3 of V1, but for audition, it happens much earlier, in the brain stem. These differences in architecture are important for how these systems represent things (and seem to be based on the specific structure of the things they are trying to represent); so, I am a little wary of making any generalizations about representational content from such a specific form of representation in the brain. But I look forward to the paper!
Martin Roth

January 20, 2009 at 12:05 am 17 years ago

An incorrect way to understand the assembly of such visual representations is as the activation of a subset of such basis functions whose content is based solely on the information each cell receives from the relevant retinal region. If this were the case, cells in the primary visual cortex would serve to indicate activity in the LGN and, ultimately, of features in the visual field. Indeed, it would be simple to determine what is currently observed if the basis functions were completely independent and the noise factor was known, but neither is true. As a result, many distinct visual representations are compatible with the information present in the visual field. Lewicki and Olshausen have shown, however, that it is possible to infer a unique visual representation if the system has information about the probability of observed states of affairs and the probability of visual representations given observed states of affairs. Instead of assembling a visual representation from a set of indicator signals alone, the visual system may construct the representation from indicator signals and relevant probabilistic information about the visual environment.
The account that emerges here consists in the assembly of an image from a set of indicator signals that are surprisingly few in number, indicate multi-point correlations between adjacent pixels in the (whitened version of) the input, and whose success as detectors of their proprietary targets is due, to a large extent, to recurrent circuitry that effectively computes a Bayesian function in which the prior probabilities are determined by the features of neighboring areas of the developing image. Most important for our purposes, perhaps, their representational content is semantically disjoint from the image they compose in the same way that pixel values are semantically disjoint from the representational content of a computer graphic. Like maps and scale models, such representations have their content as a consequence of their geometry rather than their origins. Because such representations have the source-independence that indicator signals lack, they are candidates for the disciplined structure sensitive transformations that representations afford. Since such representations literally share the static and dynamic structure of their targets, properties of the targets can be learned by transforming the representations in ways that reflect the ways in which nature constrains the structure of, and structural changes in, the targets. As a result, faces can be aged, objects can be rotated or “zoomed” and three dimensional projections can be computed. None of this is possible in a system in which visual representations are semantically composed from constituents whose contents are determined by their indicator role.
Eric Thomson

January 20, 2009 at 4:55 am 17 years ago

I don’t have much time, I’m in DC right now partying. Interesting enough topic to throw my crap at the wall…

But I tend to agree with sceptics about how little we know about how things are implemented. Even visual representations. For instance, some think we have perspective invariant representations (e.g., the Jennifer Aniston cells discovered by Koch in humans), but in the article you say that all visual representations are perspective-dependent. Of course we have many interesting data, and lots of cool speculations about how it may all fit together, but there is basic argument about how distributed the code is (e.g., while ten years ago the ‘distributed population code’ was all the rage, recently the ‘sparse’ code (the extreme of which is the grandmother cell) has taken on popularity again.

Things are better than 20 years ago, but we really have a long way to go even in basic vision, even the retina! Too bad philosophers are so corticocentric as if they studied model systems more (retina, C elegans, the Fly, the leech), they might realize we don’t really know shit even about those systems. Of course this doesn’t mean that Fodor is right. Fodor’s anti neuro stance seems just wrongheaded, and for all we know (it is premature to get all cocky) his LOT theory is bungled. If I want to understand what the digestive system does, it is probably a good idea to study the digestive system. I don’t see why the nervous system would be any different. If anything, since it is more complicated than the digestive system, we should be even more concerned about getting attached to theories about the functional decomposition before the stuff doing the functioning is better understood.
Eric Thomson

January 20, 2009 at 5:09 am 17 years ago

Dretske should be able to handle the work of Lewicki etc..

I don’t buy that they’ll have their content as a function of their “geometry” alone. Origins are hard to get rid of. Churchland has this problem too in his state space semantics. How to tell which region of the world this map represents? There are many possible mappings from map to world. Origins or interactions seem important. Neuroscientists don’t learn how area A represents B by just looking at A, but by looking at how A response to B, C, D, etc..

Also, I am not convinced you can’t do the transformations you talk about in an indicator semantics. Dretske is fine with such things as long as there is a baptismal period where the neural structures get their original content, but then the brain can activate and transform these structures arbitrarily even in the absence of the proximal stimuli.

At any rate, I’d need to see more of your story to really judge it. It always worries me when philosophers put too much stock in provocative theories in neuroscience that are almost guaranteed to be wrong (Hodgkin Huxely notwithstanding). Of course, what else are you to do at this early stage of speculative neuroscience?

At DC partying so will have to come back to this when I get back.
Anibal

January 20, 2009 at 9:23 am 17 years ago

It is not the heterogenous tissue exemplified by the prefrontal cortex a case of higher level cogntion implementation in the brain?

The frontal cortex is the most recently evolved part of the brain and is roughly the seat of human social behaviour and executive functions (working memory, attention shift, inhibition of prepotent responses…)

The different subdivisions of the frontal cortex (orbitofrontal, ventrolateral and dorsolateral)perform distinct functions, and damage in one particular subdivision impairs a given cognitive task.

For example, damage in a specific site of the prefrontal cortex cause stimulus-bound or enviroment dependent syndromes, where the individual is stuck by the stimulus coming from the enviroment acting upon them without any reflection.

With damage in the prefrontal cortex we cannot evaluate or adjust our motor plans when error correcting feedback is present.

With damage in the prefrontal cortex working memory is disrupted and working memory, the ability to hold information on-line, is critical for cognitive demands.

And i think there are many cases of higher level cogntion implementation in the brain, even thought we don´t have enough knowledge about the brain at the moment.

Reference: Miller, E.K. and Wallis, J.D. (2009) Executive function and higher-order cognition: Definitions and neural substrates. In: Encyclopedia of Neuroscience, Volume 4, Squire LR (Ed.), pp 99-104. Oxford: Academic Press

(article available from E. K. Miller´s website)
Brendan

January 21, 2009 at 2:29 am 17 years ago

Eric,

Fair enough, concerning perspective invariance. I was probably speaking too strongly and loosely. I was just trying to quickly mention one sort of reason for why one might doubt the existence of the sorts of primitives posited by Marr Biederman.

Eric + Edouard,

Regarding scepticism about how much we know about implementation: I completely agree with your points about our general lack of knowledge (even in seemingly “well understood” areas like vision). But I was trying to make a relative claim.

Gualtiero claimed in the initial thread that we know a lot about the implementation of representations, and this does not look classical. I took your challenge, Edouard, to be: provide an example to back this up. Concerning whether our cognitive architecture is classical, to say we know a lot about such implementation is not an overstatement; I do think we know enough to defend Gueltiero’s point.

However, speaking generally, we do not know a lot in the details about how representations are implemented in the brain. And in this regard, it is easy to overstate things.

So, regarding whether our architecture is classical or not, I do think we know a lot. However, as you say Eric, “we don’t know shit” about lots of model systems, and our own brain, in detail; and this applies to the implementation of representations as well.

So, in short: I think healthy scepticism about what we know so far is consistent with the point I was trying to make.
Eric Thomson

January 21, 2009 at 4:03 am 17 years ago

That all seems reasonable.
Arnold Trehub

January 21, 2009 at 2:55 pm 17 years ago

Sure, we should all adopt a healthy skepticism. But I think we make a serious mistake if we frame our understanding of cognition simply in terms of knowledge. It’s more a matter of provisional belief than a matter of fixed knowledge. Scientific understanding advances within a changing framework of candidate theories and models judged by supporting and negating evidence gathered from all relevant sources.

There *is* a detailed theoretical model about how representations are implemented in the brain and it has considerable supporting evidence.
Martin Roth

January 23, 2009 at 4:14 pm 17 years ago

Hi Eric,

I tend to think that talk of baptism is talk of how representational targets are fixed. For example, Millikan distinguishes the content of a representation from the thing the representation is supposed to represent. The latter is fixed via selection, etc., but the former is not. To illustrate, consider maps. It might be that a mechanism that produces representations is supposed to produce representations that accurately portray Chicago. In that sense, the map’s “content” is Chicago, insofar as it is with respect to Chicago that the accuracy of the map is assessed. But the representation produced may be a rather inaccurate map of Chicago, and to talk of the accuracy or inaccuracy of the map requires some notion of content independent of the target. I think it is reasonable to say that the relevant content here is determined by the geometry of the map itself.
Eric Thomson

January 23, 2009 at 6:08 pm 17 years ago

An interesting view I’ll have to think about that. Ultimately I don’t see you being able to escape having Chicago interactions somewhere in the explanation of why this map represents Chicago and not Twin Chicago.
Martin Roth

January 24, 2009 at 2:13 pm 17 years ago

Depends on how we individuate the relevant interactions. If what drove selection were interactions with street structure, then the target would be that structure, a structure that TC shares with Chicago (surely, in the typical case it is not individuals that are targets, but properties that can be possessed by many individuals). I think more interesting cases to wonder about are those cases where a representing mechanisms produces a rather inaccurate representation R of some target X, but where R is rather accurate if applied to some other target Y. For example, suppose in generating a map of Chicago, the mechanism produces a map that is isomorphic to the street structure of Madison. If the creature that harbored such a map were dropped off in Madison and employed that map to get around successfully, I would want to say the creature was accurately representing the ss of Madison, but it looks like the selection account would deny this (since, according to those views, the ss of Madison could not be a target at that point; the mechanism was not selected to produce representations of that structure). To my mind, this is a knock on selectionist accounts of target fixation!
Eric Thomson

January 24, 2009 at 4:58 pm 17 years ago

The (internalist) map view has all the problems that other internalist accounts have, just transpose them to maps instead of semantic atoms.

For your case I could say it is using a crappy map of Chicago to get around Madison.

And still that doesn’t get at my original question, individuating Chicago and Twin Chicago. Also, what of Johnny and Twin Johnny? The (internalist) map type views can’t deal with individuals, and that’s because no internalist theory can deal with individuals. Making the smallest semantically viable unit the map rather than the symbol doesn’t get rid of problems with internalism.

Of course there are two different issues here. Maps versus symbols and internalist versus externalist. You can have a map based view of representation and be an externalist–that’s what I’ve been pushing for, and have at this blog for a while–a kind of marriage of Dretske and the Churchlands. Maps carry information about their target, but they carry information about infinite targets. How to narrow down to the right map (as I said originally, how to get the right mapping from map to world). Interactions of some sort, information, causality, whatever. Import the Dretskian apparatus, originally developed for symbols, to something more neural like Churchland’s state space semantics (which, for silly reasons in my opinion, he insists on keeping internalist).

I wrote about this here.

I had forgotten I came up with a good example there: “When we (neuroscientists) study how neural assemblies represent the
world, we don’t just study the neurons and how they relate to one
another and then try to find things in the world to map onto them. We
quantify the relationship using information-like quantities to more
directly see how the world and brain hook together. Sure, we often look
at the metric structure of the neuronal space and see how it maps onto
the metric structure of the world-space, but this is typically done in
conjunction with a more Dretsky-like approach.

If the auditory
cortex had a map that looked like the map of color space, we still
wouldn’t say it was representing color space if we didn’t see responses
to visual stimuli, but did see responses to auditory stimuli!”
Martin Roth

January 28, 2009 at 2:55 pm 17 years ago

Manny and Mary enter a map making contest. The winner is the person who makes the most accurate map of Chicago. Manny and Mary both make maps, and because both of their maps have Chicago as their target, both maps are maps of Chicago. Manny wins because his is a more accurate map. Now, if the contents of the maps were determined solely by their targets, then both maps would have the same content. But if both maps have the same content, and both have as their target Chicago, in what sense could Manny’s be a more accurate representation? To assess relative accuracy, we have to have a notion of content independent of targets.

Regarding the Madison example, yes, we can say that we are using a crappy map of Chicago to get around Madison. But to say that it is a map “of Chicago” potentially conflates content with target. Now, in the case where the map is being used to get around Madison, should we say that it is a map “of Madison.” I want to say yes. Here’s why. Appeals to representation in cognitive science are supposed to be explanatory. Why am I so successful in getting around Madison? Because I have an accurate map of it. If one were to insist that since the map was produced for the purposes of getting around Chicago and thus cannot have Madison as its target,one would be in the position of having to say that the map was not an accurate map of Madison (since Madison is not the target of the map, and accuracy is relative to target). But then one cannot explain successfully navigating Madison by citing the accuracy of the map. But this looks plain wrong. Imagine that in the middle of the night someone steals your street map of Chicago from your car. Miraculously, however, a physical duplicate of your map suddenly materializes in your glove box (“swampmap”). You know nothing about this happening, and in the morning you go to your car and pull out the new “map.” The map is just as helpful in getting around Chicago as the original map would have been. If we insist that swampmap does not *really* represent anything, then I think we’ve lost sight of why appeals to representations are important in cognitive science. Note, too, that it is no good to say that swampmap gets content as it is being used to get around the city, because content is supposed to explain how you manage to get around the city so well. It does not have content because it was used with success; it was used with success because it had the right content.

As for auditory cortex, I agree we “wouldn’t say it was representing color space” if it were not responding to visual stimuli. I think this means: the content of the auditory cortex is not being used to represent certain targets. Again, to say x represents y could mean that x has y as its target. Since targets cannot be determined by inspecting neural regions alone, of course we have to use other means to figure out targets.
Eric Thomson

January 28, 2009 at 10:09 pm 17 years ago

An ant could walk around on an etch-a-sketch and trace out the shape ‘Snow is white’, but that doesn’t mean the shape has semantic properties. Origins matter. If the ant etched out something that you used to get around Madison, that doesn’t mean that it was a map of Madison. Perhaps after you come to use it as such, treat it as such, it derivatively acquires such contents.

I don’t know how much we should trust our semantic intuitions about
things like swampmen, especially since reasonable people’s intuitions
tend to differ. But clearly he would have internal states that carry
information about certain features of the world, and he uses these
internal states to navigate those features of the world. That is, he
has what I have called ‘biorepresentations’ (I wrote a long post about
this here and am a big fan of such representations).

Most importantly for me, you haven’t really answered my original question (Twinearth type worries applied to individuals rather than natural kinds). You could get
around Twin Earth just fine. You’ll navigate with Twin Mom in Twin
Chicago really well with your maps of Earth. When you think, “Mom sure is great” are you thinking of mom, or Twin Mom? We could explain why you are
navigating so well: the map you learned that has the content
Mom/Chicago, happens to map well to the Twin Mom/Twin Chicago world,
even though the map isn’t actually referring to the Twins. Or we could use your strategy: your neural states suddenly undergo a dramatic transposition of contents when you are shifted to Twin Earth. Such instability seems a bad consequence.