Applying top-down analysis to the visual system

We can get a better sense of how this general model of top-down analysis works in practice by looking at how Marr applied it in thinking about human vision. The first point to note is that Marr’s model is very interdisciplinary. His thinking at the computational level about what the visual system does was strongly influenced by research into brain-damaged patients carried out by clinical neuropsychologists. In his book he expli- citly refers to Elizabeth Warrington’s work on patients with damage to the left and right parietal cortex– areas of the brain that when damaged tend to produce problems in perceptual recognition.

Warrington noticed that the perceptual deficits of the two classes of patient are fundamentally different. Patients with right parietal lesions are able to recognize and verbally identify familiar objects provided that they can see them from familiar or“conventional” perspectives. From unconventional perspectives, however, these patients would not only fail to identify familiar objects but would also vehemently deny that the shapes The three levels at which any machine carrying out an

information-processing task must be understood

Computational theory Representation and algorithm Hardware implementation

What is the goal of the

computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?

How can this computational theory be implemented? In particular, what is the representation for the input and output, and what is the algorithm for the transformation?

How can the representation and algorithm be realized

physically?

Figure 2.10 A table illustrating the three different levels that Marr identified for explaining information-processing systems. Each level has its own characteristic questions and problems. (From Marr1982)

they perceived could possibly correspond to the objects that they in fact were.Figure 2.11

provides an example of conventional and unconventional perspectives.

Patients with left parietal lesions showed a diametrically opposed pattern of behavior. Although left parietal lesions are often accompanied by language problems, patients with such lesions tend to be capable of identifying the shape of objects. One index of this is that they are as successful as normal subjects on matching tasks. They have little diffi- culty, for example, in matching conventional and unconventional representations of the same object.

Marr drew two conclusions about how the visual system functions from Warrington’s neuropsychological observations. He concluded, first, that information about the shape of an object must be processed separately from information about what the object is for and what it is called and, second, that the visual system can deliver a specification of the shape of an object even when that object is not in any sense recognized. Here is Marr describing how he used these neuropsychological data to work out the basic functional task that the visual system performs.

Elizabeth Warrington had put her finger on what was somehow the quintessen- tial fact about human vision – that it tells us about shape and space and spatial arrangement. Here lay a way to formulate its purpose– building a description of the shapes and positions of things from images. Of course, that is by no means all that vision can do; it also tells us about the illumination and about the reflectances of the surfaces that make the shapes – their brightnesses and colors and visual tex- tures – and about their motion. But these things seemed secondary; they could be hung off a theory in which the main job of vision was to derive a representation of shape. (Marr1982: 7)

Figure 2.11 The image on the left is a familiar or conventional view of a bucket. The image on the right is an unfamiliar or unconventional view of a bucket. (From Warrington and Taylor1973)

So, at the computational level, the basic task of the visual system is to derive a representation of the three-dimensional shape and spatial arrangement of an object in a form that will allow that object to be recognized. Since ease of recognition is correlated with the ability to extrapolate from the particular vantage point from which an object is viewed, Marr concluded that this representation of object shape should be on an object- centered rather than an egocentric frame of reference (where an egocentric frame of reference is one centered on the viewer). This, in essence, is the theory that emerges at the computational level.

Exercise 2.9 Explain in your own words why Marr drew the conclusions he did from Elizabeth Warrington’s patients.

Moving to the algorithmic level, clinical neuropsychology drops out of the picture and the emphasis shifts to the very different discipline of psychophysics– the experimental study of perceptual systems. When we move to the algorithmic level of analysis we require a far more detailed account of how the general information-processing task identified at the computational level might be carried out. Task-analysis at the computational level has identified the type of inputs and outputs with which we are concerned, together with the constraints under which the system is operating. What we are looking for now is an algorithm that can take the system from inputs of the appropriate type to outputs of the appropriate type. This raises a range of new questions. How exactly is the input and output information encoded? What are the system’s representational primitives (the basic “units” over which computations are defined)? What sort of operations is the system performing on those representational primitives to carry out the information- processing task?

A crucial part of the function of vision is to recover information about surfaces in the field of view– in particular, information about their orientation; how far they are from the perceiver; and how they reflect light. In Marr’s theory this information is derived from a series of increasingly complex and sophisticated representations, which he terms the primal sketch, the 2.5D sketch, and the 3D sketch.

The primal sketch makes explicit some basic types of information implicitly present in the retinal image. These include distributions of light intensity across the retinal image– areas of relative brightness or darkness, for example. The primal sketch also aims to represent the basic geometry of the field of view.Figure 2.12gives two illustrations. Note how the primal sketch reveals basic geometrical structure– an embedded triangle in the left figure and an embedded square in the right.

The next information-processing task is to extract from the primal sketch information about the depth and orientation of visible surfaces from the viewer’s perspective. The result of this information processing is the 2.5D sketch. The 2.5D sketch represents certain basic information for every point in the field of view. It represents the point’s distance from the observer. Figure 2.13 is an example from Marr’s book.

The final information-processing stage produces the representation that Marr claims it is the job of the early visual system to produce. The 2.5D sketch is viewer-centered. It depends upon the viewer’s particular vantage point. One of the crucial things that the visual system allows us to do, though, is to keep track of Figure 2.13 An example of part of the 2.5D sketch. The figure shows orientation information, but no depth information. (Adapted from Marr1982)

Figure 2.12 Two examples of Marr’s primal sketch, the first computational stage in his analysis of the early visual system. The primal sketch contains basic elements of large-scale organization (the embedded triangle in the left-hand sketch, for example). (Adapted from Marr1982)

objects even though their visual appearance changes from the viewer’s perspective (because either the object or the viewer is moving, for example). This requires a stable representation of object shape that is independent of the viewer’s particular viewpoint. This viewer-independent representation is provided by the 3D sketch, as illustrated in Figure 2.14.

These are the three main stages of visual information processing, according to Marr. Analysis at the algorithmic level explains how this information processing takes place.

At the algorithmic level the job is to specify these different representations and how the visual system gets from one to the next, starting with the basic information arriving at the retina. Since the retina is composed of cells that are sensitive to light, this basic information is information about the intensity of the light reaching each of those cells. In thinking about how the visual system might work we need (according to Marr) to

Human

Arm

Forearm

Hand

Figure 2.14 An illustration of Marr’s 3D sketch, showing how the individual components are constructed. The 3D sketch gives an observer-independent representation of object shape and size. (Adapted from Marr1982)

think about which properties of the retinal information might provide clues for recover- ing the information we want about surfaces.

What are the starting-points for the information processing that will yield as its output an accurate representation of the layout of surfaces in the distal environment? Marr’s answer is that the visual system needs to start with discontinuities in light intensity, because these are a good guide to boundaries between objects and other physically relevant properties. Accordingly the representational primitives that he identifies are all closely correlated with changes in light intensity. These include zero-crossings (registers of sudden changes in light intensity), blobs, edges, segments, and boundaries. The algorithmic description of the visual system takes a representation formulated in terms of these representational primitives as the input, and endeavors to spell out a series of computational steps that will transform this input into the desired output, which is a representation of the three-dimensional perceived environment.

Moving down to the implementational level, a further set of disciplines come into play. In thinking about the cognitive architecture within which the various algorithms computed by the visual system are embedded we will obviously need to take into account the basic physiology of the visual system– and this in turn is something that we will need to think about at various different levels. Marr’s own work on vision contains relatively little discussion of neural implementation. But the table from his book shown here asFigure 2.15illustrates where the implementational level fits into the overall picture.Figure 2.16is a more recent attempt at identifying the neural structures underlying the visual system.

Marr’s analysis of the visual system, therefore, gives us a clear illustration not only of how a single cognitive phenomenon can be studied at different levels of explanation, but also of how the different levels of explanation can come together to provide a unified analysis. Marr’s top-down approach clearly defines a hierarchy of explanation, both delineating the respective areas of competence of different disciplines and specifying ways in which those disciplines can speak to each other. It is not surprising that Marr’s analysis of the visual system is frequently taken to be a paradigm of how cognitive science ought to proceed.

Key:

V1–V8: areas of the visual cortex in the occipital lobe (the back of the head). V1 produces the color and edges of the hippo but no depth. V2 produces the boundaries of the hippo. V3 produces depth. V4/V8 produces color and texture.

MT: medial temporal area (often used interchangeably with V5). Responsible for representing motion.

MST: medial superior temporal area. Responsible for representing size of the hippo as it gets nearer in space.

LIP: lateral intraparietal area. Registers motion trajectories.

FST: fundus of the superior temporal sulcus. Discerns shape from motion. TE: temporal area. Along with LOC, is responsible for shape recognition. LOC: lateral occipital complex

Everyday experience, coarse psychophysical demonstrations Representational problem Nature of information to be made explicit Specific representation (can be programmed) Specific neural mechanism Computational problem

Computational theory pro- cesses and constraints

Specific algorithm (can be programmed) Specific neural mechanism Detailed psychophysics Detailed neurophysiology and neuroanatomy

Figure 2.15 The place of the implementational level within Marr’s overall theory. Note also the role he identifies for detailed experiments in psychophysics (the branch of psychology studying how perceptual systems react to different physical stimuli). (Adapted from Marr1982)

Summary

This chapter has continued our historical overview of key steps in the emergence and evolution of cognitive science. We have reviewed three case studies: Terry Winograd’s SHRDLU program for modeling natural language understanding; the explorations into the representational format of mental imagery inspired by the mental rotation experiments of Roger Shepard and others; and the multilevel analysis of the early visual system proposed by David Marr. Each of these represented a significant milestone in the emergence of cognitive science. In their very different ways they show how researchers brought together some of the basic tools discussed inChapter 1and applied them to try to understand specific cognitive capacities.

Checklist

Winograd’s SHRDLU

(1) SHRDLU is more sophisticated than a conversation-simulating chatterbot because it uses language to report on the environment and to plan action.

(2) SHRDLU illustrated how abstract grammatical rules might be represented in a cognitive system and integrated with other types of information about the environment.

Low Intermediate High V1 V1 V3 MT/V5 FST TE/LOC LIP MST V4/V8

Figure 2.16 An illustration of the hierarchical organization of the visual system, including which parts of the brain are likely responsible for processing different types of visual information. (From Prinz2012)

(3) The design of SHRDLU illustrates a common strategy in cognitive science, namely, analyzing a complex system by breaking it down into distinct components, each performing a circumscribed information-processing task.

(4) These information-processing tasks are implemented algorithmically (as illustrated by the flowcharts that Winograd used to explain SHRDLU’s different procedures).

The imagery debate

(1) The experiments that gave rise to the imagery debate forced cognitive scientists to become much more reflective about how they understand information and information processing.

(2) The imagery debate is not a debate about conscious experiences of mental imagery. It is about the information processing underlying those conscious experiences.

(3) The mental rotation and scanning experiments were taken by many cognitive scientists to show that some information processing involves operations on geometrically encoded representations. (4) The debate is about whether the different effects revealed by experiments on mental imagery can

or cannot be explained in terms of digital information-processing models. Marr’s theory of vision

(1) Marr identified three different levels for analyzing cognitive systems.

(2) His analysis of vision is a classic example of the top-down analysis of a cognitive system. The analysis is driven by a general characterization at the computational level of the information- processing task that the system is carrying out.

(3) This general analysis at the computational level is worked out in detail at the algorithmic level, where Marr explains how the information-processing task can be algorithmically carried out. (4) The bottom level of analysis explains how the algorithm is actually implemented. It is only at the

implementational level than neurobiological considerations come directly into the picture.

C H A P T E R T H R E E

In document Cognitive Science: An Introduction to the Science of Mind (Page 83-94)

Applying top-down analysis to the visual system

Summary

Checklist

Further reading

C H A P T E R T H R E E