• No results found

A Feature-Analysis Approach

Theories of Visual Object Recognition

Demonstration 2.2 A Feature-Analysis Approach

Eleanor Gibson proposed that letters differ from each other with respect to their distinctive features. The demonstration below includes an abbreviated version of a table she proposed. Notice that the table shows whether a letter of the alphabet contains any of the following features: four kinds of straight lines, a closed curve, an intersection of two lines, and symmetry. As you can see, the P and R share many features. However, W and O share only one feature. Compare the follow- ing pairs of letters to determine which distinctive features they share: A and B; E and F; X and Y; I and L.

Features A E F H I L V W X Y Z B C D G J O P R Q Straight horizontal + + + + + + + vertical + + + + + + + + + + diagonal/ + + + + + + diagonal\ + + + + + + + Closed Curve + + + + + + Intersection + + + + + + + + + Symmetry + + + + + + + + + + + +

Background On Visual Object Recognition 41 designed a model based on feature analysis that correctly recognized an impressive 95%

of the numbers written in street addresses and zip codes.

The feature-analysis theories are also compatible with evidence from neuroscience (Gordon, 2004; Palmer, 2002). As described in Chapter 1, the research team of Hubel and Wiesel used the single-cell recording technique to insert small wires into the visual cortex of anesthetized animals (Hubel, 1982; Hubel & Wiesel, 1965, 1979, 2005). Next, they presented a simple visual stimulus—such as a vertical bar of light—directly in front of each animal’s eyes. Hubel and Wiesel then recorded how a particular neuron responded to that visual stimulus. In this fashion, they tested how a variety of neurons in the primary visual cortex responded to visual stimuli.

Hubel and Wiesel’s results showed that each neuron responded especially vigor- ously when a bar was presented to a specific retinal region and when the bar had a particular orientation. For example, suppose that a bar of light is presented to a par- ticular location on the animal’s retina. One neuron might respond strongly when the bar has a vertical orientation. Another neuron, just a hairbreadth away within the visual cortex, might respond most vigorously when the bar is rotated about 10 degrees from the vertical.

Furthermore, one small patch of the primary visual cortex could contain a variety of neurons, some especially responsive to vertical lines, some to horizontal lines, and some to specific diagonal lines. In fact, the visual system contains feature detectors that are present when we are born (Gordon, 2004). These detectors help us recognize cer- tain features of letters and simple patterns.

However, the feature-analysis approach has several problems. First, a theory of object recognition should not simply list the features contained in a stimulus; it must also describe the physical relationship among those features (Groome, 1999). For example, in the letter T, the vertical line supports the horizontal line. In contrast, the letter L consists of a vertical line resting at the side of the horizontal line.

In addition, bear in mind that the feature-analysis theories were constructed to explain the relatively simple recognition of letters. In contrast, the shapes that occur in nature are much more complex (Kersten et al., 2004). How can you recognize a horse? Do you analyze the stimulus into features such as its mane, its head, and its hooves? Wouldn’t any important perceptual features be distorted as soon as the horse moved— or as soon as you moved? Horses and other objects in our environment contain far too many lines and curved segments, and the task is far more complicated than letter recog- nition (Palmer, 2003; Vecera, 1998). The final approach to object recognition, which we discuss next, specifically addresses how people recognize these more complex kinds of stimuli found in everyday life.

The Recognition-by-Components Theory. Irving Biederman and his colleagues have developed a theory to explain how humans recognize three-dimensional shapes (Biederman, 1990, 1995; Hayworth & Biederman, 2006; Kayaert et al., 2003). The basic assumption of their recognition-by-components theory (also called the structural theory) is that a specific view of an object can be represented as an arrangement of simple 3-D shapes called geons. Just as the letters of the alphabet can be combined into words, geons can be combined to form meaningful objects.

You can see five of the proposed geons in Part A of Figure 2.5. Part B of this figure shows six of the objects that can be constructed from the geons. As you know, letters of the alphabet can be combined to form words with different meanings, depending upon the specific arrangements of the letters. For example, no has a different meaning from on. Similarly, geons 3 and 5 from Figure 2.5 can be combined to form different meaningful objects. A cup is different from a pail, and the recognition-by-components theory empha- sizes the specific way in which these two geons are combined.

In general, an arrangement of three geons gives people enough information to classify an object. Notice, then, that Biederman’s recognition-by-components theory is essentially a feature-analysis theory for the recognition of three-dimensional objects.

Biederman and his colleagues have conducted fMRI research with humans and single-cell recording studies with monkeys. Their findings show that areas of the cor- tex beyond the primary visual cortex respond to geons like those in Figure 2.5A (Hayworth & Biederman, 2006; Kayaert et al., 2003).

However, the recognition-by-components theory requires an important modifi- cation because people recognize objects more quickly when those objects are seen from FIGURE 2.5

Five of the Basic Geons (A) and Representative Objects that can be Constructed from the Geons (B).

1 3 4 5 1 3 5 3 2 5 2 3 5 5 3 4 3 A B 2 3 Source: Biederman, 1990.

Section Summary: Background on Visual Object Recognition 43 a standard viewpoint, rather than a much different viewpoint (Friedman et al., 2005; Graf

et al., 2005; O’Reilly & Munakata, 2000). Notice, for instance, how your own hand would be somewhat difficult to recognize if you look at it from an unusual perspective. One modification of the recognition-by-components theory is called the viewer- centered approach; this approach proposes that we store a small number of views of three-dimensional objects, rather than just one view (Mather, 2006). Suppose that we see an object from an unusual angle, and this object does not match any object shape we have stored in memory. We must then mentally rotate the image of that object until it matches one of the views that is stored in memory (Dickinson, 1999; Tarr & Vuong, 2002; Vecera, 1998). This mental rotation may require a second or two, and we may not even recognize the object. (Chapter 7 discusses mental rotation in more detail.)

At present, both the feature-analysis theory and the recognition-by-components theory (modified to include the viewer-centered approach) can explain some portion of our remarkable skill in recognizing objects. In addition, researchers must explore whether these theories can account for our ability to recognize objects that are more complicated than isolated cups and pails. For example, how were you able to immedi- ately identify numerous complex objects in the scene you viewed on your television screen in Demonstration 2.1? The theoretical explanations will become more detailed, as researchers continue to explore how we recognize real-world objects and scenes, using increasingly sophisticated research methods (Gordon, 2004; Henderson, 2005; Hollingworth, 2006a, 2006b; Tarr & Vuong, 2002).

Section Summary:

Background on Visual Object Recognition

1. Perception uses previous knowledge to gather and interpret the stimuli regis- tered by the senses; in object recognition, we identify a complex arrangement of sensory stimuli.

2. Visual information from the retina is transmitted to the primary visual cortex; other regions of the cortex are active when we recognize complex objects. 3. According to gestalt principles, people tend to organize their perceptions,

even when they encounter ambiguous figure-ground stimuli and even in illusory-contour stimuli, when no boundary actually separates the figure from the background.

4. Researchers have proposed several theories of object recognition. The oldest of these, the template-matching theory, can be rejected because it cannot account for the complexity and flexibility of object recognition.

5. Feature-analysis theory is supported by research showing that people require more time to make decisions about letters of the alphabet when those letters share many critical features. This theory is also supported by neuroscience research using the single-cell recording technique.

6. The recognition-by-components theory argues that objects are represented in terms of an arrangement of simple 3-D shapes called geons. Furthermore, according to the viewer-centered approach, we also store several alternate views of these 3-D shapes, as viewed from different angles.