Additional evidence for top-down control over stimulus-driven capture

1.2 Selective visual attention

1.2.5 Additional evidence for top-down control over stimulus-driven capture

selective attention using low-level tasks. These tasks generally consist of a search for a single item among similar distracters, allowing the initial allocation of attention to the search display to be measured. Such tasks do not provide evidence for how attention is controlled in more natural environments and they are unable to show how attention moves through the visual field (beyond orienting towards the most relevant or the most salient regions). Experiments using more naturalistic scenes are able to bridge this gap, revealing how attention moves through the visual field on the basis of

visual salience (bottom-up) and semantic information (top-down). These studies generally make use of pictures of natural scenes, or in some instances line drawings of natural scenes; participants are asked to search the scenes, and eye movements are measured. The critical issue is whether guidance of attention (and the eyes) is through low-level salience, or whether observers search on the basis of top-down knowledge about the scene4.

Early studies by Yarbus (1967) provided support for a strong influence of top- down control, revealing that scan paths (sequences of fixations) in natural scenes varied in accordance with varying task demands; in his experiments participants were altering their search patterns to meet the requirements of the task. Several studies which monitor eye movements in everyday tasks have also found evidence for top- down guidance. For example Hayhoe, Land, & Shrivastava (1999) and Shinoda, Hayhoe, & Shrivastava (2001) have found that individuals have very stereotypical search patterns when completing certain (habitually repeated) tasks (e.g., driving). This is indicative of a search strategy which is based on previous experience with the task (i.e., top-down information). They also claim that these strategies provide a possible solution to the ‘scheduling problem’ (Hayhoe, 2000; also known as the ‘initial access problem’ [Ullman, 1984]). The scheduling problem is as follows: how does an observer know where to look in a scene when they do not know what the scene contains before looking? Shinoda et al. (2001) propose that an observer will use pre-existing knowledge of the scene to look towards the region most likely to contain task-relevant information (i.e., which region was most relevant on previous

occasions?).

4_{Again, although there is an argument that the eyes can move independently from attention, there is}

good reason to believe that the following evidence represents overt shifts of attention. For example, in several cases there is evidence that attention moves before the eyes (e.g., Hayhoe, 2000; Land & Horwood, 1995) revealing that the eyes follow attention.

This fits well with the suggestion that attention is guided by ‘spatial priors’. Spatial priors are knowledge about the location of relevant information in a scene due to previous experience of the scene. For instance, Tatler (2007) has found that

observers have a ‘central fixation bias’ whereby they fixate to the centre of a scene initially before making an eye movement elsewhere. He posits that this may be due to the fact that the centre is an ideal place to begin a search, or that the centre is most informative, but suggests that it is independent of stimulus properties and task demands. Building on this account Einhäuser, Spain, and Perona (2008) hypothesise that spatial priors may be initially elicited by bottom-up influences (the first time one encounters a scene), but the subsequent application of spatial priors (on successive viewings) is completed top-down and is dependent upon the task demands.

The use of ‘stereotypical’ search patterns (e.g., Shinoda et al., 2001) is also consistent with the notion of ‘contextual cuing’ which shows that when observers see the same display on a number of occasions they learn the spatial configuration of the display, allowing them to search quicker in comparison to a novel display (Chun & Jiang, 1998). This means that individuals use previous knowledge about the spatial layout of the display to guide their attention (Chun, 2000) and this influences the pattern of visual search. Recently Brockmole and Henderson (2006) have found evidence for contextual cuing in natural scenes, reporting that participants locate a target letter in repeated scenes faster than in novel scenes, and revealing that pre- existing knowledge about the spatial configuration of the scene guides attention. Land and Hayhoe (2001) have developed an ‘orienting hypothesis’ which accounts for the influence of previous knowledge over patterns of visual search. They suggest that when first viewing a scene an observer will create an initial analysis, allowing them to represent the gist of the scene. This representation will then guide

attention and eye movements and can be updated over time as more information is derived from the scene. If, in the initial glimpse, the context is familiar, attention will be guided on the basis of previous experience. Castelhano and Henderson (2007) provide support for this hypothesis, showing that if participants are provided with a preview of a natural scene, prior to completing a search of the scene, they show a more efficient search (in comparison to when no preview is provided). This illustrates that observers learn something about the context of the scene from the preview, and this will facilitate subsequent search.

The importance of context in guiding attention and eye movements has also been demonstrated by Loftus and Mackworth (1978). They presented participants with line drawings of natural scenes and eye movements were recorded. An item was placed in each scene and this item could either be consistent with the scene (e.g., a tractor in a farmyard) or inconsistent with the scene (e.g., an octopus in a farmyard). They found that participants fixated inconsistent items earlier than consistent items, and suggested that observers will first take an initial analysis of the scene, after which attention is allocated to the inconsistent item because it does not fit with the semantic context of the scene. Further work in this area has yielded mixed results. For example in a similar study De Graef, Christiaens, and d’Ydewalle (1990) found no difference between fixations made to consistent and inconsistent items, and Hollingworth and Henderson (1998) found that participants fixated consistent items earlier than inconsistent items (experiment one). Henderson, Weeks, and Hollingworth (1999) subsequently developed a ‘saliency map framework’, proposing that covert attention will always initially be driven by salience, after which ‘cognitive’ factors take effect and the semantic context of the scene will guide attention.

In line with Henderson et al. (1999) other researchers also propose that attention is initially allocated on the basis of stimulus properties. Building on the model of attention developed by Koch and Ullman (1985), Itti and Koch (2000) have developed a saliency model which attempts to account for the control of selective attention in the visual field. Again they specify a ‘saliency map’ whereby the relative difference between regions in the visual field will be computed for several different features (they outline a total of 42 individual feature maps). Itti and Koch state that these 42 feature maps are then combined into three ‘conspicuity maps’ for intensity, orientation, and colour. Similar to the earlier model, a saliency map is created from these three conspicuity maps and attention is allocated to the most salient area of the scene. Once attention has shifted to this location, inhibition of return occurs (e.g., Posner, Cohen, & Rafal, 1982) to ensure that this area will not attract attention again, and attention moves to the next most salient region. Importantly, this model makes no account for top-down control over the initial guidance of attention, suggesting that in a natural scene visual search is initially based on the stimulus properties; the most salient items will always be fixated first. Evidence for saliency-driven search comes from a variety of sources (e.g., Baddeley & Tatler, 2006; Parkhurst & Niebur, 2003; Tatler, Baddeley, & Gilchrist, 2005). However this does not explain why other researchers have found that task demands modulate visual search patterns (e.g., Yarbus, 1967), and why the addition of ‘spatial priors’ in saliency maps provide a better prediction of fixation patterns than salience alone (e.g., Torralba, Oliva, Castelhano, & Henderson, 2006). As illustrated by this selection of literature, even in more applied research there is still no consensus regarding the control of selective attention.

In document The persistence of attentional set and its implications for top-down control (Page 37-42)