2 Chapter Visual attention and selection
2.2 Visual attention and selection
2.2.3 Spatial selection in visual attention
Spatial selection might be thought of as the means by which we select which objects in our visual environment we attend to as part of everyday tasks (e.g., finding a specific car in a crowded car park, or locating a set of keys). By this, we can
conceptualize a process that allows us to allocate attention to a specific item amongst the vast number of objects existing in the space around us. Note that, although this has been
67
phrased to suggest that top-down or intentional expectation/ experience- based processing is at work (e.g., Di Lollo, Kawahara, Zivic, & Visser, 2000; Folk,
Remington, & Johnston, 1992), this is not necessarily the case; properties of the objects evident in the environment can also control how our attention is deployed (i.e. bottom- up, stimulus-driven processing; e.g., Theeuwes, 1991, 1992, 1994; Abrams & Christ, 2003; Rauschenburger, 2003). This distinction will be explored in more detail below (see Section 2.2.8.1).
That said, suggesting that selection may be for purposes of awareness or
perception reminds us that it is rare to consciously recognize that we are not attending to every object visible in the real world. Indeed, reality may be far from this. For example, Duncan (1984) suggested that, although many objects may be processed prior to our attentional engagement (i.e. preattentively), effectively segmenting the visual field into objects on the basis of their low-level properties (i.e. spatial proximity, colour, motion), we are only able to allocate focal attention to one item at a time. Thus, the first question we might ask is: How does attention operate to select that item?
2.2.4 A metaphor for attention: The attentional spotlight
A straightforward metaphor for the operation of attentional selection is that of a spotlight (Eriksen & Hoffman, 1973), that can highlight one region of space, and apply attention only to those items falling within that spatial field. Originally, this ―spotlight‖ was understood to be of fixed size and resolution, without the flexibility to focus on one item within a perceptual group (Eriksen & Eriksen; 1974). However further research identified more flexibility within the mechanism; for example, the ability to focus in on
68
a single letter of a larger word (LeBerge, 1983; see also Eriksen & Yeh, 1985; Eriksen & St James, 1986).
LeBerge‘s findings (1983) also highlighted the ability to ―zoom‖ into a particular aspect of a stimulus (for example, the middle letter in a five letter word) and the impact this had on attentional resolution. In this instance, LeBerge found that if priority was given to a single letter, processing of letters outside that focus was impaired. In contrast, where focus was given to the entire word, all letters were processed equally effectively. In turn, this led to the comparison of the selection mechanism with a zoom-lens,
(Eriksen, 1990); the focus being adjusted according to task demands, and attentional resources being deployed according to that focus. Resolution was also related inversely to the width of the focus; i.e. when attentional focus was wide, resolution was low, and vice versa.
However, contemporary evidence suggested that the spotlight metaphor was over-simplistic. For example, when Neisser and Becklen (1975) superimposed two scenes in a display, they found that observers could preferentially attend to one scene over the other. This explicitly contradicts the attentional spotlight or zoom-lens, as in that case, the same spatial field would imply the same processing focus/ attentional resources.
Moreover, it might be said that the spotlight metaphor is suggestive of a serial mechanism, moving attention from one location to the next, in turn (see also Eriksen & Webb, 1989). However, other evidence indicates that when items can be grouped according to a shared perceptual feature, attentional selection can be made on this basis (i.e. colour, orientation, motion, texture or similarity; Treisman & Gelade, 1980;
69
Nakayama & Silverman, 1986; Duncan & Humphreys, 1989; Bravo & Blake, 1990). An alternative to a more sharply defined attentional spotlight was proposed by LeBerge & Brown (1989). This comprised the notion of a spatially-applied gradient of attentional resources, where resources were more highly concentrated in the centre (falling off towards the edges), and which could vary in size. LeBerge and colleagues (1989) also proposed that deployment of resources could also be the product of prior attentional activity; that is, resources can both accumulate or decay on a spatial basis (see also LeBerge, Carlson, Williams & Bunney, 1997). Thus, despite the simple intuitive appeal of the spotlight metaphor, understanding of selective attention appears to have
significantly outgrown its comparison with either a spotlight or zoom lens.
2.2.5 The visual search task
One method by which visual selection processes can be systematically evaluated (for example, where a particular target is selected from our environment), is the visual search task. From one perspective, this is a paradigm that is straightforward to
investigate (and manipulate) in laboratory-based studies. However, this task also encompasses the process by which we search for designated targets in our rich (and often, cluttered) visual environment, allowing us to investigate what specific elements of this environment make a particular search easy or difficult.
In these terms, it would be harder to identify a simpler, more real world example of the visual system in action. That said, the parameters of the search process per se, and the impact of associated processing (e.g., object representation, attention capture,
70
studies over the last 30 years. This provides an extensive literature by which to understand these basic mechanics of visual processing.
In essence, a typical visual search task might comprise detecting a particular target (for example, a blue vertical block) amongst a number of distractor items (for example, a group of green vertical blocks; but see Treisman & Gelade, 1980; for other examples of commonly-used stimuli).
a) Single feature search b) Conjunction search
Figure 2.1 Examples of a visual search task
a) A single feature search task, where the target is distinguished from the distractor set by a single unique feature (i.e. colour).
b) A conjunction search task, where the target is defined by the conjunction of two or more features it shares with the distractor set (i.e. colour and orientation).
71
This would present a relatively straightforward search - in fact, given that the target can be distinguished from the remainder of the array on the basis of a single unique feature (for more details regarding basic features, see e.g., Treisman & Gelade, 1980; Treisman & Gormican, 1988; Treisman & Souther, 1985; Nothdurft, 1993; Sagi & Julesz, 1987; Wolfe, 2003; see Wolfe, 1998; for a review) the target would be expected to effectively ―pop out‖ from the surrounding display (e.g., Treisman & Gelade, 1980; and see Figure 2.1 above). This presents a single feature search. Conversely, when the target is surrounded with distractors that, as a set, combine two or more of the target- defining features (or a conjunction of these low-level properties), the search becomes more difficult (see Treisman & Souther, 1985; but cf Wolfe, 1994). There is no longer the same sense of effortless target detection that one gets when a single feature makes our target distinct from the distractors around it (i.e. where the target ―pops out‖).
Further manipulations include varying the total number of items displayed in a trial, whether the target is present or absent in the display, or the nature of the distractors presented (e.g., their degree of similarity to the target, or heterogeneity as a set). And, in turn, any of these manipulations might result in a harder or easier search – which would then be evaluated according to how search performance varies under the influence of each manipulation. Several highly influential models have been proposed to account for the relative ease or difficulty of some search conditions compared with others. However, before these are outlined, it is necessary to review some of the parameters of
72
2.2.6 How can search be designated efficient or inefficient?
Figure 2.1 above may give an intuitive ―feel‖ for the relative ease of a particular search task; however, the parameters for efficient or inefficient search have been precisely defined. Whilst mean correct reaction times (RTs) may be used as a
straightforward performance indicator, regressing RT data against increasing set size allows the derivation of a search slope function (for example, x ms/item). This serves two purposes; firstly, it allows search performance to be represented as a unit of time taken to search through that specific search context, per each additional item added to the display (i.e. a numerical measure of search ease or difficulty, per se). Secondly, it gives a measure of search efficiency that is directly comparable between different search conditions (i.e. Smilek, Eastwood, & Merikle, 2000).
Display Size Mean RT (ms) Mean RT (ms) Display Size Target present Target absent Target present Target absent
a) Single feature search b) Conjunction search
Figure 2.2 Examples of typical search slopes for single feature and conjunction searches, with target present and target absent trials shown separately
73
Figure 2.2 above shows typical search slopes for feature and conjunction searches (i.e. relatively easy and difficult search tasks, respectively).
These would typically demonstrate different search slopes if depicted
individually (see Treisman & Gelade, 1980; for examples). Moreover, of particular note in Treisman & Gelade‘s seminal work (1980), whilst target absent and target present trials showed little or no overall difference in RTs (or search efficiency) in single feature search, a 2:1 ratio of search rates for target absent trials to target present trials was demonstrated in conjunction search. This is held to reflect the operation of a serial self- terminating search in conjunction search (i.e. a process that required exhaustive search through each item to verify whether it was the target, in the case of target absent trials).
Search slope functions with a value around 0 ms/item can be considered very efficient (i.e. RTs are independent of increasing set size), with values up to around 10 ms/item designated as efficient. Values between 20-30 ms/item are usually taken to indicate inefficient search, with those exceeding 30 ms/item suggesting very inefficient
search (Wolfe, 1998). In turn, this categorization feeds into other concepts important for exploring the search mechanism. For example, where RT is not related to increasing display size, it is generally held to indicate that search is on the basis of a perceptual feature that is available preattentively (e.g., Treisman & Gelade, 1980; Treisman & Gormican, 1988; Treisman & Souther, 1985; see also Wolfe, 1998; and cf. Wolfe & Horowitz, 2004). Such features are suggested to include orientation, colour, size and motion (e.g., Treisman, 1985; Wolfe, 1994; Tresiman & Gormican, 1988; Sagi & Julesz, 1987).
74
Preattentive mechanisms are believed to operate where stimuli (or their component features) are processed independently of the current focus of visuo-spatial attention, without capacity limitation. In addition, these are held to operate in parallel fashion (e.g., Treisman & Gelade, 1980; Duncan, 1984; Kahneman & Henik, 1981; Treisman, Kahneman, & Burkell, 1983; Neisser, 1967) outside conscious awareness- and generally, elicit efficient search, when utilized in a particular search task (but cf. Joseph, Chun & Nakayama, 1997; who demonstrated that attention was needed for the detection of even basic features). In contrast, where search performance (RT) is
positively related to increasing display size, this not only produces an inefficient search but also indicates the serial application of attention from one item to the next.
A subcategory of visual search methodology has been used frequently to establish whether a particular feature is processed preattentively, or as a diagnostic for preattentive processing of separable features (e.g., Treisman & Gormican, 1988; Treisman & Souther, 1985; Wolfe, 2001). The search asymmetry method evaluates search performance related to presentation of a given stimulus as a target (surrounded by another stimulus type, acting as a distractor set) and the reverse configuration. When the defining feature of an item is preattentively processed, it should result in target ―pop out‖ when presented thus, but serial search when presented as the distractor set.