• No results found

The role of top-down and bottom-up factors in control of visual selection

CHAPTER 1. GENERAL INTRODUCTION

1.1. BRIEF INTRODUCTION TO MULTISENSORY INTEGRATION AND VISUAL

1.1.2. Visual attention

1.1.2.4. The role of top-down and bottom-up factors in control of visual selection

While the distinction between exogenous, stimulus-driven and endogenous, goal-driven visual selection seems to be conceptually clear-cut, the relative importance of bottom-up and top-down factors in the control of attentional shifts has become a point of a major debate in the attentional research in the past twenty years (Beck & Kastner, 2009; Desimone &

Duncan, 1995).

According to some researchers (Theeuwes, Atchley, & Kramer, 2000; Theeuwes, 2010), local differences in contrast among neighbouring stimuli in the visual field always determine which location is going to be selected first, irrespective of whether the identity of the current target is known or not. Theeuwes (1991) created a derivative of the visual search task, also known as the ‘additional singleton paradigm’, in which he demonstrated that search for a known target (a shape singleton, e.g., unique diamond) among uniformly shaped distractors (e.g., circles) will be delayed if an irrelevant but salient feature-singleton

distractor (e.g., a red circle) is present in the search array. Theeuwes (1991) regarded these results as evidence that attention will always be initially captured by the most salient item in the visual field. However, these findings contrasted sharply with the results of Folk and colleagues (Folk, Remington, & Johnston, 1992) who showed that attentional selection of items in the visual field is determined by current behavioural goals (i.e., the ‘task set’). Folk and colleagues designed the spatial cueing paradigm, in which search display was preceded on each trial by a cue array. In this paradigm, the ability of stimuli to involuntarily attract attention is measured through spatial cueing effects, i.e., shorter RTs on trials in which the target location is cued compared to trials in which targets appear at other, uncued locations (cf., Theeuwes, 1991). Folk et al. (1992) demonstrated that salient feature singletons, e.g., red cues, triggered reliable cueing effects as long as the targets were defined by colour, but not when they were defined as abrupt onsets. These results led Folk and colleagues to propose the so-called ‘task-set contingent involuntary orienting’ hypothesis, according to which salient distractors will capture attention only in cases where they are task-relevant, i.e., if they share features with the target.

Various solutions were proposed to reconcile these apparently contradictory results.

On the one hand, it was argued that delayed RTs in the additional singleton paradigm are caused by the additional time required by filtering mechanisms that are necessary to perform the search efficiently, but which are not connected to actual shifts of attention (Folk &

Remington, 1998). On the other hand, Theeuwes et al. (2000) proposed that the most salient

item in the display will always be selected first but that attention can be rapidly disengaged when the item is identified as a nontarget, leading to no visible cueing effects, as shown by Folk and colleagues (1992, 1994). The most plausible theoretical solution was offered by Bacon and Egeth (Bacon & Egeth, 1994), who argued that two different ‘search modes’

were encouraged by demands of the additional singleton paradigm and the spatial cueing task. In the Theeuwes (1991) study, the target could be effectively detected merely by monitoring the display for an ‘odd one out’ (‘singleton-detection mode’, SDM). The costs of attentional capture by the singleton distractor were likely to be minimal as the search array remained presented until response, thus allowing participants to reorient to the feature singleton target.

Bacon and Egeth (1994) argued that the low demands that SDM places on the cognitive system render it a ‘default’ search strategy. In contrast, in the study of Folk et al.

(1992), the cue preceded the target by 150 ms and the target appeared only for 50 ms, with no time to inspect the target in case of attentional capture by a salient distractor. In this context, the low cognitive demands associated with SDM would be outweighed by substantial costs in search performance, thus forcing observers to search for a specific feature in a pre-defined dimension, e.g., all red targets (‘feature-search mode’, FSM). In support of this account, Bacon and Egeth (1994, Experiment 2) showed absence of RT costs during search for a target diamond in the presence of a colour singleton, in a version of the Theeuwes’ (1991) search task, in which other uniquely shaped distractors (e.g., triangles, squares) were present in the array, which rendered SDM ineffective. In other words, task demands, not local contrast differences, determined whether salient distractors captured attention, in line with the primary role of a goal-based mechanism in attentional control (but see Dalton & Lavie, 2004, for task-set independent capture of auditory attention by within-modal feature singletons). Recently, this hierarchy between top-down and bottom-up factors in the control of visual attention shifts has been further supported by studies which

manipulated these factors in a systematic manner (Eimer, Kiss, Press, & Sauter, 2009; Eimer

& Kiss, 2008, 2010; Lamy & Egeth, 2003; Lamy, Leber, & Egeth, 2004; Lien, Ruthruff, Goodin, & Remington, 2008): Converging behavioural and electrophysiological evidence demonstrated that the role of stimulus-driven factors in the control of visual selection is at best indirect or secondary (see Section 1.4.3.3, for details).

The plethora of findings indicating prevalence of top-down mechanisms in the control of visual selection was integrated in the ‘biased competition model’ of visual attention (BCM; Desimone & Duncan, 1995; Duncan, Humphreys, & Ward, 1997). The novel perspective on visual selective attention that this model proposed was based on an

older idea that processing in most visual brain systems is based on competition. On the basis of neurophysiological and behavioural findings (e.g., Bundesen, Shibuya, & Larsen, 1985;

Chelazzi, Miller, Duncan, & Desimone, 1993; Duncan, 1984), Desimone and Duncan (1995) proposed that in multi-stimulus contexts, stimuli compete with each other for control over the receptive fields of neurons that lead to representation of a specific stimulus in the visual cortex. Competition is already reflected by the overall lower number of spikes triggered in a neuron in response to displays with two stimuli, compared to each stimulus presented alone, suggesting the presence of inhibitory interactions between two stimuli.

Furthermore, competition occurs at any level of cortical processing at which several stimuli fall within the receptive field of a single neuron, i.e., competitive interactions are stronger at higher levels of cortical hierarchy (e.g., the inferotemporal cortex and posterior parietal cortex), as neurons at these stages have the largest receptive fields, often encompassing both left and right hemifields.

The critical tenet of BCM is that competition is typically modulated by factors concerned with the observer’s goals, but also with bottom-up salience of stimuli in the external environment. In other words, there are numerous neural mechanisms (see Section 1.2.2) that favour stimuli that match current task requirements (i.e., flexible change of medium of selection between objects, locations and features), but competition is also biased towards stimuli that are novel or highly different from their surroundings. In multi-stimulus contexts in which one or both mechanisms give a competitive advantage to one of the objects (e.g., in ‘pop-out’ displays in which only targets of pre-defined identity are presented), the competitive interactions are resolved in favour of this object, which is reflected by firing rates approximating the sum of firing rates to the stimuli presented alone.

BCM also proposes that the resolution of competition between multiple stimuli is frequently integrated across different levels of cortical hierarchy, therefore, an object in favour of which the competition was resolved at the level of perceptual processing, should also be the one that ‘wins’ the competition at the stage at which stimuli compete to be encoded into short-term memory or control shifts of visuo-spatial attention (covert or overt). The presence of biased competition in multi-stimulus arrays and the integration of competition resolution across neural systems have been substantiated by fMRI studies (Beck & Kastner, 2005, 2007; Kastner et al., 1998; Reynolds & Desimone, 2003) and recent ERP studies (e.g., Kiss, Grubert, & Eimer, 2013). The critical novel contribution of BCM into the understanding of the nature of selective attention is that it defines selective attention not as a ‘spotlight’ that moves across distinctive locations in empty space to facilitate perception or bind separate features, but as an emergent state of resolved competition across neural networks involved

in perceptual processing and behavioural control (see Beck & Kastner, 2009, for a related view on the nature of attention within the BCM).

In summary, early research on visual attention demonstrated that attention facilitates perception and behaviour through an interaction of two separate but interlinked attentional systems: the endogenous attention system that enables us to focus on stimuli relevant to our current behavioural goals, and the exogenous attention system that diverts our attention to potentially important events in the periphery. More recent findings have highlighted the flexibility of goal-based control mechanisms in guiding attention towards goal-related information on the basis of locations (both in space and time), objects or features. These discoveries have jointly paved the way for investigations into how attention operates in real-life environments, where multiple objects compete for selection at any point in time, and many of these objects are defined across different modalities.

1.2 Neural substrates of multisensory integration and