4. Grouping by Good Gestalt
5.3 Primed Flanker Task: Scope and Implications
5.3.4 Lessons from Hardwired Binding: Feedforward Pathways and Context
With respect to the recognition of natural and familiar objects (or scenes), VanRullen (2009) proposed a distinction between two possible processing modes that has striking similarity to that between base and incremental grouping. The author distinguishes between two types of binding (i.e., of the process that is part of perceptual grouping and determines which particular elements of the visual field will be perceived together): (1) hardwired binding of frequently encountered, natural objects that is feedforward and not relying on attention, and (2) on-demand binding for more arbitrary or meaningless feature conjunctions that depends on feedback and is mediated by attention (VanRullen, 2009). Thus, hardwired and on-demand binding on one side and base and incremental grouping on the other side are different conceptualizations of the same basic processes in visual cognition. Combining the insights from both these theoretical frameworks can contribute to the understanding of the flexible nature of perceptual grouping.
First, the incremental grouping theory states that base grouping is implemented by the activation of cardinal cells or multi-feature detectors but does not take a strong position on how this activation is achieved. There are different notions about the type of neural coding by which the information is transmitted in visual processing hierarchies. However, because the first feedforward wave is based on neurons that fire at most a single spike (i.e., an action potential) before the next level of the hierarchy is activated (Lamme & Roelfsema, 2000), rapid processing can for example not be based on coding of neuronal firing rates (cf. Gautrais & Thorpe, 1998). In the context of hardwired binding, different modes of neural coding were compared in modeling studies in artificial networks. Interestingly, it was shown that most of the stimulus-relevant information could be extracted from the temporal distribution of the very first spikes in the feedforward wavefront (e.g., Gollisch & Meister, 2008; Guyonneau, VanRullen, & Thorpe, 2004; Serre, Oliva, & Poggio, 2007) and that this spike-timing-dependent plasticity (STDP) can be shaped by learning processes (Guyonneau, VanRullen, & Thorpe, 2005; Masquelier, Guyonneau, & Thorpe, 2009). This richness in information in STDP and its susceptibility to perceptual learning makes it a likely candidate for the neural coding involved in base grouping.
Second, the incremental grouping theory does not detail the characteristics of the feedforward pathways that are activated in base grouping. For example, Schmidt and Schmidt (2009) presented natural images of animals and objects in a response priming paradigm. In one task, participants were instructed to identify the target image containing an animal (or object). In another task, participants were instructed to identify the target containing a small (or large) animal/object. Only in the first task, the time course of the priming effects in pointing responses was in accordance with a feedforward processing account (Schmidt & Schmidt, 2009). This suggests that the multi-feature detectors that categorize animals and objects are part of a feedforward visuomotor processing pathway but those that categorize large and small are not. The research on processing of natural images in categorization tasks offers some background for this finding (for a review see Fabre-Thorpe, 2011). Generally, stimuli can be categorized on a superordinate level (an animal, a vehicle), a basic level (a bird, a farm vehicle), or on a subordinate level (a robin, a tractor). Interestingly, research on the rapid categorization of natural images (i.e., on hardwired binding) shows that superordinate object or scene categories are rapidly available, suggesting feedforward processing, and are faster than the more detailed basic representations (Grill-Spector & Kanwisher, 2005; Macé, Joubert, Nespoulous, & Fabre-Thorpe, 2009). At the same time, basic representations can be activated without the need of focused attention (Poncet, Reddy, & Fabre-Thorpe, 2012), suggesting that they are also coded as base groupings but using feedforward pathways that are slower than that for superordinate categorizations. However, on
the subordinate level, feedforward processing is normally not found any more.25 In
other words, while base grouping can rapidly group together all stimuli that are identified by specialized multi-feature detectors (i.e., a robin or a tractor), the speed of their visuomotor processing depends on the task: Rapid motor responses only occur in those tasks where base grouping meets an established visuomotor feedforward response pathway (cf. Haberkamp, Schmidt, & Schmidt, 2013).
Indeed, it is likely that superordinate categorizations are so fast because they can be based on the coarse magnocellular information of the fast ventral pathway (Fei-Fei, Iyer, Koch, & Perona, 2007).26 Because the resulting stimulus
representations are relatively coarse, the knowledge about the object is critical. The more detailed and substantial it is, the faster the categorizations. In neurophysiological terms, this might be achieved by facilitation of low-level feature grouping by feedback projections from higher, object-selective visual cortex (Jeurissen, Self, & Roelfsema, 2013). The critical defining object features that enable rapid categorizations can be discovered experimentally: the performance in categorization tasks with animals in natural images depends on whether the animal is in a canonical posture, on its relative size within the image, and on the presence of diagnostic animal features (Delorme, Richard, & Fabre-Thorpe, 2010). In terms of the incremental grouping theory, that means that a task has to rely on incremental grouping given two conditions. First, it has to depend on successful parsing of natural images into different objects (by combining low-level and high-level features that belong to one perceptual object, Korjoukov et al., 2012). Second, it cannot be solved by a correspondence of existing object knowledge and coarse stimulus representations (Korjoukov et al., 2012). With respect to the findings by Schmidt and Schmidt (2009) this suggests that the categorization of large vs. small does rely on high-level information (e.g., because the decision boundary is inherently relative) that is not part of a specific object template and thus not part of a feedforward processing pathway.27
Another relevant finding with respect to the processing of natural images is the dependency of even the earliest motor responses on high-level context. Joubert, Fize,
25
However, these effects are also subject to influences of perceptual learning. For example, car experts show an early availability of subordinate categorizations of cars (Curby & Gauthier, 2009). From this follows that even categorizations on subordinate levels can be implemented by feedforward processing when the respective pathways are established by experience.
26 Though note that these categorizations are not based on the analysis of simple image statistics (i.e.,
the relative amount of high spatial-frequency energy in the vertical and horizontal orientation, Wichmann, Drewes, Roas, & Gegenfurtner, 2010).
27 Interestingly, Bacon-Macé, Kirchner, Fabre-Thorpe, and Thorpe (2007) found that the earliest phase
of motor responses do not depend on whether participants answer to natural images in a categorization task or in a discrimination task (i.e., deciding which of two simultaneously presented images contains an animal). This indicates that both tasks are based on feedforward processing.
Rousselet, and Fabre-Thorpe (2008) asked their participants in a categorization task to respond to the presence of animals in natural (congruent) or urban (incongruent) contexts. They found that even in the earliest phase of the behavioral responses, performance depended on context congruency. The authors argue for a model in which the neuronal populations of the ventral stream that respond selectively to animals co-activate other neuronal populations. Specifically, these other populations are those that are activated by stimuli usually occurring together with animals in our visual environment (i.e., neurons specific for natural contexts). This co-activation should facilitate responses. In contrast, when an animal is presented within an incongruent urban context, the respective neuronal populations of animal and context would compete for the motor response. This would result in an impediment of responses. This conflict might be present all along the visual stream due to bidirectional interactions between neuronal populations and could thus be part of the first feedforward phase of processing (Joubert et al., 2008). In terms of the incremental grouping theory this would imply that base grouping does not only depend on the object and the low-level features of its context but also on the information on a class of contexts irrespective of their different low-level features (e.g., the class of natural contexts).28
In sum, feedforward transmission of visuomotor responses does not only depend on base vs. incremental grouping of the respective stimuli but also on the available processing pathways. The type of neural coding in base grouping feedforward pathways is most probably temporal rather than rate coding (i.e., based on the temporal distribution of the first spikes in the wavefront and not on firing rates). Finally, incremental grouping and base grouping are not dichotomous classes of grouping but the high-ends of one continuum that can be transformed into another by processes of perceptual learning as well as stimulus and context factors.
Based on the latter insight that base and incremental grouping are located on a continuum, I discuss why the primed flanker task is particularly suited to study perceptual grouping in the framework of the incremental grouping theory.