3. The Use of Visual Context for Language Processing
3.2 Information Extraction from the Visual Context
Our visual-cognitive system controls our eye-movements and directs our gaze to informative regions in our visual field. One factor that is crucial for visual and cognitive processing is attention, as we fixate regions in a scene that draw our attention. Hence eye-movements provide a behavioral measure of the ongoing processes in our attentional system (Henderson, 2003).
However, sometimes we may miss visual changes in a scene, even if we inspected the scene attentively. In a flicker paradigm Rensink, O'Regan, and Clark (1997) asked participants to look at scenes while these scenes were repeatedly alternated with a modified version of the same scene. The modified scene contained a visual change and participants were asked to name the type of change and the part of the scene that was changing. Even though the changes in the scene were substantial, participants often failed to detect them or needed considerable time to spot the changes. However, performance on change detection increased significantly for salient scene information when the change in the scene happened while the changing object was the current
!
center of attention. Performance did not increase when the object was attended at an earlier point in time (Rensink et al., 1997) supporting the tight coupling between attention and vision.
Furthermore, attention and vision are highly task-driven processes in that task knowledge can be used to control and guide the eye-movements (Rensink et al., 1997). The cognitive goals of the observer depend on the task s / he is performing. The task in turn influences the observer’s eye-movement behavior, in that people for example fixate an object immediately before reaching for it (Salverda, Brown, & Tanenhaus, 2011).
Nevertheless, it is not only task-related knowledge that guides our visual attention. Our ability to control our gaze does not only rely on the present visual input. We can also access information from our short and long term memory of a scene. Additionally, we also use our world knowledge to determine which objects and things in our visual field are salient and relevant for us to look at. We can for example detect objects more quickly if they are consistent with the scene, i.e., we identify a kettle quicker in a kitchen than in a bathroom scene, because our stereotypical knowledge tells us that a kettle is more likely to be found in a kitchen than in a bathroom (Henderson, 2003; Palmer, 1975).
However, even though we draw on our world knowledge to control and guide our eye-movements, it seems that we still rely more on information that is actually depicted in a scene than on what we believe is stereotypical. In a visual-world eye- tracking study, Knoeferle and Crocker (2006) presented participants with agent-action clip-art scenes. Participants listened to sentences containing a verb that identified two agents: one was a stereotypical agent of the verb (e.g., a detective for spying). The other agent was not a stereotypical agent of the verb (e.g., a wizard) but was depicted as performing the action mentioned by the verb (spying). Thus, participants had to chose between two conflicting thematic role cues. The eye-movement data suggested that participants were able to use both cues but preferred only one of the cues when the two cues conflicted. In the conflicting condition they fixated the non-stereotypical agent performing the action mentioned in the sentence more than the stereotypical agent who performed a non-mentioned unrelated action. Participants hence relied on the depicted event information rather than on their stereotypical world knowledge in
!
assigning thematic roles. Thus, participants prioritized the depicted spying action over their world knowledge (Knoeferle & Crocker, 2006).
It seems thus that our notion of the visual context must be relatively broad. We use not just the information available from the immediate visual context, but also from past visual experiences. We draw on our world- and stereotypical knowledge and make extensive use of cognitive goals and task demands in directing our attention to objects and events in the environment (see also Knoeferle & Guerra, 2012 for more details on the notion of non-linguistic context in language comprehension).
Nevertheless, there seem to be differences between the various types of information in the visual context and in how and to what extent we exploit them during language comprehension. If we want to account for these different kinds of visual cues in language processing accounts, we need to be aware of their differences. The next Sections will hence discuss the use of two groups of visual contextual cues, namely direct visual cues (Section 3.3, see Section 3.3 for our definition of ‘direct cues’) and more indirect visual social cues (3.4, see Section 3.4 for our definition of ‘indirect cues’) and discuss how we can use them in language comprehension. Additionally, to date we do not yet know if these different visual cue types can be used to the same extent and with the same time course, or if they can rapidly interact with each other. Section 3.5 will hence discuss the cumulative use of different direct and indirect cues. As these Sections will show, not all visual cues are processed in the same way and can be processed by children and adults alike. Hence, following up on this, we will more explicitly argue that direct and indirect visual cues are processed differently (Section 3.6) and will then review and discuss this issue further in Section 3.7. Using emotions as an indirect social cue we will demonstrate that children and older adults but not younger adults prefer positive over negative emotional material. Thus Section 3.7 will focus on age differences in emotion recognition and interpretation.