• No results found

The Interplay between Language Processing and Non-linguistic Visual Context

4. Arguments for the Inclusion of Social Cues into Real-time Situated Language Processing Accounts

4.1 The Interplay between Language Processing and Non-linguistic Visual Context

As Section 2 has shown, the incoming linguistic input is integrated and processed incrementally and in real-time. It can influence the processing of, and be influenced by, these direct and indirect visual cues. A reason for this is that in situated language processing our eye-movements do systematically yet not directly reflect language processing (Huettig, Romners, & Meyer, 2011). In interpreting language-mediated eye-movements, we assume that they are guided by (active) mental representations (Anderson et al., 2011). Hence, eye-movements indicate a shift in attention towards the to be fixated object (Rayner, McConkie, & Ehrlich, 1978; Tanenhaus, Spivey- Knowlton, Eberhard, & Sedivy, 1995). Moreover, as language comprehension is, at least for adults, a highly routine skill, the eye-movements mediated by language arguably reflect different degrees of automaticity, depending on task demands, the situational context and the incoming linguistic information (Mishra, Olivers, & Huettig, 2013). For adults, even in complex visual scenes it only takes a few hundred milliseconds to launch an eye movement to a depicted referent after the linguistic input has been processed (Allopenna, Magnuson, & Tanenhaus, 1998; Andersson et al., 2011). In turn, the visual context also rapidly affects language processing and comprehension (e.g., Knoeferle et al., 2005; Knoeferle et al., 2008).

Recall that Knoeferle et al. (2005) for example showed that visually depicted event relations realized through depicted actions between agent and patient characters have an influence on how linguistic input is processed and interpreted on-line. In this case eye-movements towards objects occur before the linguistic input has referred to these objects, i.e., anticipatory eye-movements can be observed. Recall also that Knoeferle and Crocker (2006) demonstrated that not only visual and linguistic information is used to form an event interpretation but that moreover also real-world knowledge plays an important role. We hence generate expectations given our world knowledge on the basis of visual and linguistic contextual information and shift our attention to (visual) objects that are in line with these expectancies (Chambers, Tanenhaus, & Magnuson, 2004; McRae et al., 2005).

Moreover, not only adults as language experts, but also children and even very young infants direct their eye-movements in response to linguistic and visual input (e.g., Bergelson & Swingley, 2012). Children’s eye movement behavior resembles that of adults’ in that they launch an eye-movement to the corresponding visual target immediately (about 150-200 ms, Allopenna et al., 1998; Tanenhaus et al., 1995) after hearing the visual referent’s name (Snedeker & Trueswell, 2004; Trueswell et al., 1999). Recall also that they can moreover already predict upcoming linguistic input given a constraining visual display, e.g., looking at a visual display of one edible object among distractor objects upon hearing The boy will eat the… elicits anticipatory eye-movements at the verb region towards the edible object before this object is mentioned (e.g., Mani & Huettig, 2012; Nation et al., 2003).

That the processing of language and visual context information is tightly coupled in both adults and children is also supported by ERP evidence. Showing adults and 19-months old infants pictures of objects that were either named congruently or incongruently, Friedrich and Friederici's (2004) results indicated a similar N400 effect for incongruous (vs. congruous) object names in infants compared to adults (however, topography and latency differed). Hence, already 19-months old infants show the beginnings of adult-like mechanisms regarding semantic interpretation.

However, simply measuring people’s eye-movements while they inspect a given scene and listen to language does not allow us to draw conclusions about the underlying nature of language processing. In order to do so, we need to specify a so- called linking hypothesis that describes the relationship between eye-movements and linguistic processes in more detail, i.e., we set up an assumption about the relationship between our eye-tracking data and the underlying cognitive processes (Knoeferle, 2015a) depending on the type of hypothesis we want to investigate. Our interrogation of the visual environment is highly sensitive to pragmatic, semantic, and referential processes, and even to the interpretation of abstract language. As a result, eye gaze can provide a very useful tool for investigating visually situated language comprehension (Knoeferle, 2015a). Making use of such a linking hypothesis should therefore answer questions about when, how and why a shift in visual attention towards visual contextual information occurs and what the nature of its relationship to the unfolding linguistic input is (Salverda et al., 2011).

Nevertheless, investigating how and when language processing (of different linguistic structures), and the visual context (i.e., the perception of different visual

contextual cues) interact and influence each other in real time only provides the basis for developing a real-time language processing account that will eventually model how language comprehension functions.

Even though many psycholinguistic processing accounts, models and frameworks have been developed over the past three decades, not all of them consider language processing in real time. Moreover, even those that look at on-line language processing are mostly language-centric (see Knoeferle, 2015a for a related argument) or only include the influence and impact of direct referential non-linguistic visual cues.

However, as Section 3 has argued, visual contextual cues are highly diverse and do not just comprise visual contextual information such as depicted characters and actions. Even though we are aware of the impact that our social environment and the indirect social visual cues (such as emotional facial expressions) have on the way we process linguistic input, to date, real-time language processing accounts remain largely elusive regarding the integration of these cues.

This is however essential when the goal is to develop an account that can explain how language processing with all its associated real-world influences happens in real time in the mind of the comprehender. One way this could be achieved will be exemplified in Section 10.8.1. Using our own findings, we will demonstrate how the CIA could be extended to also include more indirect visual social cues, as well as listener characteristics such as age (Knoeferle & Crocker, 2006; 2007). The following Section (4.2) will therefore present the most prominent language processing accounts, frameworks and models and point out their limitations. We will further argue for an adaptation of real-time language processing accounts regarding the integration of social (visual) information (4.3) based not only on the findings discussed in Section 3, but also from a developmental angle (4.3.1). Following our argument that language processing and social information are tightly linked from the beginning of our lives, we will discuss the CIA as a suitable candidate to be enriched with the influence of social cues and listener characteristics (4.4).