4. Arguments for the Inclusion of Social Cues into Real-time Situated Language Processing Accounts
4.2 Language Processing Accounts and their Limitations
Up to this point no real-time language processing account has taken into account the impact that social factors (such as discussed in Section 3) or listener characteristics such as age might have on our language processing mechanisms. This is true for
constraint-based models of language processing (cf., MacDonald, 1994), for situation models (cf., Zwaan, 2014) for expectancy generation models (cf., McRae et al., 2005), for situated theories of cognition (cf., Myachykov, Scheepers, Fischer, &
Kessler, 2013), and for processing accounts of situated language comprehension (CIA, Knoeferle & Crocker, 2006, 2007). However, if language is a means to maintain social behavior (Semin, 1995), and the knowledge we have about social interaction is inseparably linked to language (Semin & Fiedler, 1992), what we need is an interface between, or an integration of, accounts on language processing and accounts on processing social information.
Theories of language comprehension and communication have contributed insights into how context affects language processing. The Communication Adaptation Theory (CAT) by Giles, Coupland, and Coupland (1991) focuses on the adaptation of interlocutors, e.g., through speech, vocal patterns, gestures, accents, but also through social norms and social situations. According to CAT, people converge and diverge with their communicative partners. The more the partners’ behavior and speech patterns resemble each other, the more they converge and sympathize.
Likewise, the linguistic style matching (LSM) that Niederhoffer and Pennebaker (2002) propose, argues that the use of words of a speaker unconsciously primes the response, and thus the use of words, of the listener. This unconscious priming implies, according to Niederhoffer and Pennebaker (2002), that conversation partners who match their linguistic styles organize their minds psychologically in a similar manner.
Even though CAT relies heavily on social factors and we unintentionally seem to match our linguistic styles when we interact with an interlocutor, little attention has gone to the effects of the visual context (e.g., the visual features hinting at social status, gender, race, emotional expression etc.) on language processing. Moreover, CAT and LMS rather focus on how communication works as a social phenomenon but much less on the underlying language processing mechanisms. The indexical hypothesis by Glenberg and Robertson (1999) already highlights the importance of experiential components (which imply social constructs, beliefs and experiences) and their indexical nature for language comprehension but do not make any claims regarding the time course of the integration of this kind of information into language processing.
Situation models (Zwaan, 2014) and expectancy generation models (e.g., McRae et al., 2005) and models of interactive alignment in dialog (Garrod & Pickering, 2009;
Pickering & Garrod, 2009) take the world knowledge, the situational (visual and non-visual, linguistic and non-linguistic) context, the expectancies we generate, our experiences, and the behavior of our interaction partner into account, when incrementally integrating and processing information. Also, subsequent input can change how we interpret previous input. Information processing in these latter accounts occurs on-line and can be updated at the same time (see also Anderson et al., 2011).
Although these approaches make use of rich social and situational cues, they miss out on assumptions about how language processing happens in detail, i.e., which information is processed when during language comprehension and how exactly these different cues affect the comprehension and interpretation of an utterance in real time.
However, a situation model as discussed by Zwaan (2014) might be a suitable higher-level environment to accommodate a more specific account on language processing like the CIA. This issue will be addressed further in Section 10.8.1.1.
Vice versa, most serial language comprehension accounts (e.g., Frazier & Fodor, 1978; Friederici, 2002) include little contextual information and delay context effects to relatively late processing ‘stages’. Other theories foreground a rapid interaction between syntactic and non-syntactic information sources (e.g., MacDonald, Pearlmutter, & Seidenberg, 1994; Trueswell & Tanenhaus, 1994). In these probabilistic theories the input is analyzed and ranked according to probabilistic constraints, which are competing with each other. However, constraint-based accounts only hold when a linguistic competitor is present. Extralinguistic information is not directly linked to the linguistic input and the interpretation of a sentence cannot be updated. Rather, these accounts model the competition between two linguistic interpretations and not the way interpretations are built incrementally.
Yet, one could argue that the interpretation of a sentence can be updated through the competing constraints. How much support a specific interpretation receives during sentence processing can be updated as a function of the constraints that become available at each word. Nevertheless, they do not explicitly include a rich non-linguistic context that could provide social information. Furthermore, even though they might be extended to include social and visual information, they do not directly model language comprehension in real-time. Thus a real-time language processing account arguably provides a more suitable basis for the enrichment with direct and indirect visual cues.
Recall that Tanenhaus et al. (1995) investigated the rapid interaction of available structural and visual constraints on language processing. Using eye tracking, they showed that people shift their gaze towards an object immediately after its mention, suggesting incrementality of comprehension. Additionally, the visual context disambiguated local structural ambiguity in the linguistic input. These results support a theory of language comprehension that combines extralinguistic and linguistic information in real-time comprehension (see also Novick, Thompson-Schill, &
Trueswell, 2008).
However, as mentioned, existing approaches to real-time language processing have not yet considered social cues. There are, however, accounts (Knoeferle &
Crocker, 2006, 2007) and computational models (Crocker, Knoeferle, & Mayberry, 2010) that include at least actions and events in addition to objects as visual contexts (but no indirect social cues of the kinds we discussed in Section 3.4). Visual cues can help listeners to rapidly disambiguate sentences and assign thematic roles (see also Altmann & Mirković, 2009). Still, all these approaches fail to include social aspects into the processing cycle.
4.3 The Importance of Including Social Cues in Real-time Language