Visual Cues - Exploiting referential gaze for uncertainty reduction in situated language proces

without the visual stimuli, participants heard sentences including the target referent (In 1969 Neil Armstrong was the first man to set foot on the moon), its shape competitor (tomato), or an unrelated distractor (rice). They found that the N400 amplitude was significantly weaker for the semantic violation including the referent of similar shape than the unrelated distractor. The authors argue that predictive language processing is not limited to functional semantic information, but can also involve the activation of visual representations.

2.2 Visual Cues

2.2.1 The Gaze Cue

Eye gaze is a constantly available source of information that is frequently used to understand another person’s intentions and emotions (see Baron-Cohen, Wheelwright, & Jolliffe, 1997). It is a very expressive social cue, commonly interpreted as such even without being intention- ally used as a communicative tool. Also, it is unique as a communication cue because it is difficult to suppress and fake its expression, unlike a smile, for instance. Some aspects of its use and interpretation are culture-dependent, such as establishing mutual gaze, reading emotions from one’s eyes, and the use of mutual gaze for turn-taking in communicative contexts (see e.g. LaFrance & Mayo, 1976; Greenbaum, 1985; Schofield, Parke, Castañeda, & Coltrane, 2008).

A study comparing American and Japanese participants found that in an attempt to interpret emotions, the Americans focused on the person’s mouth, while the Japanese focused more on the eyes (Yuki, Maddux, & Masuda, 2007). The authors argue that this is due to cultural differences, namely, whether emotions are more readily expressed verbally, or ought to be suppressed and are understood only from subtle spontaneous involuntary signals, such as those made by the eyes.

Moreover, mutual gaze depends on the relationship between the interlocutors, in terms of intimacy and power relations. Closer physical proximity has been shown to accompany less mutual gaze, which seems to reflect higher intimacy (Argyle & Dean, 1965). Also, power relations are an important predictor of the amount of mutual gaze. The more people were looking at their interlocutor while speaking and less while listening, the higher the social power they were seen as having (Dovidio & Ellyson, 1982).

The use of mutual gaze for turn-taking in face-to-face communication is commonly such that the speaker tends to look away when planning and uttering longer utterances, but reestablishes eye-contact when approaching the end of her turn, and wishing to give the listener the opportunity to speak (Kendon, 1967). Planning an utterance requires effort;

18 Sentence Processing and Visual Cues

hence, by looking away the speaker ignores the current informational input and concentrates on the output she is trying to produce. Therefore, listeners are more likely to start talking after a mutual gaze with the speaker. However, other cues are also used to establish the appropriate time to start talking, such as intonation, voice pitch and gestures (Duncan, 1972).

Importantly for our present purpose, eye-gaze is often synchronized with language production. While speaking about objects present in their visual context, speakers tend to look at a relevant object before mentioning it. Object-directed, that is, referential speaker gaze usually appears 800–1000 ms prior to uttering the linguistic reference (Meyer et al., 1998; Griffin & Bock, 2000). Meyer et al. (1998) found that people look longer at objects that require low frequency words to be labeled. The authors argued that objects are inspected not only until they are identified, but until their phonological form is retrieved. Also, objects of low codability (having multiple similarly dominant labels) are found to be inspected longer before being named (Griffin, 2001).

Listeners tend to reflexively attend to the direction of the speaker’s visual attention. The understanding, from the side of the listener, that looking at an object means seeing it, and is thus directly related to visual attention, can lead to joint attention – the listener actively attending to the same thing that the speaker is attending to. If aware of this, the speaker may choose to use their eye gaze in order to subtly point towards a certain direction, as a communicative act in relation to the listener – shared attention (Emery, 2000). The ability to engage in joint and shared attention is argued to have evolutionary relevance (Baron- Cohen, Baldwin, & Crowson, 1995; Flom et al., 2007; Emery, 2000). If we understand that looking at something means perceiving it, what is relevant for our “neighbor” is potentially of importance to us too, be it food, or danger of some kind.

Moreover, listeners tend to use the direction of speaker gaze to identify the target referent prior to its explicit mention, thereby facilitating reference resolution. In their seminal study, Hanna and Brennan (2007) found that listeners utilized the direction of speaker gaze even in a condition of mirrored visual displays, where the speaker’s gaze cue towards left, for them, actually meant that the relevant object was to be found on their right side. In general, findings about gaze-following suggest that listeners tend to have an initial reflexive response, after which the intention is being read from the gaze (Friesen & Kingstone, 1998; Driver et al., 1999; Langton & Bruce, 1999).

A group of studies have shown that gaze-following aids sentence processing. Aiming for a fully controlled environment, Staudte and colleagues conducted a series of experiments where they utilized robots and virtual agents as speaker figures and examined how listeners interpret referential gaze coupled with speech (Staudte & Crocker, 2011; Staudte et al., 2014). Listeners engaged in gaze-following and used the direction of robot gaze to anticipate the

2.2 Visual Cues 19

upcoming referent. As measured by response times at sentence validation, congruent robot gaze facilitated comprehension (relative to neutral gaze), while incongruent gaze disrupted it (Staudte & Crocker, 2011).

Moreover, Knoeferle and Kreysa (2012) found evidence that gaze-following is robust to observing the speaker from an angle (rather than frontally). The facilitatory effect of gaze varied depending on the syntactic structure and thematic role relations in the sentence (SVO vs. SOV), and it extended to the post-trial comprehension processes. The authors conclude that the effects of speaker gaze-following contribute to visual attention and comprehension processes and should be accommodated by accounts of situated language processing.

Finally, complementing the mentioned findings from behavioral studies, based on the ob- servation that gaze-following facilitates (or disrupts) listeners’ performance on a subsequent task, Jachmann et al. (2017) examined the effect of gaze-following more directly. In an ERP study, they manipulated gaze congruency and found an increased N400 for the incongruent gaze, as well as for the neutral (uninformative) gaze. The incongruent gaze condition also showed a late (sustained) positivity that reflects the need to update the previously created sit- uational model. This study provides further evidence that listeners exploit speaker referential gaze in order to form expectations for the upcoming referent.

2.2.2 Reflexive Cue Following

Much evidence has been gathered establishing eye gaze as a means to communicate visual attention (Frischen, Bayliss and Tipper, 2007), as well as socially relevant information, like focus of interest, thoughts and intentions (e.g. Baron-Cohen et al., 1997). Moreover, gaze-following also enables language acquisition, and the development of theory of mind (Tomasello, 1995). Inspired by such findings, it has been argued that gaze-following might represent a unique attentional process. Even though many studies have contrasted the gaze cue with other visual directional cues, such as arrows, conflicting results have been collected, leading some researchers to propose that gaze attentional effects are at least partially driven by a domain-general attentional processing (e.g. Santiesteban, Catmur, Hopkins, Bird, & Heyes, 2014).

Friesen et al. (2004) found gaze-following to be more resistant to voluntary control than the following of directional arrow cues. In the counterpredictive cuing paradigm (target more likely to appear in the direction opposite the cued one), better performance was found for the cued location when gaze was the examined cue, while in the case of an arrow, the cued location was not considered. However, Tipples (2008) found that both cues inspire reflexive attention shifts. Moreover, Ricciardelli, Bricolo, Aglioti, and Chelazzi (2002) found differences in overt orienting between gaze and arrow, while Kuhn and colleagues found only

20 Sentence Processing and Visual Cues

small or no differences (Kuhn & Benson, 2007; Kuhn & Kingstone, 2009). Bayliss, Paul, Cannon, and Tipper (2006) found that objects inspected by other people are perceived as more likable than others, an effect which was not found for arrows. Similarly, an improvement in memory accuracy for the cued information was found with gaze, but not with arrows (Dodd, Weiss, McDonnell, Sarwal, & Kingstone, 2012; Gregory & Jackson, 2017).

Importantly to our present cause, a series of experiments contrasted referential gaze and arrows as visual cues used simultaneously with speech (Staudte et al., 2014). The authors aimed at understanding whether the influence of referential speaker gaze on sentence processing goes beyond visual cuing. They found that when gaze is made as precise as arrows, no qualitative difference is to be found between the two cues’ influence on the comprehension of linguistic material.

Finally, Böckler et al. (2011) contrasted gaze from schematic faces vs. gaze from photographs of humans. They set out to examine whether observing two people engaging in eye contact (or not) modulates the observer’s following of their synchronous gaze cue. Evidence was found for the facilitatory effects of gaze following on participants’ reaction times only for the gaze cues that followed mutual gaze between the two faces shown. Moreover, and more interestingly for our present cause, in their study Böckler et al. (2011) also compared scenes where actual photographs of human faces were used with schematic representations of faces. The abovementioned effect was present in both representation styles, showing that people engage in gaze-following even with more symbolic representations of the eyes.

In document Exploiting referential gaze for uncertainty reduction in situated language processing : an information-theoretic approach (Page 43-46)