Attention with visual secondary programme content

The visual presentation of SPC raises interesting questions regarding how users direct their attention. The user must choose whether to look at the secondary device or the main screen. As part of this decision, the user must sacrifice the information on the other, unattended screen. Researchers have sought to understand how users divide their attention in these scenarios, either using eye-tracking (Holmes et al., 2012; Brown et al., 2014) or video analysis of gaze direction (Dowell et al., 2015). The manner in which users choose to distribute their attention across screens may be affected by their level of interest in the content on the main screen. Brown et al. (2014) noted in their study using a nature magazine show (“Autumnwatch”) that participants were more likely to view the second screen when the programme cut to the studio rather than during segments shot on location. In this context, users were apparently less visually engaged by the studio footage than by the prospect of the secondary content. Dowell et al. (2015) suggest, however, that the content of the auditory channel may also be a factor in when users choose to orient attention to the secondary device. In traditional television or film production, the audience’s attention is directed by the programme maker using auditory and visual cues (i.e., focus, lighting, framing etc.) (Carroll, 2003). Giving the audience a decision between two concurrent streams breaks this tradition and introduces the possibility of missing information. This decision itself can introduce difficulties with users not knowing where to look, as was reported by Dowell et al. (2015). Furthermore, the issue of how to notify the user of the presence of new VSPC is introduced. Neate et al. (2015) compared the use of notifications on the main screen (still or shaking) and non-speech auditory notifications (earcon or auditory icon) with content appearing on the secondary device (still or shaking). Earcons were found to elicit the shortest reaction times, significantly lower than the visual notification on the main screen and were considered more effective at getting attention to content appearing with no notification. Participants’ preference ratings, however, indicated that auditory icons and the TV-based alerts were preferred to the condition with no alert.

While the cost of diverting visual attention on the visual information from the MPC is apparent, the effect of VSPC on the information from the soundtrack is less clear. The multiple resource theory of attention suggests that the degree of interference that occurs

Figure 6.1: Multiple resource model of attention (adapted from Wickens (2002, p. 163))

between concurrent tasks is governed by the extent to which they draw from the same pools of resources (Wickens, 1980, 1984). The model of multiple resources has four dimensions, which comprise: modality, stage of processing (perception, cognition or response), coding (spatial or verbal) and visual channel (focal or ambient) (Wickens, 2002) (See Figure 6.1). It follows that the modal separation between the visual SPC and the auditory MPC soundtrack does not implicitly rule out interference between the streams. In fact, it seems likely that the amount of interference that occurs is dependent on the nature of the information present in the MPC and SPC at any time. VSPC is likely to comprise text and/or images. Whilst the MPC soundtrack is likely to comprise speech, music and sound effects (i.e., sounds of on-screen action and ambient sounds in the scene) (Butler, 2007). Butler (2007) describes four purposes of television audio: attracting and maintaining the audience’s attention, aiding understanding of the narrative and meaning, creating smoother transitions when cutting between scenes, and providing the illusion of continuity across shots within a scene that have been shot non-sequentially. The type of information conveyed by sound, its importance and the type of mental processing it demands are therefore dependent on the type of sound and its usage. For example, an on-screen conversation between two main characters may convey important plot information and require the audience to process the speech and comprehend it. Conversely, off-screen speech of minor characters in a scene may be present to convey context (e.g., background chatter in a busy cafe) and is required only for the audience to perceive its presence.

The presentation of images alongside music, speech and effects is a device commonly exploited within the televisual experience. From the multiple resource model of attention (Wickens, 1980, 1984, 2002), the benefits of this bimodal presentation are argued to arise from the reduced cognitive demands compared to speech and the streams not drawing on common resources. It is therefore assumed, when displaying a slide show of photos on a secondary device, that the two streams of information do not impede each other. Care is necessary here, as this is not to say that the two streams do not have an effect on each other. The editorial use of sound in television to support, contradict, or emphasise elements of the footage (Butler, 2007) clearly displays the potential of sound to impact a user’s interpretation of an image. Of the possible scenarios posed by VSPC experiences, the presentation of additional text alongside MPC dialogue or narration seems most likely to lead to disruption. Both speech and text are verbal codes and require cognition for processing. Multiple resource theory would therefore suggest that a significant amount of interaction is to be expected. In the study of interactions between visual processing of textual information and verbal stimuli, many authors have made use of list recall measures (e.g., Salam´e & Baddeley, 1987, 1989; Jones et al., 1992; LeCompte, 1994). This, however, is not necessarily representative of the process that would be utilised during normal reading (Baddeley, 1997). Experiments in which participants have been asked to process different speech and text stimuli simultaneously appear to show that participants are unable to attend to both streams (Mowbray, 1953, 1954). It seems likely, therefore, that when users are reading textual material in SPC they are choosing to process the textual materials at the cost of the concurrently presented speech. Even in selective attention tasks where participants have been asked to read text in the presence of irrelevant speech, researchers have found detrimental effects on comprehension (Martin et al., 1988; Oswald et al., 2000; S¨orqvist et al., 2010) or, when time was not limited, increased reading times (Cauchard et al., 2012). Furthermore, work specifically focussing on the impact on reading of background television content containing a lot of spoken content has found detrimental effects on reading comprehension (Armstrong et al., 1991; Armstrong & Chung, 2000; Ylias & Heaven, 2003). It would therefore seem that reading textual SPC during the presentation of MPC containing speech is likely to have a detrimental impact on the comprehension of the SPC compared to an asynchronous experience.

Though users are likely to be employing selective attention in concurrent speech and reading scenarios, some information from the ignored speech stream may still be processed. Findings that the detrimental effect of speech on reading comprehension is dependent on the speech

being meaningful implies some semantic processing is occurring (Martin et al., 1988). Brown et al. (2014) report that users returned their gaze to the main screen after an exclamation in the MPC soundtrack. This could be seen as a sign that users were able to process speech from the MPC. It is seems likely, however, that a salient auditory event such as a vocal exclamation would draw attention regardless of whether the rest of the speech was being processed. Furthermore, while textual content was present in the SPC, it was generally quite short and was presented alongside images. Demands on users’ linguistic resources may therefore have been sufficiently small as to have had minimal effect.

The impact of music playing during reading is somewhat more complex. Firstly, in terms of dual task performance it is difficult to assess what informational content a listener in the real world would extract from music. There is, however, some evidence of added music altering emotional evaluation of the text (Cassidy & MacDonald, 2007). Considering the multiple resource model of attention (Wickens, 1980, 1984), one would not expect instrumental music and reading to interact to the same degree as speech due to the non-verbal nature of instrumental music. Studies on the impact of background music on reading have, however, still found detrimental effects from both lyrical (Martin et al., 1988; Furnham & Strbac, 2002; Avila et al., 2012; Perham & Currie, 2014) and instrumental music (Avila et al., 2012) when compared with quiet conditions. Nevertheless, in comparison to speech, it seems likely that instrumental music is less disruptive to reading (Martin et al., 1988; Cauchard et al., 2012). This said, music is hugely variable in both its spectro-temporal characteristics and its affective qualities, which makes generalised comments on the effect of music problematic. For example, there is some indication that music which is more aggressive (Cassidy & MacDonald, 2007), or has a higher tempo and greater intensity (Thompson et al., 2012), is more detrimental to reading.

Considering the effects and ambient content in a MPC soundtrack, there exists considerably less applicable research. The atmospheric components of the soundtrack face similar difficulties to music in that it is hard to characterise the information they convey or to generalise their acoustic characteristics. Research looking at the impact of environmental noise on reading has revealed some detrimental effects on reading which are similar to those caused by music (Furnham & Strbac, 2002; Cassidy & MacDonald, 2007). It is noteworthy, however, that these studies included some speech within the environmental noise (mumbling in the case of Furnham & Strbac (2002)). It is difficult to determine how applicable this would be to the atmospheric sounds in television. Also, as most reading occurs in the presence of

some ambient sound, it seems that the presence of some environmental noise is not overly disruptive to reading.

To summarise, it appears that with the presentation of textual SPC the user must decide whether to miss the MPC speech, or the information in the SPC. Furthermore, should they choose to devote attentional resources to the SPC, reading the content will be more difficult than if it was read in isolation.

In document Concurrency in auditory displays for connected television (Page 147-151)