Capturing Multimodal Relationships - Chapter Three Methodology

Chapter Three Methodology

3.3 Capturing Multimodal Relationships

As explored Chapter 2, the sign language interpreter’s source text is a multimodal one, and her task is one of audiovisual translation. The performance text constructs meaning through the interplay of various

resources, such as the spoken words, diegetic sound, the actors’ movements, their interactions with set, props, and so on. The interpreter’s rendition is based on the particular topography of space, movement and speech contained in the performance; the cross-modal translation is delivered as a live simultaneous interpretation.

The analytical framework is required to enable the capture of specific features of the play. The physical orientations of the characters and their directions of address as they deliver their lines of text, for example, must be identified and captured separately; however as Shi et al (2004:1) state ‘for multimodal communication, temporal synchrony and relationships are critical’, and here their analysis is supported by their intersection on the timeline of the

annotation tool ELAN, as elaborated in 3.3.1. The relevant features in the rendition are captured in the same way, so that the intersection of the annotated features in the performance and the rendition, and the relations between them, may be identified and compared.

3.3.1 The Annotation Tool

As noted in 3.2 the corpus is made up of audiovisual recordings of each performance and the respective signed renditions. The study requires a

platform that enables two separate video streams (one of the performance, and

one of the interpreter) to be temporally synchronised, allowing independent annotation of each stream.

This investigation uses ELAN, the European Distributed Corpora Linguistic Annotator, a multimedia annotation tool developed at the Max Planck Institute of Psycholinguistics. Whilst there is a range of tools available for the annotation of video, such as ANVIL (Cassidy and Schmidt, 2017), ELAN facilitates the streaming of both the performance film and the interpreter film, time aligned as they happened in the event; allows the user to create, edit, visualize, and search annotations for video and audio data; and, in particular, is designed for the analysis of language, sign language and gesture (Drew and Ney, 2008).

Over the past decade ELAN has become the most widely used annotation tool in the study of sign languages, language and gesture, and multimodal texts (Crasborn et al, 2006:82; Schembri et al, 2013; Meyerhoﬀ et al, 2015; Nagy and Meyerhoﬀ, 2015;Cruz et al, 2015; Turchyn et al, 2018), due to its functionality and flexibility. It automatically time-aligns media and annotations and allows the user to work with an unlimited number of annotation tiers, and multiple tiers can be assigned to each video file participant. Crucially, this allows us to

record data on multiple tiers, and the file can be saved as a template to allow the future creation of files with the same participant structure. Additionally searching files is very flexible. The user can search multiple files for very

specific ‘tokens’, jump from search results to corresponding points in the texts, and resulting concordance can be exported to a text file.

ELAN facilitates the implementation of the annotation scheme. It allows the deconstruction of multimodal audiovisual texts by the creation of tiers that allow the examination of chosen sections. Through this deconstruction and examination of elements, it is possible to make finely detailed comparisons of the texts. The ELAN interface can be seen in Fig. 3.3.1, below.

ELAN, the User Guide, and a beginners’ ‘Getting Started Guide’ can be downloaded for free from the Max Planck website, and the platform and user guide is updated regularly. There is an active support network/forum of community users and developers, accessible via Max Planck website.

3.4 Segmentation

Whilst the rationale for the corpus size and selection has been made earlier in the chapter, there must also be a rationale for the segmentation of the

performances recorded. The annotated segments are based around plot or situation developing articulations in the drama (as elaborated in 2.4), motivated by Esslin’s assertion that the audience must share in the ‘consensus on what happened to whom in the drama’ [Esslin’s italics] (Esslin, 1987:128); the very minimum we would expect of the audience is the fundamental understanding of the progression of the plot and dramatic situation. To identify these

moments, I have asked the question in each case ‘if this incident did not happen, would the outcome of the play be the same?’ (see Aristotle, c.335BC/

1996:15). The moments are considered essential to the development of the drama if the answer to the question is ‘no’. An alternative approach that may have been of benefit would be a discussion with the director, to ascertain his or her choice of salient moments in the development drama, since as a maker of the production, the director would have a deep insight into the intentions of the piece. Typically, however, by the time the theatre interpreter begins work on a translation for a production, the director’s work is complete and s/he is no longer involved in the project; in all three of the cases investigated here, the interpreted performances were in the middle or towards the end of the

Fig. 3.3.1. Example of ELAN interface

production’s tour. Practically, then, it would have been too time consuming to attempt to make contact and request and arrange a meeting in which the director is asked to list the salient moments in a production that they are no longer working on, nor indeed would there be any guarantee that a director would agree to that meeting. Since it is rare that the interpreter of a theatrical performance has contact with the director of the piece, it is typically the interpreter alone who makes the decision as to what information fundamental for the audience’s understanding of the development of the drama.

The number of plot or situation developing moments contained within a production is dependent on the play itself and can only be identified through initial analysis of each separate performance. From the initial analysis sweep of each play, the number of such moments identified were Goodnight Mister Tom, 68; Gravity, 74; and Blackberry Trout Face, 81. Examples from the case studies are:

• Plot articulation: Mister Tom arrives at school with a letter for Willie - his mother wants him to go back to London immediately (Goodnight Mister Tom).

• Situation developing moment: Jakey forces Cameron to box in an attempt to toughen him up (Blackberry Trout Face).

The number of segments chosen for analysis was restricted because the processes of manually annotating a multimodal text is labour intensive and time-consuming (Abuczki and Ghazaleh, 2013:87; Cassidy and Schmidt,

2017:2010). In addition the lengths of the plays varied considerably; Goodnight Mister Tom included a 20 minute interval, the first half running for 62 minutes and the second for 51 minutes, a total running time of 113 minutes; Gravity also included a 20 minute interval, the first half running for 69 minutes, and the second for 49 minutes, a total 118 minutes; Blackberry Trout Face however ran for 78 minutes in total without an interval. Rather than selecting segments to annotate from either side of the interval (thus allowing the interpreters time to rest) for two performances, and from one uninterrupted performance for the other, I chose to select segments for annotation from the first half of Goodnight Mister Tom (62 minutes) and Gravity (69 minutes), and from the 78 minutes of Blackberry Trout Face; this would give a spread of segments from an

uninterrupted stretch of performance/rendered text of over 1 hour. From my

own professional experience, delivering the rendition of a live performance for over an hour requires a great deal of concentration and is physically and mentally taxing; annotating segments selected from over 1 hour of

uninterrupted text for each performance avoids the issue of an interval allowing interpreters to rest in two of the performances and not in the other, potentially distorting results which might have been aﬀected by interpreter fatigue. For clarity I will refer to each uninterrupted stretch of film containing the annotated segments as the ‘performance’.

From each performance, five plot or situation developing moments were sampled at various points, near the beginning, around the middle and towards the end. A sample near the beginning of the performance was selected as typically a great deal of establishing information is presented at the beginning of a play which is essential for the spectator’s ‘grounding’ in the drama, and this is also the section of the performance in which the interpreter may be

‘settling in’ to the rhythms of the particular performance. A moment was selected towards the end of each performance, to account for any eﬀects of interpreter fatigue in each case. One moment was selected around the mid point of each performance, and two further moments selected, one at an interval between the first and middle moment, and another at an interval

between the middle and final moment; the spacing between the selections was approximate.

Once each of the 5 moments had been identified, I chose a segment of the text between one and two minutes either side of each, leading up to and away from the moment, beginning and ending at naturally appropriate points (for example, not beginning or ending mid-way through a stretch of dialogue or an

enactment). These 2-4 minute segments became the sections to be annotated.

The selections chosen were based around plot or situation developing

moments of the recorded play, without reference to the accompanying signed rendition. These sections provided enough data to test the robustness of the annotation scheme and enable analysis and discussion of interpreter activities in the respective renditions. Details of the selected sections annotated in each

play, such as length, number of characters, and a description of the events, are summarised in Appendix 1.

In document A Method for the Analysis of British Sign Language Interpreted Theatrical Texts (Page 117-122)