Recording experiments - A Study of Accomodation of Prosodic and Temporal Features in Spoken Dia

Two types of recording experiments have been used in the work described in this dissertation. The first type is unsolicited, unconstrained dialogues that were recorded with subjects situated in the booths. The second type is spontaneous dialogue recordings elicited by mood induction procedures. Both types are discussed in the next two sections (6.4.1 and 6.4.2). Detailed information of the dialogues can be found in appendix A.

13 The installation of the described audio recording laboratory was a collaborative undertaking within the SALERO project (www.salero.info), which was funded by the EU. The laboratory has been used for other projects, such as the acquisition and annotation of an emotional speech corpus (Cullen 2008a).

6.4.1 Unconstrained dialogues

These dialogues were recorded while loosely acquainted or well-acquainted subjects (mostly DMC14_{and DIT staff and students) conversed in pairs from within the isolation booths. These}

conversations were primarily recorded for the purposes of a language learning research project called FLUENT15_{. In FLUENT, these recordings aim to provide a non-native language learner with}

audio material from native speakers, in three gradually “ascending” stages: (1) short, scripted conversations, (2) “role-playing” dialogues (such as ticket-booking), and (3) unconstrained dialogues. Dialogues acquired with method (3) were selected for analysis of inter-speaker accommodation, based on a quality rating given by the FLUENT research group to each dialogue. The dialogues comprise unconstrained speech that is not organized in any way. The dialogues can be characterized as “friendly chats”. There are many unpredictable topic changes, and there is a fair amount of spontaneous dialogue acts (interruptions, laughter, disfluencies, repairs), which would classify this speech as spontaneous. Therefore, these dialogues were considered as appropriate for studying inter-speaker accommodation.

However, it could be argued that subjects in these experiments were not as “relaxed” as they would be in a real-life setting, due to the presence of the recording equipment and the awareness of being recorded. In order to overcome this, one needs to turn towards experimental settings that require subjects to participate in a task, as task requirements are found to distract speakers from the recording setting and communicate more freely (Gross and Levenson 1995; Fernandez and Picard 2000; Picard et al. 2001).

6.4.2 Elicited spontaneity

A variety of experimental scenarios for eliciting spontaneous speech were considered in the design phase (Vaughan et al. 2006; 2007). These were primarily designed to elicit human emotions. However, since the chosen method of emotion elicitation was to encourage spontaneous speech, these scenarios were considered for analysis of inter-speaker accommodation.

The first of these experimental designs was a LEGO® puzzle which has also been used in (Kehrein 2002). In this scenario, one of the subjects is given the instructions for constructing an object (in this case a fire engine), while the other subject is given the LEGO pieces. In the simplest case, this encourages the two subjects (who are situated in the two separate booths and have no visual contact to each other) to get involved in the construction of the puzzle, a process which provides for natural

14 Digital Media Center, www.dmc.dit.ie

interaction between the participants. An extension to this idea is to provide the subjects with fewer pieces and/or misleading instructions, which is more tailored to the idea of inducing mild frustration, for the purpose of recording spontaneous emotions (Cullen et al. 2006). An important point in this scenario - from an accommodation point of view - is that the two subjects have distinct roles (information giver vs information receiver). While this is perhaps also relevant from an SDS application point of view, it was considered that – for the purpose of studying inter-speaker accommodation – any task should be “symmetrical” for the two subjects.

Another proposed scenario was that of a dice game known as “Mexican” or “Bluff” in which players roll two dice in turns. Each player has to claim a roll higher than the opponent's previous roll. If a bluff is called then that player loses a “life”, while if the roll was actually the one claimed, then the player who called the bluff loses a life. While this scenario is symmetrical and also suitable for acquiring spontaneous emotions, it was considered that the lexical variety in the corpus would be small (mainly digits that describe rolls) and that the game itself has a short duration with only two players, unless they are given a large amount of lives, in which case it becomes very repetitive. A third idea, proposed in (Johnstone 1996), was to record subjects while they were playing a computer game (Gears of War®16_{- a combat-style game). Actual sessions were recorded using this}

method. This required the additional installation of two Microsoft XBOX II ® gaming consoles, which were connected to the monitors in the booths. The subjects were playing in the same game area (via LAN connection) and had to combat each other in-game. Although this method is suitable for obtaining spontaneous emotions, it is less suitable for obtaining spontaneous conversation, since the subjects tended to remain silent for long periods of time. Most of the speech material occurred in “bursts” along with laughter or other non-verbal expressions, typically when a significant event happened in game. Minimal conversations occurred that were sparse and of very short duration. Thus, these recordings were not used in the study of inter-speaker accommodation.

6.4.3 The “shipwrecked” scenario

The experience from the early efforts described in the previous section led to the conclusion that the experimental design should comprise a task for the subjects to be involved with, while having a number of desired properties (a) it must require discussion, thus encouraging spontaneous conversation, (b) it must be symmetrical, i.e. experienced equally by both participants, (c) it must not constrain the subjects to any specific linguistic content (as in the case of the dice game), and (d) some motivation should be provided to the subjects to get involved with the task promptly.

The above specification led to the design of the “shipwrecked” scenario (see Figure 6.4). In this experiment, the two subjects experienced a hypothetical shipwreck, from which they had to survive. In order to accomplish this, the two subjects had to agree on which items from those shown on- screen were the most essential and in what order. Thus they had to rank the 15 objects shown in the picture by order of importance in surviving the hazard. In addition, a time limit of ten minutes was imposed, so as to encourage quick involvement from the subjects. The result was that the conversations were relatively focused, thus eliminating the problem of long stretches of silence that was encountered in the computer game experiment. In an earlier version of the experiment the subjects were given a list of the objects on paper and a pen to write down the ranks. However, this was found to introduce noise in the recordings. The inclusion of pictures instead of object names required the subjects to name the objects themselves. Thus, the corpus can be used for investigating lexical accommodation, in addition to a/p and temporal features. Based on the same procedure, two more “hazard” scenarios were implemented: an expedition in the Himalayas, in which the subjects had lost their guide and path, and a space mission, where the subjects had to abandon their spaceship and get into a rescue pod. The task in both these cases was identical (ranking a set of 15 objects relevant to the task). These two sets of objects are shown in appendix A.

A further expansion of this experimental design was the inclusion of an on-line performance score. This score was automatically assigned and shown on-screen by an “intelligent” system, based on the “correct” ranking. This was actually a Wizard-of-Oz implementation, in which the changes in

the score shown were always the same regardless of the choices that the subjects made (there was no “correct” solution). The purpose of this was to record the subjects' reactions when they thought they were doing well with the task or when their score was dropping. Since this expansion did not alter the task and recording conditions significantly (in reality it only made the task appear more difficult), these recordings were also used in the study of inter-speaker accommodation.

Conclusively, a total of 30 dialogues were recorded using all methods, as shown in Table 6.2. The recording experiments for some categories are on-going: the table contains those dialogues that were analyzed for inter-speaker accommodation.

Method Number of dialogues Average Duration (min) Total Duration (min) Unconstrained 8 20 161 Shipwrecked 14 8 108 Shipwrecked + ranking score 8 9 76 Total 30 - 345

Table 6.2: Recorded dialogues

In document A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications (Page 112-116)