The Interaction Perspective - Exploring Wizard of Oz Experimentation

3.5 Exploring Wizard of Oz Experimentation

3.5.3 The Interaction Perspective

Finally, the interaction perspective aimed at generating new insight into the challenges of creating and running WOZ experiments. The goal was to focus on the task of the wizard and the challenges a researcher needs to overcome when employing this form of evaluation. While all the previous parts of this research methodology focused on understanding WOZ in order to build a flexible tool the interaction perspective specifically looked at using this tool. As we followed an integrated development process which used realistic experiments conducted with different versions of the tool as a dedicated source of information, the interaction perspective served two purposes. Firstly, it was used to drive the development process and initiate the integration of new functions as well as the optimization of existing ones. Secondly, it aimed at expanding on what we already knew about WOZ prototyping. While all the conducted experiments were motivated by external research interests, they also helped to explore different aspects of the wizard task. In this respect we looked at third parties using different versions of our tool (i.e. one version per experiment) to answer dedicated research questions. This created the necessary context to realistically evaluate wizard workload, wizard consistency as well as the effort dedicated to constructing a WOZ experiment. Following, we describe in more detail why we think that these three areas are important factors having the potential to significantly influence the success of an experiment, and how we were exploring them.

Wizard Workload

Previous research has highlighted that the cognitive effort required to convincingly manage the task of the wizard is highly demanding (Salber and Coutaz [1993a]). To help with this task the wizard usually operates a multi-purpose interface that is customised to fit the particularities of a specific test setting. Our goal was to identify interface elements that might support the wizard beyond the frontiers of a single test scenario. To do so, formal usability tests (cf. Dumas and Redish [1999]) with small groups of users, testing different prototypes of our wizard interface, were used to identify difficulties and suggest improvements. Here the goal was to improve the intuitiveness of our solution without spending too much time on lengthy studies. While con-

ceptual problems were difficult to identify in these walk-up-and-use tests, we wanted to ensure that basic interface glitches were fixed before any in-depth analyses were started. In a way the tests can be seen as a number of trial runs before an actual experiment was conducted. Aiming at more solid feedback we then looked at three real WOZ studies, each of which employed a different wizard over the span of various experiments, and analysed how wizards performed over time. Doing so not only helped to see whether wizards got used to operating the interface but also allowed for identifying those aspects of the wizard task that remained challenging. In summary, this mixture of low-fidelity one-off usability testing combined with analysing long term wizard performance (over the course of a whole study consisting of several WOZ sessions with real test participants) produced results which consist of both direct user feedback as well as observed user behaviour.

Wizard Consistency

A second interesting aspect of running WOZ experiments concerns the consistency with which a wizard simulates system behaviour. While a technological solution would usually perform within a defined margin of consistency, humans are susceptible to changing behaviour; es- pecially if the tasks they are supposed to perform are cognitively demanding. Analysing the log-files of three different WOZ studies we wanted to understand whether more experience leads to better performance. Both timing as well as consistency in terms of simulated system behaviour were evaluated. Statistical measurements were employed to look for learning effects which might lead to faster response times. Furthermore an in-depth analysis of the content wizards produced was conducted, which analysed how this content can vary throughout the course of an experiment. In summary the focus of this round of analyses was on understanding those aspects of the wizard task where experience helps and those aspects where even extensive training sessions fail in harmonizing wizard actions.

Experiment Construction

A final series of evaluations looked at the construction process of WOZ experiments. Software tools that allow for the simulation of system behaviour are often considered as throwaway applications (Dow et al. [2005c]). While resources spent on building test relevant environments seem a necessary investment in order to improve the overall quality of technology, it remains a challenge to convince stakeholders of their importance. Our goal, of building a generic tool for WOZ experimentation, aimed at reducing the amount of developing and set-up time needed for running evaluations. Editing functionalities similar to the ones found in web-based content management systems were meant to offer a straight forward and configurable solution for a va- riety of different WOZ settings ranging from pure text-based interactions to more sophisticated speech and multi-modal scenarios. In order to evaluate if these functionalities were effective and whether potential wizards were able to use them we ran a series of analysis where participants were asked to design an experiment. Different groups of potential wizard users were

analysed so that we were able to understand their distinct requirements and identify missing functionalities. We looked at their success rate in terms of creating experiments and evaluated the complexity of their creations. Furthermore, we collected their feedback with respect to the perceived usability of the tool as well as the difficulty level of the task. In summary, we aimed for a low-entry barrier and so this analysis looked at whether people were able to create WOZ experiments with our prototyping platform without undergoing a dedicated training session first.

In document Supporting Wizard of Oz experimentation for language technology applications (Page 58-60)