5.2 System Requirements
5.2.2 Technology-dependent Requirements Gathering
After having analysed user- and role-depended requirements, we also need to think about the technological aspects of WOZ support. To do so we first look at technologies used in existing tools and then discuss a general tool architecture that aims to provide the necessary flexibility to cover the various uses cases and WOZ set-ups presented in earlier sections, as well as sup- ports the different roles a wizard might need to perform when running an experiment (cf. 5.4). Furthermore, we discuss how this type of architecture may be supported by current (and future) technologies and services.
Technology Platforms of Existing Tools
If we look at some of the existing WOZ tools we find a variety of different technology plat- forms that have been used to create wizard interfaces. From a typological perspective, we can generally differentiate between two kinds of applications. On the one hand, we find a small number of tools such as SUEDE and the CSLU toolkit that work as stand-alone applications where the simulation is achieved by a single computer. This type of application does, however, provide only limited interaction options for a test participant, for which experiments are usually
based on pure speech input. On the other hand, we find network-based client/server architec- tures where separate interfaces for both the wizard as well as the test participant are available. In addition to different tool typologies developers employed various technology platforms (i.e. programming languages) in order to build the relevant interfaces. While for stand-alone ap- plications the TCL/TK platform1 or JAVA2 seem to be the favourite choices for networked solutions, we also find VISUAL C#3 and ADOBE DIRECTOR4 solutions. Table 5.5 shows a variety of different WOZ tools and the the technology platform that was used to implement them.
Tool Type Technology Platform Reference
CSLU Toolkit Standalone Tcl/Tk Sutton et al. [1998]
R. B.’s WOZ tool Standalone Tcl/Tk Online5
NEIMO Client/Server Appletalk Salber and Coutaz [1993a]
Polonius Client/Server ROS Framework Lu et al. [2011]
SUEDE Standalone Java Klemmer et al. [2000]
MDWOZ Client/Server Java Munteanu and Boldea [2000]
WOZ Pro Client/Server Java Hundhausen et al. [2007]
BrickRoad Client/Server Java Liu and Li [2007]
Topiary Client/Server Java Li et al. [2004]
EPFL Dial. Pl. Client/Server Java Rajman et al. [2006] SketchWizard Client/Server Visual C# Davis et al. [2007] QuickWoZ Client/Server Visual C# Smeddinck et al. [2010]
Domer Client/Server Visual C# Villano et al. [2011]
Ozlab Client/Server Adobe Director Pettersson and Siponen [2002] Dart Client/Server Adobe Director MacIntyre et al. [2004] Olympus Client/Server Web Technologies Bohus et al. [2007]
Jaspis Client/Server Web Technologies Turunen and Hakulinen [2000]
Table 5.5– Existing WOZ Tools and the technology platform that was used to implement them.
As can be seen from the table, JAVAand VISUALC# seem to be the most popular platforms for implementation whereas ADOBEDIRECTORis predominantly used for supporting media- heavy experiments. Furthermore, we can see that most of the listed client/server tools follow a classic thick client model, which requires the installation of dedicated software components on all of the used experiment computers. A different model is used by OLYMPUSand JASPIS, which employ open-source web technologies to offer various language-related services. Both, however, do not provide dedicated graphical interfaces a wizard could use to interact with a test participant.
Another tool aspect that seems important, especially for applications that are based on proprietary software platforms, is their support for different file formats. Import and export
1http://www.tcl.tk [Accessed: August 14th2012]
2http://www.oracle.com/technetwork/java/javase/downloads/index.html [Accessed: August 14th2012]
3http://msdn.microsoft.com/en-us/vstudio/hh388566.aspx [Accessed: August 14th2012]
4http://www.adobe.com/products/director/ [Accessed: August 14th2012]
of data is crucial, not only in cases where a chosen tool does not support all the experiment tasks that need to be accomplished, but also in situations where third party products may allow for a more efficient generation of relevant source and design files. From the tools we could test, we can say that most of them do offer import and export feature at least to some extent. A positive example can be found in SUEDE, which supports designing a dialogue, running it and afterwards looking at the results in analysis mode. Also, its experiments are saved in an XML format which makes it easier to work with them in third party applications. Similar functionality is offered by RICHARDBREUER’S WOZ TOOL. While here we do not find an analysis mode, the software supports various export formats and also lets designers save tested dialogues, including their grammar, in VoiceXML format. The CSLU TOOLKITallows for saving the trial-logs in two different text formats and DART, as already explained earlier, uses the ADOBE DIRECTORformat. So overall, when it comes to the export of generated data and log files existing tools offer relevant functionality. The import of data structures, however, seems more complicated. Generally import functionalities are available with all the tools we tested. However, in all cases special formatting is required, which makes it more difficult to use alternative software tools to design experiments. While for small experiments the design functionalities offered by the tools are usually sufficient, more extensive research interests can require complex and versatile designs for which alternative tools often provide better design support.
A final aspect of interoperability can be found in the possible integration of external prod- ucts and services during runtime. Also here we find some support with existing tools, although the offered functionality is rather limited. The CSLU TOOLKIT, for example, allows for chang- ing the talking head that is integrated and also offers access and integration of web-based infor- mation resources. RICHARD BREUER’SWOZTOOLlets the user access media that is stored on the file system but does not offer any additional integration functionalities, and SUEDE can be seen as an entirely closed environment. Client/Server tools are generally more open to expansion. OLYMPUS, for example, is an aggregation of different modules offering different services and so it also offers various interfaces for expansion. Similarly, the XML-based ar- chitecture of the JASPISdialogue manager makes it easy to integrate the software with other services. In both cases, however, the necessary upfront engagement with the tool in order to understand its features and Application Programming Interfaces (API) can be time consuming. A standard format for integrating components into WOZ experiments is currently missing.
In summary, combining the flexibility of open web technologies with the interface qual- ity as well as the import and export functionalities of some of the tools discussed earlier, we see a potential for a new category of application. Open access would allow for different re- searchers and designers to use the tool, to improve it, and eventually adopt it to their very own requirement. Using web technologies, on the other hand, would significantly increase the po- tential user base, as most designers are capable of building web-based tools and interfaces and furthermore expand on the possibilities for testing interaction scenarios beyond the traditional computer workstation setting. However, for such a tool to be employable in a variety of set-
tings we first need to define an architecture that on the one hand covers a range of possible use cases and on the other hand allows for a flexible integration of external services. Only if the architecture enables researchers to develop their own components and integrate them using standard APIs, such a tool would go beyond what is currently available.
A Comprehensive Tool Architecture
Looking at the earlier outlined design space, it appears that a tool that aims to more compre- hensively support the application of the WOZ method would need to offer a way of combining existing technologies in a more flexible manner. Some of those technologies might be fully working, others, however, might still be in an early development stage, and would rely on a wizard to raise their quality to an acceptable level. To make this possible, a software archi- tecture is required which supports a flexible use of technology. Ideally, this would provide a modular, ‘pluggable’ framework that allows for components to be integrated or replaced easily. From an architectural point of view, we need to define each of the components, the different stages of development they can be in, and their relationships to each other.
As regards the different task responsibilities a wizard can take on within the interaction pipeline, one can define several different modes technology components can be in. A compo- nent can be relevant for a given setting i.e. it is needed and therefore needs to be represented in some form (e.g. ASR in a hands-busy-eyes-busy situation), or it is irrelevant for which it needs to have the possibility to be turned off (e.g. MT in a monolingual setting). In the case where a technology is needed, one can further distinguish between three different states. In the best circumstances the technology is of production quality and therefore can be used in a black-box manner producing results either for the wizard or a test participant. On the other hand, if the performance of a component is not sufficient, a wizard’s task can be to enhance its quality. This type of scenario is particularly useful when the goal of an experiment is to investigate the improvement in quality that is needed for a technology to be acceptable, and therefore requires some sort of correction mode. Finally, in a setting where a component is needed but not available, it is usually the task of the wizard to completely simulate the missing functionality.
In summary, a comprehensive WOZ tool should enable a wizard to complement existing technology on a continuum by permitting her to simulate and correct technology, before finally using it as a black-box. Likewise Dow et al. [2005c] argue that a wizard might first take on the role of a ‘controller’ who simulates technology. Then, in a second stage act as a ‘moderator’ who approves technology output, before finally moving on to being a ‘supervisor’ who only overrides output in cases where it is really needed.
By looking at these different modes and carrying them on to the LTC level it is possible to further deduce a set of rules that handle the relationship between consecutive technology components. The first rule defines a fully working component as a black-box for which it can be preceded as well as followed by components in any state. If a component is simulated by the
Input Processing Output
Example Text ASR MT DM MT TTS Text
Kelley [1984] ON OFF OFF ON OFF OFF COR
Bederson et al. [2010] ON OFF OFF SIM SIM OFF SIM
Gould et al. [1983] OFF SIM OFF SIM OFF OFF SIM
Geutner et al. [2002] OFF SIM OFF SIM OFF ON OFF
Schneider et al. [2010] OFF SIM OFF SIM ON ON OFF
Karpov et al. [2008] OFF COR OFF ON ON ON OFF
ON. . . The technology component is relevant for the given use case. A working solution is available.
OFF. . . The technology component is not relevant for the given use case and therefore not considered.
SIM. . . The technology component is relevant for the given use case. No solution is available so that the wizard has to simulated the com- ponent.
COR . . . The technology component is relevant for the given use case. A solution is available but does not produce satisfactory re- sults. The wizard augments the technology by changing or overrid- ing component output where necessary.
Table 5.6– Some examples from the literature showing possible component/state combinations.
wizard, however, it needs to be followed by a working component. In cases where two or more consecutive components need to be simulated, they merge into one single task for the wizard (e.g. simulated ASR followed by simulated MT). Also when a corrected component follows a simulated one, both components merge into a simulation task for the wizard, as it seems defective to first simulate input for a component and then correct its output. Similarly, when one or more simulations follow a correction, all merge into an integrated simulation. Finally, a component can only be in correction mode when either its preceding component is fully working or when it receives its input directly from a test participant. Table 5.6 illustrates some of the possible component-state combinations and the related task of the wizard. Integrating those rules into a software architecture should allow for a more flexible use of technology when running WOZ experiments.
Wizard of Oz and the Web
An increasing number of traditional software applications are now offered in a web-based form, and the applications available are becoming more complex. The almost ubiquitous avail- ability of high-speed internet has been an important factor, but also recent advances in web
technologies have been critical in supporting this transition from locally installed software to cloud-based web applications. While some of the WOZ experiment environments presented in the literature were built to some extent using web technologies (e.g. Turunen and Hakulinen [2000]), the majority are based on conventional software tools. The lack of simple support for web-based speech input and output has been a major obstacle, leading to the use of locally installed software, with associated installation effort, software dependencies and compatibility problems. Recent advances in web technologies, however, provide better support for deal- ing with speech. Modern web browsers are able to process audio and video data in real time and without the need for additional plug-ins. Upcoming web standards (i.e. the forthcoming HTML5 standard6) go further by giving access to computer hardware through the browser. These standards open up new possibilities for WOZ experimentation. We are now able to inte- grate speech input and output into web-based platforms, which significantly reduces the set-up requirements for an experiment environment. Furthermore, by using web services it is possible to build flexible tool architectures, such as the one just presented, in a way which allows for components to be integrated and replaced easily and on-the-fly.
As well as removing problems associated with installation, there is also a benefit in terms of interoperability with other platforms i.e. it is easy to integrate WOZ experiments into ex- isting web-based software environments. For example, if a new interaction modality for a web-based help system needs to be tested, a WOZ client can quickly be added to an already existing interface. From the point of view of the wizard, it is further possible to add additional information channels such as video of the user or location data, which allows for the evaluation of not only speech but also multi-modal interaction. Finally, the possibility of running WOZ experiments on different platforms with different form factors (e.g. smartphones, tablets, media centres) represents another significant advantage that web-based solutions have over traditional software.