• No results found

2.4 Computer-mediated communication

3.1.2 Dialog systems

After looking into some phenomena present in human dialogs that are an important basis for a computational modeling of dialog, we are now going to present the foun- dations of building computer dialog systems that attempt to provide an interface for humans based on human conversation. We start by discussing the motivation for de- veloping dialog systems and provide a general description of architectures and design features and issues. We then discuss in more detail the functionality of crucial compo- nents.

Motivations and applications

Motivations and applications for natural language dialog systems are manifold. A common, underlying goal for many systems is to make the interaction with a com- puter more natural and human like and thus easier or more fun to use. Apart from that, there are also more practical concerns that justify the use of a dialog system, in particular speech-based ones. There are application contexts in which more traditional interfaces based on visual displays and/or manual operation are impractical, danger- ous, or impossible. This applies, for instance, to phone-based systems, or scenarios where users are driving vehicles or controlling other devices, or operate as surgeons. Related to that, speech based systems may also assist users who cannot use other de- vices due to inabilities. Finally, dialog systems may be used in systems in which the natural language is the only feasible medium to impart knowledge (tutorial dialog systems) or is even in the center of instruction, as in systems that support learning a language.

With the exception of purely conversational systems, most dialog systems serve practical purposes based on some task domain. In this view, natural language is con- sidered as another possible interface alternatively or in addition to traditional user interfaces. A representation of the specific task and application domain of a dialog system must connect to the dialog-specific modules much like the logic of a regular software application must connect to the mouse gestures and dynamic screen content of a graphical user interface (GUI). Task-related knowledge may consist of a database

3.1. DIALOG 29

Figure 3.1– Architecture for dialog systems

for an information retrieval system, map data for navigation systems, rich environ- mental information for robotic systems, or domain and didactic knowledge for tutorial systems. In most cases, the domain knowledge will be changeable, thus, the dialog sys- tem needs to have access to the latest state and also be able to trigger state changes. As a simple example, consider a booking application, in which a sucessful booking leads to the unavailability of the item in question. Depending on the application domain, management of the task can range from a trivial passing through of commands to the back-end application to highly complex models of collaborative multi-agent problem- solving (Allen et al., 2000). Collaborative approaches may also include the attempt to recognize user intentions, which requires more than just the literal interpretation of user utterances (Allen et al., 2001).

Architecture

Across all differences between the variety of dialog systems, there is a common set of components for the universal tasks. Figure 3.1 shows an overview of these compo- nents and the information flow between them. End-to-end dialog systems for human- computer conversation require an interface for input and output. The users can either type in their contributions or speak to the system, the latter relies on a module for automatic speech recognition (ASR). Likewise, the system needs an output interface, which can be based on text or speech, the latter requires a module for text-to-speech (TTS) synthesis. Based on the result of the ASR module or the type-written input, the module for natural language interpretation analyzes the input and provides a formal semantic representation of the user utterance. This representation is handed to the dialog manager, which decides how to react based on the current state of the dialog and task-related context. The dialog manager interfaces with the task manager which maintains knowledge related to the task of the dialog system and any relevant context

30 CHAPTER 3. DIALOG FOR LANGUAGE LEARNING outside of the central conversation. Based on the dialog state and external context, the dialog manager issues a communicative goal to the natural language generation (NLG) module. The NLG module then is in charge of finding a linguistic realization of the communicative goal and sends it further to the synthesis module or simple text output.

Information flow

While most architectures for dialog systems share these components in one form or the other, they differ with regard to how the modules are connected and how the in- formation flow is organized between them. The processing in simpler architectures fol- lows a pipeline model, in which the information is passed in a linear fashion through ASR/text input, interpretation, dialog manager, generation and text/speech output.

More advanced architectures allow some additional exchange of information in a blackboard style, where each module can consult and contribute simultaneously to a central management component that stores the state of the dialog and external con- texts. These approaches are also conceptualized as agent-based architectures, referring to the different modules that work independently but collaboratively (Kerminen and Jokinen, 2003; Ferguson and Allen, 2005). Advantages of these more sophisticated architectures are that they allow for continuous interpretation of user input and are therefore better suited to allow flexible initiative from user and system. Furthermore, they allow for the integration of different independent agents with different types of knowledge regarding the linguistic interpretation, domain knowledge, as well as col- laborative concepts like a model of beliefs, desires and intentions (Ferguson and Allen, 2005).

Initiative

Depending on the specific application and task domain, the dialog system will im- plement a specific policy for initiative, which puts requirements on the architecture. Many systems implement a model which allows either the system or the user to ini- tiate and proceed the dialog, whereas the respective partner only reacts and responds to the initiator’s utterances. In system-initiative dialog systems, the system asks ques- tions or makes announcements and waits for the user to respond, while in systems that implement user-initiative, the system awaits the user questions or commands and re- acts. More sophisticated dialog systems provide mixed-initiative dialogs where both system and user can initiate in a more flexible manner. Mixed-initiative approaches are more natural but also more complex to implement.

Multiple threads

Natural conversation can comprise multiple topics, or threads, that can be embedded in one another or sometimes even interleaved. Humans usually have little problem managing thread switches. In terms of dialog management, a few approaches have been proposed (Ros´e et al., 1995; Larsson, 2002; Lemon et al., 2002; Lemon and Gruen- stein, 2004). Often, multiple threads arise out of multiple tasks that the dialog system

3.1. DIALOG 31 and user are pursuing concurrently. The ability to handle multiple threads and tasks increases the flexibility of a dialog system. At the same time, it poses additional de- mands for the interpretation module and management, since the range of possible user input widens and the system must keep track of the different threads.

Incrementality

Another method of making a dialog system more flexible and faster is the incremental processing of utterances. While the standard approach to treat language is to consider a complete utterance at once and pass it through the different processing steps, it has been proposed more recently to start processing with the smaller units at sub-utterance level. This can increase the reactivity of a system and make the conversation more natural as it is better suited to model phenomena like back-channels, fast turn-taking, self-corrections or collaborative utterance construction (Schlangen and Skantze, 2009). Further, an incremental approach to processing is also more similar to the way the human mind processes language.

Multiple modalities

While dialog systems use spoken or written language as their main modality, addi- tional modalities for input and output are possible and can be useful for different ap- plications. On the one hand, non-verbal channels that play a crucial role in human communication, as for instance, gestures, gaze, or facial expressions can be added. On the other hand, other conventional or novel user interfaces such as GUIs, touch, or body movements can be used to support the processing constraints or other physical constraints of the environment (Wahlster, 2006). Additional modalities increase the complexity of the system and add challenges to the overall processing and integration of all input and output channels.