3.5 Motivation
3.6.2 Dialog System APIs
Virtually all approaches to dialog systems claim to be easy configurable in some way or another. One of the first approaches that explicitly addressed the issue of API usability and rapid development of dialog applications was the CSLU toolkit [SNC+96, SCd+98]
which provides a graphical editor and a toolkit environment for creating finite-state based dialog systems (cf. section 2.1.2).
Another toolkit approach represents the WIT toolkit for building spoken dialog sys- tems [NMY+00]. Also the WIT toolkit pursues a whole systems approach and provides an
environment for integrating speech recognition, language understanding and generation, and speech output. For each of the components, a domain-specific knowledge source needs to be defined. Based on a user-defined semantic frame specification of the domain, an integrated parsing and discourse processing method plans the output using a unification grammar [NMH+99]. The WIT toolkit relies on a more sophisticated dialog model than the
CSLU toolkit, but will presumably requires more expertise from the application developer. In a similar way, most approaches achieve easy reconfigurability by separation of domain- specific and domain-independent knowledge. Some approaches emphasize the definition of task models, while others focus of the identification on generic dialog strategies. Examples for the first category are Collagen with its Recipes (cf. section 2.1.3) and WITAS with its Activity Model (cf. section 2.2.2), while RavenClaw (cf. section2.1.2) focuses on the identification of domain-independent dialog strategies for error handling and grounding. Another approach that focuses on describing domain-independent dialog strategies has been proposed in the context of the ARIADNE dialog system [Den02]. It relies on the slot-filling approach and uses an explicit dialog state (similar as the information state approach described in section 2.1.3). To develop a spoken dialog application with it, it is required to specify a number of domain specific knowledge sources, most notably an ontology and a set of service descriptions that specifies for each back-end application what kind of information is necessary to invoke that service. During interaction, the dialog state keeps track of the goals that are compatible with the information gathered so far. To control the dialog, the system relies on generic dialog processing algorithms, which are
3.6 Foundational Work 51
also called Interaction Patterns. These are procedures that entail sequences of utterances. Four types of Interaction Patterns are incorporated into the system: The Question pattern requests information from the user, the Undo pattern removes information, the Correction corrects an information, and the State pattern handles help requests. Denecke’s Interaction Patterns are specified in a declarative way, and their execution is based on constraint logics. Depending on the dialog state and the compatible goals, the system instantiates the appropriate Interaction Patterns. Similar as the Interaction Patterns proposed in the present work, the shape of the patterns – i.e. the specific sequence of utterances they include – varies and is determined as the dialog develops. Another commonality between the different concepts of Interaction Patterns is that they both model not only sequences of utterances, but also system operations. However, while Denecke’s Interaction Patterns operate purely at the information level by updating the dialog information state, the Interaction Patterns proposed here operate additionally at the domain level by updating back-end tasks through the Task State Protocol. Also, Denecke’s Interaction Patterns do not serve as an API specification for dialog designers (which is one of the basic functions they take up in the present approach), but can rather be seen as built-in system capabilities that are triggered automatically as appropriate.
A similar approach has been proposed by Bui and colleagues [BRM04], also in the domain of slot-filling applications. In their approach, the domain is modeled as a set of relational database tables. The dialog model consists of a set of interconnected Generic Dialog Nodes (GDN), each of which refers a column in the database. The GDN are configured by the application developer with a grammar to interpret the user input and the prompts the system will say. Based on this configuration, each GDN performs a simple interac- tion with the purpose to obtain a value for the associated attribute from the user. The local dialog flow management is handled by a single GDN. Each GDN can handle five situations: OK, Repeat, Help Request, No Input and No Match. More general strategies determine the global dialog flow management, e.g. how to deal with inconsistencies. The proposed approach is embedded into a process model for developing spoken dialog applica- tions, which includes conducting WOz studies, as well as internal and external field studies. Gandhe and colleagues have introduced an approach to rapidly developing dialog capa- bilities for virtual characters based on the Information State approach [GDR+08]. As
the ones described above, this approach operates on a domain specification describing the objects and characters of the domain, as well as their attributes and possible values, or their goals. This authoring process is supported by a graphical user interface. From the domain description, the dialog acts that may occur during interaction are generated automatically. For example, for a specification of an object with certain attributes and possible values, an associated assert dialog act is generated. During interaction, the dialog manager updates the information state according to the occurring dialog acts, and generates the content of the response. The agent’s conversational obligations – i.e., the sequences of dialog acts – and the rules according to which the information state is
offer not
elicited hassan.elicit-offer offer elicited player.offer offer given
hassan.elicit-offer player.offer player.offer
hassan.response-offer or hassan.assert
Figure 3.14: Finite state machine modeling the agent Hassan’s conversational obligations
associated with an offer subdialog (after [GDR+08]). Not shown are the conditions and
updates to the information state.
updated1 are implemented as finite state machines. Figure 3.14 shows the finite state
machine for an offer subdialog. Thus, the final state machines model the local discourse coherence, while the global coherence is determined by the system’s information state. In this respect, they are similar to the Interaction Patterns proposed in the present work. Also, both concepts are modeled as a kind of finite state machine. Gandhe’s obligation descriptions, however, model exclusively the dialog act sequences, but not the associated system actions (such as updates of the information state).