Lessons Learned from the CeBit Setup - The CeBit Setup: A Stripped-Down Version of the Curious

6.3 The CeBit Setup: A Stripped-Down Version of the Curious Robot

7.1.1 Lessons Learned from the CeBit Setup

Section 6.3.2 has given a detailed analysis of the different error causes for speech recognition errors in the CeBit setup, especially for non-understandings. In this section, the consequences for the interaction strategy of the current iteration are discussed. Strategies are derived how to reduce their number, and how to cope with the inevitable ones. As detailed in section 6.3.2, in contrast to non-understandings, misunderstandings are by definition non-detectable. However, uncorrected misunderstandings may have serious consequences, as they may result in the system performing erroneous operations, or using faulty parameters. In the CeBit setup for instance, the system might learn wrong labels. Representing more than a quarter of all recognition errors, the system must provide interaction strategies to recover, and to repair them. The Curious Flobi scenario was already implemented based on the PaMini framework, whose interaction patterns incorporate a range of, and varieties of, recovery strategies, such as confirming an action

1 In detail: I have developed the study setup and proposed the subjective measures. The objective measures were defined in collaboration with Ingo Lütkebohle, who also realized an automated calculation of the objective measures. Nina Riether calculated the regression analysis of the data.

before beginning it, canceling it, and explicit and implicit confirmation of information, giving the user the possibility to correct it. Besides applying appropriate strategies within interaction, recovery of information needs to be considered at the system level as well, i.e. if a wrong label has been learnt, the responsible component (e.g. the object recognition) must allow to overwrite it with the correct one later, or at least to merge the representations. In order to reduce the number of non-understandings that are caused by valid user utterances, the speech processing was revised. Both in the Curious Robot system and in the CeBit setup, a speech understanding component was used that represented utterances as hierarchical, linked frame-slot structures, each rated based on completeness of the structures [HWS06]. Utterances for which the linking algorithm fails to construct a reliable representation are rejected. However, many of the rejected utterances, though unparseable, still provide the information required (e.g. an object label) and thus would be rejected unnecessarily. In short, the fundamental deficiency of the approach is that it does not take into account the dialog context when parsing the utterances.

As a consequence, it was decided not to employ the above speech understanding algorithm, but to directly use the speech recognition result as input for the dialog system. This be- comes possible as the HMM-based speech recognizer used in all of our scenarios integrates a language model in form of an LR(1)-grammar [Fin99], providing not only the recognized chain of words, but also the corresponding grammar tree. By matching conditions over the nonterminal symbols of the grammar tree within the dialog system, robust key-word matching can be achieved, driven by the dialog expectation. As it was to turn out during the user study described in section 7.4, this approach worked surprisingly well and enabled correct dialog decisions even if an utterance contains many incorrectly classified words. Out-of-capabilityutterances have been identified as the largest error source. Considering the somewhat restricted capabilities of the CeBit setup, this seems not surprising. A large proportion of these errors were due to users’ attempts to demonstrate objects themselves, which strongly suggests that they would prefer to take a more active role in the interaction. As a consequence, the current iteration was extended as to allow for more user initiative. In particular, users can demonstrate objects on their own initiative now, and they can ask test questions about a specific object.

Moreover, a WOz study on object teaching (cf. section 7.1.2) served as foundation to study typical demonstration behavior. The analysis revealed that not only the task-related but also social elements are crucial in such interactions. Thus, the results of the WOz study have fundamentally influenced the design of the interaction strategy.

Another significant cause of error were out-of-vocabulary utterances. Out-of-vocabulary utterances decrease with grammar size, but so does in-grammar accuracy. Thus, a balance between wide grammar coverage and good in-grammar accuracy had to be found. Again, the analysis of the aforementioned WOz study served as a guide for the design of the speech recognition grammar, giving insights into the verbal strategies users apply in

7.1 Preparatory Activities: The Design Process 119

demonstrating objects, as well as into their social interaction behavior. Additionally, the resulting speech recognition grammar was evaluated and fine-tuned in a pre-test, enabling iterative grammar improvements (cf. section 7.1.3).

Out-of-contextutterances occurred rarely in the CeBit setup, which is probably due to the rather restricted interaction capabilities. With increasing system capabilities, in particular regarding user initiative, out-of-context utterances are expected to occur more frequently if left unchecked. Fortunately, PaMini’s interleaving interaction patterns facili- tate implementation of a non-restrictive interaction strategy that allows for interjections and social feedback. However, unrestricted interleaving may not always be appropriate, e.g. interleaving of two object demonstration patterns may confuse the user a little. Finally, a minor problem in the evaluation of the CeBit study was meta commentary, i.e. utterances that addressed the experimenter. They can easily be reduced in future studies by making sure that the experimenter is not present during the interaction. Table 7.1 summarizes the strategies proposed in this section.

Cause of Error Breakdown into subcategories Strategy

Misunderstanding Dialog provides recovery strategies Non-understanding

Valid utterances Expectation-based key-word matching Out-of-capability Extension of system capabilities

Out-of-vocabulary Pre-evaluation of speech recognition grammar Out-of-context Flexible interleaving of interaction patterns Meta commentary Experimenter not present

Table 7.1: Strategies to deal with the different speech recognition error sources.

In document Modeling Human-Robot-Interaction based on generic Interaction Patterns (Page 125-127)