Multimodal Shared-Control Interaction
for Mobile Robots in AAL Environments
Cui Jian
Kumulative Dissertation
zur Erlangung des Grades eines Doktors der Ingenieurwissenschaften
– Dr.-Ing. –
Vorgelegt im Fachbereich 3 (Mathematik und Informatik)
Universit¨
at Bremen
September 2013
Datum des Promotionskolloquiums: 18. Dezember 2013
Gutachter
Prof. Dr. Bernd Krieg-Br¨uckner (Universit¨at Bremen) Prof. Dr. John Bateman (Universit¨at Bremen)
Abstract
This dissertation investigates the design, development and implementation of cognitively adequate, safe and robust, spatially-related, multimodal interaction between human opera-tors and mobile robots in Ambient Assisted Living environments both from the theoretical and practical perspectives. By focusing on different aspects of the concept Interaction, the essential contribution of this dissertation is divided into three main research packages; namely, Formal Interaction, Spatial Interaction and Multimodal Interaction in AAL. As the principle package, in Formal Interaction, research effort is dedicated to developing a for-mal language based interaction modelling and management solution process and a unified dialogue modelling approach. This package aims to enable a robust, flexible, and context-sensitive, yet formally controllable and tractable interaction. This type of interaction can be used to support the interaction management of any complex interactive systems, including the ones covered in the other two research packages. In the second research package,Spatial Interaction, a general qualitative spatial knowledge based multi-level conceptual model is developed and proposed. The goal is to support a spatially-related interaction in human-robot collaborative navigation. With a model-based computational framework, the proposed conceptual model has been implemented and integrated into a practical interactive system which has been evaluated by empirical studies. It has been particularly tested with re-spect to a set of high-level and model-based conceptual strategies for resolving the frequent spatially-related communication problems in human-robot interaction. Last but not least, in Multimodal Interaction in AAL, attention is drawn to design, development and imple-mentation of multimodal interaction for elderly persons. In this elderly-friendly scenario, ageing-related characteristics are carefully considered for an effective and efficient interac-tion. Moreover, a standard model based empirical framework for evaluating multimodal interaction is provided. This framework was especially applied to evaluate a minutely devel-oped and systematically improved elderly-friendly multimodal interactive system through a series of empirical studies with groups of elderly persons.
Zusammenfassung
Die vorliegende Doktorarbeit untersucht die Konzeption, Entwicklung und Umsetzung von kognitiv ad¨aquater, sicherer und robuster raumbezogener multimodaler Interaktion zwis-chen Menszwis-chen und mobilen Robotersystemen im Rahmen des altersgerechten umgebungsun-terst¨utzten Lebens (AAL), aus theoretischer und praktischer Perspektive. Entsprechend den unterschiedlichen Aspekten des zentralen Konzeptes Interaktion, ist der wesentliche Beitrag dieser Arbeit in drei Forschungspakete aufgeteilt, n¨amlich Formale Interaktion, R¨aumliche Interaktion und Multimodale Interaktion im Kontext von AAL. Im grundlegenden Paket, Formale Interaktion, besteht ein Großteil der Forschungsarbeiten in der Entwicklung eines L¨osungsprozesses, der auf einer formalen Sprache basiert und in Modellierung und Man-agement allgemeiner Interaktion eingesetzt werden kann, sowie eines generellen hybriden Ansatzes zur Dialog-Modellierung. Dieses Paket hat das Ziel, eine robuste, flexible und kontext-sensitive, zugleich formal steuerbare und verfolgbare Interaktion zu erm¨oglichen, die dann dazu verwendet werden kann, Interaktionsmanagement von komplexen interak-tiven Systemen zu unterst¨utzen, einschließlich der in den beiden anderen Forschungspaketen abgedeckten Systeme. In dem zweiten Forschungspaket,R¨aumliche Interaktion, wird ein auf qualitativem r¨aumlichen Wissen basierendes, allgemeines konzeptionelles Mehrebenmodell entwickelt und vorgeschlagen. Das Ziel ist es, eine raumbezogene Interaktion in koopera-tiver Navigation von Mensch und Roboter zu unterst¨utzen. Das konzeptionelle Modell wurde mit einem modell-basierten Rahmenwerk implementiert und in ein praktisches interaktives System integriert, das dann durch empirische Experimente evaluiert wurde. Dies wurde vor allem im Hinblick auf eine Reihe von modell-basierten konzeptionellen Strategien auf hoher Ebene getestet, die zur Bew¨altigung der h¨aufigen raumbezogenen Kommunikationsprob-leme in Mensch-Roboter-Interaktion verwendet werden. Die Forschungsarbeiten im Paket Multimodale Interaktion in Umgebungsunterst¨utztem Leben konzentrieren sich auf Entwurf, Entwicklung und Implementierung von multimodaler Interaktion f¨ur ¨altere Menschen. Dabei wurden altersbedingte Eigenschaften f¨ur eine effektive und effiziente Interaktion in alters-gerechter Umgebung sorgf¨altig betrachtet. Dar¨uber hinaus wurde ein empirisches Rahmen-werk auf der Grundlage des Standard-Modells f¨ur die Bewertung multimodaler Interaktion entwickelt. Dieses Rahmenwerk wurde dann speziell angewendet, um ein umfassend entwick-eltes und systematisch verbessertes, altersgerechtes multimodales interaktives System durch eine Reihe von empirischen Experimenten mit Gruppen von ¨alteren Menschen zu evaluieren.
Contents
1. General Introduction 1
1.1. General Overview . . . 1
1.2. Related Work . . . 3
1.2.1. Interaction Management . . . 3
1.2.2. Interaction in Spatially-related Applications . . . 4
1.2.3. Interaction in AAL . . . 5
1.3. Contributions of this Work . . . 6
2. Formal Interaction 9 2.1. The Formal Language based Dialogue Modelling and Management . . . 9
2.1.1. The Formal Language based Interaction and Modelling Solution Process . . . . 10
2.1.2. The Formal Dialogue Development Framework . . . 11
2.2. the Unified Dialogue Modelling and Management Approach . . . 12
2.3. Contribution of the Corresponding Publications . . . 14
2.4. Possible Future Work . . . 14
3. Spatial Interaction 17 3.1. QSBM: A general Qualitative Spatial Beliefs Model . . . 17
3.2. A DCC-based QSBM . . . 19
3.2.1. The Conceptual Level . . . 19
3.2.2. The Application Level . . . 20
3.2.3. The Strategy Level . . . 22
3.3. SimSpace: A Computational Framework to Support QSBM . . . 25
3.4. Empirical Studies of QSBM based Spatial Interaction . . . 26
3.5. Contribution of the Corresponding Publications . . . 27
3.6. Possible Future Work . . . 28
4. Multimodal Interaction in AAL 29 4.1. Foundation of Design and Development of Multimodal Interaction for Elderly Persons . . . 29
4.1.1. Design Guidelines of Multimodal Interaction for Elderly Persons . . . 29
4.1.2. The Unified Dialogue Model . . . 31
4.2. MIGSEP: the Multimodal Interactive Guidance System for Elderly Persons . . 32
4.3. Empirical Evaluation of MIGSEP . . . 34
4.4. Contribution of the Corresponding Publications . . . 37
Full List of Publications by the Author 42
Bibliography 51
A. Accumulated Publications 53
A.1. SimSpace: A Tool to Interpret Route Instructions with Qualitative Spatial Knowledge . . . 55 A.2. Qualitative Spatial Modelling of Human Route Instructions to Mobile Robots . 57 A.3. Deep Reasoning in Clarification Dialogues with Mobile Robots . . . 63 A.4. Evaluation of a Unified Dialogue Model for Human-Computer Interaction . . . 69 A.5. Towards Effective, Efficient and Elderly-friendly Multimodal Interaction . . . . 83 A.6. Evaluating A Spoken Language Interface of A Multimodal Interactive Guidance
System for Elderly Persons . . . 91 A.7. Touch and Speech: Multimodal Interaction for Elderly Persons . . . 101 A.8. Better Choice? Combining Speech And Touch In Multimodal Interaction For
Elderly Persons . . . 117 A.9. Resolving Conceptual Mode Confusion with Qualitative Spatial Knowledge in
Human-Robot Interaction . . . 127 A.10. Modality Preference in Multimodal Interaction for Elderly Persons . . . 145 A.11. A Conceptual Model for Human-Robot Collaborative Spatial Navigation . . . . 161
Chapter 1.
General Introduction
In the context of Ambient Assisted Living (AAL), intelligent mobile robots, also known as (semi)automated mobile systems that are capable of navigating human operators through complex spatial environments, are gaining increasing interest in the areas of both academic and industrial research (e.g. see [Lankenau and R¨ofer, 2000], [R¨ofer et al., 2009]). Various so-called intelligent assistants concerned with different behaviours of controlling and navigating mobile robots have been developed and evaluated ([17]), some of them can assist human operators to avoid obstacles by taking control themselves if necessary ([Lankenau and R¨ofer, 2001]), some can go along preassigned routes or to predefined locations fully autonomously ([R¨ofer and Lankenau, 2002]). Since the mobile robots are collaboratively controlled by these intelligent systems and the human operators usually only have a naive theory about the systems, there inevitably arise problems when the human operators and the mobile robots interact with each other.
Motivated by the need for bridging the communication gap between the human operators and the mobile robots, the research work1 reported in this dissertation has been focusing on designing, developing and implementing cognitively adequate, safe and robust, spatially-related interaction between human operators and mobile robots in AAL environments. This chapter will first give a general overview of the research work done by the author, then characterize some of the related research efforts also concerned with the aspects being in the focus of this dissertation, and end the introduction by briefly describing the major contributions of the reported work in the relevant areas.
1.1. General Overview
As illustrated in figure 1.1, by focusing on three different aspects of the conceptInteraction, the essential contribution of this dissertation is divided into three major research packages: the principle package Formal Interaction and two domain-dependent packages Spatial In-teraction and Multimodal Interaction in AAL, each of which is given a brief introduction below.
Formal Interaction has been focusing on developing robust, flexible, context-sensitive yet formally controllable and tractable interaction. This research package consists of two 1This work has been funded by the German Research Foundation (DFG) in context of the
Sonderforschungs-bereich/Transregio 8Spatial Cognition, projects I3-[SharC] and I5-[DiaSpace], as well as the German Re-search Center for Artificial Intelligence (Deutsches Forschungszentrum f¨ur K¨unstliche Intelligenz, DFKI)
6SDWLDO
,QWHUDFWLRQ
,QWHUDFWLRQ
0XOWLPRGDO
,QWHUDFWLRQ
LQ$$/
)RUPDO,QWHUDFWLRQ
Figure 1.1.: The overview of the reported research work.
core aspects: a) a solution process highlighting a formal language based dialogue mod-elling and management; and b) a unified dialogue modmod-elling approach to enable a flexible and context-sensitive yet easily manageable interaction. In this package, both theoretical and practical interaction models and frameworks have been delivered for developing and implementing formal dialogue managers in complex interactive sys-tems. Furthermore, they were also used to support the other two research packages in this dissertation.
Spatial Interaction has been aiming at the area of human-robot collaborative navigation within complex spatial environments. Specifically, this research package has been elab-orating on problems about how to enable human operators to interact with mobile robots to go from one location to another, while assisting in negotiating the possible communication problems that occur frequently during the interaction. A qualitative spatial knowledge based multi-level conceptual model is developed and proposed. Fur-thermore, with a model-based computational framework, the conceptual model has been implemented and integrated into a practical interactive system, which was evalu-ated by empirical studies with respect to a set of high-level and model-based conceptual strategies.
Multimodal Interaction in AAL has been concentrating on effective, efficient and elderly-friendly multimodal interaction in the context of Ambient Assisted Living. This re-search package consists of two parts: a) the design, development and implementation of multimodal interaction for elderly persons while carefully considering age-related characteristics; and b) the general-framework-based empirical evaluation of a minutely developed and systematically improved elderly-friendly multimodal interactive guid-ance system. This focused on the effectiveness, efficiency and user-acceptguid-ance of the entire system as well as the different input modalities, and was supported by a series
1.2. Related Work
of empirical studies with several groups of elderly persons.
1.2. Related Work
The three research packages have addressed work in the areas of interaction management, in-teraction in spatially-related applications and inin-teraction in Ambient Assisted Living. There-fore, this section gives an introduction to other research work in the areas in the focus of the dissertation.
1.2.1. Interaction Management
In the context of natural language processing, interaction management, also known as dia-logue management, manages the controlling process of an interactive system, which accepts input from the interaction partner, decides upon the next system actions according to the maintained interaction context, and outputs the system responses at a concept level. Ac-cording to how the interaction flow is controlled, two classic approaches have been proposed for interaction management: structure-based and the principle-based interaction manage-ment.
Typical examples of structure-based interaction management can be found in the systems presented in [Peckham, 1993, McTear, 1998, Lamel et al., 1999], where the interaction in-volved usually has clearly defined structures and goals, and therefore can be modeled as a finite state transition network to enable a straightforward and effective development of interaction management. However, the finite state transition network based management can only control an inflexible interaction flow.
To overcome these problems, other research has been investigating the principle-based inter-action management. E.g., the interinter-action management presented in [Chu-Carroll, 1999, Sen-eff and Polifroni, 2000, Zue et al., 2000] all shared one principle in common, that the context of the interaction is fixed and can be represented as a set of slots that need to be filled during the interaction, either the departure time of a train for a ticket reservation system, or the goal of a route to be planned, and so on. The interaction is not controlled by a predefined structure, but with a frame-based mechanism, where only if the slots of the frame are filled, specific tasks can be performed. However, instead of dealing with only limited predefined tasks, [Larsson and Traum, 2000, Traum and Larsson, 2003] proposed another principle-based interaction management method: the information state update approach, which manages an interaction flow by defining a set of informational components for func-tional aspects of interaction such as Question under discussion, common ground, etc., and a set of update rules and update strategies for managing the interaction context, such that the interaction is being managed from the perspective of a human. This approach is now widely used in many interactive systems (e.g., [Lemon and Liu, 2006, Varges et al., 2008], etc.) for its ability to deal with flexible and context-sensitive interaction.
Furthermore, the research community of interaction management was also focusing on the development of stochastic dialogue modeling using reinforcement learning (RL), where statis-tical data based dialogue modelling was applied to dynamically allow changes to the dialogue strategy (e.g., [Lecoeuche, 2001, Li et al., 2009, Pietquin et al., 2011]). However, this ap-proach is still not that mature to be applied in developing practical interactive systems due
to the requirement of a sufficiently large number of natural language dialogue corpora for the correspondingly large state space and policy space.
1.2.2. Interaction in Spatially-related Applications
Since the research of interaction in spatially-related applications usually involves many as-pects in human-computer interaction, cognitive science and robotics, much effort has also been invested in different related aspects from different perspectives.
Some research has been concentrating on the most straightforward way of collecting and analyzing empirical corpora concerned with human-robot interaction e.g. by using natu-ral language route instructions in spatial navigation (e.g., [Bugmann et al., 2004, Koulouri and Lauria, 2009, Shi and Tenbrink, 2009]), which also specified the conceptual as well as the spatially-related difficulties for either human operators to provide route instructions, or mobile robots to process route instructions. Meanwhile, according to empirical findings, effort has also been put into studying the relationship between language and the functional properties of spatial environments (e.g., [Hirtle, 2008]), as well as the natural language route directions or instructions used during the interaction (e.g., [Kollar et al., 2010, Pappu and Rudnicky, 2012]); some even tried to build the conceptual mapping between natural language route instructions and mobile robot executable procedures ([Lauria et al., 2002]).
Apart from the research based on empirical data and natural language, considerable focus has also been placed on how to appropriately represent spatial knowledge to support the spatially-related application. For example, in mobile robotics, metrical spatial data has been related with semantic information based on probabilistic models to resolve the object-recognition based spatial localization problems (e.g., [Galindo et al., 2005, Vasudevan et al., 2007]). Meanwhile, in cognitive science, as a classic conceptual model, [Werner et al., 2000] proposed the Route Graph, which provided a simple, abstract, yet powerful formalism to serve as the basis of complex navigational knowledge and support route-based navigation. This model was further improved and adapted with respect to different application aspects, e.g., with ontology based specification in [Krieg-Br¨uckner et al., 2004]. Similar to the princi-ples of the route graph, conceptual models with different levels of information were proposed for various applications, e.g., [Zender et al., 2008, Mart´ınez Mozos, 2010] developed and im-proved a topological information based multi-layered conceptual model corresponding to the spatial and functional properties of typical indoor environments to support a mobile robot’s indoor navigation.
Furthermore, by considering the formal algebraic properties of qualitative spatial knowledge and its important role in spatially-related interaction, much research has also been carried out in qualitative spatial representation and reasoning. Based on the most prominent spatial calculi such as the cardinal direction calculus ([Frank, 1996]), double cross calculus ([Freksa, 1992]), region connection calculi ([Cohn et al., 1997]) and many others, general or domain specific QSR frameworks and models have been developed and proposed to support various spatially-related applications, e.g. [Wallgr¨un et al., 2007] proposed SparQ, a general toolbox for qualitative spatial reasoning in applications; [Bhatt et al., 2011] developed a declarative spatial reasoning framework and demonstrated its applicability for the domain of computer aided architecture design; many applications based on qualitative spatial knowledge have also been involved with human system interaction, such as in [Shi et al., 2006], the double cross calculus based spatial actions have been used as the fundamental unit in processing
1.2. Related Work
natural language route instructions and interpreting them by fuzzy operations on a Voronoi graph to support human-robot collaborative spatial navigation; or in [Schultz et al., 2006] qualitative spatial reasoning has been integrated into query tools that are used by non-expert users in geographic information systems; or more recently, [Bellotto et al., 2013] proposed a qualitative trajectory calculus based approach to abstract and design robot behaviours for spatial interactions with a mobile robot.
1.2.3. Interaction in AAL
The mechanisms of typical interaction, either with single or multiple modalities, are usually only suitable for users with sufficient familiarity with information technology; while the po-tential user group in the Ambient Assisted Living environments consists mostly of elderly persons or persons with physical and mental impairments. Therefore, special focus has been given to research of interaction while taking AAL-centered characteristics into account from different perspectives.
Empirical studies have been conducted to collect objective and subjective data to motivate and support the development and improvement of interaction and interactive systems in AAL environments. For example, [Takahashi et al., 2003] reported a ‘Wizard of OZ’ (WOZ) experiment where elderly persons interacted with a home health care system and provided hints for natural language understanding for elderly persons; or in [M¨oller et al., 2008], dia-logue corpora were obtained from interactions of older and younger users with a smart-home system, and the analysis results confirmed the significant difference of the two groups re-garding either speaking style or vocabulary; or in [Ivanecky et al., 2011] empirical studies were also conducted on the usability of a mobile phone used by elderly or disabled people as the communication medium to control intelligent house environments and provided proof for the feasibility of the interactive system.
Combining empirical results and the AAL-centered characteristics, several efforts have been invested into developing and adapting different modalities to support interaction within AAL environments. For example in [Becker et al., 2009] a voice recognition system was developed within an assisted environment deployed with multiple sensors to build a health care mon-itoring system for elderly persons; or in [Goetze et al., 2010] acoustic user interfaces were developed especially for elderly persons in the context of AAL, and the implementation was demonstrated with a multi-media reminder and calendar system. As another important modality, intuitive gestures in the AAL context were investigated in [Nazemi et al., 2011] for identifying common interaction scenarios in an AAL environment with elderly persons; sim-ple gesture-based interaction was also developed and integrated into a framework featuring three dimensional acceleration sensor information of WiiMote from Nintendo to be used in smart home environments ([Nesselrath et al., 2011]). Furthermore, new interfaces have also been developed to meet the special requirement of severely disabled persons, e.g. in [Mandel et al., 2009], a brain computer interface has been developed and used by disabled persons to steer an automated wheelchair.
Moreover, in order to generally improve the accessibility, flexibility and usability of inter-action in AAL environments, considerable research effort has also been concentrated on developing multimodal interaction. Some focused on multimodal inputs, e.g., [Goetze et al., 2012] proposed a mobile communication and assistance system on a robot platform featuring acoustic, visual and haptic input modalities to be used by elderly persons in home-care
envi-ronments. Some focused on multimodal output, such as in [Boll et al., 2010], a multimodal reminder system using different acoustic, visual and tactile outputs as system responses was developed and used by elderly persons in their residential home. In addition, as the most basic aspect of multimodal interaction, modality fusion is performed in different ways, e.g., in the previous two examples, the multimodal fusion were implemented at the dialogue management level; while some others tried to perform the fusion at the grammar level, by integrating formal grammars and logical calculus as a multimodal language specification (e.g., [D’Ulizia et al., 2007]). Further research effort has also been made based on this principle of fusing multimodal events at the grammar level, e.g. [D’Andrea et al., 2009] proposed a multimodal pervasive framework for Ambient Assisted Living using multimodal grammar specification to support the interpretation of multimodal input, the management of the multimodal interaction and the generation of multimodal output.
1.3. Contributions of this Work
This dissertation investigates the design, development and implementation of formal lan-guage based, spatially-related, multimodal interaction in AAL environments. Ac-cording to the research work addressed in the three research packages introduced above, the major contributions of the reported work are summarized as follows:
Formal Interaction. In general, this research package has been focusing on the modelling and management of interaction, in both theory and practice.
The first contribution of this package is the solution process centering on a formal guage based dialogue modelling approach as well as the development of a formal lan-guage based computational framework for dialogue modelling and management called FormDia, the Formal Language Based Development Toolkit (see Section 2.1). Inter-action processes, with either single or multiple modalities, can be specified using the formal language CSP ([Hoare, 1978, Roscoe et al., 1997]) as an abstract interaction model; then the CSP specification can be validated with the model checker FDR2 ([Roscoe, 1994, (Europe), 2010]) and verified with the simulator provided by the For-mDia framework; and finally, the validated and verified model can be integrated into a practical interactive system to support formally tractable and extensible interaction management.
As the second contribution, by considering the limitations of conventional finite-state transition based dialogue modelling approach and the classic agent-based theory, i.e., the information state update based method, a unified dialogue modelling approach is developed (see Section 2.2). This approach benefits from the advantages of both classic models and can support an easily tractable, flexible and context-sensitive in-teraction in any complex interactive systems. With the FormDia framework, several unified dialogue models were accordingly developed, implemented and integrated into the interactive systems covered in the other two research packages of this thesis. Fur-thermore, the unified dialogue model implemented into a multimodal interactive system was especially evaluated through an empirical study. The effectiveness of the unified dialogue model was highlighted by the positive empirical results based on a standard statistical method.
1.3. Contributions of this Work
Spatial Interaction. In general, this research package has been treating the following as-pects in Human-Robot Interaction and Cognitive Science in depth: the management and formalization of, and reasoning with, spatially-related knowledge.
The most important contribution of this package is the development of a general four-level conceptual model based on qualitative spatial representation and reasoning (see Section 3.1). This model is called QSBM, the Qualitative Spatial Beliefs Model. It is used to support effective, efficient and user-friendly interaction between human op-erators and mobile robots while performing spatial navigation tasks. Specifically, the conceptual model can be used by mobile robot systems to represent spatial environ-ments based on qualitative spatial knowledge; with the model based qualitative spatial reasoning, application-dependent low-level update rules can be implemented to man-age the state of the situated environment; based on the low-level update rules, model based conceptual strategies can be developed for high-level spatially-related human-robot interaction; and finally, high development flexibility and extensibility are also ensured by the multi-level structure to support broader application possibilities. As the next contribution, a DCC-based QSBM was developed by combining the con-ventional route graph ([Werner et al., 2000]) and the double cross calculus ([Freksa, 1992]) (see Section 3.2). As a result, this model benefits from the topological structure of a route graph for global navigation and the qualitative spatial DCC relations for intuitive communication with human operators. Accordingly, a set of low-level update rules were defined based on qualitative spatial representation and reasoning of DCC. These rules can refer to the most atomic route instructions one can use to instruct a mobile robot. Furthermore, with respect to the principle of general QSBM, a set of high-level conceptual strategies was developed and applied to manage the low-level update rules. Finally, these strategies are used to generate clarification dialogues for resolving different frequently occurring conceptual mode confusions caused by the dis-parity between the human’s mental and the robots’ internal representations of spatial environments.
To support the implementation of the QSBM, the low-level update rules as well as the high-level conceptual strategies, a computational framework called SimSpace was developed (see Section 3.3). On the one hand, SimSpace can be used as a stand-alone system for implementing, visualizing, simulating and testing QSBM-based instances of spatial environments and the QSBM-based functions; on the other hand, SimSpace is also well-encapsulated as a domain-dependent model-component, i.e., it can be di-rectly integrated into an interactive system and used by a mobile robot to support spatially-related interaction with human operators.
Last but not least, empirical studies were conducted to evaluate an interactive sys-tem that implemented the QSBM-based models and functions (see Section 3.4). The evaluation was conducted especially for comparing the implemented set of high-level conceptual strategies. The positive results regarding effectiveness, efficiency and user satisfaction about the interactive system confirmed the important contributions of the QSBM-based model, the computational framework SimSpace and the conceptual strategies.
Multimodal Interaction in AAL. In general, this research package has been dealing with the following aspects in Multimodal Interaction and Ambient Assisted Living with
considerable effort: the development, and evaluation of, multimodal and AAL-centered interaction.
The first contribution of this package is the general support of design, development and implementation of multimodal interaction for elderly persons in AAL environments (see Section 4.1). As the general foundation of this contribution, two important aspects were highlighted: a) a list of elaborated design and development guidelines based on the consideration of the traditional design principles of conventional multimodal inter-active systems, and the most common age-related decline of sensory, perceptual, motor and cognitive abilities of elderly persons; and b) a formal language supported unified dialogue model that combines a recursive transition network based generalized dia-logue model and a classic agent based management method, which is used to support flexible and context-sensitive, yet formally tractable and extensible multimodal inter-action for elderly persons. According to the two development foundations, MIGSEP, the Multimodal Interactive Guidance System for Elderly Persons was developed and implemented (see Section 4.2). The MIGSEP system runs on a portable touch-screen tablet PC and serves as the interactive assistant; it is intended to be used by an elderly or handicapped person seated in an electronic wheelchair that can navigate its user within complex spatial environments autonomously.
As the second contribution (see Section 4.3), via the cooperation with the department of medical psychology, medical sociology and neurology at the university medical center G¨ottingen, a series of empirical studies was conducted with groups of elderly persons. These studies systematically evaluated the minutely developed multimodal interactive system with respect to the touch-screen, the spoken language and the combination of both as input modalities, while enabling a continuously improved development process with respect to the subjective and objective data of each empirical study. Furthermore, a general model based evaluation framework was accordingly developed and proposed to analyze and compare the empirical multimodal data. The overall positive results showed high effectiveness of task performance, high efficiency of interaction and good user satisfaction with the interactive system. These findings also provided proof of the systematically developed and empirically improved design and development guidelines, foundations, interaction models and frameworks for supporting effective, efficiently and elderly-friendly multimodal interaction in AAL environments.
Chapter 2.
Formal Interaction
As the principle aspect of the dissertation, this research package investigated the design and development of general models and frameworks to support robust, formally tractable and manageable, flexible and context-sensitive interaction. On the one hand, these models and frameworks can be applied to any possible application domain involved with interaction or interactive systems; on the other hand, they can also be used to support the development of the interaction management component for the other two research packages in this dis-sertation. In this package, the major work effort has been concentrated on the two essential research aspects: a) a complete solution process featuring a formal language based dialogue modelling and management framework to enable the development of a formally tractable and manageable interaction modelling and management; and b) the development of a unified di-alogue modelling approach that combined the generalized didi-alogue modelling and the classic agent-based information state update management theory to support an easily-tractable, flexible and context-sensitive interaction.
This chapter briefly introduces the contributed work as follows: the solution process with the formal language based dialogue modelling and management is presented in section 2.1, the development and implementation of the unified dialogue modelling and management ap-proach is introduced in section 2.2, then the corresponding publications contributing to this research package are summarized in section 2.3, and finally the possible future work related to this package is given in section 2.4.
2.1. The Formal Language based Dialogue Modelling and
Management
Correctness and robustness are two of the most important properties of interaction or in-teractive systems. However, to test whether an inin-teractive system is correct or robust is usually a cumbersome and costly process. As introduced in subsection 1.2.1, interaction models can be represented as finite state transition networks. Meanwhile, formal languages can be used to specify finite state transition networks, and the formal language specification can be analyzed, tested and validated by theorem provers and model checkers (cf. [Roscoe et al., 1997, (Europe), 2010]). Therefore, an interaction modelling and management solution process featuring a formal language based development framework is developed and used to support the design, development and implementation of a formally tractable and manageable interaction and its integration into interactive systems.
)RUP'LD
,WKLQNWKHOLJKWLQWKHNLWFKHQ LVVWLOORQ 7KHOLJKWLQWKHNLWFKHQ, VHH <HVLWLVVWLOORQ 1RWKHOLJKWLVQRWRQ VLQFH,WXUQHGLWRII «« $VVHUW$% $DVVHUWÆ %DFFHSWÆ$VVHUW$% >@%DJUHHÆ$VVHUW$% >@%UHMHFWÆ$VVHUW%$ )'5 $DVVHUW %UHMHFW %DFFHSW %DJUHH $VVHUW%$Figure 2.1.: The formal language based solution process.
2.1.1. The Formal Language based Interaction and Modelling Solution
Process
Figure 2.1 illustrates the formal language based interaction modelling and management so-lution process, which consists of the following four important steps:
1) Semantic Interaction Modelling based on empirical data, interaction models can be constructed and illustrated as finite-state transition networks with straightforward in-teraction structures, which also ease the development process of semantic inin-teraction models in a preliminary manner. The semantic models can be quite corpus dependent and contain details about the interaction context within the given corpus, or built at the illocutionary level without references to any direct surface indicators (cf. e.g. [Sitter and Stein, 1992]).
2) Formal Specification To bridge the gap between semantic models and machine readable codes, the formal language Communicating Sequential Process (abbr. CSP cf. [Hoare, 1978]) is used to specify the semantic model based finite state transition networks with abstract, yet highly readable and easily maintainable logic formalization (see the sample CSP specification of the simple semantic model in the step 2) of figure 2.1).
3) Testing and Validation CSP specifications can be loaded into the model checker FDR ([Roscoe, 1994]) for validating the functional concurrent properties, enabling the fur-ther development and improvement of the CSP specifications, and fur-therefore increasing the tractability of the semantic interaction models.
2.1. The Formal Language based Dialogue Modelling and Management
)RUP'LD
&63 6SHFLILFDWLRQ 9DOLGDWRU)'5 'LDORJXH 0DQDJHPHQW 'ULYHU *HQHUDWRU )LQLWHVWDWH DXWRPDWD 6LPXODWRU &KDQQHOV &KDQ &KDQ «Figure 2.2.: The architecture of the FormDia framework.
4) Integration Finally, the CSP specifications can be imported into the formal dialogue development framework (abbr. FormDia) to support a further verification, simulation, development and implementation process as well as a direct integration into interactive systems such as the shown speech-enabled home device (e.g. [Heise.de, 2009]), a simple interaction assistant for an ambient assisted living environment (e.g. [Krieg-Br¨uckner, 2013]), or to enable multimodal interaction in vehicles. The concrete details about FormDia is introduced in the next subsection.
2.1.2. The Formal Dialogue Development Framework
Based on the previous research work on the development and implementation of formal language based dialogue models (cf. [Shi et al., 2005, Shi and Bateman, 2005]), the for-mal dialogue development framework (abbr. FormDia) was further developed. Figure 2.2 illustrates the improved architecture of the FormDia framework. The current FormDia com-prises six functional resources/components according to the development process of a formal language based dialogue model in a practical perspective, which includes its development, implementation and integration as an interaction management component into a practical interactive system. Specifically:
1. CSP Specification As introduced e.g., in figure 2.1, every dialogue model can be illus-trated as a finite state transition network and accordingly, the structure of the finite state transition network can be specified as a CSP specification, i.e., a machine readable CSP program.
2. Validator the CSP specification can be validated by the model checking toolkit called Failures-Divergence Refinement (abbr. FDR cf. [(Europe), 2010]). This toolkit can be used to validate the functional properties of any CSP specification.
3. Generator according to the validated CSP specification, machine readable finite state automata can be generated by the Generator.
4. Channels based on the finite state automata, communication channels regarding all the generated finite states can be defined with domain specific information and handling mechanisms. These channels are only black boxes at the beginning, which will then be implemented with deterministic behaviour of concrete components with respect to their application contexts.
5. Simulator uses the generated finite state automata to simulate dialogue scenarios via an external graphical interface (uDrawGraph, cf. [BKB, 2005]), which can visualize the dialogue model as a finite state transition network based directed graph. With the corresponding communication channels, either black boxes or implemented ones, a set of utility functions are also provided by the Simulator to generate dialogue events and trigger the state transition for the advanced verification of the dialogue model within simulated dialogue scenarios.
6. Dialogue Management Driver after the validation and verification, the dialogue model based finite state automata and the communication channels are integrated into the dialogue management driver, which can then be directly used by a practical interactive dialogue system as the interaction management component.
The FormDia framework can be used as a general interaction modelling framework to develop and implement a formal language based dialogue model to enable formally tractable and extensible interaction. Furthermore, the framework can also be used to support the unified dialogue modelling and management approach, by implementing the Dialogue Man-agement Driver and the Communication Channels with information state update based com-ponents (see section 2.2).
2.2. the Unified Dialogue Modelling and Management
Approach
As a typical finite state transition network based approach, generalized dialogue models (cf. [Sitter and Stein, 1992]) were developed by structuring dialogues at the illocutionary level (cf. [Alston, 2000]) to enable surface-independent dialogue modelling. However, this mod-elling approach is criticized for lacking flexibility of handling dynamic information exchange. Meanwhile, the information state update based dialogue management theory was proposed by [Traum and Larsson, 2003] and provides a powerful mechanism to deal with dynamic information and therefore achieves a context sensitive dialogue management. Nevertheless, such models are usually very difficult to manage and extend ([Ross et al., 2005]). Thus, a unified dialogue modelling approach was developed by combining the generalized dialogue models with the information state updated based theories.
Figure 2.3 illustrates how a unified dialogue model is developed based on a generalized dialogue model with information state update rules. Specifically:
• Figure 2.3 a) shows a simple generalized dialogue model as a finite state based recursive transition network (abbr. RTN). It describes the dialogue situations where an agent A is making an assertion at the beginning, followed by the agent B’s reaction with one of the three possible actions: accepting, agreeing on or rejecting A’s assertion; if B
2.2. the Unified Dialogue Modelling and Management Approach $DVVHUW %UHMHFW %DFFHSW %DJUHH $VVHUW%$ $VN%$ D ^` $DVVHUW ^FKHFN$VVHUWD IDOVH` %UHMHFW ^FKHFN$VVHUWD DGGHG` %DFFHSW ^FKHFN$VVHUWD WUXH` %DJUHH $VVHUW%$ $VN%$ E ^FKHFN$VVHUWD IDOVH` %UHMHFW >5(-(&7@ ^FKHFN$VVHUWD DGGHG` %$FFHSW >$&&(37@ ^FKHFN$VVHUWD WUXH` %$JUHH >$*5((@ $VVHUW%$ $VN%$ F ^` $$VVHUW >$66(57@
Figure 2.3.: The development process of a unified dialogue model.
rejects A’s assertion, then B makes a follow-up assertion to A and triggers the recursive transition.
• The generalized dialogue model in figure 2.3 a) is a none deterministic model, i.e., no mechanism is defined about how B reacts to A’s assertion. However, in order to build a feasible interaction model, deterministic behaviour should be assured for the interaction flow. Thus, conditional transitions are introduced to modify the original dialogue model into the one in figure 2.3 b), where checkAssert is a function to check whether an assertion holds within the knowledge base of B: if the assertion holds, B agrees with it; otherwise, B rejects it and initiates further discussion with a follow-up assertion; or if the assertion is not known by B, then B accepts it. As a result, the original dialogue model was modified as a conditional RTN with conditional transitions that can only be triggered if the relevant condition is fulfilled with respect to the concerned checking-function.
• Although the conditional RTN based generalized dialogue model specifies a determin-istic illocutionary structure, it does not provide the mechanism to integrate discourse information, such as the assertion during the interaction. Thus, information state up-date based theory was accordingly applied by a) ignoring the typical element in the original information state update theory: i.e. the AGENDA for containing the next planned dialogue moves, since such information is already captured by the structure of the generalized dialogue model; b) complementing the illocutionary structure with information state based update rules, which are associated with the information state of discourse context and can update the information state respectively if necessary. As a result, a unified dialogue model is constructed as shown in figure 2.3 c), where four update rules: ASSERT, ACCEPT, AGREE, REJECT are added and used to access to the information state regarding context while performing updates accordingly. E.g. the update rule ACCEPT is used to add a new assertion into the knowledge base of B and considers this assertion as known by B from then on; or the update rule AGREE is used to insert the acknowledgement of the assertion into the topic under discussion. In general, a unified dialogue model is developed as a recursive transition network with the following three essential features: a) it is built at the illocutionary level of interaction processes as a generalized dialogue model; b) its state transitions can only be triggered by fulfilled conditions concerning the information state; and c) a set of information state based update rules are defined and accordingly invoked during state transitions to update the in-formation state if necessary. Therefore, a unified dialogue model benefits from both the
generalized dialogue model for a easily-tractable and manageable dialogue management and the information state update based theory for a flexible and context-sensitive interaction control.
With the introduced formal language based solution process and the FormDia framework in section 2.1, unified dialogue models can be developed and implemented with corresponding CSP specifications and domadependent channel drivers, and integrated into practical in-teractive systems for supporting unified dialogue model based interaction management. E.g., a unified dialogue model was used to incorporate spatial knowledge as information state con-text to support a spatially related interaction for human-robot collaborative navigation (see section 3.4); or another unified dialogue model was implemented into a multimodal inter-active guidance system for elderly persons (see section 4.1.2) and evaluated with a series of empirical studies. Furthermore, an evaluation is conducted especially on the effectiveness of a unified dialogue model with a standard statistical method from the perspective of dialogue model level (see section 2.3);
2.3. Contribution of the Corresponding Publications
Major effort has been put into a general interaction modelling and management solution process with a formal language based dialogue modelling and management framework, as well as the development, implementation and empirical evaluation of a unified dialogue modelling approach to support a robust, flexible and context-sensitive, yet formally manageable and extensible interaction. Specifically,
• based on the previous work on using a formal method to support dialogue management ([Shi et al., 2005, Shi and Bateman, 2005]), a formal language based development toolkit for dialogue modelling was developed and improved, which enables an intuitive design of interaction models with a formal language, easy validation and verification for the formal language specified interaction models, and a straightforward integration into practical interactive systems ([7]).
• By combining the conventional recursive transition network based modelling and agent-based dialogue theory, a unified dialogue modelling and management approach was proposed. Then as a practical example, a unified dialogue model was implemented and integrated with the formal language based toolkit into a practical interactive system as the interaction model to support multimodal interaction ([7, 4, 5, 2]).
• using a standard statistical method, the kappa coefficient, the task success of the implemented unified dialogue model was evaluated through an empirical study in [11]. The positive results showed that the unified dialogue model is highly effective.
2.4. Possible Future Work
The reported work aimed to provide general methods, approaches and frameworks to support the development process of interaction management. Relating to the conducted research, further work effort can be concentrated upon the following aspects:
2.4. Possible Future Work
• Currently, the first step of the formal language based solution process, i.e., the semantic modelling of interaction, is usually performed in a hand-crafted way, where developers construct the semantic models based on the subjective evaluation of empirical data. In this situation, not only difficulties can arise with larger empirical corpora, but unforseen modelling errors are also likely to occur due to individual subjectiveness. Therefore, machine learning techniques can be applied to learning the semantic models out of empirical data, e.g. with the semantically annotated empirical corpora based on the work of [Shi et al., 2010].
• Although the unified dialogue model can support a flexible and context-sensitive inter-action management with the integrated information state update theory, interinter-action with adaptive behaviours is gaining increasing interest in the recent years. Much research has been focusing on using reinforcement learning to optimize interaction be-haviour either with collected empirical data or during the interaction with real users (cf. [Lecoeuche, 2001, Li et al., 2009, Pietquin et al., 2011]). Based on this research work, different reinforcement learning methods can also be implemented into the di-alogue management driver of FormDia to support not only formally tractable and manageable, but also context-tailored adaptive interaction.
Chapter 3.
Spatial Interaction
This research package has focused essentially on the following two important issues in the scenarios of human-robot collaborative spatial navigation:
• To control the route, along which a mobile robot should go, human operators usually use natural language instructions that contain only qualitative spatial relations and conceptual landmarks (see e.g., [Werner et al., 1997, Hirtle, 2008]); while the mobile robot uses quantitative representation as the internal model about the spatial envi-ronment and it can usually accept route instructions consisting of only quantitative data;
• It is a rather complex process for human operators to provide a sequence of instructions to a mobile robot for route planning, since spatially-related communication problems could easily occur if a route direction is mistakenly given or spatial objects are incor-rectly localized ([Reason, 1990, Bugmann et al., 2004, Marge and Rudnicky, 2010]). Therefore, in order to a) bridge the communication gap between human operators and mobile robots, which is caused by the qualitative and quantitative disparity of their representation about space, and b) to support the collaborative negotiation of spatially-related commu-nication problems in the sequence of route instructions, a qualitative spatial knowledge based four-level conceptual model: the Qualitative Spatial Beliefs Model (abbr. QSBM) was developed and used as the foundation of this research package for supporting intuitive human-robot spatially-related interaction.
This chapter briefly presents the major focus of this research package as follows: the general Qualitative Spatial Beliefs Model is introduced in section 3.1, followed by a qualitative spa-tial calculus dependent instance of QSBM in section 3.2, then a computational framework that implements the QSBM model and the model based functions is presented in section 3.3; empirical studies were conducted regarding the resolving of the spatially-related com-munication problems using QSBM and reported in section 3.4; finally, the contribution of the corresponding publications is summarized in section 3.5 and the possible future work is given in section 3.6.
3.1. QSBM: A general Qualitative Spatial Beliefs Model
From the perspective of human operators, spatial environments are not represented with quantitative data as a mobile robot does, but with conceptual objects or places and their
7RS6WUDWHJ\/HYHO 0LGGOH$SSOLFDWLRQ/HYHO %RWWRP&RQFHSWXDO/HYHO %DVLF4650RGHOOHYHO 'RXEOH&URVV &DOFXOXV &RQFHSWXDO0DS 4XDOLI\ &DOFXODWH5HODWLRQ 5HRULHQWDWLRQ 5HGLUHFWLRQ 3DVV%\ *R8QWLO 'LUHFWHG0RWLRQ 7KURXJK0RWLRQ )HDWXUHEDVHG 0RWLRQ /HDUQLQJEDVHG 0RWLRQ 6LPSOH
5HDVRQLQJ ZLWK%DFNWUDFNLQJ'HHS5HDVRQLQJ 4659DOXH7XSOHVEDVHG6HDUFKLQJ
&DUGLQDO
'LUHFWLRQV ,QWHUVHFWLRQ
«
Figure 3.1.: The General Qualitative Spatial Beliefs Model.
qualitative spatial relations. For human operators to communicate with mobile robots for spatial navigation tasks, an intermediate knowledge representation is needed and accordingly, Qualitative Spatial Beliefs Model (QSBM), a qualitative spatial knowledge based four-level conceptual model, was developed to model a mobile robot’s beliefs to support human-robot collaborative navigation. The general QSBM is illustrated in Figure 3.1 and introduced as follows:
• The basic level QSR Model levelrefers to the most basic theoretical foundation of a QSBM: the qualitative spatial calculi to meet the requirement of different application scenarios, such as Double-Cross Calculus ([Freksa, 1992]), Cardinal Directions ([Frank, 1996]), 9+ Intersection ([Kurata, 2008]), etc.
• The bottom level Conceptual Level holds the fundamental conceptual model of the QSBM: a conceptual map with only conceptual objects and qualitative spatial relations regarding a chosen qualitative spatial calculus. It facilitates the most basic calculating and reasoning operations related to the connection between the chosen calculus and the spatial environment. It is used as a black box holding a conceptual qualitative spatial knowledge based representation of a spatial environment. It provides two ba-sic functions: Qualify for qualifying quantitative data into calculus-based qualitative relations, and CalculateRelation for calculating qualitative spatial relations between objects using calculus-based qualitative spatial reasoning.
• The middle level Application Levelconsists of a set of application-dependent update rules corresponding to the most atomic route instructions one can use to instruct a mobile robot in human-robot collaborative spatial navigation. For instance, the update rule Reorientation refers to the instruction “turn left”, Redirection interprets “take the next junction on the right”,Feature-based Motion concerns instructions with features of objects or landmarks, such as “go around the big room” ([12]), and Learning-based Motion represents those instructions enabling the robot to update its conceptual
3.2. A DCC-based QSBM
knowledge by acquiring new landmarks from given instructions, such as “the third office is the directory’s office, pass by it”, etc. According to the formal definition, each update rule is used to update the conceptual map on the conceptual level, based on a chosen calculus and the related qualitative spatial reasoning on the QSR Model level. • The top levelStrategy Level includes a set of high-level conceptual strategies for in-terpreting a sequence of route instructions and if possible, providing qualitative spatial knowledge based information to resolve the spatially-related communication problems during the human-robot collaborative spatial navigation. In practice, each conceptual strategy defines its own mechanism for appropriately choosing and applying atomic update rules on the application level to update the conceptual map on the conceptual level.
In general, QSBM a) provides a conceptual model with qualitative spatial knowledge to rep-resent spatial environments; b) applies qualitative spatial reasoning to support application-dependent low-level update rules to update the conceptual representation; and c) offers high-level conceptual strategies to manage the atomic update rules to support high-level spatially-related human-robot collaborative navigation. Benefiting from the flexibility and extensibility of the multi-level structure, different qualitative spatial calculi can be used on the QSR model level to support various application scenarios, application-dependent atomic actions can be easily defined and extended on the application level, or different high-level conceptual strategies can also be developed with respect to different ways of applying update rules for resolving communication problems, while each of these changes/extensions requires only limited adaptation on the other levels in QSBM.
3.2. A DCC-based QSBM
According to the requirement of the focused scenario of this research package, double-cross calculus (DCC) is used on the basic QSR model level. Accordingly, a DCC-based Qualitative Spatial Beliefs Model is developed and a brief introduction to the other levels of the DCC-based QSBM is given as follows:
3.2.1. The Conceptual Level
On the one hand, as a common knowledge base of a mobile robot involved in spatial naviga-tion, the Route Graph, was proposed in [Werner et al., 2000], which models the conceptual topological knowledge on the cognitive level in navigation space from human’s perspective. Route graphs can be used as metrical maps in global navigation for mobile robots and ease the interaction with human operators to a certain extent. However, lacking qualitative spa-tial relations between objects, conventional route graphs are not suitable for supporting natural language based human-robot collaborative navigation. On the other hand, Double-Cross calculus (DCC) was proposed in [Freksa, 1992] for qualitative spatial representation and reasoning using the conceptual orientation grids, where a directed segment divides the 2-dimensional space into disjoint grids and can define 15 meaningful qualitative spatial rela-tions. DCC model can therefore describe the relative relations between objects in the local navigation map from an egocentric perspective. However, the conventional DCC model does
not consider the topological relations within global navigation maps.
To benefit from the two well-accepted conceptual models, the conceptual route graph (abbr. CRG) was developed by combining the topological structure of conventional route graph and the conceptual orientation grids of Double-Cross Calculus. Instead of quantitative information, DCC qualitative spatial relations are used to describe all the relative relations between route graph nodes and route graph segments. Formally, a CRG is defined as a tuple of four elements, (M, P, V, R), where
• M is a set of conceptual landmarks in a spatial environment, each of which is located at a place in P.
• P is a set of topological places on the conceptual level of a spatial environment. • V is a set of vectors from a source place to a target place, both of which belong toP
• R is a set of DCC based qualitative spatial relation-pairs, describing the qualitative spatial relations between each place and related vectors that define the orientation grids of DCC.
As a simple example, a CRG can be represented as the following specification:
crg = ( M = {kitchen : p1, printer: p2}, P = {p1, p2, A, B},
V = {AB, BA},
R = {<AB, RightFront, p1>, <AB, LeftBack, p2>} )
This specification indicates that, this is a spatial environment containing two landmarks: kitchen, located at p1, and printer, located at p2, and two vectors AB and BA, with the relation-pairs showing that, kitchen is at the right-front of AB and printer is at the left-back of AB.
The model of Conceptual Route Graph provides a semantic framework for supporting human-robot collaborative navigation with the intuitive interpretation of human route instructions as well as the straightforward presentation of internal feedback from a mobile robot with the DCC-based qualitative spatial representation and reasoning, meanwhile it can also be used as a direct interface with the low-level mobile robot system for performing navigation tasks via the topological structure of the conventional route graph.
With the conceptual route graph as the conceptual map on the conceptual level of the DCC-based QSBM, the state of a DCC-DCC-based QSBM can be specified as a tuple of two elements: (crg, pos), where “crg” represents the conceptual route graph, and “pos” represents a vector of the current position and orientation of a mobile robot in the given conceptual route graph.
3.2.2. The Application Level
In order to support human-robot spatially-related interaction, natural language based route instructions from a human operator should be interpreted to update the state of a mobile
3.2. A DCC-based QSBM
robot’s QSBM instance, so that possible feedback regarding the interpretation can be trans-ferred back to the human operator. According to the empirical studies on human-robot collaborative navigation (cf. [Bugmann et al., 2004, Roger et al., 2007, Shi and Tenbrink, 2009]) and the previous research effort related to natural language, cognitive models and route instructions (cf. [Denis, 1997, Tversky and Lee, 1998, Lauria et al., 2002]), a set of update rules regarding the most atomic route instructions one can use to instruct a mobile robot were developed.
Formally, each update rule is defined with the following three elements:
• RULE refers to the name that identifies an atomic type of route instructions. • PRE is a set of preconditions, under which this update rule can be applied.
• EFF describes how the state of the QSBM is updated after applying this update rule. For brevity two update rules are presented as examples as follows, while the other update rules can be found in the contributing publications in section 3.5.
Reorientation refers to the simplest route instructions, which is used to change the orien-tation of a robot regarding its current position. “Turn left”, “Turn right” and “Turn around” are the typical expressions of such instructions. The precondition for Reori-entation is whether the robot can find a CRG place in the current state of QSBM with the following two conditions: 1. it is connected with the current position, and 2. it has the desired spatial relation with the current position; the effect is that the robot faces that found CRG place after the reorientation. Formally it is described as:
RULE: Reorientation PRE: pos = ab,
∃ ac ∈ V . <ab, dir, c> EFF: pos = ac
Concretely, this rule indicates that the robot is currently at the place a and facing the place b (ab is a CRG vector), if there exists a CRG vector ac with a targeting place c, such that the spatial relation of c with respect to the vector ab (i.e. the current position) is the desired direction dir to turn, i.e.,<ab, dir, c>, then the current position will be updated as ac after applying this update rule.
Passing Motion relates to the route instructions containing an external landmark to be passed by, e.g. “pass the kitchen” or “pass the printer on the right” with directional information. For these route instructions, the robot should first identify the landmark and then check whether the landmark can be passed by along the current directed path. Furthermore, the desired passing direction should be considered as well, if the direction for passing the landmark is given. Accordingly the update rule PassLeft for passing a landmark on the left is specified as:
RULE: PassingLeft PRE: pos = ab,
∃ cd ∈ V . (landmark : l)
∧ <ab, LeftFront, l> ∧ <cd, LeftBack, l>
∧ <ab, Front, c> ∧ <ab, Front, d> EFF: pos = cd
This rule tries to find if there is the desired landmark and a vector cd, such that the landmark is located at the place l, which is on the left front of the robot regarding the current position ab, and left behind the robot with the updated route segment cd after executing the update rule.
3.2.3. The Strategy Level
With the update rules defined on the application level, single route instructions can be interpreted. However, in human robot collaborative navigation, instead of giving one single instruction, human operators usually give a sequence of route instructions to the mobile robot. In this case, before human operators can organize the appropriate terms for giving the instructions, they first need to correctly locate the robot’s current position and the desired goal location, and then take the imagined journey in mind to go along the expected route while encountering possible mental rotation during the travelling. In this complicated process, a wrongly located place or turning can happen quite often ([Shi and Krieg-Br¨uckner, 2008, Shi et al., 2006]). These errors can cause the failure of the interpretation of the following route instructions and consequently lead to so-called conceptual mode confusion situations, where the mobile robot goes along an undesired path or even simply cannot execute the desired instructions. In order to cope with these problems, a set of high-level conceptual strategies was developed on the strategy level, which can choose and apply the low-level update rules on the application level according to different principles for resolving conceptual mode confusion.
Deep Reasoning can deal with one of the most typical conceptual mode confusions called spatial relation or orientation mismatches. This type of conceptual mode confusion occurs, if a spatial object is incorrectly located in the operator’s mental representation, such as “pass the kitchen on the left”, where the kitchen is currently located on the right; or “take the second junction on the left”, where the second junction is only leading to the right.
In this situation, the strategy of deep reasoning finds the suitable low-level update rules, then checks the preconditions of chosen update rules with the currently observed state of the QSBM using qualitative spatial reasoning. If an unsatisfiable precondition is identified by an update rule, this situation can be presented back to the human operator appropriately, or furthermore, if possible, by checking the update rule corresponding to the route instruction leading to the unsatisfiable situation, a corrected spatial relation can be inferred to build a possible suggestion.
Therefore, this strategy tries to resolve the problematic situation by either giving a reason regarding the current spatial situation to support the human operator to reorganize the route instructions, such as “you cannot pass the kitchen on the left, because it is now behind you”, or it can make a suitable suggestion if one exists, such as “you cannot take a right turn here, but maybe you mean to take a left turn?”
3.2. A DCC-based QSBM SULQWHU JRVWUDLJKW JROHIW WXUQULJKW JRXQWLOWKHSULQWHU RQWKHULJKW
Figure 3.2.: A sequence of route instructions with a wrong instruction in the middle.
Deep Reasoning with Backtracking can not only handle the situations covered by deep reasoning, but also cope with situations where the failure of the interpretation of an instruction is caused by a previously wrong instruction, e.g. see the situation in figure 3.2. The robot is located at the thick red arrow and the instructions are: “go straight ahead, then go left, and then turn right, and go until the printer on the right.” The check fails on interpreting the fourth instruction “go until printer on the right”, because there is no kitchen ahead after taking a right turn as the previous instruction. However, by taking one step backwards, if the third instruction is changed from turning right to left, then the last instruction can also be interpreted accordingly.
Thus, this strategy interprets the route instructions as the deep reasoning does by checking every precondition of the chosen update rules. Yet after applying each update rule, the state of the updated QSBM is also saved in an interpretation history. Once one instruction cannot be interpreted, the previous state of the QSBM can be reloaded to replace the current state, then a possible suggestion can be made based on the previous instruction if possible, such as “turn left” instead of “turn right” in the example in figure 3.2. As a result, the interpretation of the remaining route instructions can be resumed based on the suggested route instruction, and if possible, instead of giving a reason or a suggestion regarding a certain problematic instruction, the deep reasoning with backtracking can manage to locate the previously wrong instruction and find a successful interpretation of the entire sequence of route instructions if such exists.
QSR-Value Tuples based Searching was developed to cope with a new type of conceptual mode confusion regarding wrongly located starting or turning position (called “concep-tual turning location mismatch”). Figure 3.3a illustrates an example of this concep“concep-tual mode confusion, where the robot is located at the thick red arrow and the instructions are “go straight, then left, then go until the printer on the right”. From the perspective of a human operator, the printer is located directly on the right side after taking a left turn, and therefore the operator simply ignores a turning point which is not in his or her mental representation. However, after taking a right turn, the last instruction “go until the printer on the right” cannot be interpreted, because there is no continuing possibility in the current state of the QSBM.
These problems cannot be solved by the other strategies, because they can only pro-vide suggestions if there exists a wrong route instruction, while in this situation one
SULQWHU VWUDLJKW 5HRULHQWDWLRQ OHIW *R8QWLO SULQWHU
JRVWUDLJKW JROHIW JRXQWLOWKHSULQWHU RQWKHULJKW ULY ULY ULY UDLYD UELYE UDLYD UELYE D E
Figure 3.3.: a) An example of conceptual turning location mismatch; b) An abstract view of the QSR-Value Tuples based Searching.
route instruction (e.g. “turn right” as a third instruction) is missing. Thus, the strat-egy “QSR-Value Tuples based Searching” defines a QSR-weighted value tuple for each outgoing direction of each turning node in a conceptual route graph as:
(ROUTE, INSTRUCTIONS, QSR-V)
Here ROUTE represents the currently chosen route, INSTRUCTIONS includes all the interpreted instructions along this route, and QSR-V is the cumulative value cal-culated by
QSR-V = ii=0MRi ∗SRi
where MRi is the matching rate by comparing the taken qualitative spatial
direc-tion with the current route direcdirec-tion while interpreting the i-th route instrucdirec-tion, and SRi is the success rate of interpreting the route instruction at that point.
With the definition of the QSR-weighted value tuple, finding an appropriate inter-pretation (namely a route) to correspond to a sequence of natural language route instructions is illustrated in in figure 3.3b as:
• An empty set of QSR-weighted value tuples was initialized at the current robot position (the black point in the middle of the network in figure 3.3b).
• This value-tuple-set is automatically updated by the QSBM manager in e.g. the 3 directions with (r1, i1, v1),(r2, i1, v2),(r3, i1, v3) in figure 3.3b, where (rx, ii, vx)
indicates the tuple of the covered route rx, the interpreted instructionsii and the
currently calculated QSR-weighted value vx).
• Searching agents of the QSBM manager are then travelling along all paths ac-cording to the branching of the current point on the current QSBM. The