Robots that are to assist humans in their real-world environments must be equipped with a variety of capabilities, ranging from perceiving the environment, detecting and recognizing humans, navigation in dynamic environments, to localizing and manipulating objects. These activities are initiated, and can be supported by, interaction with the human, where human and robot exchange information about their common environment and collaborate on joint tasks in a mixed-initiative interaction that goes beyond command-control style. From the human’s point of view, the dialog system provides a gateway to the robot’s capabilities. It takes the human’s commands and forwards it to the components that carry it out, keeps the human informed about the robot’s current state, and reports about execu- tion success while – in more complex settings – the human can at any time stop or revise the robot’s actions. On the other hand, it enables the robot to verify the given command, or to ask for missing information. As argued before, from a software engineering point of view, this requires a close collaboration between the dialog system and the components that carry out the robot’s activities. To address this issue, a general communication model is used: the Task State Protocol. It provides a fine-grained coordination mechanism for robot activities and, at the same time, serves as a well-defined component interface to the dialog system.
The basic concept is that components execute and request tasks, acting as servers and clientsrespectively. A task consists of its specification (which is relevant for execution) and the state (which is relevant for coordination) it is in. While the task specification depends on the type of the task, the set of possible states are the same for all tasks. The task life cycle can be described by the finite-state machine depicted in figure 3.6: Normally, a task gets initiated, accepted and finally is completed. Alternatively, it might fail, or be rejected right away. During execution, it might already deliver intermediate results, or be updated
update requested cancel requested running initiated CANCELLED DONE failed cancel accepted accepted rejected
update update accepted / failed
cancel cancel failed
initiated completed
intermediate result
3.3 The Task State Protocol 39
or canceled (which again might fail). Table 3.1 lists the semantics of those task state updates. The updates cause event notifications that are delivered to the participating components, together with the current task specification. However, not every task must support all state changes.
The concept of task-based coordination is not new: It has been identified as a common design pattern for coordination in robotics, and variations of it are applied in several robot architectures [LPP+11]. However, its application to, and benefit for, human-robot
interaction has not been investigated in the robotics community yet.
Technically, tasks are represented as XML document that contain a STATUS element, indicating their current state (cf. figure 3.7). Events are delivered asynchronously through an event bus provided by the middleware. The Task State Protocol has so far been implemented based both on the robotic middlewares XCF [FW07] and RSB [WW11] (but can in principle realized with any middleware that supports asynchronous commu-
nication). Lütkebohle has developed a toolkit that supports task management, both for server’s and for client’s side [LPP+11]. It offers functionality for requesting tasks and
updating their states (and possibly their specification as well) via a task object that encapsulates the detailed XML handling. Most importantly, the toolkit makes sure that the update operations comply to the state machine shown in figure 3.6. It also takes on error management, e.g. by detecting and recovering concurrent updates or delayed delivery. The Task State Protocol in the form as described here evolved within the Home-Tour scenario, where a mobile robot assistant has to become acquainted with human’s living environment by interacting with a human during a guided tour (see Chapter 6.1). In this context, a very first version of the Task State Protocol has been used for modeling the communication between the dialog system and a room representation component.
Task state Update operation Semantics
initial initiated The client initiates the task. initiated accepted The server begins execution.
rejected The task will not be executed.
running intermediate result The server has updated the task specification. completed The server has completed the task.
failed The task could not be completed.
update The client has updated the task specification. cancel The client requests termination of execution.
update requested update accepted The server conforms to the updated task specification.
update failed The server continues execution with the previous task specification. cancel requested cancel accepted The server stops execution.
cancel failed The execution will be continued. Table 3.1: The semantics of the task state updates.
< GRASP > < S T A T U S v a l u e =" i n i t i a t e d "/ > < R e g i o n v a r i a n c e F i r s t M a j o r A x i s = " 4 3 4 " v a r i a n c e S e c o n d M a j o r A x i s = " 4 3 3 " p i x e l C o u n t = " 4 3 8 4 " > < c o o r d ref =" i m a g e " k i n d =" r e l a t i v e " x = " 1 5 8 " y = " 6 7 " w i d t h = " 0 . 4 " h e i g h t = " 0 . 8 " / > < O b j e c t d e t e c t o r L a b e l =" a p p l e " > < G r i p t y p e =" T w o F i n g e r S p e c i a l "/ > </ Object > </ Region > </ GRASP >
Figure 3.7: Example task specification for a grasp operation.
Representations were based on laser-range data. The robot has to turn by 360°to acquire these. As this normally takes some time, it was desired that the robot acknowledges the execution at the beginning of the learning process to provide feedback about its internal action state (“I will have a look at it”). For this purpose, a basic version of the task life cycle was used, comprising the events initiated, accepted (marking the execution begin) or rejected (when the hardware was otherwise busy), and completed.
The concept was further refined within the Curious Robot scenario, an interactive object learning and manipulation scenario where a humanoid robot learns labels of objects and how to grasp them, assisted by a human tutor (see Chapter 6.2). In particular, the robot’s grasping operation provided an interesting use case for coordinating more complex actions: The human can intervene at any time, to stop or to correct an on-going grasping operation. Moreover, during an on-going grasping action, the human can bring up new topics, e.g. asking the system about information it has, which requires to coordinate multiple tasks in parallel. To meet these demands, the Task State Protocol was iteratively extended. The final version shown in figure 3.6 has proven to be detailed enough to support the desired functionality and, at the same time, general enough to be applicable to a variety of use cases in very diverse scenarios. Its role as a general coordination mechanism has been investigated [LPP+11, Lüt11]. Accordingly, in the Curious Robot system not only
the communication between the dialog system and the back-end, but the interplay of all components relies on the concept of task states.
Based on the experiences with the Home-Tour and the Curious Robot scenario, which were both implemented with the then used Sunshine dialog system, concepts for the reusable and customizable PaMini (Pattern-based Mixed Initiative) dialog framework were developed. As a consequence thereof, the role of the Task State Protocol as a generic interface became more important. From the dialog system point of view, it establishes a uniform interface to the back-end – and in particular to action execution – which allows the dialog system to treat all tasks in a uniform manner. From the back-end point of view, the Task State Protocol represents a well-defined component interface to the dialog system. It provides architectural guidelines for component developers, thus facilitating integration. The architecture it entails is schematically shown in figure 3.8: The dialog