Skill System - Representing QualitativeAction Models for Learningin Complex Virtual Worlds

The skill system implements a high level action interface that allows an agent (or user) to request atomic action executions on target objects and have them realised in the simulation using low level api calls. The require- ments of the skill system are that it executes actions atomically (i.e. actions are executed between observation snapshots and never across them), that it executes ‘high level’ actions which are assumed to have been previously learned (i.e. the agent can request a ‘pick up object X’ action and not be concerned with the underlying fine level control of arm and gripper actu- ators), and that the outcome of actions is non-deterministic (for example, if the agent pushes a stack of objects, sometimes they fall over and sometimes they do not). The agent is assumed to have a skillful manipulator arm that can interact with objects in a similar manner to a human.

The skill system is implemented as an independent thread within the simulation process, this allows it to execute actions asynchronously with respect to simulation updates and observation snapshots. Semaphores are used to communicate with the observer thread, allowing the timing of observation snapshots and action executions to be co-ordinated when required. The skill system acts as a listener and waits for action execution requests. The requests can come from either the agent process or directly from user input. The actual execution of an action and its subsequent observation is identical whether the requests was generated by the agent or a user. This enables the user to guide the agent by effectively choosing actions for it. The skill system implements action execution requests by using the simulation’s api to move, apply force to, or update the internal state of objects.

Actions may be executed at any time during the simulation but they are prevented from occurring simultaneously, the agent cannot multi-task. Some actions may be implemented as composite actions, for example the put-on action is implemented as a sequence of pick-up, move and put-

4.4. SKILL SYSTEM 97 down actions. This composition is useful for implementation purposes but it is not visible to the agent. The agent is not aware that a put-on action is implemented as a decomposition, however there is nothing preventing the agent from learning and implementing its own action compositions if required.

To ensure atomicity actions are always completed before the next observation snapshot. This normally requires the observation snapshot to be delayed because actions typically are longer than the observation interval. The delay has does not need to be explicitly observed by the agent because it assumes a similar but non-fixed interval between states.

Non-deterministic outcomes are implemented by two mechanisms, firstly by ensuring certain actions are executed slightly differently every time, and secondly by relying on the non-determinism of object interactions in the underlying simulation. The ‘move’ action is an example of an action that is executed slightly differently each time because the object is moved to the approximate target location rather than the precise location. This can lead to different outcomes depending on the other objects in the vicin- ity of the target location.

Occasionally invalid actions will be requested. This can happen if a target object has ceased to exist in the time between choosing an action and its execution. In these cases no action is performed and no action is observed. Invalid actions are distinct from actions which ‘fail’ in some way, such failing actions are observed normally and not explicitly flagged as failed. For example, a pick-up action may fail when it drops the target object during the action execution; in this case the agent will observe a ‘successful’ action with an unusual outcome.

Multiple actions are allowed to occur between snapshots but this rarely happens because most actions are longer than the observation interval. If the situation does arise then the all actions are observed in the following snapshot as having occurred ‘simultaneously’ but they are not ordered. The agent is unable to observe the order of events at a resolution greater

time

observation snapshots water height 0 transition because all water has drained

insert-plug action

Figure 4.11: Simultaneous Observation of Dynamic Transition and Action than the observation interval. This lack of temporal resolution results in misleading observations when action executions occur coincidentally at the same time as qualitative state transitions generated from the simulation dynamics. There is insufficient information in the snapshots to enable the observer to disambiguate the causal effects of the action and the on- going effects of the underlying world dynamics (this is a similar problem to the problem of missed qualitative states discussed in section 4.3). Figure 4.11 illustrates an example of this problem with respect to action timing. The scenario involves a sink that is emptying of water through a drain. The insert-plug action is executed coincidentally during the same observation interval as the last of the water drains from the sink. The resulting observation snapshot indicates that inserting the plug caused the sink to become empty. These misleading observations must be resolved by the agent if it is to learn useful models.

4.4.1 Example Action Implementation

This section describes a typical action execution in detail to show how the simulation api is used to execute the action. The example action is the put-on action which has two target objects, the object to move ‘X’, and the object on which it will be placed ‘Y’. The following steps are carried out to execute the action:

4.4. SKILL SYSTEM 99 1. The observation thread is paused.

2. Object X is grabbed by the agent. This involves changing X from a dynamic object which responds to physical forces that act upon it, to a static object which collides with and moves other objects but cannot be moved itself by the physics engine. Making the object static sim- ulates the effect of the agent holding the block securely. (The physics engine is suspended while the object is changed, this is to avoid un- realistic interactions between objects when X is suddenly replaced by a new static version with different physical properties.)

3. Object X is picked up. This involves moving the static object at a con- stant rate to until a specified height is reached. Any objects on top of X are picked up with it. A timer is started to periodically check if the object is in the approximate location. The use of the timer introduces some imprecision in the action because there is some variability in the delay between position checks.

4. The centroid of object X is aligned above object Y. Again a timer is used to check when the object is in position. During this process the object may collide with other objects, also any objects on top of X may fall off due to the effects of momentum (and also dependent on the amount of friction between the objects).

5. Object X is lowered until it is close to touching an object. It cannot be lowered all the way because X is still a static object and this would be equivalent to the agent moving X down with zero give, causing the underlying object to be forced away (explosively due to limitations in the physics engine).

6. Object X is dropped. X is changed back from a static object to a dynamic object. Gravity takes effect and X falls the remaining short distance to land (hopefully) on top of Y.

7. The observation thread is restarted. The put-on action event is added to the current observation snapshot.

It can be seen from the example that the actual outcome of the put-on action is dependent on a variety of complex interactions between control commands, other objects, and the physics engine dynamics.

In document Representing Qualitative Action Models for Learning in Complex Virtual Worlds (Page 110-114)