(Verification Operators) Verification Operators represent the knowledge of which perceptions and expectations are appropriate for the

Plan Unexpected Situation

Definition 8 (Verification Operators) Verification Operators represent the knowledge of which perceptions and expectations are appropriate for the

pre-conditions and postconditions of which actions .

Each verification operator will map each assertion,which will have its own expected value based on the current situation, into a set of sensory actions. For example, a grasp action for a robot requires the correct position of the robot’s arm and the arm is not holding an object at the current stage. These assertions will be translated into several sensing actions, such as a vision sensor (to check the arm’s position) or force sensor (to check what the arm is holding). Then a comparison between the expected value and observed value from the sensory data will be used to indicate the successful execution of the actions. Finally, the interpretation will decide how the failure actions affect the rest of the plan.

As mentioned before, it is intractable to monitor all assertions in the plan due to the limited computational power and time constraint. In [30], they discussed several criteria for selecting appropriate assertions at run-time.

1. Uncertainty Criteria Uncertainty can exist in the world model or action outcomes. A stochastic action which has multiple outcomes might need more verification operators to determine the failure of the action compared to deterministic actions.

2. Dependency Criteria A critical path in a plan represents those postconditions that will be required by actions later in the plan. In other words, postconditions that are not used by later actions in the plan can be ignored and need not be monitored because they will not affect continuing execution of the rest of the plan. This again can be characterised as determining relevance of the effects of the actions based on the validity of the plan.

3. Importance Criteria These criteria are largely related to the pre- vious dependency criteria. Conditions of the actions can be pri- oritised based on metrics such as the number of subsequent actions that need these conditions.

4. Recovery Ease Criteria These criteria focus on how easily it can recover from the failure of an action. If it is quite difficult to recover from the failure of an action, the assertions of the action might need to be examined closely.

3.1.1.4 Monitoring policy Execution

Fritz et al ’s work [39] focuses on monitoring policies of MDP prob- lems (details in Chapter 2). They apply execution monitoring for

MDP policies because of the incomplete model of the planning do- mains, so unexpected states could occur at any time step. In particu- lar, unexpected situations will not only affect the validity of the current best plan, but will also affect its optimality. For instance, the original sub-optimal branches in the policy might become optimal after the unexpected situations occur. This idea of checking the optimality of plans was first introduced by Veloso et al.[108] and is shown in Section 3.1.3.4. Therefore, their execution monitoring tech-

nique needs to decide the optimality of the current best policy at execution time. They claimed re-planning for every unexpected state is costly and often unnecessary [39], so the main contribution of this

work is finding the relevant conditions that will affect the optimality of the current policy. By doing this, execution monitoring will ignore the unexpected states that only contain irrelevant conditions so as to avoid expensive replanning procedures.

One thing that is worthy of note is that they consider forward search-based MDP solvers rather than standard dynamic program- ming as explained in Chapter 2. As described in [26], a forward search-based MDP solver is an on-line solver that will start with a root node which contains only initial state S0 and gradually expand its successors until a certain horizon is reached. Forward search-based MDP solvers require a heuristic estimate (V0) of optimal value func- tion (V∗) for all states in the domain to be computed, so it can back up these values from the leaf nodes of the search tree to the root us- ing Bellman Backup operators. This can provide a better estimation of value functions for the states in the tree. Given this search tree, the best action for the current state can be selected greedily and also for the subsequent actions. An example of the search tree is illustrated in Figure12where circles represent states in the MDP, and rectangles

represent action choices. Another thing to be noted here is that all the states and actions are represented in the situation calculus. The initial state S0 is the root of the search tree and N[a1, S0] represents the execution of action a1 in the initial state. Since actions in the MDP have stochastic outcomes, they refer to the selection of an action outcome as nature’s choice and the notion of N[do(a_i,j0 , s)] indicates the jth outcome of action i. As mentioned in [38], situation label nodes N[s]will be annotated with rewards, and edges E[a0, S] will associate cost and probability of that outcome.

In the context of execution monitoring, they [39] want to make sure the current unexpected situation will not affect the validity and optimality of the policy. At first, the forward search-based MDP solver

S

a

N[a

,s

]

N[a

,s

]

N[do(a'

1,1

, S

)]

N[do(a'

1,2

, S

)]

N[do(a'

2,1

, S

)]

N[do(a'

2,2

, S

)]

E[a'

1,1

,S

]

E[a'

1,2

,S

]

E[a'

2,1

,S

]

E[a'

2,2

,S

]

Figure 12: An example of annotated search tree for MDP monitoring,

adapted from [37]

will only produce a policy (contingency plan) which contains the best action to take for current state and also for its successors. Given the policy itself, it is not enough to answer the above question, because the policy is extracted from the search tree and does not provide any information about how the optimal or near-optimal policy was selected. So Fritz et al. [39] annotate the policy with the search tree. The annotation is done by associating the root node in the policy with the complete search tree and its following nodes with corresponding sub- search trees. So it is only necessary to check whether the unexpected states affect the current annotating (sub-search) tree at execution time. This is done by regression which is defined as follows [39]:

Definition 9 (Regression). Regression of a formula ψ through an action

In document Monitoring plan execution in partially observable stochastic worlds (Page 63-66)