POMDP Executive - Dynamically Extending Planning Models using an Ontology

The Executive component implements the top-level receding horizon control loop, coordinating the activities of the planner and sensor components. Algorithm 1 shows pseudocode for the Executive. The algorithm begins by initializing the belief state according to the a priori probabilities, and per- forming other initialization (Line 1). The receding horizon

generative planner requires, as one of its inputs, the current state in a deterministic form. The belief state, however, is represented in a probabilistic form, so it first has to be con- verted to a deterministic form using MostLikelyState (Line 6). The input to the generative planner includes the deter- minized state, as well as the goal state set. The Executive generates domain and problem PDDL files, and then exe- cutes the planner (Line 7). The planner generates a result file containing a plan, which the Executive reads.

The planner may fail to generate a plan, in which case, the algorithm returns failure. If the planner is successful in gen- erating a plan, the executive dispatches the first action (Line 11). The action (sensor operation) generates an observation. The belief state is updated based on this observation (Line 12). The algorithm terminates if the goal has been achieved, or the maximum number of iterations has been exceeded.

Algorithm 1: Executive

Input: a-priori-belief-state-probs, goal-state-set Output: goal-achieved? /* Perform initialization. */ 1 belief-state ← InitializeBeliefState(a-priori-belief-state-probs); 2 goal-achieved? ← f alse; 3 max-iterations ← 1000; 4 iteration ← 0;

/* Begin control loop. */

5 while notgoal-achieved? do 6 current-state

← MostLikelyState(belief-state);

7 plan? ←

GeneratePlan(current-state, goal-state-set);

8 if notplan? then

9 return ; 10 action ← First(plan?); 11 observation ← Dispatch(action); 12 belief-state ← UpdateBeliefState(observation); 13 goal-achieved? ← CheckGoalAchieved(belief-state, goal-state-set); 14 iteration ← iteration + 1;

15 ifiteration > max-iterations then

16 return ;

Planner

The Planner component was originally implemented using Fast Downward(Helmert 2006), a state of the art generative planner that accepts problems formulated in the PDDL lan- guage (McDermott et al. 1998). A limitation of this planner

son. 2015). This became unwieldy, so we are now using Metric FF, which supports real-valued variables, and linear relations among them. This is an improvement, though the restriction to linear relations requires linearization of the belief update formulas. We have also tried some planners that support non-linear relations, but the ones we tried have not reached a level of maturity to be useable. This is still an im- portant area of ongoing research in the community.

In order for the Planner component to operate properly, it is necessary for it compute belief state updates. This allows it to predict the effect of actions on belief state; a prediction needed for planning purposes. Note that this computation is distinct from the belief state update performed by the Belief State Update component after an observation. The limita- tions of current planners make it impossible, during planning, to implement completely accurate belief state update, as specified by Eq. 6. Instead, we use an approximation (a linearization in this case), and accept that this approximation is not as accurate as the one performed by the Belief State Update component after an observation. The key point here is that the goal of the Planner component is to make a good choice for the next action, not to predict completely accurately what will happen in the future. The approximation that we use is good enough for our current purposes. More testing is needed to determine whether a more accurate approximation would be beneficial for more complex problems.

We linearize Eq. 7 using a first-order approximation of the form

b (s1) ≈ f0() + f1()( − ¯b (s1)) (16)

where is the linearization point, the belief value about which Eq. 7 is being linearized, and

f0() = (1 − pf n(s1, a1)) (1 − pf n(s1, a1)) + pf p(s 6= s1, a1)(1 − ) (17) f1() = df0() d (18) f1() = pf p(s 6= s1, a1) (1 − pf n(s1, a1)) 2(1 − pf n(s1, a1)) 2 +2 (1 − pf n(s1, a1)) pf p(s 6= s1, a1)(1 − ) +pf p(s 6= s1, a1)2(1 − )2 (19) We currently use a single linearization, with manually cho- sen to be 0.5, the average probability. Using multiple linearizations, with different values, would allow for approx- imating the nonlinear belief state update more accurately, with piecewise linearizations.

A PDDL problem formulation consists of a domain file, and a problem file. The domain file specifies types of actions that can be used across a domain of application such

as logistics, manufacturing assembly, or in this case, find- ing wheels in an image. The domain file is fixed; it does not change for different problems within the domain. Thus, this part of the formulation was generated manually, and is not modified by the Executive. The problem file, on the other hand, contains problem-specific information such as initial and goal states. Therefore, it must be generated specifically for any new problem. The Executive generates this file auto- matically, based on knowledge of the goal and belief states. The following PDDL domain file fragment shows the defi- nition of the SURF Match action in PDDL.

(:action SURF-match :parameters (?w ?p ?bsv) :precondition (and (belief-state-variable ?bsv) (pose ?p) (wheel ?w) (for-wheel ?w ?bsv) (at-pose ?p ?bsv) (> (belief-level ?bsv) 0.1)) :effect (and (increase (belief-level ?bsv) (- (+ (f0) (* (f1) (- (eps) (belief-level ?bsv)))) (belief-level ?bsv))) (increase (total-cost) (feature-observation-cost ?p)))) The precondition clause specifies that the belief state variable value for a particular pose and wheel must be at a mini- mum of 0.1 in order for this operation to be tried. The effect clause specifies that the belief level increases according to Eq. 16. The cost for the operation is also added to the total cost. The other sensor actions are specified in the PDDL domain file in a similar manner.

In document Dynamically Extending Planning Models using an Ontology (Page 71-73)