Block-based Macro Generation System - Using Plan Decomposition for Continuing Plan Optimisation

Our macro generation process is based on block-deordered plans, where a plan is divided into meaningful non-interleaving subplans, called blocks. Recall that a more meaningful subplan is relatively more coherent, more encapsulated, less interfering with the other parts of the plan, and often represents a purposeful activity. An example of meaningful subplans, found by the block deordering of a plan (an optimal plan, in fact) for a problem in the Gripper domain, is shown in Figure 6.3.

The Gripper problem is an easy transportation type problem with a single mobile (a robot) with a fixed capacity of holding two objects (balls) using two grippers

§6.3 Block-based Macro Generation System 99

Figure 6.3: Block-deordered plan of a gripper problem with eight balls.

(which gives the domain its name). There are exactly two locations (called rooma and roomb) connected to each other. The robot can move between the rooms and pick up or drop balls with either of his two arms. Initially, all balls and the robot are in the first room. We want the balls to be in the second room.

In Figure 6.3, each of the blocks represents a purposeful activity: “comprehensive delivery”, where the robot uses all its arms to perform a fully loaded delivery (of two balls) in one move. Also each block encapsulates its internal activities to perform the comprehensive delivery, and restricts the interference of the activities not within the block (which, in fact, allows the blocks b1 to b3 to be unordered). Because of this nature, each of the blocks represents a meaningful subplan, and (in this case) is a good candidate by itself for a macro. However, a block alone is not always a good candidate for a macro; a block can be too small (e.g., containing one step only) or too large (e.g., containing several component blocks) to represent a purposeful activity. Therefore, we need some principles (based on the structural properties of a block deordered plan) that group together logically connected blocks as a larger subplan resembling a single-minded activity frequently used in plans (e.g. mixing a cocktail and cleaning the shaker afterwards – as observed in the Barman domain). We call such a group of blocks amacro candidate(described in detail in the following subsection).

The overall procedure of our block-based macro generation is implemented in a system, called BloMa, whose high-level design is shown in Algorithm 6. In the first step (line 1), given a set of training planning problems (simpler but not trivial),

Algorithm 6BloMa– a block-based macro generation system 1: Π←GenerateTrainingPlans() 2: B ←GenerateMacroCandidates(Π) 3: M←ExtractMacros(B) 4: EnhanceDomain(M) 5: Π←GenerateTrainingPlans() 6: FilterMacros(Π) 7: LearnEntanglements(Π)

training plans are generated using off the shelf planners. Next, macro candidates are generated from the training plans based on some rules (line 2) described in the following subsection, and the macro candidates that are generated frequently by the rules are assembled into macros (line 3). Then the formulated macros are added into the domain model (line 4), and training plans are re-generated using this macro enhanced domain model (line 5). Macros that do not appear or appear infrequently in the re-generated training plans are filtered out (line 6) from the domain description. In fact, we let the planner “decide” which macros are useful for it. The same strategy was used by MacroFF [Botea et al., 2005]. As a final step of BloMa, like MUM [Chrpa et al., 2014], macro-specific entanglements (i.e. those where only macros are involved) are learnt (line 7) for efficient pruning of unnecessary instances of macros. For learning entanglements, we check whether for each operator and related predicates the entanglement conditions (described in 6.2.2) are satisfied in all the training plans. Some error rate (i.e., when the entanglement conditions are violated) is allowed (typically 10%). For example, a Logistics problem might suggest that passengers can board the plane only in their initial locations and debark in their final destinations. If only a small number of passengers have to change the flights, we might still be within the "error ratio" boundary. Our entanglement learning process is similar to that of MUM (for details, see how entanglements are learnt by MUM [Chrpa et al., 2014]). Note that pruning of unnecessary instances of macros through learning entanglements does not compromise completeness because the original operators remain intact and can be used instead of incorrectly pruned instances of macros.

Whether a macro candidate or macro is “frequent” is determined relatively. Let

fb(m) be a number of occurrences of a macro candidate min the set of macro candidates B. Then, a macro candidate m ∈ M (M is the set of macros) is considered as frequent if fb₍_m₎_≥ _p

bmaxx∈M fb(x), where 0 < pb ≤ 1. Similarly, let fp(o)be a

number of occurrences of an operator or macroo in the set of macro-enhanced training plansΠ. Then, a macromis considered as frequent if fp(m)≥ ppmaxx∈O fp(x),

where 0 < pp ≤ 1 and O is a set of operators (including macros) defined in the

planning domain. Clearly, setting p_b,pp too high might cause filtering some useful

§6.3 Block-based Macro Generation System 101

6.3.1 Formulation of Macro Candidates

As mentioned, a block (in a block deordered plan) itself is a good candidate for a macro but not always; a block can be too small (e.g., containing one step only) to represent a purposeful combined activity. Hence we extend the blocks to larger blocks, called macro candidates, by capturing different structural relations among them in order to provide larger subplans that can capture more complex activities that are frequently occurred in plans.

We use a fixed set of rules based on structural relations among the blocks of a block deordered planπbdp in order to formulate macro candidates. Like windowing

strategies (described in Chapter 4), we define relations of immediate predecessorand

immediate successor of a block in π_bdp. Let IP(b) and IS(b) be sets of blocks in π_bdp

being ordered, respectively, immediately before and after bwith respect to the tran- sitively reduced orderings of blocks. Orderings between blocks are determined by any of the necessary reasons (PC, CD, and DP) as described in Section 2.2.

In macro candidate formulation, we also make use of thecausal follower blocksof a producer pof an atom m, CFBhm,pi(according to Definition 17 on page 73). In brief,

the causal followers of a producer p with respect to an atom m, CFhm,pi, are a set of

steps {p,sj, ...,sk} \ {sI,sG} (sI and sG are the “init” and “goal” steps respectively)

such that {hp,m,s_ji, ...,hp,m,s_ki} are causal links. The causal follower blocks with respect to a producer pof an atomm, CFBhm,pi, is the set of blocks, where each block

contains at least one element of CFhm,pi.

We use both basic andextended blocks of a block deordered plan for macro candidate generation. Recall that basic blocks are the blocks generated by the block deordering of a plan, while extended blocks are generated by encapsulating non- branching subsequences of basic blocks (as described in Algorithm 4 on page 66). In other words, extended blocks put together blocks that are strictly consecutive in block deordered plans. Thus, they are supposed to capture larger activities (e.g. shaking a cocktail and cleaning the shaker). It should be noted that if no deordering is possible, there will be a single extended block encapsulating the whole plan.

Having described the structural relations (among blocks) above, we use the following eight rules to construct macro candidates based on those relations.

R1 :{b} R5 : R2(R4)

R2 :{b} ∪IP(b) R6 : R3(R4)

R3 :{b} ∪IS(b) R7 : R4(R4)

R4 :{b} ∪IP(b)∪IS(b) R8 : CFBhm,pi

A rule is applied over each basic block bto construct one single macro candidate. Rule R1, for example, constructs a macro candidate by simply taking a single block

b, whereas R2, R3, R4 capture sequences of neighbouring blocks ofbdetermined by IP(b), IS(b), or both. Unlike windowing strategies (described in Chapter 4), we do not use any rule to capture the unordered blocks of a block b(Un(b)) into a macro

Figure 6.4: Formation of extended blocks (ii) and macro candidates (iii) from basic- blocks (1). (The macro candidateb3∪b4∪b5is formulated by applying the rule R3

over the basic block b3.)

candidate. Generally speaking, accumulating unordered blocks can be useful for plan optimisation, but usually not for macro generation. For example, the compound activity “delivering packages by separate trucks” can be useful to consider within a window for optimisation (which might be optimised by better use of trucks), but not as a macro because of not being internally consistent (i.e., delivering separate packages by separate trucks does not signify a single minded activity, rather increases the preconditions of the resultant macro).

The above rules can be divided into three groups, namely the primary rules (R1 to R4), the secondary rules (R5 to R7), and the causal rule (R8). The primary and secondary rules generally produce different sizes of macro candidates (measured by the number of steps they consist of). Secondary rules concatenate multiple primary rules to achieve larger macro candidates. The R5 rule (as a secondary rule), for example, says that R2 is applied over the macro candidate constructed by R4 to achieve the resultant macro candidate. The causal rule (R8) constructs macro candidates based on causal links. This rule is applied over each producer p of each atom mto group the causal followers of m(with respect to p) in a block deordered plan. The macro candidates produced by R8 are not limited to any fixed length. An example of a macro candidate formulated by applying R3 over a basic block in a block deordered plan is shown in Figure 6.4. Note that this macro candidate cannot be captured by the extended blocks.

The resultant macro candidate, after applying any rule, may not capture some intermediate blocks, i.e., blocks that are ordered in between (formally described in Definition 10 on page 68). This is illustrated in Figure 6.5 where an example of a block deordered plan is shown. In that plan, applying the R4 rule over bp results in a set of blocks not capturing the intermediately placed block bq. We incorporate

In document Using Plan Decomposition for Continuing Plan Optimisation and Macro Generation (Page 118-123)