• No results found

Time (hours)

A

v

er

age Quality Score (Relativ

e IPC Quality Score / Co

v er age) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 0.915 0.919 0.923 0.927 0.931 0.935 0.939 0.943 0.947 0.951 0.955 0.959 0.963 ● ●

BDPO2 (PNGS+IBCS: Bandit) BDPO2 (PNGS+IBCS: Alternating) BDPO2 (PNGS only)

BDPO2 (IBCS only) BDPO2 (LAMA only)

Figure 5.3: Average IPC quality score as a function of time per problem in five differ- ent runs of BDPO2: using only each one of the three subplanners, using two of them (IBCS and PNGS) combined with the UCB1 and without the UCB1 (using simple alternation policy instead). This experiment was run with setup 3 as described in Section 3.3 (on page 42). Note that the y-axis is truncated (started at 0.907) and the x- axis shows the runtime of BDPO2 (which started at 2 hour in the overall experiment

been experimentally evaluated. A summary of the experimental results is given in Figure 4.8 (on page 81).

5.6

Summary

This chapter describes the on-line adaptation, an important component of our plan optimisation system, where we auto-tune the system for the current problem, over the course of optimisation, in order to achieve better quality plans. In doing so, we apply UCB1, a popular multi-armed bandit learning technique, to exploit useful sub- planners as much as possible, since subplanners’ productivity varies from domain to domain. Our experiments clearly suggest that the bandit policy improves the overall plan quality across the problems (measured by the average IPC plan quality score) more than what is achieved by using any one single subplanner for all problems, across the entire time scale. We also describe how we adapt the ranking policies to the current problem to prioritise the subplans that are more likely to be improvable by individual subplanners.

Recall that we have three contributions in this thesis: plan decomposition, con- tinuing plan quality optimisation, and macro generation. We have discussed the first two contributions up to this chapter. The next chapter explains the third con- tribution: macro generation from block deordered plans, and how it is useful for improving planners’ efficiency.

Chapter6

Block-based Macro Generation

This chapter describes the third contribution of this thesis: block-based macro gener- ation. Macros are a well known and widely studied kind of planning knowledge that can be easily encoded in the domain model, and thus can be exploited by standard planning engines in order to improve their efficiency and coverage.

The macro generation process, like our window generation process (described in Chapter 4) for plan optimisation, is based on block-deordered plans. In such deordering, plans are divided into meaningful subplans, called blocks, by encapsu- lating more effects and preconditions within a subplan, and reducing interference with steps outside the subplan. According to the nature of blocks, they can straight- forwardly be transformed into purposeful macros. Note that our macro generation process, unlike the window generation process, is not concerned with plan cost, but focuses on finding macros beneficial for helping planners to find plans efficiently.

This chapter is structured as follows. Section 6.2 describes the preliminaries and terminology used in macro generation. Our overall block-based macro generation process, implemented in a system named BloMa, is described in detail in Section 6.3. This description is followed by an analysis of our experimental results in Section 6.4. Finally existing work related to our macro generation process are discussed in Section 6.5.

This chapter describes my contribution to the work published as “Exploiting Block Deordering for Improving Planners Efficiency” [Chrpa and Siddiqui, 2015] in more detail.

6.1

Introduction

Capturing and exploiting structural knowledge of planning problems has shown to be a successful strategy for making the planning process more efficient. Solutions (i.e., plans) to a planning problem that show the trajectory of achieving goals in the problem landscape are good sources of such structural knowledge. Theses solutions can be used for automatic re-engineering a domain model in order to reduce the problem complexity, and thus can help to improve planners’ efficiency.

One common approach to domain remodelling is to add macro-operators (“mac- ro”, for short). A macro-operator is formulated by assembling a group of planning

operators, which is generally based on investigating training solutions, for example, by exploring groups of actions (grounded instances of these operators) that are often placed successively in the training solutions. It is, in fact, logical to recommend that if a sequence of actions frequently occur in solution plans it might be a good sequence for the planner to consider. Macros can be encoded in the same format as original planning operators in the domain model, and thus can be used in a planner- independent way.

Macros date back to 1970s where they were used, for example, in STRIPS [Fikes and Nilsson, 1971] and REFLECT [Dawson and Siklóssy, 1977]. Korf [1985] later mo- tivates the use of macros by showing that macros can reduce the problem complexity in a number of cases. Since then many successful macro learning techniques have been developed (see Section 6.5).

Generating macros from training plans so that the macros are useful for improv- ing planners’ efficiency is not simple. Totally ordered plans, as training plans, for example, often hide some promising candidates for macros, since the corresponding actions of a useful macro may not be adjacent in these totally ordered plans. MUM [Chrpa et al., 2014] is a system that can form macros by taking non-adjacent actions of a totally ordered plan after investigating the pair-wise action dependencies in the plan. However, the technique is often unable to detect unnecessary orderings among groups of actions, which can limit the ability of the technique to capture promis- ing macro candidates. A block deordered plan (as described in Chapter 2), on the other hand, is a decomposition into meaningful non-interfering subplans with sig- nificantly reduced number of ordering constraints among themselves. A block in a block deordered plan, because of its nature, often represents a single compound activity useful for efficient planning, and therefore is a good candidate by itself for a macro.

In this chapter, we first describe how to extract subplans that are promising can- didates for macros from a block deordered plan by utilising structural relations among blocks. These macro candidates often have longer subplans representing important high level activities. Then we describe our macro generation system (BloMa) that automatically extracts domain-specific macros from the macro can- didates extracted from the block-decomposed training plans. BloMa can generate useful longer macros in problems whose structure relies on repetitive application of a larger sets of actions. Traditional macro learning techniques that are based on “op- erator chaining” approaches (i.e. assembling operators one by one) are often not able to find such long macros. BloMa is evaluated by using the IPC benchmarks with state-of-the-art planning engines, and shows considerable improvements (in terms of IPC score and coverage) in some domains.