Gap Solver - Miscellaneous systems - Acting and Learning withGoal and TaskDecomposition

2.4 Miscellaneous systems

2.4.9 Gap Solver

Gap Solver is an algorithm that is most closely related to TADPOLE and was developed concurrently with it [20]. It is a learning algorithm for learning new HTN methods and extending the domain coverage of HTN planners.

Standard HTN planners do not include preconditions of atomic operators and do not have goals describing what each method tries to achieve. This means that an HTN planner has no way of determining plan correctness, and it must rely on its decomposition rules to ensure that the generated plans are correct. However, this stringent requirement makes the rules prohibitively difficult to learn. To get around this difficulty, Gap Solver extends the standard HTN representation to include preconditions

on atomic operators and goals on methods. Because of these extensions to the standard HTN representation, a modified HTN planner can determine whether the plan it generated is correct and, if it is not, backtrack and try alternative methods or bindings. This flexibility in planning means that the planner no longer relies on perfectly crafted method rules, and Gap Solver can get away with learning minimal method preconditions (just enough to bind all of the variables in the method decomposition).

Gap Solver learns new method rules from demonstrated lessons of a teacher where each lesson consists of a goal the teacher achieved and the sequence of atomic actions that achieved it. Gap Solver does a top-down and a bottom-up parse simultaneously, and then fills any gaps between the top-down and bottom-up parses with newly learned methods. The top-down parse is just the resulting decomposition hierarchies of a back- tracking HTN planner (as much as it is able to generate with its current rule set). The bottom-up parse results from applying inverted methods — rules for composing sub-tasks into a super-task which inherit the precon- dition of the inverted method. It is unclear whether this kind of search scales well (Gap Solver has only been partially implemented so this ques- tion remains unresolved). The top-down planner backtracks resulting in multiple top-down decomposition hierarchies. The bottom-up parser also results in multiple bottom-up hierarchies. Potentially, this may result in a large number of candidate combined hierarchies. Selecting the best candidate (the one which will result in the most reusable learned methods) is an important part of the algorithm, but this has only been partially ad- dressed. Gap Solver currently uses only one heuristic: preferring candi- dates whose newly learned methods have more primitive tasks included in their method decompositions. After selecting a candidate decomposition hierarchy, Gap Solver uses explanation based generalization to learn the minimal preconditions of the new methods. Finally, it normalizes the learned methods, changing the common constants into variables.

rules will scale. One of the driving motivations behind HTNs was to im- prove upon the inefficiency of classical planners by using domain-specific decomposition rules encoding search control knowledge. Using only minimal decomposition rules results in a loss of much of the search control knowledge. For any goal or sub-goal there may be multiple methods for achieving it most of which are not appropriate in the current state. Mini- malist rules that do not guide the planner in the selection of appropriate methods force the planner to backtrack to try alternate methods and bindings. It remains to be seen whether this is tractable in rich domains.

Another important limitation of Gap Solver is that it relies on complete knowledge of how operators affect the domain. Gap Solvers modified HTN planner relies on complete operator knowledge to determine whether a generated plan is correct so that it knows when to backtrack, and Gap Solver relies on the planners ability to backtrack to get away with learning only minimal rules. However, complete operator knowledge is an unreasonable demand on a learner in rich, complex domains. For example, to completely understand the atomic action of turning on an electric kettle, the agent needs to have an understanding of electronic circuits.

TADPOLE avoids the limitations of Gap Solver by relaxing the requirement of ensuring plan correctness (a requirement that is impossible to satisfy in an unpredictable domain anyway). It does only a bottom-up parse, and it learns rich preconditions for both atomic and non-atomic rules based on what the teacher usually does. This results in sensible, useful rules, without requiring complete understanding of the domain.

Representation

This chapter describes the way that HOPPER (Chapter 4) and TADPOLE (Chapter 5) represent states, goals, tasks, and why decoupling tasks and goals results in more re-usable decomposition rules.

Previous systems that have used decomposition rules, such as Icarus and HTNs (described in Chapter 2), have used a wide range of terms such as goals, tasks, skills, and methods to describe their decomposition rules and the components the rules are made up of. To add to the confusion, different systems use semantically different decomposition rules made up of different components. This has a marked impact on the algorithms that make use of the different decomposition rules.

The purpose of this chapter is to clearly define the terms that will be used in subsequent chapters of the thesis. The chapter also describes the task to sub-goal decomposition rules used by HOPPER and learned by TADPOLE, contrasts them with the decomposition rules used by other systems, and argues that task to sub-goal decomposition rules are most appropriate for the HPD.

Because TADPOLE learns in a rich domain, the states and decomposition rules it learns are extensive. For the sake of clarity, this chapter presents only very simplified examples of states, goals, tasks, and decomposition rules. A fully detailed exampled of a state and a decomposition

rule (including its head-task and sub-goals) can be found in the appendix.

Organization of the chapter

• Section 3.1 defines the terms goal and task and clarifies the distinction between the two.

• Section 3.2 explores the various ways that decomposition rules can be constructed using goals and tasks, what effect the types of decomposition rules have on algorithms that make use of them, and what kind of decomposition rule is most appropriate in the human planning domain.

• Section 3.3 describes how HOPPER and TADPOLE represent states, the passage of time, and state differences.

• Section 3.4 describes how HOPPER and TADPOLE represent goals, tasks, and goal-state differences; as well as describing their internal structure.

• Section 3.5 describes the representation of the decomposition rules used by HOPPER and learned by TADPOLE in terms of the task and sub-goals that comprise it.

• Section 3.6 concludes the chapter by describing limitations of the representation scheme used by TADPOLE and HOPPER and ways in which it will need to be extended if it is to scale to the Human Plan- ning Domain.

3.1 Distinction between goals and tasks

Goals and tasks are similar to each other and it is easy to confuse the two. However, they are semantically distinct and they play different roles in decomposition rules and in the algorithms that make use of them. This section distinguishes the two terms.

In document Acting and Learning with Goal and Task Decomposition (Page 69-74)