Experimental Evaluation - A computational framework for behaviour adaptation: the case for agen

The purpose of the experiment we designed is to evaluate the complexity of building the predicted traces. Since the ⊕ operator could be implemented in multiple ways (the space of alternative design decision is very large) and we do not suggest that this particular implementation is to be preferred other other possible ones (such claims can only be made after a series of substantive comparative studies, and it is most likely contextually dependent). The possible implementations could include (but not limited to)

1. modeled and solved as a MAX-SAT problem,

2. modeled and solved as a MAX-SAT problem with hard clauses, 3. implemented using a theorem prover,

4. implemented using a theorem prover and a SAT solver 5. implemented using answer set programming (ASP).

6. . . .

However, the particular implementation in this chapter only aims to provide an adequate basis for making a preliminary determination of whether this approach is practical.

According to Definition 3.1, given a consistent state description s and an effect e, s ⊕ e is a set of new states, in which each of the new state s⁰ is a maximum consistent subset of s ∪ e (according to clause (1), (2) and (4) in Definition 3.1), and the effect e must be a part of the new state s⁰ (clause (3)). Therefore, the set of new states are the

Figure 3.2: PROMELA model of pt₃

set of all solutions of the MAX-SAT problem of s ∪ e where e also holds. For example, given a knowledge base with a rule p ∧ q =⇒ r, s = {p, q} and e = {¬r}, there are 2 new states according to Definition 3.1, s⁰₁ = {¬r, p} and s⁰₂ = {¬r, q}.

This first design decision that have to be made is the language that is used in the state description, and the computational implementation of ⊕. In the following evaluation, the propositional language is used and ⊕ is implemented using a SAT4J SAT [9] solver with a procedure inspired by [78] used to compute all maximally satisfiable subsets in which e always holds. This choice of formal language and implementation of operator is mainly a matter of convenience. Choosing first order or higher order logic language introduces issues such as semi-decidability that is not yet addressed in this chapter.

The next step is to collect a set of Agent programs with annotated effects of goals and actions. Here we use a collection of systematically generated programs with effects, controlled by the following variables,

Language Size from {20, 30, 50} is the total number of primitive symbols in the language L.

Context Size from {2, 3, 5} is the number of conjunctive clauses in each context.

Knowledge Size from {20, 40} is the number of conjunctive clauses in the knowledge base KB, given KB is generated in conjunctive normal form.

Rule Size from {2, 5} is the maximum number of disjunctive literals in every conjunctive clauses in KB.

Effect Size from {1, 3, 5} is the number of conjunctive literals in effect e, as the disjunction can be represented in an alternative effect (i.e. the set of effects E of an action can be seen as disjunctive normal form).

Number of Goals from {2, 5, 10} is the total number of goals in a plan library.

Number of Plans from {1, 2, 3} is the number of plans that achieve one goal in the plan library.

Plan Body Size from {5, 10} is the total number of action and sub-goals within a plan.

Number of Sub-goals from {1, 2, 3, 5} is the maximum number of sub-goals that are allowed in a plan body, and there should always be plans that does not contains sub-goal.

Also, the consistency of semantics is tested during the generation, for example, (1) KB is always consistent, (2) context of plans are consistent with KB and context of plans achieving the same sub-goal are different but not necessarily mutually exclusive, and (3) the plans does not contain plan-sub-goal cycle, that is each plan library can be represented in finite goal-plan trees. With all combination of the values of these variables, the minimal number of plan library that can be generated is 7776. A subset of 1390 plan libraries is used in the evaluation, and for every state description at a plan selection, there is at least one plan that can be selected.

Every plan library and corresponding effects is then used as input to Algorithm 3.1 to construct traces. Every run of Algorithm 3.1 is timed and the resulting traces is recorded. Generation of traces for a plan library is run 10 times and timed separately to eliminate the errors that could be introduced by the hardware and software environment.

The evaluation is run on Intel^R Core^TMi5–4440 with 16GB memory in Ubuntu 12 and Java SE 7.

Figure 3.3 demonstrates some interesting relation between the time spent of Algorithm 3.1 on each plan library and variables that represent the structures of the plan library.

For every plan library, the construction of trace generation time is illustrated in Figure 3.3a. Since the complexity of plan library is different, the more complex plan library could lead to a longer time of execution. Therefore, we count the total number

of state descriptions within traces for every plan library to compute the average time spent on computing a state description for every plan library, which is illustrated in Figure 3.3b. It can be seen that the most of the population distributed within 200 seconds for generating all traces, and within 12.5 seconds for generating a single state description. Figure 3.3c and Figure 3.3d shows the total time taken to explore traces of a plan library compared with the number of traces and state descriptions explored of the same plan library. Generally speaking, the maximum time taken for a plan library is stable when the number of traces and state descriptions increase, while the minimum time taken increases. Thus, the more complex the plan library, i.e. more possible run time instances (traces) and more state descriptions (the results of either a longer plan body, more sub-goals, etc.) possibly requires longer time to explore all possible traces.

Moreover, the stable maximum time spent means that even with a simple plan library with only one trace, it is always possible to take long time to explore, which means the semantics of the plan library (i.e. knowledge base and effects) defines a difficult SAT problem that takes longer time for each ⊕ operator to solve.

The variable that directly connects to the average time of exploring a state description is the effect size, i.e. the number of conjunctive literals that is used in each effect as shown in Figure 3.4a, because ⊕ operation is implemented as solving maximum satisfiability problems, the larger the effect, possibly means a more difficult problem.

On the other hand, the context does not affect the average time by much (Figure 3.4b) as plan selection is only an evaluation of consistency between a state description and the context, and operator is only taking the union of two sets of assertions. We cannot find any strong evidence in other variables that shows any effect on the overall time spent and the average spent.

Overall, our approach for evaluating agent program is plausible in practices, as we have explored all possible execution instances of 1390 plan libraries, each of which

Figure 3.3: Time Spend in Constructing Traces for each Plan Library

(a) Distribution of Total Time of Generating Traces

(b) Distribution of Average Time of Generating a State

contains from 2 to 30 plans, within 3.9 hours.

In document A computational framework for behaviour adaptation: the case for agents and business processes (Page 79-84)