Offline vs Online Execution - Action Logic Programs: How to Specify Strategic Behavior in Dynam

In the context of sensing actions let us make some remarks on the issue of offline vs. online execution of ALPs. Basically, action logic programs can be used to solve two complementary tasks: They can be used to control an agent online, where the agent actually executes the actions that are part of a derivation; or the agent may use them offline, to infer a plan that helps it achieve its goals.

If we intend to use ALPs for the online control of an agent we can do this by non-deterministically picking one path in the proof tree of an action logic program. Essentially, this approach to online agent control is the same as the one taken in Golog and Flux.

In the case of the CLP(D) calculus the derivations are restricted to non-disjunctive substitutions, and the constraint rule may not be used at all. These restrictions are necessary because otherwise in general the agent would not know which action it is executing, or whether the action is even executable. The agent should behave more cautiously. Summing up, in the case of online execution, the LP(D) and the CLP(D) proof calculus really are identical. The resulting proof calculus is sound, but in the case of action domains that are not query-complete completeness of course does not hold.

But there is something more to the online execution of ALPs in the presence of sensing: The sensor axioms enumerate all the possible sensing results, and if we evaluate a sense fluent against the sensor axiom we obtain a disjunctive substitution.

3.7 Offline vs. Online Execution

In the online execution of an ALP, however, the idea is that we evaluate the sense fluent against the “real world”, not against our axiomatization. The real world determines a single sensing result that applies.

In the CLP(D) proof calculus this corresponds to considering only a single case of the disjunctive substitution to continue the derivation with. Unfortunately, if such a derivation is successful it is not even sound wrt. the domain axiomatization D and the program P — all that we can say is that it is consistent with D ∪ P. On the other hand, such a derivation is sound wrt. D ∪ P augmented by the real world; or, more precisely, D ∪ P augmented by the observed sensing results.

This is a problem shared by all approaches to the online control of agents equipped with sensors: Beforehand it is impossible to say which sensing results will be observed. But adding an actual sensing result to our formal world model D we obtain a new world model D0. Because our action theories are based on first order logic, which is monotonic, in D0 we do not lose any consequences from D — but possibly we get some new consequences. This is the reason why a derivation in CLP(D) for domains with sensing is only sound wrt. the program and the action domain augmented by the sensing results.

Now assume that an action domain contains disjunctive or merely existential information wrt. some properties of the world: An ALP that is meant for online agent control may try to ensure with the help of sensing actions that the actual values of these properties are known at run-time. Observe that for this it is of paramount importance that the values of fluents mentioned in a sensor axiom never contradict the agent’s world model.

Let us conclude this section with the remark that the combination of online execution and the computation rule of negation as finite failure is not unproblematic: We read negated goals as a goal not being achievable by the execution of actions. In the online setting this does not seem to make much sense. Hence, for this setting, negation as finite failure can only safely be used for program atoms that do not depend on the time-dependent special atoms.

Propositional Fluent Calculus Domains with

Sensing

In this chapter we present ALPprolog1 — an implementation of the ALP framework atop of action theories in a version of the Fluent Calculus that

• uses (a notational variant of) propositional logic for describing state properties; • is restricted to actions with ground deterministic effects; and

• includes sensing actions.

The intended application domain for ALPprolog is the online control of agents in dynamic domains with incomplete information — hence all the remarks on online reasoning from section 3.7 apply.

ALPprolog is inspired by, and closely related to, dialects of the two prominent action programming languages Flux [Thielscher, 2005a] (cf. also section 6.2) and Golog [Levesque et al., 1997] (cf. also section 6.1).

In a sense it covers the middle-ground between special and full Flux. In Special Flux the expressivity is limited to conjunctions of ground, positive atoms: negated atoms are expressed using the closed world assumption. ALPprolog also uses a ground state representation, but on the other hand also admits disjunctive state knowledge, negation, sensing, and incomplete knowledge (open world semantics). It does not reach the expressivity of full Flux in that it does not support arbitrary, possibly non-ground terms, quantifiers, and an explicit notion of knowledge. It transcends the expressivity of Flux in that it fully supports disjunction — Flux (depending on the dialect) allows only one negative literal in disjunctions (or none at all).

From the variant of Golog that supports open-world planning [Reiter, 2001a] ALPprolog takes the representation of state knowledge via prime implicates. Here, the major difference is that ALPprolog uses progression whereas Golog uses regres- sion.

The fragment of the Fluent Calculus that ALPprolog supports is tailored for an efficient implementation using Prolog list operations — the actual implementation is

The name refers to both the implementation language Prolog, and the underlying logic (Propositional Logic).

4.1 ALPprolog Programs

based on ECLiPSe Prolog [ECLiPSe Implementors Group, 2009]. One major design objective for ALPprolog was to both extend the expressivity of Special Flux and to retain (some of) its practical efficiency.

4.1 ALPprolog Programs

An ALPprolog program is an ALP that respects the following restrictions on the ?(Phi) atoms in the program:

• All occurrences of non-fluent expressions in φ are positive.

• So called sense fluents S(~x) that represent the interface to a sensor may only occur in the form ?(s(X)). Sense fluents are formally introduced below. The following will be our running example of a ALPprolog program throughout this section:

Example 4.1. Consider an agent whose task is to find gold in a maze. For the sake of simplicity, the states of the environment shall be described by a single fluent (i.e., state property): At (u, x) to denote that u ∈ {Agent , Gold } is at location x. The agent can perform the action Go(y) of going to location y, which is possible if y is adjacent to, and accessible from, the current location of the agent. The fluent and action are used as basic elements in the following agent logic program. It describes a simple search strategy based on two parameters: a given list of locations (choice points) that the agent may visit, and an ordered collection of backtracking points.

explore(Choicepoints,Backtrack) :- % finished, if ?(at(agent,X)), ?(at(gold,X)). % gold is found explore(Choicepoints,Backtrack) :-

?(at(agent,X)),

select(Y,Choicepoints,NewChoicepoints), % choose a direction do(go(Y)), % go in this direction explore(NewChoicepoints,[X|Backtrack]). % store the choice made explore(Choicepoints,[X|Backtrack]) :- % go back one step

do(go(X)),

explore(Choicepoints,Backtrack). select(X,[X|Xs],Xs).

select(X,[Y|Xs],[Y|Ys]) :- select(X,Xs,Ys).

Suppose we are given a list of choice points C, then the query :- explore(C,[]) lets the agent systematically search for gold from its current location: the first

clause describes the base case where the agent is successful; the second clause lets the agent select a new location from the list of choice points and go to this location (the declarative semantics and proof theory for do(α) will require that the action is possible at the time of execution); and the third clause sends the agent back using the latest backtracking point.

Because ALPprolog programs are meant for online execution the programmer must ensure that no backtracking over action executions occurs, by inserting cuts after all action occurrences. Observe that this applies to sensing actions, too. It is readily checked that — after the insertion of cuts — the ALP from example 4.1 satisfies all of the above conditions.

In document Action Logic Programs: How to Specify Strategic Behavior in Dynamic Domains Using Logical Rules (Page 86-90)