• No results found

Model-based contrastive explanations for explainable planning

N/A
N/A
Protected

Academic year: 2019

Share "Model-based contrastive explanations for explainable planning"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Model-Based Contrastive Explanations for Explainable Planning

Benjamin Krarup

, Michael Cashmore

, Daniele Magazzeni

, Tim Miller

Abstract

An important type of question that arises in Explainable Plan-ning is a contrastive question, of the form “Why action A instead of action B?”. These kinds of questions can be an-swered with acontrastive explanationthat compares proper-ties of the original plan containing A against the contrastive plan containing B. An effective explanation of this type serves to highlight the differences between the decisions that have been made by the planner and what the user would expect, as well as to provide further insight into the model and the planning process. Producing this kind of explanation requires the generation of the contrastive plan. This paper introduces domain-independent compilations of user questions into con-straints. These constraints are added to the planning model, so that a solution to the new model represents the contrastive plan. We introduce a formal description of the compilation from user question to constraints in a temporal and numeric PDDL2.1 planning setting.

1

Introduction

Explainable AI (XAI) is an emerging and important research area within AI. Recent work has shown that AI Planning is an important tool in XAI, as its decision-making mecha-nisms are model-based and so in principle more transparent. This recent work includes many approaches towards provid-ing explanations in AI plannprovid-ing.

Chakraborti et al. (2019) gives an in-depth overview of this work and different terms used within the XAI landscape. In particular, Zhang et al. (2017) shows that if an AI sys-tem behaves “explicably” there is less of a need for explana-tions. However, this is not always possible and explanation is sometimes required. Chakraborti et al. (2017) tackles ex-planation as a model reconciliation problem, arguing that the explanation must be a difference between the human model and AI model. Seegebarth et al. (2012) show that by repre-senting plans as first order logic formulae generating expla-nations is feasible in real time. In contrast, in this paper we focus on contrastive“why”questions. Fox, Long, and Mag-azzeni (2017) highlight some important questions in XAIP and discuss possible answers, and also describe how these “why”questions are especially important. Smith (2012) out-lines the approach to planning as an iterative process for

bet-∗

King’s College London, UK,{firstname.lastname}@kcl.ac.uk †

[image:1.612.372.507.214.370.2]

University of Melbourne, Australia,[email protected]

Figure 1: The four-stage process for generating a contrastive explanation from a user question. The hypothetical model is created by compiling the formal question into the planning model (in PDDL 2.1).

ter modelling preferences and providing explanations. We propose to follow this same approach.

The aim of explanations is to improve the user’s lev-els of understanding and trust in the system they are us-ing. These explanations can be local (regarding a specific plan) or global (concerning how the planning system works in general). In this paper we focus on local explanations of temporal and numeric planning problems, introducing an approach for explaining why a planner has made a cer-tain decision. Through active exploration of these specific cases, the user may also gain global insight into the way in which the planner makes decisions. (See (Lipton 1990; 2016; Ribeiro, Singh, and Guestrin 2016)).

(2)

questions arecontrastive questions, because the user is ask-ing why some action was selected rather than some other action that was not.

Instead, Miller argues that all such questions can be asked as contrastive questions of the form “Why action A rather than action B?” (Miller 2018a). Contrastive questions cap-ture the context of the question; they more precisely iden-tify the gaps in the user’s understanding of a plan that needs to be explained (Lewis 1986). A contrastive question about a plan can be answered by acontrastive explanation. Con-trastive explanations will compare the original plan against a contrastive plan that accounts for the user expectation. Pro-viding contrastive explanations is not only effective in im-proving understanding, but is simpler than providing a full causal analysis (Miller 2019).

Following the approach of Smith (2012) we propose an approach to contrastive explanations through a dialogue with the user. The proposed approach consists of an iterative four-stage process illustrated in figure 1. First the user asks a con-trastive question in natural language. Second, a constraint is derived from the user question, in the following we re-fer to this constraint as theformal question. Third a hypo-thetical model (HModel) is generated which encapsulates this constraint. A solution to this model is the hypotheti-cal plan (HPlan) that can be compared to the original plan to show the consequence of the user suggestion. The user can compare plans and iterate the process by asking further questions, and refining the HModel. This allows the user to combine different compilations to create a more constrained HModel, producing more meaningful explanations, until the explanation is satisfactory. Each stage of this process rep-resents a vital research challenge. This paper describes and formalises the third stage of this process: compiling the for-mal question into a hypothetical model for temporal and nu-meric planning.

We are interested in temporal and numeric planning prob-lems, for which optimal solutions are difficult to find. There-fore, while the process described above serves for explana-tion, the insight of the user can also result in guiding the planning process to a more efficient solution. As noted by (Smith 2012), the explanations could also give the user the opportunity to improve the plan with respect to their own preferences. The user could have hidden preferences which have not been captured in the model. The user could ask questions which enforce constraints that favour these ences. The new plan could be sub-optimal, but more prefer-able to the user.

The contribution of this paper is a formalisation of domain-independent and planner-agnostic compilations from formal contrastive questions to PDDL2.1 (Fox and Long 2003), necessary for providing contrastive explana-tions. The compilations shown are not exhaustive. However, they do cover an interesting set of questions which users would commonly have about both classical and temporal plans. The paper is organised as follows. The next section describes the planning definitions we will use throughout the paper. In Section 3 we describe the running example that we use to demonstrate our compilations throughout the paper. In Section 4 we list the set of formal questions that we are

interested in, and formalise the compilations of each of these into constraints. Finally, we conclude the paper in Section 5 whilst touching on some interesting future work.

2

Background

Our definition of a planning model follows the definition of PDDL2.1 given by (Fox and Long 2003), extended by a set of time windows as follows.

Definition 1 Aplanning modelis a pairΠ = hD, P robi. The domainD=hP s, V s, As, arityiis a tuple whereP sis a finite set of predicate symbols,V sis a finite set of function symbols, As is a set of action schemas, called operators, andarityis a function mapping all of these symbols to their respective arity. The problem P rob = hOs, I, G, Wiis a tuple whereOsis the set of objects in the planning instance,

Iis the initial state,Gis the goal condition, andW is a set of time windows.

A set of atomicpropositionsPis formed by applying the predicate symbolsP sto the objectsOs(respecting arities). One propositionpis formed by applying an ordered set of objectso ⊆O to one predicateps, respecting its arity. For example, applying the predicate (block on?a?b) with ar-ity2to the ordered set of objects{blockA, blockB}forms the proposition (block on blockA blockB). This process is called “grounding” and is denoted with:

ground(ps, χ) =p

whereχ ⊆Ois an ordered set of objects. Similarly the set of primitive numeric expressions(PNEs)V are formed by applying the function symbolsV stoOs.

A statesconsists of a timet∈R, a logical partsl ⊆P, and a numeric partsvthat describes the values for the PNE’s at that state. The initial stateIis the state at timet= 0.

The goal G = g1, ..., gn is a set of constraints over P andV that must hold at the end of an action sequence for a plan to be valid. More specifically, for an action sequence

φ=ha1, a2, . . . , anieach with a respective time denoted by

Dispatch(ai), we use the definition of plan validity from (Fox and Long 2003) (Definition 15 “Validity of a Simple Plan”). A simple plan is the sequence of actionsφwhich de-fines a happening sequence,ti=0...kand a sequence of states,

si=0...k+1such thats0 =Iand for eachi= 0. . . k,si+1is the result of executing the happening at timeti. The simple planφis valid ifsk+1|=G.

Each time windoww∈W is a tuplew=hwlb, wub, wvi wherewvis a proposition which becomes true or a numeric effect which acts upon somen ∈ V.wlb ∈ Ris the time at which the proposition becomes true, or the numeric effect is applied. wub ∈ Ris the time at which the proposition becomes false. The constraintwlb < wub must hold. Note that the numeric effect is not effected atwub.

Similar to propositions and PNEs, the set of ground ac-tionsAis generated from the substitution of objects for op-erator parameters with respect to it’s arity. Each ground ac-tion is defined as follows:

Definition 2 A ground action a ∈ A has a duration

(3)

(define (domain turtlebot_demo) (:types waypoint robot)

(:predicates

(robot_at ?v - robot ?wp - waypoint) (connected ?from ?to - waypoint) (visited ?wp - waypoint))

(:functions

(travel_time ?wp1 ?wp2 - waypoint)) (:durative-action goto_waypoint

:parameters (?v - robot ?from ?to - waypoint) :duration(= ?duration

(travel_time ?from ?to)) :condition (and

(at start (robot_at ?v ?from)) (over all (connected ?from ?to))) :effect (and

(at start (not (robot_at ?v ?from))) (at end (visited ?to))

[image:3.612.317.554.53.246.2]

(at end (robot_at ?v ?to)))))

Figure 2: The robotics domain used as a running example.

pass between the start and end of a; a start (end) con-dition P re`(a) (P rea(a)) which must hold at the state

thatastarts (ends); an invariant conditionP re↔(a)which

must hold throughout the entire execution of a; add ef-fects Eff(a)+`,Eff(a)+a ⊆ P that are made true at the start and ends of the action respectively; delete effects Eff(a)−`,Eff(a)−a ⊆Pthat are made false at the start and end of the action respectively; and numeric effectsEff(a)n`, Eff(a)n

↔,Eff(a)nathat act upon somen∈V.

3

Running Example

We use as a running example the following planning model. figure 2 shows the domainD. The domain describes a sce-nario in which a robot is able to move between connected waypoints and mark them as visited. The domain contains three predicate symbols (robot at, connected, visited) with arities2,2, and1 respectively. The domain includes only a single function symbol travel time with arity 2. There is a single operatorgoto waypoint.

Figure 3 shows the problemP rob. The problem speci-fies 7 objects: wp0, wp1, wp2, wp3, wp4, wp5 andkenny. The initial state specifies which propositions are ini-tially true, such as the current location of the robot

(robot at kenny wp0), and the initial values of the PNEs, e.g.(= (travel time wp5wp3) 4.68). Finally, the goal is specified as a constraint overP∪V, in this example that the robot has visited all of the locations.

Finally, figure 4 shows an example plan that solves this problem. This plan might appear sub-optimal. The robot moves from waypointwp2towp1and then immediately re-turns towp2. This second action might seem redundant to the user. However, upon closer inspection of the connectiv-ity of waypoints (shown in figure 5) we can see that the plan is in fact the optimal one. Visiting waypointwp1 is a goal of the problem, and it is only connected to waypointswp0

andwp2, both of which have already been visited. Waypoint

(define (problem task) (:domain turtlebot_demo) (:objects

wp0 wp1 wp2 wp3 wp4 wp5 - waypoint kenny - robot)

(:init

(robot_at kenny wp0) (visited wp0) (connected wp0 wp2) (connected wp0 wp4) (connected wp1 wp0) (connected wp1 wp2) (connected wp2 wp1) (connected wp2 wp4) (connected wp2 wp5) (connected wp3 wp5) (connected wp5 wp0) (connected wp5 wp2) (connected wp5 wp3)

(= (travel_time wp0 wp2) 1.45) (= (travel_time wp0 wp4) 2) ...

[image:3.612.321.556.291.372.2]

(:goal (and (visited wp1) (visited wp2) (visited wp3) (visited wp4) (visited wp5) )))

Figure 3: Example Problem with some travel time functions omitted for space.

0.00: (goto_waypoint kenny wp0 wp2) [1.45] 1.45: (goto_waypoint kenny wp2 wp1) [2.00] 3.45: (goto_waypoint kenny wp1 wp2) [2.00] 5.45: (goto_waypoint kenny wp2 wp5) [2.00] 7.45: (goto_waypoint kenny wp5 wp3) [4.68] 12.13: (goto_waypoint kenny wp3 wp5) [4.68] 16.81: (goto_waypoint kenny wp5 wp0) [0.99] 17.80: (goto_waypoint kenny wp0 wp4) [2.00]

Figure 4: Plan generated from the example domain and prob-lem. The cost of the plan is its duration (19.80).

wp0is only connected to waypointswp2andwp4,wp2has been visited andwp4is a dead end. For these reasons com-bined, the only logical option is to move back towp2after completing the goal of visitingwp1. This type of behaviour similarly happens between waypointswp3andwp5.

A graphical representation such as figure 5 is not al-ways available, and so even for this simple model and plan, deducing the reasoning behind the planned ac-tions is not trivial. This is an example of where XAIP is useful. Using our proposed approach the user could have asked the question: “Why do we use the action

(goto waypoint kenny wp1wp2), rather than not using it?” From this question we could generate a contrastive plan with this constraint enforced (shown in figure 6). Comparing the actions and costs of the original and the new plan could shed light on why the action needed to be used. The user can carry on asking questions until they were satisfied.

4

Formalisation

Definition 3 An explanation problem is a tuple E =

hΠ, φ, Qi, in which Π is a planning model (Definition 1),

(4)
[image:4.612.104.252.58.142.2]

Figure 5: Waypoint connectivity in the running example. The robot is only allowed to move along the directed arrows.

0.00: (goto_waypoint kenny wp0 wp2) [1.45] 1.45: (goto_waypoint kenny wp2 wp5) [2.00] 3.45: (goto_waypoint kenny wp5 wp3) [4.68] 8.13: (goto_waypoint kenny wp3 wp5) [4.68] 12.81: (goto_waypoint kenny wp5 wp2) [2.00] 14.81: (goto_waypoint kenny wp2 wp1) [2.00] 16.81: (goto_waypoint kenny wp1 wp0) [2.00] 18.81: (goto_waypoint kenny wp0 wp4) [2.00]

Figure 6: The hypothetical plan that accounts for the user’s suggestion, avoiding the action of moving fromwp1towp2. The cost of the plan is its duration (20.81).

In this paper, we assume that the user knows the model

Πand the planφ, so answers such as stating the goal of the problem will not increase their understanding. Given this, we propose the following set of questions, and provide a for-mal description for compilations of this set of forfor-mal ques-tions of temporal plans:

1. Why is actionaused in states, rather than actionb? (Sec-tion 4.1)

2. Why is action anot used in the plan, rather than being used? (Section 4.2)

3. Why is action aused in the plan, rather than not being used? (Section 4.3)

4. Why is actionaused at timet, rather than at least some timet0after/beforet? (Section 4.4)

5. Why is actionaused outside of time window w, rather than only being allowed withinw? (Section 4.5)

6. Why is actionanot used in time windoww, rather than being used withinw? (Section 4.6)

7. Why is action a not performed before (after) action b, rather than a being performed after (before) b? (Sec-tion 4.7)

These questions were derived by systematically assessing ways that counterfactual situations could occur in plans, and choosing those that would be useful over many applications. This is not an exhaustive list of possible constraints that can be enforced upon the original model, however, it does repre-sent a list of questions that would be useful in specific con-texts and applications.

Part of being able to answer these questions is the abil-ity to reason about what would happen in the counterfactual cases. We approach this problem by generating plans for the

counterfactual cases via compilations. A compilation of a planning instance where the model is given byΠ, and a ques-tion is given by Qis shown asCompilation(Π, Q) = Π0 where:

Π0 =hhP s0, V s, As0, arity0i,hOs, I0, G0, W0ii We callΠ0thehypothetical model, or HModel.

However,Π0 can also be used as the input model so that the user can iteratively ask questions about some model, i.e:

Compilation(Compilation(Π, Q), Q0)

This allows the user to stack questions, further increasing their understanding of the plan through combining compi-lations. Combining compilations this way provides a much wider set of possible constraints.

After the HModel is formed, it is solved to give the HPlan. Any new operators that are used in the compilation to en-force some constraint are trivially renamed back to the orig-inal operators they represent. For each iteration of compila-tion the HPlan is validated against the original modelΠ.

4.1

Replacing an Action in a State

Given a planφ, a formal questionQis asked of the form: Why is the operatorowith parametersχused in states, rather than the operatornwith parametersχ0? where

o6=norχ6=χ0

For example, given the example plan in figure 4 the user might ask:

“Why is (goto waypoint kenny wp2wp5) used, rather than(goto waypoint kenny wp2wp4)?”

They might ask this because a goal of the problem is to visit

wp4. As the robot visitswp5fromwp3later in the plan, it might make sense to the user that the robot can visit wp4

earlier, aswp5will be visited at a later point.

To generate the HPlan, a compilation is formed such that the ground actionb =ground(n, χ0)appears in the plan in place of the actionai = ground(o, χ). Given the example above b = ground(goto waypoint,{kenny, wp2, wp4}), and ai = ground(goto waypoint,{kenny, wp2, wp5}). Given a plan:

φ=ha1, a2, . . . , ani

The ground actionaiat statesis replaced withb, which is executed, resulting in stateI0, which becomes the new ini-tial state in the HModel. A time window is created for each durative action that is still executing in states. These model the end effects of the concurrent actions. A plan is then gen-erated from this new state with these new time windows for the original goal, which gives us the plan:

φ0 =ha01, a02, . . . , a0ni

The HPlan is then the initial actions of the original planφ

concatenated withband the new planφ0: ha1, a2, . . . , ai−1, b, a01, a02, . . . , a0ni Specifically, the HModelΠ0is:

[image:4.612.55.289.189.270.2]
(5)

• I0 is the final state obtained by executing1 ha1, a2, . . . , ai−1, bifrom stateI.

• Cis a set of time windowswx, for each durative actionaj that is still executing in the stateI. For each such action,

wx specifies that the end effects of that action will be-come true at the time point at which the action is sched-uled to complete. Specifically:wx = hDispatch(aj) +

Dur(aj)−Dispatch(b), inf, uiwhereu=Eff(aj)−a ∪ Eff(aj)+a ∪Eff(aj)na.

In the case in which an actionaj that is executing in state

I0 has an overall condition that is violated, this is detected when the plan is validated against the original model. As an example, given the user question above, the new initial state

I0from the running example is shown below:

(:init

(robot at kenny wp4) (visited wp2) (visited wp1) (visited wp4)

(connected wp0 wp2) (connected wp0 wp4) (connected wp1 wp0) (connected wp1 wp2) ...)

(:goal (and (visited wp1)(visited wp2) (visited wp3)(visited wp4) (visited wp5) )))

This captures the stateI0, resulting from executing the actionsa1, a2, a3, andb:

0.00: (goto_waypoint kenny wp0 wp2) [1.45]

1.45: (goto_waypoint kenny wp2 wp1) [2.00] 3.45: (goto_waypoint kenny wp1 wp2) [2.00]

5.45: (goto_waypoint kenny wp2 wp4) [2.00]

In this state the robot has visited the waypointswp2,wp1, andwp4, and is currently atwp4. This new initial state is then used to plan for the original goals to get the planφ0, which, along with bandφ, gives the HPlan. However, the problem is unsolvable from this state as there are no con-nections fromwp4to any other waypoint. By applying the user’s constraint, and showing there are no more applicable actions, it answers the above question: “because by doing this there is no way to complete the goals of the problem”.

This compilation keeps the position of the replaced ac-tion in the plan, however, it may not be optimal. This is be-cause we are only re-planning after the inserted action has been performed. The first half of the plan, because it was originally planned to support a different set of actions, may now be inefficient, as shown by Borgo, Cashmore, and Mag-azzeni (2018).

If the user instead wishes to replace the action without necessarily retaining its position in the plan, then the fol-lowing constraints on adding and removing an action from the plan can be applied iteratively, as mentioned previously.

4.2

Add an Action to the Plan

Given a planφ, a formal questionQis asked of the form: Why is the operator o with parameters χ not used, rather than being used?

1

We use VAL to validate this execution. We use the add and delete effects of each action, at each happening (provided by VAL), up to the replacement action to computeI0.

For example, given the example plan in figure 4 the user might ask:

“Why is (goto waypoint kenny wp2wp4) not used, rather than being used?”

They might ask this because a goal of the problem is to visit

wp4. As the robot is atwp2early in the plan, and you can visitwp4fromwp2, it might make sense to the user for the robot to visitwp4at that time.

To generate the HPlan, a compilation is formed such that the actiona = ground(o, χ)must be applied for the plan to be valid. The compilation introduces a new predicate

has done a, which represents which actions have been ap-plied. Using this, the goal is extended to include that the user suggested action has been applied. The HModelΠ0is:

Π0=hhP s0, V s, As0, arity0i,hOs, I, G0, Wii

where

• P s0=P s∪ {has done a} • As0 ={o0} ∪As\ {o}

• arity0(x) =arity(x),xarity

• arity0(has done a) =arity0(o0) =arity(o)

• G0=G∪ {ground(has done a, χ)}

where the new operator o0 extends o with the add effect

has done awith corresponding parameters, i.e.

Eff+a(o0) =Eff+a(o)∪ {has done a}

For example, given the user question above, the opera-torgoto waypointfrom the running example is extended to

goto waypoint0with the additional add effecthas done a:

(:durative-action goto_waypoint’ :parameters (?v - robot

?from ?to - waypoint) :duration( = ?duration

(travel_time ?from ?to))

:condition (at start (robot_at ?v ?from) (over all (connected ?from ?to)) :effect (and (at end (visited ?to))

(at start (not (robot_at ?v ?from))) (at end (robot_at ?v ?to))

(at end (has done goto waypoint’ ?v ?from ?to)))))

and the goal is extended to include the proposition:

(has done goto waypoint kenny wp2wp4).

4.3

Remove a Specific Grounded Action

Given a planφ, a formal questionQis asked of the form: Why is the operatorowith parametersχused, rather than not being used?

For example, given the example plan in figure 4 the user might ask:

(6)

A user might ask this because the robot has already sat-isfied the goal to visitwp2before this point with the action

(goto waypoint kenny wp0wp2). The user might think the second action(goto waypoint kenny wp1wp2) seems re-dundant.

The specifics of the compilation is similar to the compi-lation in Section 4.2. The HModel is extended to introduce a new predicatenot done actionwhich represents actions that have not yet been performed. The operatorois extended with the new predicate as an additional delete effect. The ini-tial state and goal are then extended to include the user se-lected grounding ofnot done action. Now, when the user selected action is performed it deletes the new goal and so invalidates the plan. This ensures the user suggested action is not performed.

For example, given the user question above, an HPlan is generated that does not include the action

(goto waypoint kenny wp1wp2), and is shown in figure 6.

4.4

Delay/Advance an Action

Given a planφ, a formal questionQis asked of the form: Why is the operatorowith parametersχused at time

t, rather than at least some durationt0after/beforet? For example, given the example plan in figure 4 the user might ask:

“Why is (goto waypoint kenny wp2wp5) used at time5.45, rather than at least4seconds earlier?”

A user might ask this question in general because they expected an action to appear earlier or later in a plan. This could happen for a variety of reasons. In domains with re-sources that are depleted by specific actions, and are replen-ished by others, such as fuel for vehicles, these questions may arise often. A user might want an explanation for why a vehicle was refueled earlier or later than what was expected. In this case the refuel action can be delayed or advanced to answer this question.

For this particular example the user might want the action

(goto waypoint kenny wp2wp5) to be advanced nearer the start of the plan. The user might see that in the original plan the the robot goes fromwp2towp1 at time1.45and then instantly goes back again. The user might think that a better action would be to go fromwp2towp5before this. The user might notice thatwp5is connected to more way-points thanwp1. Having these extra options might prevent redundant actions that revisit waypoints.

To generate the HPlan, the planning model is compiled such that the ground actiona = ground(o, χ)is forced to be used in time windowwwhich is at leastt0 before/after

t. This compilation is an example of a combination of two other compilations: adding an action (in Section 4.2) and for-bidding the action outside of a time window (in Section 4.5). The latter enforces that the action can only be applied within the user specified time window, while the former enforces that the action must be applied. Specifically, the XModelΠ0

is:

• P0=P∪ {can do a, not done a, has done a} • A0={oa, o¬a} ∪A\ {o}

• arity0(x) =arity(x),∀x∈arity

• arity0(can do a) =arity0(not done a) =

arity0(has done a) =arity0(o

a) =

arity0(o¬a) =arity(o)

• I0=I∪ {ground(not done a, χ)}

• G0=G∪ { ground(not done a, χ),

ground(has done a, χ) }

• W0 =W∪

bef ore:h0, tReal, ground(can do a, χ)i

af ter:htReal, inf, ground(can do a, χ)i where the new operatorsoaando¬aboth extendo. The latter with the delete effectnot done a, whileoaextends owith the preconditioncan do aand add effecthas done a; i.e.:

Eff−a(o¬a) =Eff−a(o)∪ {not done a}

P re↔(oa) =P re↔(o)∪ {can do a}

Eff+a(oa) =Eff+a(o)∪ {has done a}

This ensures that the ground actiona = ground(oa, χ) must be present in the plan between the times0andtReal, ortRealandinf, depending on the user question, and be-tween those times only. In addition, the user selected ac-tion is forced to be performed using the same approach as in Section 4.2. For example, given the user question above, the operator goto waypoint from figure 2 is extended to

goto waypoint notaandgoto waypoint a. The resulting HPlan is:

0.00: (goto_waypoint kenny wp0 wp2) [1.45] 1.45: (goto waypoint a kenny wp2 wp5) [2.00] 3.45: (goto_waypoint kenny wp5 wp3) [4.68] 8.13: (goto_waypoint kenny wp3 wp5) [4.68] 12.81: (goto_waypoint kenny wp5 wp2) [2.00] 14.81: (goto_waypoint kenny wp2 wp1) [2.00] 16.81: (goto_waypoint kenny wp1 wp0) [2.00] 18.81: (goto_waypoint kenny wp0 wp4) [2.00]

4.5

Forbid an Action Outside a Time Window

Given a planφ, a formal questionQis asked of the form: Why is the operatorowith parametersχused outside of time lb < t < ub, rather than only being allowed within this time window?

For example, given the example plan in figure 4 the user might ask:

“Why is (goto waypoint kenny wp0wp4) used out-side of times 0 and 2, rather than being restricted to that time window?”

A user can ask this because the action

(goto waypoint kenny wp0wp4) is used at the end of the plan, the robot starts at wp0 and must visit wp4 to satisfy a goal. The user might think that satisfying this goal earlier in the plan will free up time for the robot to complete the other goals.

(7)

Operator o¬a replaces the original operator o for all other actions ground(o, χ0), where χ0 6= χ. The action

ground(o¬a, χ) cannot be used (this is enforced using the compilation for forbidding an action described in Sec-tion 4.3). Operatoroaacts as the operatorospecifically for the actiona=ground(o, χ), which has an added constraint that it can only be performed betweenlbandub. Specifically, the HModelΠ0is:

Π0 =hhP s0, V s, As0, arity0i,hOs, I0, G0, W0ii

where:

• P s0=P s∪ {can do a, not done a} • As0 ={oa, o¬a} ∪As\ {o}

• arity0(x) =arity(x),∀x∈arity

• arity0(can do a) = arity0(not done a) =

arity0(oa) =arity0(o¬a) =arity(o) • I0=I∪ {ground(not done a, χ)} • G0=G∪ {ground(not done a, χ)} • W0=W ∪ {hlb, ub, ground(can do a, χ)i}

where the new operatorso¬aandoaextendowith the delete effectnot done a and the preconditioncan do a, respec-tively. i.e:

Eff−`(o¬a) =Eff−`(o)∪ {not done a}

P re`(oa) =P re`(o)∪ {can do a}

As the proposition ground(can do a, χ)must be true for

ground(oa, χ)to be performed, this ensures that the action

acan only be performed within the timeslbandub. Other actions from the same operator can still be applied at any time using the new operatoro¬a. As in Section 4.3 we make sure the ground actionground(o¬a, χ)can never appear in the plan.

For example, given the user question above, the operator

goto waypointfrom Figure 2 is extended too¬a andoaas shown below:

(:durative-action goto_waypoint_nota :parameters (?v - robot

?from ?to - waypoint) :duration(= ?duration

(travel_time ?from ?to)) :condition (and

(at start (robot_at ?v ?from)) (over all (connected ?from ?to))) :effect (and

(at end (visited ?to))

(at start (not (robot_at ?v ?from))) (at end (robot_at ?v ?to))

(at start (not (not done goto waypoint ?v ?from ?to)))))

(:durative-action goto_waypoint_a :parameters (?v - robot

?from ?to - waypoint) :duration(= ?duration

(travel_time ?from ?to)) :condition (and (at start

(can do goto waypoint ?v ?from ?to)) (at start (robot_at ?v ?from)))

:effect (and (at end (visited ?to)) (at start (not (robot_at ?v ?from))) (at end (robot_at ?v ?to))))

The initial state is extended to include the proposition

(not done goto waypoint kenny wp0wp4) and the time window h0,2,(can do goto waypoint kenny wp0wp4)i. This time window enforces that the proposition

(can do goto waypoint kenny wp0wp4) is true between times 0 and 2. The resulting HPlan is:

0.00: (goto_waypoint kenny wp0 wp2) [1.45] 1.45: (goto_waypoint kenny wp2 wp1) [2.00] 3.45: (goto_waypoint kenny wp1 wp2) [2.00] 5.45: (goto_waypoint kenny wp2 wp5) [2.00] 7.45: (goto_waypoint kenny wp5 wp3) [4.69] 12.14: (goto_waypoint kenny wp3 wp5) [4.69] 16.83: (goto_waypoint kenny wp5 wp2) [2.00] 18.84: (goto_waypoint kenny wp2 wp4) [2.98]

Following the user suggestion, the action is no longer ap-plied outside of the time window, and in fact does not appear in the plan at all.

4.6

Add an Action Within a Time Window

Given a planφ, a formal questionQis asked of the form: Why is the operatoro with parametersχnot used at timelb < t < ub, rather than being used in this time window?

For example, given the example plan in figure 4 the user might ask:

“Why is (goto waypoint kenny wp0wp4) not used between times 0 and 2, rather than being used in this time window?”

The HPlan given in Section 4.5 shows the user that there is a better plan which does not have the action in this time window. However, the user may only be satisfied once they have seen a plan where the action is performed in their given time window. To allow this the action may have to appear in other parts of the plan as well.

This constraint differs from Section 4.5 in two ways: first the action is now forced to be applied in the time window, and second the action can be applied at other times in the plan. This constraint is useful in cases such as a robot that has a fuel level. As fuel is depleted when travelling between waypoints, the robot must refuel, possibly more than once. The user might ask “why does the robot not refuel between the timesxandy(as well as the other times it refuels)?”.

To generate the HPlan, the planning model is compiled such that the ground actiona= ground(o, χ)is forced to be used between timeslbandub, but can also appear at any other time. This is done using a combination of the com-pilation in Section 4.2 and a variation of the comcom-pilation in Section 4.5. Simply, the former ensures that new action

(8)

a =ground(o, χ)to be applied at other times in the plan. Given this, the HModelΠ0is:

Π0 =hhP s0, V s, As0, arity0i,hOs, I, G0, W0ii

where:

• P s0=P s∪ {can do a, has done a} • As0 ={oa} ∪As

• arity0(x) =arity(x),∀x∈arity

• arity0(can do a) =arity0(has done a) =arity0(oa) =arity(o)

• G0=G∪ {ground(has done a, χ)} • W0=W ∪ {hlb, ub, ground(can do a, χ)i}

Aswp4is a dead end there is no valid HPlan following this suggestion.

4.7

Reordering Actions

Given a planφ, a formal questionQis asked of the form: Why is the operatoro with parametersχ used before (after) the operatornwith parametersχ0, rather than after (before)? whereo6=norχ6=χ0

For example, given the example plan in figure 4 the user might ask:

“Why is (goto waypoint kenny wp2wp1) used be-fore(goto waypoint kenny wp2wp5), rather than af-ter?”

A user might ask this because there are more connections fromwp5 thanwp2. The user might think that if the robot has more choice of where to move to, the planner could make a better choice, giving a more efficient plan.

The compilation to the HModel is performed in the fol-lowing way.

First, a directed-acyclic-graph (DAG) hN, Eiis built to represent each ordering between actions suggested by the user. For example the ordering ofQisa ≺ bwherea =

[image:8.612.103.241.513.565.2]

ground(o, χ) andb = ground(n, χ0). The DAG is illus-trated in Figure 7.

Figure 7: The directed-acyclic-graph of the ordering con-straint enforced by the formal questionQ. Each ground ac-tion is represented by a node.

This DAG is then encoded into the modelΠto createΠ0. For each edge(a, b) ∈ E two new predicates are added:

orderedab representing that an edge exists betweenaand

bin the DAG, andtraversedab representing that the edge between actionsaandbhas been traversed.

For each node representing a ground actiona ∈ N, the action is disallowed using the compilation from Section 4.3.

Also, for each such action a new operatoroais added to the domain, with the same functionality of the original operator

o. The arity of the new operator,arity(oa)is the combined arity of the original operator plus the arity of all ofa’s sink nodes. Specifically, the HModelΠ0is:

Π0=hhP s0, V s, As0, arity0i,hOs, I0, G0, Wii where:

• P s0=P s∪ {orderedab} ∪ {traversedab},∀(a, b)∈E • As0 ={oa} ∪As,∀a∈N

• arity0(x) =arity(x),∀x∈arity

• arity0(oa) =arity(o) +P(a,b)∈Earity(b),∀a∈N • arity0(orderedab) =arity(a) +arity(b),∀(a, b)∈E • arity0(traversedab) =arity(b),∀(a, b)∈E

• I0=I∪ground(orderedab, χ+χ0),∀(a, b)∈E, where

χandχ0are the parameters ofaandb, respectively. In the above, we abuse the arity notation to specify the arity of an action to mean the arity of the operator from which it was ground; e.g.arity(a) = arity(o)wherea=

ground(o, χ).

Each new operatoroaextendsowith the precondition that all incoming edges must have been traversed, i.e. the source node has been performed. The effects are extended to add that its outgoing edges have been traversed. That is:

P re`(oa) =P re`(o)∪ {orderedab∈P s0,∀b} ∪ {traversedca∈P s0,∀c} Eff+a(oa) =Eff+a(o)∪ {traversedab∈P s0,∀b} This ensures that the ordering the user has selected is maintained within the HPlan.

As the operator oa has a combined arity of the orig-inal operator plus the arity of all of a’s sink nodes, there exists a large set of possible ground actions. How-ever, for all b ∈ N, orderedab is a precondition of oa; and for each edge (a, b) ∈ E the ground proposition

ground(orderedab, χ, χ0)is added to the initial state to rep-resent that the edge exists in the DAG. Therefore, the only grounding of the operator that can be performed is the action with parametersχ+χ0. This drastically reduces the size of the search space.

For example given the user question above, two new op-erators node goto waypoint kenny wp2 wp1 (shown in figure 8) and node goto waypoint kenny wp2 wp5 are added to the domain. These extend operatorgoto waypoint

from figure 2 as described above. The HPlan generated is shown below:

0.00: (goto_waypoint kenny wp0 wp2) [1.45] 1.45: (node goto waypoint kenny wp2 wp5 kenny kenny wp2 wp5 wp2 wp1) [2.00]

3.45: (goto_waypoint kenny wp5 wp3) [4.68] 8.13: (goto_waypoint kenny wp3 wp5) [4.68] 12.81: (goto_waypoint kenny wp5 wp2) [2.00] 14.81: (node goto waypoint kenny wp2 wp1 kenny wp2 wp1) [2.00]

(9)

(:durative-action

node_goto_waypoint_kenny_wp2_wp5 :parameters (?v1 ?v2 - robot

?from1 ?to1 ?from2 ?to2 - waypoint) :duration ( = ?duration

(travel_time ?from1 ?to1)) :condition (and (at start

(robot_at ?v1 ?from1))

(over all (connected ?from ?to)) (at start (ordered wp2 wp5 wp2 wp1 ?v1 ?v2 ?from1 ?to1 ?from2 ?to2)))

:effect (and (at end (visited ?to1)) (at start (not (robot_at ?v1 ?from1))) (at end (robot_at ?v1 ?to1))

(at end (traversed v2 from2 to2 ?v2

[image:9.612.54.279.63.218.2]

?from2 ?to2)) ))

Figure 8: An operator added to the original domain to cap-ture an ordering constraint between actions. The operator ex-tends the originalgoto waypointoperator.

5

Conclusion

In this paper we have presented an approach to compiling a set of formal contrastive questions into domain independent constraints. These are then used within the XAI paradigm to provide explanations. We have described how these com-pilations form a part of a series of stages which start with a user question and end with an explanation. This paper formalises and provides examples of these compilations in PDDL 2.1 for temporal and numeric domains and planners.

We have defined a series of questions which we believe a user may have about a plan in a PDDL2.1 setting. These questions cover a large set of scenarios, and can be stacked to create new interesting constraints which may answer a much richer set of questions.

We acknowledge that the questions we provide compila-tions for do not cover the full set of contrastive quescompila-tions one may have about a plan. For example the question, “Why is the operatorowith parametersχused at timelb < t < ub, rather than not being used in this time window?”, can be an-swered using a variant of Section 4.6. For future work we plan to investigate which compilations will form an atomic set whose elements can be stacked to cover the full set of possible contrastive questions. We also acknowledge that the compilations we have formalised may have equivalent com-pilations. However, the ones we have described have proven successful for explanations.

In future work, we will look to extend this work in several ways. First, while we define how to calculate plans for con-trastive cases, we do not take full advantage of concon-trastive explanations by explaining thedifferencebetween two plans (Miller 2018b). In particular, we will look to extend the pre-sentation beyond just plans into showing the difference be-tween two causal chains as well. Second, we will provide both functional and human-behavioural evaluations of our framework, to ensure that the provided explanations are both satisfactory from a user perspective, and also that they pro-vide actionable insight into the plan.

References

Borgo, R.; Cashmore, M.; and Magazzeni, D. 2018. Towards providing explanations for AI planner decisions. IJCAI-18 Workshop on Explainable AI.

Chakraborti, T.; Sreedharan, S.; Zhang, Y.; and Kambham-pati, S. 2017. Plan explanations as model reconciliation: Moving beyond explanation as soliloquy. InIJCAI. Chakraborti, T.; Kulkarni, A.; Sreedharan, S.; Smith, D. E.; and Kambhampati, S. 2019. Explicability? legibility? pre-dictability? transparency? privacy? security? the emerging landscape of interpretable agent behavior. InICAPS. Fox, M., and Long, D. 2003. PDDL2.1: An extension to pddl for expressing temporal planning domains. Journal of Artificial Intelliigence Research20:61–124.

Fox, M.; Long, D.; and Magazzeni, D. 2017. Explain-able planning. IJCAI-17 workshop on Explainable AI abs/1709.10256.

Haynes, S. R.; Cohen, M. A.; and Ritter, F. E. 2009. Designs for explaining intelligent agents. International Journal of Human-Computer Studies67(1):90 – 110.

Lewis, D. 1986. Causal explanation. In Lewis, D., ed., Philosophical Papers Vol. Ii. Oxford University Press. 214– 240.

Lim, B. Y.; Dey, A. K.; and Avrahami, D. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, 2119–2128.

Lipton, P. 1990. Contrastive explanation. Royal Institute of Philosophy Supplement27:247266.

Lipton, Z. C. 2016. The mythos of model interpretability. CoRRabs/1606.03490.

Miller, T. 2018a. Contrastive explanation: A structural-model approach.arXiv preprint arXiv:1811.03163. Miller, T. 2018b. Contrastive explanation: A structural-model approach.CoRRabs/1811.03163.

Miller, T. 2019. Explanation in artificial intelligence: In-sights from the social sciences.Artificial Intelligence267:1– 38.

Mueller, S. T.; Hoffman, R. R.; Clancey, W. J.; Emrey, A.; and Klein, G. 2019. Explanation in human-ai systems: A lit-erature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI.CoRRabs/1902.01876. Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. ”why should I trust you?”: Explaining the predictions of any clas-sifier.CoRRabs/1602.04938.

Seegebarth, B.; M¨uller, F.; Schattenberg, B.; and Biundo, S. 2012. Making hybrid plans more clear to human users — a formal approach for generating sound explanations. In ICAPS, 225–233.

Figure

Figure 1: The four-stage process for generating a contrastiveexplanation from a user question
Figure 3: Example Problem with some travel time functionsomitted for space.
Figure 6: The hypothetical plan that accounts for the user’ssuggestion, avoiding the action of moving from wp1 to wp2.The cost of the plan is its duration (20.81).
Figure 7: The directed-acyclic-graph of the ordering con-straint enforced by the formal question Q
+2

References

Related documents