• No results found

Experimental Evaluation

Goal Recognition as Path-Planning †

3.3 Experimental Evaluation

We have seen that, in the context of path-planning, the R&G model of GR can be re-formulated to arrive at single-observation recognition and that the single-observation cost difference formula at the heart of our single-observation account can then be used to calculate the RMP around a goal in discrete or continuous domains.

In this section, we report on the results of our experimentation. We tested the performance of GR in path-planning using R&G’s complex cost difference (RG1), the simpler version (3.1) (that does not reason negatively about observations), and the single-observation formula (3.2), which we have proposed for online and offline recognition (to generate a probabilistic heatmap) and as the basis of the RMP formula (3.6). Tests were conducted using problems adapted from the well-known Moving-AI20 path-planning benchmarks (Sturtevant, 2012), which discretise the underlying maps and groundplans to a 512 × 512 grid. We experimented in a discrete environment and have relied on

20http://movingai.com/

our theoretical conclusions to support the applicability of our formulas in the continuous domain.21

Our aim was to develop an experimental framework for GR in path-planning to empirically confirm: (i) that the case of exclusive optimality (as in Theorem 3) is rare and, otherwise, the simpler formula (3.1) yields identical posterior probability distributions to (RG1); (ii) that all three accounts return posterior probability distributions that rank goals the same; and (iii) that use of either formula (3.1) or (3.2) cuts processing time by more than half.

3.3.1 Experimental setup

We generated over 2500 individual probability distributions from a problem set of 774, built on 43 scenarios selected at random from two sets of Moving-AI benchmarks (Sturte-vant, 2012):22 game landscapes from StarCraft; and connected room layouts (chosen for their similarity to internal locations, such as airport terminals or shopping centres). Since the scenarios are intended for path-planning, we adapted them for GR as follows. First, we added two to five additional (reachable) candidate goals at random locations. Second, to generate the observations, we used Weighted-A* (Pohl,1970) to build three full contin-uous paths from the start location to the real goal, differing in quality: one optimal, one suboptimal, one greedy. We then extracted observation sequences varying two further di-mensions: ‘observation density’, that is, the proportion of the continuous path used in the extracted observation sequence (sparse 20%, medium 50%, dense 80%) and ‘observation strategy’, that is, the method of extracting the observations, which was either random (taking the required density of observations from random locations anywhere along the path) or prefix (taking the required density as a consecutive sequence of nodes from the start location on).

In preliminary tests, we had found that the probability distribution formula (RG2) was particularly sensitive to any small variation in cost difference. This was especially noticeable with the large negative values often returned by our single-observation formula (3.2) for the most probable goal in a distribution (where the optimal cost of a complete path is subtracted from the much smaller cost of reaching that goal from the more recent observation).

Note. We initially attributed the sensitivity of R&G’s probability distribution for-mula (RG2) to the use of exponential values in generating posterior probabilities.

Sub-21Arguably, a grid can be regarded as a middle way between continuous and discrete domains in that while, algorithmically, it behaves like a graph (i.e., we can use variations of Dijkstra’s algorithm or A* to find shortest paths), it can equally be used to discretise almost any continuous space, adjusting the size of the grid to whatever granularity of solution is required.

22Experiments were conducted on an i7 3.4GHz dual core with 10GB RAM in a virtual Linux environ-ment; preliminary and manual tests were conducted on a similar 1.8GHz machine.

sequent examination, however—conducted while investigating its performance when dealing with excessively suboptimal paths, which we report on in Part II—reveals a

‘quirk’ in the formula so that, although it reliably enforces the intuition that lower cost difference results in higher probability across goals, it does not enforce that intuition with respect to the probability values for any one goal.

In fact, as we will show (p.105), the higher the cost difference for any one goal, the higher its individual probability value. Conversely, with our unusually low (negative) cost differences using the single-observation formula, low cost difference resulted in low probability values, which exagerated the delta we recorded between the value returned by single-observation and the value returned using complex cost difference.

To compensate for the unexpected behaviour noted above, in addition to the three probability distributions derived using formula (RG2) with the original (baseline) formula (PRG), the simpler formula (P1) and the single-observation formula (P2), we also included a variation on (P2) obtained by adding a large constant value (which we set at 800) to the cost difference returned by the single-observation formula (P2). Recall that the β constant is a rate parameter, which modulates the shape of the distribution (as further discussed in Part 2, p.111). For the automated tests, we adopted a β value of 0.1 throughout. (With β = 1, PRG and P1 tended to return 1 for the most probable goal and otherwise 0, while with β = 0.01, the distribution tended to even out and, with loss of precision on test equipment, returned for example, 0.33 for each of three goals, 0.25 for each of four, and so on). We used the usual uniform-cost approach for grids, with horizontal and vertical moves costed at 1, and diagonal moves at √

2; and we made the simplifying assumption that priors were equal.

In addition to the auto-generated problem set, we manually set up individual exper-iments to trial the various cost difference formulas against completely open landscapes and ‘single-pixel’ mazes (through which there is typically only one path from any given starting point to goal). For simplicity and, given that planners are meant to be used off-the-shelf, optimal costs for paths with, and without, waypoints were calculated using a standard A* algorithm (Hart et al.,1968)23. To obtain the cost of an optimal path that did not embed the observations, inspired by the technique used for R&G, we modified A*

so that each search node, in addition to a location indicator, also included an observation counter. When the counter reached the total number of observations (meaning all obser-vations had been encountered) the search node representing that last observation—and so the associated path that embedded all the observations—was pruned.

23We used our own Python-based infrastructure, originally designed as a simulator and testbed for path-planning algorithms (https://tinyurl.com/p4sim).

3.3.2 Results

Our theoretical results were confirmed. In hand-crafted problems using maps with open landscapes, the formulas performed exactly as predicted: formulas (RG1) and (3.1) re-turned identical results, all four formulas ranked goals identically by probability and (after the first iteration in a domain with the same start location and goals) formulas (3.1) and (3.2) returned in half the time of formula (RG1).

In the single-pixel maze, again as predicted, the implementation based on formula (RG1) was unable to return a probability distribution (because the most probable goal gave a cost difference of −∞), whereas formulas (3.1) and (3.2) returned in 0.005 and 0.002 seconds, respectively, and successfully identified the real goal.

The corner case of exclusive optimality, in which observations conform to the only optimal path to goal, did not arise in any of the randomly generated scenarios. This is perhaps unsurprising, given the symmetries found in a two-dimensional grid. Nevertheless, we were able to reproduce the condition by exactly replicating the example scenario. That is, we set up an environment in which diagonal moves were prohibited, there were three goals, observations were on the optimal path to all of them but one goal lay in a straight line from the start location. In the resulting probability distributions, formula (RG1) returned 0.329, 0.342, 0.329 (for L, M and N in Figure 3.13, respectively), whereas formulas (3.1) and (3.2) returned 0.333 for all three goals.

Tables3.3and3.4summarise the results of our automated tests. Column Obs displays the percentage of nodes from the full path that were included in the observation sequence.

P indicates that the observations were extracted using the continuous path prefix strategy, and R that they were extracted using the random strategy, that is, randomly drawn from the length of the path. Column Time displays the average time-taken per GR problem in seconds. Column Match shows the percentage of probability distributions where the probability value for the target goal exactly matched that generated using cost difference formula (RG1). Where a difference was recorded (only in the cases of P2and P2), column

∆ displays the average difference. Our main findings were as follows.

• The implementation using cost-difference (RG1) performed even more slowly than we expected.

– In the room layouts (Table3.3), it frequently exceeded our three minute time-out; the longer the paths (i.e., the larger the set of observations) and the more optimal the path they were extracted from, the longer the algorithm took.

This is explained by the difficulty of identifying an alternative optimal path (i.e., when we randomly selected problems that timed out and let them run to completion, cost difference for the target goal ultimately returned zero).

The problem was exacerbated in room layouts, which are significantly more restrictive than landscapes (doorways are only one-pixel wide).

Table 3.3: Rooms.

PRG P1 P2, P2 P2 P2

Obs Time Time Time Match ∆ Match ∆

Optimal

20%P 94.538 7.223 3.400 6.7% 0.202 40.0% 0.031 20%R 68.086 3.316 2.918 10.0% 0.340 50.0% 0.041

50%P 180+ 3.075 2.723 0% 0.487 36.7% 0.030

50%R 83.381 3.473 3.068 16.7% 0.313 50.0% 0.040 80%P 180+ 3.360 2.967 3.3% 0.475 50.0% 0.052 80%R 180+ 3.716 2.991 16.7% 0.332 50.0% 0.037

Suboptimal

20%P 94.609 7.210 3.457 10.0% 0.190 36.7% 0.023 20%R 61.842 3.456 2.993 13.3% 0.344 56.7% 0.025

50%P 180+ 3.319 2.782 0% 0.417 40.0% 0.021

50%R 74.184 3.593 3.073 16.7% 0.290 60.0% 0.026 80%P 180+ 3.435 2.993 3.3% 0.415 50.0% 0.030 80%R 88.831 3.729 3.100 13.3% 0.332 60.0% 0.022

Greedy

20%P 92.260 7.193 3.287 10.0% 0.202 56.7% 0.014 20%R 58.117 3.346 2.919 13.3% 0.382 80.0% 0.032 50%P 58.667 3.231 2.634 0% 0.410 66.7% 0.014 50%R 70.057 3.548 2.996 13.3% 0.367 80.0% 0.031 80%P 61.732 3.278 2.655 3.3% 0.448 70.0% 0.024 80%R 91.231 3.675 2.983 10.0% 0.399 83.3% 0.029

540 problems. Average goals: 4.9. Average optimal path cost: 372. Probabilities calculated using formula (RG2) with β value of 0.1. We obtained P 2as P2but adding a constant (800) to the corresponding cost difference (see discussion inline). The Match column indicates the percentage of cases where probability values matched exactly. The ∆ column indicates average difference in non-matching values.

– Observations presented as a path prefix took, in some cases, twice as long to solve as those presented randomly. This seems to be because the pruning algorithm backtracks so, if observations are consecutive, it repeatedly reaches the final observation via multiple different routes.

– Ultimately, the relative slowness may be a symptom of the calculation’s inherent complexity. We note that the Easy IPC Grid experiments reported by Ramirez and Geffner(2010) (which include a significant navigational element, though in the more demanding context of general task-planning)24 also took, on average, over three minutes to complete problems with observation densities of 50%

using an optimal planner comparable to A*, on problems with average optimal path lengths of just 17 steps. Ramirez and Geffner improved performance by using a suboptimal planner. We did try a suboptimal—much faster—algorithm but, although it returned approximately equivalent probability distributions, it failed to preserve the corner cases, which were of interest to us.

24Easy IPC Grids have far fewer cells than Moving-AI maps so they are easier to navigate and optimal paths are shorter. Problems are more complex, however, as they include task-planning elements, e.g., that keys may be required to access particular cells.

Table 3.4: Landscapes.

PRG P1 P2, P2 P2 P2

Obs Time Time Time Match ∆ Match ∆

Optimal

20%P 34.385 3.444 1.344 0% 0.140 7.7% 0.043 20%R 19.135 1.749 1.646 7.7% 0.317 69.2% 0.009 50%P 51.433 1.541 1.379 0% 0.247 30.8% 0.034 50%R 37.100 1.907 1.672 15.4% 0.299 69.2% 0.009 80%P 56.109 1.917 1.515 7.7% 0.284 46.2% 0.027 80%R 49.645 2.015 1.687 15.4% 0.344 69.2% 0.009

Suboptimal

20%P 35.183 3.300 1.446 15.4% 0.143 38.5% 0.041 20%R 18.939 1.797 1.690 15.4% 0.347 69.2% 0.010 50%P 51.395 1.625 1.450 15.4% 0.227 46.2% 0.028 50%R 35.180 1.898 1.669 15.4% 0.324 69.2% 0.011 80%P 55.780 1.912 1.564 7.7% 0.247 46.2% 0.017 80%R 48.455 1.922 1.731 15.4% 0.335 69.2% 0.011

Greedy

20%P 35.400 3.342 1.451 15.4% 0.146 38.5% 0.038 20%R 16.678 1.781 1.679 15.4% 0.351 69.2% 0.011 50%P 50.662 1.725 1.421 15.4% 0.250 46.2% 0.013 50%R 33.433 1.827 1.706 15.4% 0.337 69.2% 0.011 80%P 54.790 1.952 1.597 7.7% 0.268 46.2% 0.011 80%R 48.024 2.020 1.729 15.4% 0.345 69.2% 0.011

234 problems. Average goals: 4. Average optimal path cost: 233.04. Probabilities calculated with β of 0.1 and P2 constant of 800, as at Table3.3. We note that in room layouts, all traversable locations are accessible from one another whereas in landscapes, automatically generated goal locations were frequently inaccessible from the start location resulting in fewer usable scenarios.

• Use of formula (3.1) cut processing time even from landscapes (Table 3.4) by more than an order of magnitude. We should note that, in our experiments, the 20%

density, prefix observations were always the first to be tested in each new problem set. This meant that it was always when running the 20P test that optimal costs to each goal were calculated (and stored for future use). This is reflected in the results, which clearly show the simple formula taking approximately twice the time for that problem as subsequent problems.

• Although time-savings were on nothing like the same scale, we note that average timings for the single-observation formula (3.2) were consistently lower than those for formula (3.1).

• Whereas the probabilities based on cost difference formula (3.1) always exactly matched those based on cost difference formula (RG1), probabilities generated using formula (3.2) were usually different.

– This is because the actual values returned by that formula are different; it is the relative cost differences that are maintained. This is the anomalous effect

discussed above (see p.75) and in our concluding note below.

– As can be seen, delta values for P2 were sometimes quite high thanks to the typically large negative cost difference for the most probable goal. For P2, as discussed, we compensated for this effect by adding a large constant to the function’s output, which raised it always above zero. This significantly reduced the delta.

– In any event, observe that, whatever the delta, relative rank is always preserved.

In particular, whether or not the constant is added, in all cases, use of the single-observation formula successfully identified the same goal as having the highest, or equal highest, posterior probability as either of the other formulas.

Note. We conclude by noting that the anomaly discussed here with respect to prob-ability distribution formula (RG2) is resolved by our reformulation of the formula in Part II (see Equation (4.5), p.110). We predict that use of our revised formula will return identical probability distributions for each of the three cost differences exam-ined above without requiring any additional manipulation (i.e., there will be no need for the addition of an extra constant to bring P2 more or less in line with P1). We leave experimental confirmation of this prediction for future work.

3.4 Discussion

In the previous three sections of this chapter, we have transposed the R&G model of GR from task-planning to path-planning and have demonstrated the considerable efficiencies that can be achieved in this context. This section discusses some of the broader issues that arise in relation to our work: first, the special case where the single-observation formula ranks goals differently from Ramirez and Geffner’s original cost difference formula (which did not arise in testing), then the extent to which our results apply in a general task-planning domain and, finally, its relationship with plan (as opposed to goal) recognition.

3.4.1 Corner Case: Exclusive Optimality and Negative Reasoning There is only one corner case where our simpler and single-observation formulas (3.1) and (3.2) rank goals differently from R&G’s original complex cost difference formula (RG1).

This case arises only when observations conform to the optimal path for multiple goals and are exclusively optimal for at least one of them. Using our formulas (3.1) or (3.2), all such goals are ranked equally; using formula (RG1), which depends on negative reasoning, rankings may differ depending on the length of alternative (non-optimal) paths to the various goals.

ns

Observation o1 is on an optimal path to all three goals but, by formula (RG1), M is the most probable.

Ramirez and Geffner (2010) support the use of negative reasoning by reference to the following example, where observations are optimal for all three goals but exclusively optimal for only one.

Example 1. Consider the situation depicted in Figure 3.13. An agent operates in a discrete gridworld environment where the only legal moves are horizontal or vertical and all steps cost 1. There are three possible goals, gl, gm, gr ∈ G, labelled L, M and R respectively. All goals are north of the start location, ns. Observations ~o track directly north through the marked observation o1 and satisfy an optimal path to all three goals. In the case of L and R, there are multiple optimal paths to goal so the optimal path that embeds the observations has the same cost (15) as one that does not: there is no cost difference;

therefore, costdif(ns, gl, ~o ) = 0 and costdif(ns, gr, ~o ) = 0 (see Equation RG1). In the case of M , however, which lies directly north of ns and o1, there is only one optimal path to goal: the one that embeds the observations. In order to take a path that does not embed them, it is necessary to take a longer route. In the example, optc(ns, ~o , gm) = 10, whereas optc¬(ns, ~o , gm) = 12. Thus, costdif(ns, gm, ~o ) = −2. The lower the cost difference, the higher the probability, making gm (M ) the most probable goal.

Although cited as an “illustration” of the distinction between cost difference formulas (RG1) and (3.1) (Ramirez & Geffner, 2010, p.1123), this scenario, in fact, represents the only distinction—a case of exclusive optimality—as proved in Theorem 3. Given the considerable additional computational work required to achieve formula (RG1), it is worth noting that this special case is concerned only with that set of goals in which the probabilistic account is least interested, namely goals for which observations are on the optimal path; that is, the case already handled in the non-probabilistic account (Ramirez

& Geffner, 2009). Nevertheless, let us consider what is lost (and gained) by substituting either the simpler or single-observation formula for (RG1).

ns

To arrive at a probability distribution, R&G appeals to Bayes’ Rule, which, using our notation, can be given as P (G | ~o ) = αP (~o | G) · P rob, where α is a normalising constant. Assuming that prior probabilities in P rob are given, the challenge is to account for P (~o | G). The authors assert that it is correct for P (~o | gm) to exceed the probabilities of either of the other goals because “goal M predicts the observations better than either L or R” (p.1123). The intuition is that, in order to reach gm optimally, an agent from ns must pass through the observation o1; whereas, to reach gl or gr optimally, the agent might (or might not) pass through o1.

This reasoning seems to link probabilities to the number of available paths to goal.

Indeed,Ramirez and Geffner(2010) acknowledge that there are situations where it would be preferable to count the number of paths but their framework does not support it. Thus, one might think that, if there had been four optimal paths and the agent had been seen on one of them, the probability of the goal would be correspondingly lower (because the goal predicts the observations less well than if there had been only one optimal path); and that, if there had been 100 optimal paths, it would be lower still. This is not the case, however. In fact, as soon as there is a second optimal path to goal, the account fails to follow the intuition, as in the following counter-example.

Example 2. Consider the domain depicted in Figure 3.14. Here we have added a rect-angular block—the patterned green cells, (2,2) to (5,10)—which is not traversable. Again there are three goals, labelled L, M and R but the change has now made gl to be very like gm and very different from gr. There are just two optimal paths to gl, only one more than to gm. Meanwhile (owing to the notorious symmetry of gridworlds), there are 3003 optimal paths to gr, as before, and yet Equation (RG1) can no more distinguish between gl and gr (which are non-exclusively optimal) than can the simpler formula (3.1). In this scenario, the probability of gl should—based on how well the goal predicts the observations—be very much greater than gr (only a little less likely than gm) but, for both goals gl and gr, all three

Figure 3.15: The Circle Line.

Using negative reasoning, an agent who boarded a train at Edgware Road and was observed at Great Portland Street is more likely to be travelling to Moorgate than Liverpool Street.

cost difference formulas now return zero so in all three cases, by the posterior probability calculation (RG2), both goals gl and gr appear to be equally likely (unlikely).

The authors explain that this apparent anomaly arises because their distribution depends on an approximation whereby “probabilities corresponding to different plans for the same goal are not added up” (Ramirez & Geffner,2010, p.1124). Our point here is not that it is unreasonable to assume that both goals are equally likely; rather that it would have been just as reasonable to assume that all three goals are equally likely.

We offer the following more extreme—and perhaps more commonplace—example of exclusive optimality, which arises in the context of a transport network, such as a road network or the London Underground system.25 Exclusive optimality occurs routinely in this environment because there are frequently situations where the agent (having decided which line to travel on and in which direction) is frequently on the optimal path towards multiple goals, namely any stop on the line between the station where they boarded and

We offer the following more extreme—and perhaps more commonplace—example of exclusive optimality, which arises in the context of a transport network, such as a road network or the London Underground system.25 Exclusive optimality occurs routinely in this environment because there are frequently situations where the agent (having decided which line to travel on and in which direction) is frequently on the optimal path towards multiple goals, namely any stop on the line between the station where they boarded and