Deception as Path-Planning †
5.2 Future Work
In Part I, we showed that single-observation recognition ranked goals in the same order as the Ramirez and Geffner (2010) model. However, in order to ensure that probability values were similar, we found it necessary to add a large constant to our cost difference results. In Part II we showed that this anomalous behaviour resulted from using the Boltzmann equation as a probability distribution formula and we showed that our non-sigmoidal variation would return probabilities at the limit of that formula. Our interest there was to amend our self-modulating goal recognition formula in such a way that it would provide a correct and consistent baseline to be adjusted strictly according to our level of confidence (flattening the probability distribution as the agent’s behaviour becomes less rational). With this improved understanding, it would be useful now to explicitly extend our model for single-observation recognition, using our improved version of the
probability distribution formula. With that minor change (which corrects an anomaly in Ramirez and Geffner’s original output), the probability distribution achieved using single-observation recognition should become identical to that obtained using Ramirez and Geffner’s formula, not only in terms of rank, but in terms of the probability values themselves; and this applies in all but the one corner case which, as discussed in Section 3.4.1, differentiates between goals in a subtle way that can sometimes be more misleading than useful.
A second—and perhaps even more useful—extension would be to fully generalise our solution back to task-planning. As discussed in Section 3.4.2, this involves either the (unreasonable) assumption of full observability or determining—in a domain-independent fashion—precisely which fluents need to be observed in order to extrapolate accurate costs from the most recently observed action to each candidate goal. Although this is a substantial piece of work, it may be achievable. As a brute-force solution, for example, one might work back from each goal through all possible preconditions to the problem’s initial state to arrive at a definition of partial observability that would suffice.
Building on the above, the notion of a heatmap or a radius of maximum probability for task-planning are interesting propositions: the idea of being able to ‘look up’ an action and immediately determine an agent’s most likely objective is an attractive one.
Our self-modulating formula for goal recognition does not represent a typical approach to adversarial recognition. Taking a cost-based approach, we identify behaviour that is suboptimal with respect to every goal as being symptomatic of irrational (or rational but deceptive) behaviour. This could be characterised as domain-independent anomaly-detection—and, indeed, it can be used that way—but this was not our primary objective.
Rather, we wanted to factor into keyhole recognition the possibility of encountering an adversarial agent. The crucial difference is that, whereas an adversarial recognition system has achieved its objective as soon as the anomaly has been detected, our system simply proceeds as before but with less confidence in its predictions. This means that when, eventually, the deceptive agent does begin to approach her real goal, our system is still operating. In future work, therefore, there is scope to consider how confidence could be restored if, after a period of irrationality or erratic behaviour, the observed agent seems once again to be ‘back on track’.
The self-modulating formula depends on a sequence of observations to assess an agent’s degree of rationality. This is quite different from single-observation recogni-tion, which minimises the number of observations required. Recall, however, that single-observation recognition does use the β parameter (a rate or ‘heat’ parameter, which we use to modulate the shape of the distribution). This means that, although the two pro-cesses cannot be unified (we cannot have self-modulating single-observation recognition, for example), we could use a rationality measure obtained during one event as the β value for single-observation recognition in another. Using this aproach, it should be possible to
determine the most likely destination of a ‘highly rational’ agent, on the basis of fewer than usual observations.
In relation to deception, several opportunities for further research were flagged in Section4.4: full incorporation of magnitude into our model, development of more sophis-ticated strategies that exploit other aspects of Bell and Whaley’s theory of deception and the potential for inverting our model of deceptive path-planning so that it can be ap-plied to truthful path-planning (i.e., intended recognition). There is also scope to develop optimised algorithms to implement the strategies already proposed.
Another promising aspect of deceptive path-planning concerns exploitation of known (or suspected) psychological idiosyncrasies and biases when it is known that the observer is human. Recall that (on the negative side) we noted, in Section 4.3, that a ‘pure’
simulation strategy (which heads straight towards a bogus goal, then diverts towards the real goal) ceases to deceive a human observer almost the moment the diversion towards the real goal begins; whereas, for a computerised goal recognition system, deception—i.e., the probability of the bogus goal—may persist for some considerable distance. The flipside of this human tendency to jump quickly to conclusions was noted by Baker et al.(2011) in their work on Bayesian theory of mind. In humans, they observed that opinions formed early tend to dominate opinions that are formed late. We could take advantage of this computationally, for example, by introducing a discount factor (or value gain) for paths that ‘simulate’ strongly at the start. Other possibilities of a similar nature include the idea of incorporating rewards (or discounts) for consecutive moves in the same direction, based on the idea (untested) that continuity increases the perception of intent.
If we were to enrich the domain so that it became capable of dealing with more complete motion-planning and extended navigational scenarios, we could consider the impact of speed. For example, is fast, direct movement more persuasive of intentionality than slow, indirect movement, even if both movements are generally tending towards the same goal? There may be many other psychological factors that could be brought to bear on the deceptive path-planning problem. Cognitive science has already had considerable influence on goal recognition (e.g., in the work ofBaker et al.,2011andVered et al.,2016) much of which may also apply to deception, particularly when treated—as it has been by us—as an inversion of the goal recognition problem.
Our approach to goal recognition and deceptive path-planning has involved special-ising from task-planning to path-planning in the hope that, by reducing complexity, we might gain new insights. Perhaps the greatest potential for future work lies in following through on an attempt to generalise those insights back to task-planning.