Computing Stochastic Scores - Robust Extension to CBBA

5.2 Robust Extension to CBBA

5.2.1 Computing Stochastic Scores

Within the distributed CBBA framework, agents select assignments that optimize their own local score functions and then perform consensus amongst each other to resolve con- flicts. These score calculations are performed using each agent’s local understanding of the planning parameters, and CBBA is guaranteed to converge even when agents have varying situational awareness. In the robust extension to the CBBA algorithm proposed in this section, this property of CBBA is leveraged. In particular, within Robust CBBA, agents use their own local situational awareness of the uncertain planning parameters and associated distributions when building their bundles. As described in the previous section, this requires that agents have knowledge about parameters that affect them only (e.g. agent specific parameters, task parameters for relevant tasks, and environment parameters for relevant operating areas), and irrelevant information that doesn’t affect the particular agent scores can safely be ignored. It is important to note that, in situations where agent scores are coupled (for example, through coupled tasks with other agents, or through environment parameters affecting all agents), the distributed CBBA framework can still be used and is still guaranteed to converge, although the performance of the team may decrease if the coupling is not explicitly considered.

When optimizing agent scores, one main issue with evaluating the stochastic planning metrics described in the previous section is that the agent score functions consist of sums of non-independent heterogeneous task scores. Therefore, computing the distribution of an agent’s score involves combining task score distributions for all tasks in the agent’s assign- ment in nontrivial ways (e.g. convolution if independent), which is only tractable given very specific distribution types (e.g. i.i.d random variable, Gaussian distributions, exponential- Erlang, etc.). As a result, analytically computing the integrals, convolutions, and coupling effects associated with the stochastic metrics in closed form is usually impossible unless several limiting assumptions on the allowable distributions, uncertainty models, and score

functions are made. Another issue particular to this problem, which complicates these computations even further, is that evaluating scores for agents’ paths implicitly involves selecting the optimal task execution times for the current agent’s path. Recall that, in the

deterministic case, the score that agent i obtains for a given path pi is,

Jpi =

j=1

cij(τij?(pi), θ) xij (5.10)

where xij = 1 for all tasks in the path, and where the optimal task execution times τ?i are

found by solving, τ?_i = argmax τi Nt X j=1 cij(τij(pi), θ) xij (5.11)

In stochastic environments, the task execution times are usually random variables which are subject to the uncertainty in the system. This makes the step of computing the “optimal” execution times nontrivial, since these times may be different for different realizations of the uncertainty. Therefore, when optimizing stochastic path scores, the computation must take into account that different optimal execution times may result for a given path subject to the uncertainty in the environment. For example, using the expected-value metric, the stochastic path score is given by,

Jpi = Eθ    Nt X j=1 cij(τij?(pi), θ) xij    = Z θ∈Θ   Nt X j=1 cij(τij?(pi), θ) xij  f (θ) dθ (5.12)

where for each realization of the uncertainty θ ∈ Θ the optimal task execution times τ?

must be determined. Analytically computing these effects is very difficult, motivating the use of sampling methods to approximate these stochastic score calculations. An example of the sampling process used within the Robust CBBA framework is provided in Algorithm 6. Here, numerical techniques that efficiently sample from f (θ) can be used to approximates the distribution of θ, generating a set of representative samples, {θ1, . . . , θN} (Alg. 6, line

1), with corresponding probability weights {w1, . . . , wN} (Alg. 6, line 2), where N

k=1

Algorithm 6 Compute-Stochastic-Path-Score(pi) - (Expected Value) 1: {θ1, . . . , θN} ∼ f (θ) 2: {w1, . . . , wN} ← {w1, . . . , wN}/ N X k=1 wk 3: for k ∈ {1, . . . , N } do 4: τ?i = argmax τi Nt X j=1 cij(τij(pi), θk) xij 5: J_pk_i = Nt X j=1 cij(τij?(pi), θk) xij 6: end for 7: Jpi= N X k=1 wk Jpki 8: return (Jpi)

Using sampling, an approximation to the expected-value score for path (pi) is given by

ˆ Jpi = N X k=1 wk   Nt X j=1 cij(τij?(pi), θk) xij  

where the expected value integral is replaced by a summation over the individual weighted

samples. Note that within this sampling approximation, the optimal times τ?

ij can be

deterministically computed for each realization of the uncertainty θk (Alg. 6, line 4), along

with the associated path scores (Alg. 6, line 5). These sample path scores can then be used to approximate the stochastic metric, where for the expected-value path score the approximation is given by a sum over weighted score samples (Alg. 6, line 7). In addition to maintaining analytic tractability, another advantage of using sampling is that, although stochastic planning increases the computational complexity of the planning process with respect to the deterministic formulation, the number of samples can be adjusted given the available computational resources. Therefore, there is a trade-off between the accuracy of the approximation versus the amount of time required for the algorithm to run, and real- time convergence guarantees can be preserved by lowering the amount of samples used. In particular, the robust extension to CBBA proposed in this chapter preserves polynomial- time convergence (although the plan time does increase roughly linearly with the number of samples N ).

When computing the worst-case path score given the uncertainty in θ, the sampling step can be avoided if intuition about how the uncertainty affects the score function is

Algorithm 7 Compute-Stochastic-Path-Score(pi) - (Worst-Case Value) 1: τ?i = argmax τi Nt X j=1 cij(τij(pi), θworst) xij 2: Jpi= Nt X j=1 cij(τij?(pi), θworst) xij 3: return (Jpi)

available. For example, if θ represents uncertainty in task durations and/or travel times,

then θworst can be chosen as the longest task durations and the slowest travel times. This

is illustrated in Algorithm 7, where the worst-case path score can be analytically computed

given the parameter realization θworst. If the score functions are more complex, and intuitive

predictions of how the uncertainty will propagate are hard to make, then the sampling

approach used in Algorithm 6 can be employed instead, where line 7 is replaced by Jpi =

min

k J k

pi. One issue with using sampling to represent the worst-case performance is that

many more samples are required to ensure that low probability catastrophic events are adequately represented. This increases the computational complexity of the algorithm and thus the convergence time (higher N ). The field of rare event simulation has addressed this issue by employing smarter sampling methods that focus the sampling process on the low probability zones of the distribution (e.g. importance sampling [11]). These methods could be used to sample efficiently if intuitive predictions on worst-case performance are hard to make.

As discussed in Section 4.1.3, one primary concern with using the distributed CBBA framework is that score functions within CBBA must satisfy a submodularity condition referred to as diminishing marginal gains (DMG) in order to ensure algorithm convergence [58]. If the score functions do not satisfy DMG, then the algorithm could lead to cycles between agents’ assignments, thus preventing convergence (see Section 4.1.3 for an example). Unfortunately, explicit coupling in the score functions between tasks can often violate this submodularity condition. This is especially true in stochastic scenarios where task scores are coupled through the uncertainty in the planning parameters, and even when the analytic stochastic metrics employed do satisfy submodularity, the use of numerical sampling techniques to compute stochastic path scores could violate DMG, and therefore CBBA is not guaranteed to converge.

Figure 5-1: Example UAV mission with 1 agent and 2 tasks.

velocity agents, where task scores decrease as a function of time. Figure 5-1 shows an example of this scenario, where an agent has a choice between a far task and a closer task. In this situation, agent i will take longer to reach Task 2 than Task 1, and therefore the score the agent computes for Task 2 is lower than that of Task 1. Thus Task 1 is added to the bundle first. The agent then recomputes a score for Task 2 (which is solely a function of time), and due to the triangle inequality, since the travel distance to Task 2 is now longer than in the previous case without Task 1 in the bundle, the score for Task 2 is necessarily lower than before, satisfying the DMG condition. In uncertain planning environments, however, if the agent velocity is stochastic and sampling methods are employed, it is possible to unluckily select “low-velocity samples” when computing the original score for Task 2, and then “high- velocity samples” when computing the second score for Task 2. Therefore, the algorithm could produce a higher score for Task 2 after Task 1 is added to the bundle, violating the DMG property required by CBBA. As the number of samples goes to infinity, the expected- value score functions may satisfy submodularity, however, given a fixed number of samples, the DMG property is not guaranteed, and therefore Robust CBBA is not guaranteed to converge (as shown in Section 4.1.3, even a violation of DMG by a small value can cause cycles between agents). In these stochastic settings, designing heuristic approximate score functions that ensure submodularity in bids is a nontrivial exercise, limiting the use of the original CBBA algorithm. This property was identified as part of this work, and it was demonstrated through numerical simulations that this is a crucial issue, leading to

cycles within the planner where the algorithm fails to converge [172]. To address this

major issue, recent work by Johnson et al. [106] proposed a key algorithmic extension that embeds the DMG condition into the algorithmic framework itself, enabling the use of CBBA with arbitrary (nonsubmodular) score functions. This algorithmic extension was leveraged

within the Robust CBBA framework proposed in this thesis, enabling the use of stochastic metrics while guaranteeing algorithm convergence in distributed stochastic environments. The extension to enable CBBA with nonsubmodular score functions is briefly described in the following section.

In document Robust distributed planning strategies for autonomous multi-agent teams (Page 119-124)