The Dec-MCTS algorithm presented so far naively communicates at every iteration. However, in practice this may not be possible, for example due to exceeding bandwidth limits. In this section, we propose a communication scheduling algorithm that can be used to select which communication messages should be sent by reasoning over the information value of the messages.
More specifically, in this section we are interested in addressing the problem of decid- ing when to communicate, and to whom, while the robots are performing decentralised planning. We aim to find a balance between minimising the use of limited commu- nication resources, and satisfying the planning algorithm objectives. This problem is challenging because it is difficult to efficiently predict how communicating the current plan will impact the coordination performance in the long term (Becker et al., 2009). We propose a novel planning algorithm that reasons over the value of communication messages to decide when and to whom each robot should communicate. We present this algorithm for the context of Dec-MCTS, although the methods presented here could be adapted for other online coordination algorithms. The aim is to minimise communication while maintaining bounds on the uncertainty of the reward distribu- tion. Our approach predicts the value of future communication messages, then uses these predictions to plan a sequence of communication requests. The predictions are performed using a particle filter, and the optimal communication schedule is found using dynamic programming. Notably, the approach collapses the decision-tree into a directed acyclic graph, enabling polynomial runtime.
Overall, the approach trades drastically reduced communication for a modest over- head in computation time. We have evaluated our approach in a multi-robot informa- tion gathering scenario similar to that in the experiments of Section 3.5. Our results show large reductions in channel utilisation with little impact on task performance. This demonstrates our approach is suitable for communication planning in real-world multi-robot scenarios.
Our approach here is analogous to the belief-space planner of Ondruska et al. (2015), which was developed for scheduling localisation hardware usage to conserve energy
during path-following scenarios. The general approach also has similarities to the dynamic programming algorithm proposed later in Chapter 5.
3.7.1
Summary of approach
We present a brief summary of our communication scheduling formulation and algo- rithm as follows. Full details may be found in Best et al. (2018c).
Formulation and algorithm
We propose a decentralised algorithm for deciding when to communicate, and to whom, while the robots are performing Dec-MCTS. During each iteration n, robot i updates its plan, schedules sequences of communication requests, then performs the selected requests. This continues until the robots execute their plans.
In the decentralised planning phase, robot i updates its plan using Dec-MCTS while considering the most recently communicated plans of the other robots. As described in the preceding sections, this update involves selecting a subset of possible action sequences ˆXi
n ⊂ Xi using Monte Carlo tree search, and then optimising a probability distribution qn
t over the selected subset ˆXni.
Then, robot i decides whether or not to request communication from each robot j by considering how much its own plan qi
n depends on the plan of robot j. This dependency is measured as the uncertainty σj
n of the expected local utility fi for robot i that is caused by not knowing the plan qj
n of robot j. This uncertainty is described as σjn= stdevBj n(qnj) Eqj n∪q(j)n h fixj∪ x(j)i , (3.14)
where this standard deviation is measured with respect to the uncertain belief Bj n of the distribution qj
n(xj) of robot j. The decision is made by first evolving the belief Bj
nover a finite time-horizon, then finding the optimal communication schedule using dynamic programming, summarised as follows.
The belief evolution is implemented using a particle filter that predicts the distribution qj
timestep, then the set of particles at that timestep has uncertainty σn= 0. Typically,
σj
n then increases at each iteration until the next communication request. When σ j t increases beyond a threshold θ the robots must communicate. This belief evolution manifests as a prediction graph that describes valid request sequences.
Dynamic programming is then used to find the optimal sequence of communication decisions for T future iterations. This sequence minimises the number of requests while satisfying the uncertainty σj
n ≤ θ constraints.
At the end of each iteration, robot i requests the plans ˆXj
n, qnj of selected robots ac- cording to the first decision in the communication schedule. The received information is used to improve coordination in future planning iterations.
Analysis
The schedule is optimal with respect to the belief and is guaranteed to satisfy the constraints. The algorithm has polynomial runtime, with complexity O(BT2RE),
where B is the number of particles at each decision node, T is the number of steps in the planning horizon, R is the number of robots, and E is time taken to compute expected utility. We note that due to our construction, the total number of particles generated during each round of communication scheduling is O(BT2R), which is
polynomial in the time horizon (rather then exponential, as would occur in a typical decision tree). Typically, we expect that E is large for non-trivial problems; therefore, B or T should be selected to strike a balance between runtime and desired accuracy of the predictions.
3.7.2
Experiments
We analyse the performance and behaviour of our proposed communication schedul- ing algorithm in the context of Dec-MCTS and the information gathering scenario from Section 3.5. Overall, the results show that the performance of Dec-MCTS can be maintained, even with significantly reduced communication rates, by judiciously selecting which communication messages to transmit.
Although it is difficult to identify alternative algorithms that can be directly compared to ours, we do provide comparisons to a suite of communication reduction approaches. One of these is full (effectively all-to-all) communication. This scheme is not feasible in practice for our systems of interest, which are field robots with significant channel contention that arises from sources such as RTK GPS corrections, e-stop heartbeat messages, and telemetry. However, the full communication case provides a quality benchmark that allows us to measure relative coordination (task) performance. Comparison scenarios
In the following experimental results we compare 5 different scenarios. The All-to-all scenario makes the unrealistic assumption of perfect communication and the robots communicate their intentions at every iteration. Random represents a scenario where only 20% of the packets are successfully received due to uniform-random message loss (e.g., to model excessive contention on the communication channel). We compare two version of our approach: Horizon 4 plans with a planning horizon of T = 4, while Greedy only looks one time-step ahead. In the Horizon 4 and Greedy scenarios, θ is selected such that they have a 20 % average communication rate. As a baseline comparison, the None scenario assumes all communication fails and no messages are successfully received.
Results
For a baseline comparison we observe that communication is important for coordi- nation. In Figure 3.7a, the robots take advantage of the perfect setting of having full communication (All-to-all) to coordinate their plans effectively. In Figure 3.7b, there is no communication (None) and therefore there is no coordination, resulting in multiple visits to the same regions. Table 3.1 compares the planning performance for different communication scenarios. All-to-all naturally resulted in the highest reward, but the partial communication scenarios performed well despite having 80 % less com- munication. As expected, None resulted in the poorest performance. Planning with a horizon of T = 4 achieved higher rewards than Greedy, showing the advantage of
(a) Communication (All-to-all) re-
sults in successful coordination.
(b) No communication (None) re-
sults in poor coordination.
Figure 3.7 – Information gathering problem instance for the communication experi-
ments, with example solution paths (coloured lines). Arrows show start location and orientation for 8 robots. Green disks are reward regions (weighted by reward).
planning over a time horizon. Both of these scenarios outperformed Random, which highlights the practical benefit of performing informative communication planning. The planned communication scenarios achieved better results than Random since the proposed approach chose to communicate more frequently for pairs of robots that have a larger coupling between their local utilities. For the T = 4 scenarios, the highest communication rate (62 %) is between the blue and pink robots in the bottom left of Figure 3.7a. We expect this pair to communicate more since their reachable regions significantly overlap. The yellow robot in the bottom right received the least requests (11 %) since it is relatively isolated. The algorithm also selects when to communicate, which tended to be more during earlier iterations when successful coordination is most important.