• No results found

Summary and limitations

This chapter surveyed the literature related to multi-robot active perception and generic planning algorithms for sequential decision making. We began by discussing perception models and objective functions for typical active perception problems. We then discussed methods for informative path planning in single- and multi-robot scenarios. Finally, we introduced several generic planning algorithms to provide back- ground for the algorithms presented in this thesis.

This thesis presents planning algorithms that optimise paths with respect to percep- tion objectives. The literature reviewed in Section 2.1 demonstrates there is a variety of existing perception models suitable as objective functions. In our experimental sections we use several standard perception models to demonstrate the performance of our algorithms. The mission monitoring scenario is a new problem and thus we develop new prediction models and objective functions for this case.

Section 2.2 reviewed existing approaches to informative path planning. The TSP is insufficient at modelling active perception problems, but several variants of the TSP, particularly the GTSP and the OP formulations, can be useful approximations of more complex perception models that admit efficient planning algorithms; we propose a new algorithm for this class of problems in Chapter 4.

While submodularity guarantees can lead to efficient greedy solutions, there is still significant benefits in performing long horizon planning, particularly when path costs are considered. There has been work in developing non-myopic solutions, although few have been demonstrated to work well with rich perception models, and even fewer are applicable to the decentralised multi-robot planning setting. In decentralised settings in particular, it is important to consider the role of communication, either in the objective function or within the actual planning algorithm. Chapter 3 proposes a new generic decentralised planning algorithm with analytical convergence guarantees and addresses these various communication considerations. Chapter 5 and Chapter 6 formulate new communication-aware motion planning problems that are motivated by the important role of communication in monitoring AUV missions. The decentralised planning literature has focussed on developing efficient problem-specific solutions; similarly, our algorithms in Chapter 5 and Chapter 6 are designed to exploit geometric

properties of the mission monitoring problem to efficiently find good solutions. The new algorithms we propose for robotics problems are strongly motivated by algorithms found in the wider artificial intelligence community for general sequential decision problems; we introduced and reviewed relevant algorithms in Section 2.3. Our algorithm in Chapter 3 particularly builds on MCTS and PC. MCTS has recently become popular in robotics, but we present here the first decentralised generalisation of MCTS. PC has been used for decentralised decision making but has so far been limited to myopic planning. Our algorithm in Chapter 4 is a new variant of SOM that is generalised for centralised multi-robot planning. The SOM literature shows that these algorithms do not perform well at the standard TSP, but are particularly advantageous when it is required to plan directly over continuous space. Our extensive simulated experiments show the applicability of a generalised OP formulation and a new SOM solution algorithm for robotics scenarios. Our algorithm in Chapter 5 solves a new active perception formulation that is a generalisation of the optimal stopping problem to multiple dimensions. Our solution algorithm and associated analysis borrows ideas from computational geometry to efficiently prune and search over the space of trajectories. Finally, Chapter 6 combines ideas from Chapter 5 and Chapter 3 to efficiently solve a new decentralised active perception problem.

Overall, this chapter has introduced relevant background material and identified gaps in the literature in order to set the context of the new multi-robot active perception planning algorithms proposed in this thesis.

Decentralised Monte Carlo tree

search

In this chapter we propose the decentralised Monte Carlo tree search (Dec-MCTS) algorithm as a general decentralised coordination algorithm suitable for any objective function defined over the action sequences of the robots. This chapter addresses a de- centralised formulation of the general multi-robot active perception planning problem stated in Problem 1.1, proposes a solution algorithm with strong analytical proper- ties, presents empirical results for several example active perception formulations, and presents an extended algorithm that also considers communication limitations.

3.1

Overview

Dec-MCTS is essentially a novel decentralised variant of Monte Carlo tree search (MCTS). At a high level, the Dec-MCTS algorithm alternates between exploring each robot’s individual action space and optimising a probability distribution over the joint-action space. In any particular iteration of the algorithm, we first use a new variant of MCTS to find locally favourable sequences of actions for each robot. These favourable actions sequences are selected with respect to probabilistic estimates of other robots’ actions that evolve during planning-time. The main novelty is our new tree expansion policy, motivated by discounted-UCB (Garivier and Moulines, 2011), that accounts in general for changing reward distributions.

Next, during each planning iteration, the robots periodically attempt to asyn- chronously communicate a highly compressed version of their local search trees which, together, correspond to a product distribution approximation of the joint plan. These communicated distributions are used to estimate the underlying joint distribution for the teams’ plan. The estimates are probabilistic, unlike the deter- ministic representation of joint actions typically used in multi-robot coordination algorithms. Optimising a product distribution is similar in spirit to the mean-field approximation from variational inference, and also has a natural game-theoretic interpretation (Rezek et al., 2008; Wolpert and Bieniawski, 2004).

Dec-MCTS is a powerful new method of decentralised coordination for any objec- tive function defined over the robot action sequences. Notably, this implies that Dec-MCTS is suitable for complex perception tasks that are highly viewpoint- dependent, which are the motivation for this thesis. Further, communication is assumed to be intermittent, and the amount of data sent over the network is small in comparison to the raw data generated by typical range sensors and cameras. Our method also inherits important properties from MCTS, such as the ability to com- pute anytime solutions and to incorporate prior knowledge about the environment. Moreover, our method is suitable for online replanning to adapt to changes in the objective function or team behaviour.

We provide an extensive theoretical analysis of the algorithm that leverages results from probability theory and game theory. Our main analytical result is to show convergence rates for the expected payoff at the root of the search tree towards the optimal payoff sequence. Thus, the proposed MCTS tree expansion policy balances exploration and exploitation while the reward distributions are changing. We prove this result in Best et al. (2018a) by extending the MCTS analysis of Kocsis et al. (2006) for the context of switching bandit problems (Garivier and Moulines, 2011). Our second result leverages Wolpert et al. (2006) to show that the product distri- bution optimisation phase locally minimises the KL divergence to the optimal joint probability distribution. While, given the difficulty of the problem, these results do not directly yield guarantees for global optimality, the analysis provides strong moti- vation for the use of these components in our algorithm for decentralised, long-horizon planning with general objective function definitions.

We empirically evaluate our algorithm in two scenarios: generalised team orienteering and online active object recognition. These experiments are run in simulation, where the robots traverse a PRM (Kavraki et al., 1996) with a Dubins motion model (Du- bins, 1957), and the second scenario uses range sensor data collected a priori by real robots. We show that our decentralised approach performs as well as or better than centralised MCTS even with a significant rate of communication message loss. We also show the benefits of our algorithm in performing long-horizon and online planning.

Further empirical analyses for Dec-MCTS are also presented in later chapters: the results in Chapter 4 compare Dec-MCTS to our proposed SOM algorithm, and the results in Chapter 6 compare Dec-MCTS to the proposed decentralised mission mon- itoring algorithm that is motivated by Dec-MCTS.

Communication is fundamental to the coordinated behaviour that emerges from Dec-MCTS and other similar decentralised planning algorithms (Farinelli et al., 2008), as robots need to develop decision strategies that take into account the actions of other robots. However, communication is typically considered to be an infinite re- source, but in practice communication is often limited, unreliable, or susceptible to interference (Hollinger et al., 2011b; Williamson et al., 2008; Fitch et al., 2017). In the experiments of Section 3.5 we show that Dec-MCTS is robust to communication loss such that reasonable task performance is achieved even if communication packets are lost. In Section 3.7, we take this one step further by explicitly planning how to effectively use communication resources. We present a communication scheduling algorithm as an extension of Dec-MCTS that aims to mitigate these various commu- nication issues, while also maintaining task performance.

3.1.1

Chapter outline

The remainder of this chapter is organised as follows. Section 3.2 formally defines the decentralised planning problem considered in this chapter. Section 3.3 presents our proposed Dec-MCTS algorithm. Section 3.4 provides a theoretical analysis of Dec-MCTS, based on our extended results and proofs presented in Best et al. (2018a). Sections 3.5 and 3.6 present an empirical analysis of our algorithm for two example

active perception problems. Section 3.7 presents an algorithm for scheduling com- munication that can be used within Dec-MCTS for scenarios where communication bandwidth is severely limited. Finally, Section 3.8 summarises the chapter.