Dispatch-and-Search: Dynamic Multi-Ferry Control in Partitioned Mobile Networks

(1)

Dispatch-and-Search: Dynamic Multi-Ferry Control in

Partitioned Mobile Networks

∗

Ting He

†

, Ananthram Swami

‡

, and Kang-Won Lee

†

†IBM Research, Hawthorne, NY 10532 USA

{the,kangwon}@us.ibm.com

‡Army Research Laboratory, Adelphi, MD 20783 USA

[email protected]

ABSTRACT

We consider the problem of disseminating data from a base station to a sparse, partitioned mobile network by control-lable data ferries with limited ferry-node and ferry-ferry communication ranges. Existing solutions to data ferry con-trol mostly assume the nodes to be stationary, which reduces the problem to designing fixed ferry routes. In the more challenging scenario of mobile networks, existing solutions have focused on single-ferry control and left out an impor-tant issue of ferry cooperation in the presence of multiple ferries. In this paper, we jointly address the issues of ferry navigation and cooperation using the approach of stochastic control. Under the assumption that ferries can communi-cate within each partition, we propose a hierarchical control system called Dispatch-and-Search (DAS), consisting of a global controller that dispatches ferries to individual par-titions and local controllers that coordinate the search for nodes within each partition. Formulating the global and the local control as Partially Observable Markov Decision Processes (POMDPs), we develop efficient control policies to optimize the (discounted) total throughput, which sig-nificantly improve the performance of their predetermined counterparts in cases of limited prior knowledge.

Categories and Subject Descriptors

C.2.1 [Computer-Communication Networks]: Network Architecture and Design—store and forward networks; G.4 [Mathematical Software]:algorithm design and analysis; I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search—control theory; I.2.9 [Artificial In-telligence]: Robotics—autonomous vehicles

∗_{Research was sponsored by the U.S. Army Research Laboratory and}

the U.K. Ministry of Defence under Agreement Number W911NF-06-3-0001. The views and conclusions are those of the authors and do not represent the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding copyright notation.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

MobiHoc’11,May 16–20, 2011, Paris, France.

General Terms

Algorithms, Design, Performance

Keywords

Data ferry control, Partially Observable Markov Decision Processes, Myopic control policies

1.

INTRODUCTION

The development of robotic technology has led to im-provements in many applications, a recent one being the use of controllable mobile nodes to assist communications in sparse networks. Sparse network topologies are routinely encountered in a broad variety of applications, e.g., sen-sor/actuator networks, search-and-rescue operations, under-water networks, spacecraft networks (“interplanetary com-munications”), vehicular ad hoc networks (VANETs), and military mobile ad hoc networks (MANETs). The sparse networks discussed above pose significant challenges to tra-ditional communication schemes since existing in-network solutions such as routing and delay-tolerant communications are only designed for (intermittently) connected networks and will fail if the network is permanently partitioned. In such partitioned networks, the only remedy is to look for external help, and mobile helper nodes called data ferries, usually mounted on controllable platforms such as UAVs, stand out as a preferred solution in many scenarios due to their agility in contrast to fixed infrastructures. The prob-lem, though, is that proper control has to be in place to leverage these data ferries effectively.

Previous solutions to data ferry control have mostly fo-cused on designing fixed ferry routes [3, 5, 12, 13] for static networks. Although the approach still applies to mobile net-works [9], the performance is generally suboptimal as fixed routes can only leverage long-term statistics of node move-ments. To fully utilize the ferries’ capability, we have shown in [2] that each ferry needs to dynamically adjust its move-ments based on run-time observations. As in [2], we as-sume a ferry can only observe nodes within a certain dis-tance defined by ferry-node communication range, referred to as partial observations1_{. Although it shows promising}

performance improvements over fixed routes, the solution in [2] is limited by (i) only considering a single ferry and (ii) only focusing on contact rate optimization. The avail-ability of multiple ferries raises a new challenge because one

1_{This is in contrast to assuming} _complete _{observations of}

(2)

ferry’s optimal movements will depend on the movements of the other ferries, which induces a need for the ferries to cooperate. While this is not an issue under unlimited ferry-ferry communications, global cooperation can be expensive, especially for networks with scattered partitions. Further-more, although contact rates are performance indicators, the ultimate goal of the ferries should be to optimize the end-to-end communication performance perceived by nodes, which also depends on other factors such as traffic demands. These new challenges call for a new solution that can efficiently uti-lize multiple ferries to optimize the end-to-end performance without excessive cooperation overhead.

1.1

Related Work

Designated communication nodes, typically called data ferries or data mules, have been considered as an effective so-lution to support communications in sparse networks. Data ferries physically carry data either between task (i.e., non-ferry) nodes [3, 11–14] or between nodes and a base sta-tion [1, 4–6]. The main issue in using such ferries is how to control their mobility, where existing solutions can be di-vided intosingle-ferry control [1, 2, 4, 9, 14] and multi-ferry control [3,5,6,12,13]. An in-between case was studied in [11] where nodes can switch between the roles of ferry and non-ferry. Most existing work focuses on ferry route design un-der the assumption that either the task nodes are station-ary [3–5, 12, 13], or their locations are always known by the ferries if they are mobile (i.e.,completeobservations) [11,14]. A related problem of static but randomly distributed nodes has been studied: using predetermined routes in [6] and dynamic route adaptation based on complete observations in [1]2_{. The problem of stochastic node movements and}

par-tial observability was first studied in [9] using predetermined routes and later improved in [2] using a control policy that navigates the ferry dynamically.

1.2

Our Approach and Results

We consider controlling multiple ferries in partitioned mo-bile networks to optimize the total throughput. We present our solution in the scenario of data dissemination, although our approach extends naturally to the cases of data har-vesting and peer-to-peer communications (see discussions in Section 7). Our specific contributions are:

Flexible control framework: To avoid long-range ferry-ferry communications, we propose a hierarchical control frame-work called Dispatch-and-Search (DAS), which divides the problem into the global control of allocating ferries among partitions and the local control of searching for nodes within each partition. Since the global control can be implemented by the base station (BS), ferries only need to communicate within each partition to cooperate in local control.

Rigorous problem formulation: Under Markovian node mobility, we show that both the global and the local control problems can be formulated as special cases of Partially Ob-servable Markov Decision Process (POMDP). The POMDP model provides a comprehensive representation of available information, control options, and performance objective in a dynamic system, which provides a foundation for rigorous optimization.

Efficient solutions: Based on the POMDP formulation,

2_{Both works treat each message as a new node which arrives}

randomly in space and time; [1] also assumes locations and arrival times of pending messages are known at run time.

domain 1 gateway node domain 2 ferry dispatch return Base Station search r

Figure 1: Example: K = 3 ferries serve D = 2

do-mains in despatch-and-search mode. r: (projected) ferry communication radius.

we develop solutions to global and local control in the form of dynamic control policies. Due to the intrinsic hardness of POMDP, we focus on an efficient heuristic called the my-opic policy. In particular, in global control, even the mymy-opic policy can be computationally expensive; hence, we also develop two approximations with similar performance but much lower complexity than the exact solution.

Performance insights: We evaluate the proposed policies through both analysis and simulations. We obtain closed-form upper and lower bounds on the optimal local perfor-mance, as well as a lower bound on the optimal global per-formance which can be evaluated numerically. Moreover, we compare our solutions with predetermined control through extensive simulations, which show significant performance gain (e.g., 100%+ on discounted throughput and 40%+ on undiscounted throughput) in cases of minimum prior knowl-edge.

We assume Markovian mobility models to rigorously rep-resent the impact of past actions on future decisions. Al-though a limiting assumption, it includes popular models such as random walk/waypoint and their variations as spe-cial cases and can be viewed as a second-order approxima-tion of general mobility models.

The rest of the paper is organized as follows. Section 2 specifies the network model and the DAS framework, Sec-tion 3 gives a brief overview of POMDP, SecSec-tions 4 and 5 address the local and the global control problems respec-tively, Section 6 presents simulation evaluation, and Section 7 concludes the paper.

2.

PROBLEM FORMULATION

2.1

Network Model

Suppose that there is a total ofK ferries andD network partitions, each partition occupying a disjoint region called adomain, as illustrated in Fig. 1. A base station (BS) gen-erates traffic for all nodes at constant rates3 λ = (λd)Dd=1

(fluid model), which is then delivered to the respective mains by the ferries in a delay-tolerant manner. In each do-main, we select a node as the gateway to the ferries, which then disseminates received traffic within the domain using in-network routing; we will simply call these gateway nodes “nodes”. According to the ferry-node communication range, each domain is partitioned into Nd (d= 1, . . . , D) cells

de-noted by a setSd (Sd∩ Sd′ =∅for d6=d′), such that the

transmission of each ferry can cover one cell at a time. We

(3)

round ∆

search _return dispatch

time

Figure 2: DAS procedure: Dispatch control occurs at the time-scale of rounds (∆ + 1slots) and search control at the time-scale of slots.

assume that node mobility in domaind can be modeled as a Markov chain onSd with transition matrixPd. We also

assume that ferry-ferry communications can span an entire domain but not across domains. The total network field is denoted byS :=SD

d=1Sd of size|S|=N =PDd=1Nd. We

do not restrict the relationships betweenK, D, andNd,

al-though restrictions may apply in later analysis (e.g., Section 5.3).

2.2

System Architecture

Due to the ferry-ferry communication constraint, ferries can only cooperate locally within a domain. Accordingly, we propose a hierarchical control system called “Dispatch-and-Search” (DAS), consisting of a global dispatch controller

πgat the BS in charge of allocating ferries to individual

do-mains, and a local search controller πl,d for each domain

d, located at an arbitrarily selected lead ferry, in charge of jointly navigating ferries dispatched to that domain. Specif-ically, the lead ferry will collect observations from the other ferries in its domain and broadcast navigation actions deter-mined by the search controller in terms of the set of cells to cover (and their matching with the ferries) in each slot in order to contact the gateway node (by any ferry) and deliver traffic. We assume that all ferries dispatched to the same domain carry the same traffic. As illustrated in Fig. 2, dis-patch control operates periodically with period ∆ + 1, called around(∆ is a design parameter); the first ∆ slots are used to search for the gateway nodes in the individual domains; the last slot is used for all the ferries to return to the BS and pick up new traffic, after which the next round begins4.

2.3

Main Problem: DAS Control

Let λπ

β := E[P∞t=1N

π₍_t₎_βt_{] denote the} _{discounted total}

throughput for a given discount factorβ ∈ (0, 1) under a DAS control policyπ:= (πg,(πl,d)Dd=1), whereNπ(t) is the

amount of data delivered (over all domains) at timetunder policyπ. Our goal is to design a policyπthat maximizesλπ

β.

Remarks: The discounts are used to model the objective of delivering the data as soon as possible. The above defini-tion is a cumulative throughput, but it represents a delivery rate, defined as the ratio of the delivered traffic and the delivery timeλπβ :=E[(P∞t=1N π₍_t₎_βt₎_/₍P∞ t=1β t_{)], in that} λπβ=λπβ(1−β)/β.

3.

PRELIMINARY

3.1

POMDP Recap

POMDP [8] is a general framework to model the con-trol of a stochastic system. We provide a quick overview of POMDP in this section. A POMDP is represented by

4_{The above assumes that the ferries can move between the}

BS and any domain in one slot. Longer BS-domain distances can be modeled by replacing the last slot byT slots during which a round trip can be completed; all the results hold by replacing ∆ + 1 with ∆ +T.

a tuple (X, A,O, r, Px, Po), whereX represents the

pos-sible states of the controlled system, A the set of control actions, O the observations of controller, and r(x, a, o) a reward function modeling the performance criterion based on the state x∈ X, the actiona∈ A, and the observation

o∈ O. The other elementsPxandPodescribe how the state

evolves (Px(x, a, x′) := Pr{xt+1=x′|xt=x, at=a}) and

the observation is generated (Po(x, a, o) := Pr{ot=o|xt=

x, at =a}), and are derived from the first three elements

in a given problem setting. Under partial observability, the controller does not know the true statexprecisely, but can only estimate it up to a distributionb= (b(x))x∈X, called

thebelief, based on the action and observation history. The objective of the controller is to find a policyπ that determines the action to take based on the current belief so as to maximize the total reward over a time window. With an infinite horizon, a common form of the total reward is thediscounted reward

Rπ∞:=E[ ∞

X

t=1

βtr(xt, aπt, ot)] (1)

for a fixed discount factor β ∈(0, 1). The optimal policy

π∗ _{that maximizes} _Rπ

∞ is known to be the solution to the

Bellman equation:

V∞(b) =βarg max a

E_[_r₍_b_{, a, o}_{) +}_V_∞₍_T₍_b_{, a, o}_))]_, ₍₂₎

where V∞(b) := maxπRπ∞|b₁=b, called the value function,

is the maximum reward starting from a given belief stateb, the expectation is overPo(b, a, o) :=P_xb(x)·Po(x, a, o),

and the functionb′₌_T₍_b_{, a, o}_{) generates a new belief for}

the next step based on the previous belief, the action, and the observation. It is known that solving (2) for the optimal policy is PSPACE-hard in general [7]. A popular alternative is to use themyopic policyin which one only maximizes the averageimmediatereward, and is given by

πMY (b) := arg max a X x b(x)E_[_r₍_{x, a, o}_)]_, ₍₃₎

where the expectation is overPo(x, a, o). The myopic

pol-icy is easy to implement and has exhibited excellent per-formance in single-ferry control [2]. We will focus on this policy in the sequel and examine its different forms and per-formances at different levels of control.

3.2

Applying POMDP to Multi-Ferry Control

The use of POMDP is natural for the problem under con-sideration. Under the DAS architecture, the goal of multi-ferry control is to jointly design global dispatch controller and local search controller such that the overall (discounted) throughput is optimized. The challenge is that both con-trollers face a dynamic environment that changes over time, e.g., the search controller relies on node distributions, which evolve due to node mobility, and the dispatch controller re-lies on (predicted) contact processes as well as traffic de-mands for each domain. Moreover, characteristics of this dynamic environment are only partially observable to the controllers due to limited ferry-node communication range. POMDP is well suited for addressing these challenges using its model of dynamic systems under partial observability.

The local and the global controllers have inherent con-nections: the global action affects the local control because the number of ferries dispatched to a domain will largely

(4)

determine how long it takes to contact the node; the lo-cal performance also affects the global control because the dispatch controller needs to weigh the impact of dispatch-ing more/fewer ferries to each domain to make a balanced decision. The other parameters such as domain sizes, node mobility models, and traffic rates all play a role in the perfor-mance. How can we capture all these factors in the POMDP framework? What reward functions should we use for lo-cal/global control so that together they achieve throughput optimization? These are the main questions we will answer in the sequel.

4.

LOCAL CONTROL: SEARCH

Consider a domaindto whichk∈ {0, . . . , K}ferries have been dispatched in the current round; assumek ≤N, size of the domain (subscriptd is dropped). Given initial node distributionb0 and mobility modelP, the goal of the local controllerπlis to jointly navigate thekferries to contact the

node and deliver traffic. Moreover, due to the discount in throughput calculation, it is intuitively desirable to make the contact as soon as possible. In this section, we cast the prob-lem of local control as a POMDP and establish lower and upper bounds on the optimal reward, where the lower bound is provided by the myopic policy. Based on the bounds, we discuss conditions under which the myopic policy is close to optimal.

4.1

Local Control as a POMDP

In the language of POMDP, the problem can be formu-lated as follows:

1. Local state: xl

t = (st, δt), where st ∈ S is the node’s

location at timet, andδt ∈ {0,1} indicates whether

the next contact will be the first in the current round (δt= 1) or not (δt= 0);

2. Local action: al

t ⊆ Swith|alt| ≡kdenotes the set of

cells for the ferries to cover in slot t (no two ferries cover the same cell);

3. Local observation: olt = zt ∈ {0, 1} is a joint

con-tact indicator, i.e., zt =I_s_t_∈_al

t, where I· denotes the

indicator function;

4. Local reward: the one-time rewardrl(xl, al, ol) =δz

gives a unit reward for the first contact, after whichδ

transits to 0 and no further reward occurs; the overall rewardRπl

∞ is defined as in (1) forrl.

We elaborate on the preceding POMDP formulation: let Υπl _{denote the} _{time till (the first) contact (TTC)} _under

policyπl. It is easy to see thatR∞πl =E[βΥ

πl

]. Sinceβx _is

a decreasing function, maximizingRπl

∞ leads to minimizing

the TTC in a discounted average sense5_{. Intuitively, the}

local rewardRπl

∞represents the weight each unit of delivered

traffic will contribute to the discounted total throughputλβ,

starting from the current round. Later we will show that local policies optimizing this reward will help to optimize the overallλβ under DAS framework (see Section 5.1).

5_{The convexity of} _βx _{implies that the mean TTC satisfies}

E_[Υπl_]_≥_log_Rπl

∞/logβ.

4.2

Local Control Policy

The true statexl

t is not directly known to the controller

since node location st is unknown. Instead, the controller

observes another state yl

t, which consists of the belief bt

of st (bt(s) := Pr{st = s|al1, . . . , alt−1, ol1, . . . , olt−1}) and

the indicator δt. At the end of the slot, the new state

yl t+1= (bt+1, δt+1) transits as6 bt+1 =PT(ztest+ (1−zt)bt\al t), δt+1 =δt(1−zt), (4) whereestorbt\al

tis the updated belief based on observation zt, and multiplying it by the transition matrixPT gives the

predicted belief for the next slot. The myopic search pol-icy is the one that maximizes the probability of immediate contact: πMY l (b) = arg max al X s∈al b(s), (5)

i.e., thekferries will search the cells with the topk proba-bilities in the belief.

4.3

Performance of Local Control

For a given search policy π (subscript l is dropped for simplicity), let Υπdenote the TTC underπfor a given ini-tial beliefb0 andk ferries. The distribution of the random variable Υπ _{can be characterized as follows. Conditioned}

on the event that no contact has occurred beforet(i.e., the previous t−1 slots are misses), letaπ

t and bπt denote the

action and the belief under policy π in slot t, and pπ t the

conditional probability of contact. By definition, we have

pπt = Ps∈aπ

t b

π

t(s), and aπt, bπt can be computed from the

following updates: starting frombπ

1 =PTb0,

aπt =π(bπt), bπt+1=PTbπt\aπ

t, t= 1,2, . . . (6)

Based on pπ

t, it is easy to see that the distribution of Υπ

and the correspondingRπ

∞are given by Pr{Υπ=t}=pπt t−1 Y j=1 (1−pπj), (7) Rπ∞= ∞ X t=1 βtPr{Υπ=t}= ∞ X t=1 βtpπt t−1 Y j=1 (1−pπj). (8) In particular, plugging inaπ t =πlMY(bπt) forπlMYin (5) yields

the reward of myopic policyRMY ∞ .

Although the above method will give the reward for any given policy, it does not indicate how close its performance is to the optimal. Characterization of theoptimal reward R∗∞

is an open question, since it is computationally intractable to compute the optimal policy. Hence, we derive the following lower and upper bounds on the optimal reward. The lower bound comes naturally from the myopic policy. For the up-per bound, consider the passive belief transitions without any observation: b(t) _{:= (}_PT₎t_b0_{. Let} _B

t,k := max|a|=k

P

s∈ab

(t)₍_s_{) be the sum of the} _k _{largest elements in} _b(t)_. 6_Here _PT _{denotes matrix transpose,} _e

s the unit vector

with one in the sth element, and b\a the posterior

be-lief after a miss given by b\a(s) = 0 for all s ∈ a and

(5)

Define R∞:=βT0(1− T0−1 X t=1 Bt,k) + T0−1 X t=1 βtBt,k, (9)

whereT0 := inf{t ≥1 : 1−P_jt−₌₁1Bj,k ≤Bt,k}. We have

the following result.

Theorem 4.1. The rewardR∗

∞of the optimal search

pol-icy satisfies: RMY

∞ ≤R∗∞≤R∞.

To prove the theorem, we introduce the following lemmas; their proofs can be found in Appendix.

Lemma _4.2. _{For any policy} _π_{, its reward} _R_∞π _is

mono-tone increasing with the conditional contact probability pπ t

for anyt.

Lemma 4.3. For anytand any π,

pπt ≤

Bt,k

max(1−Pt−1

j=1Bj,k, Bt,k)

=:p_t. (10)

Proof: (Theorem 4.1) The lower bound holds trivially. For the upper bound, Lemma 4.2 implies that substituting

p_t in Lemma 4.3 into (8) will give an upper bound on the reward: Rπ∞≤ T0 X t=1 βtp_t t−1 Y j=1 (1−p_j) =βT0₍₁₋ T0−1 X t=1 Bt,k) + T0−1 X t=1 βtBt,k=R∞,

which impliesR∗∞≤R∞as the bound holds for anyπ. Note

thatQt−1

j=1(1−pj) = 1−

Pt−1

j=1Bj,k.

Closed-form bounds can be obtained by further bounding

RMY

∞ andR∞(see Appendix for proofs).

Corollary 4.4. For k ferries and a domain of size N

(cells),RMY

∞ ≥βk/[N(1−β) +βk].

In general, there is no non-trivial closed-form upper bound, i.e., R∗∞ can approach β arbitrarily. Under certain

condi-tions, however, we have the following result.

Corollary 4.5. If the initial belief is the steady-state

distributionb∗ (b∗=PTb∗), then

R∞=βT0[1−B∗,k(T0−1)] +B∗,k(β−β

T0₎

1−β , (11) whereB∗,k:= max|a|=kPs∈ab∗(s)andT0=⌈B

−1

∗,k⌉.

More-over, if P is doubly stochastic7, then the above reduces to R∞≤βk/[(1−β)N].

Remark: For a large domain (N ≫k), Corollary 4.4 im-plies thatR∗

∞ ≥βk/[(1−β)N], achievable by the myopic

search policy. The worst case is when the node moves equal likely in all directions (i.e.,Pis doubly stochastic) and the initial distribution is uniform, under which Corollary 4.5 saysβk/[(1−β)N] is also an upper bound. In this case, the optimal rewardR∗

∞ ≈ βk/[(1−β)N]. Note that the 7_{That is, each column of}_P_{also sums up to one.}

steady-state distribution b∗ is uniform if and only if P is

doubly stochastic.

Similar analysis can also be performed for the mean TTC

E_[Υπ_{]. Given a policy}_π_{, the mean TTC can be computed by} E_[Υπ_{] =}P∞ t=1tp π t Qtj−=11(1−p π j). Analogous to Lemma 4.2,

it can be shown thatE_[Υπ_{] is monotone decreasing with each} pπ

t. Accordingly, we can bound the minimum mean TTC by

E_[Υ]_≤_min

π

E_[Υπ_]_≤E_[ΥMY_]_, ₍₁₂₎

whereE_[ΥMY_{] is the mean TTC for the myopic policy, bounded}

by E_[ΥMY_] _≤_N/k_{, and} E_{[Υ] is an analytical lower bound}

analogous to R∞, given by E[Υ] = T0(1−PT_t₌₁0−1Bt,k) +

PT0−1

t=1 tBt,k. Under the conditions thatb0 =b∗ and Pis

doubly stochastic,E_{[Υ] can be relaxed to}E_[Υ]_≥ 1

2(1 +

N k).

Note that the optimal policy that maximizes Rπ

∞ may not

achieve the minimumE_[Υπ_].

5.

GLOBAL CONTROL: DISPATCH

The global control operates on top of local control at the macro time scale of rounds. Given local search policies, the goal of the global dispatch policy is to allocate the ferries among the domains at the beginning of each round to max-imize the total throughput. In this section, we formulate the global control as another POMDP, based on which we develop an approximate myopic dispatch policy with a sig-nificantly lower complexity than the original and analyze its performance.

5.1

Global Control as a POMDP

The global control problem can be formulated as the fol-lowing POMDP:

1. Global state: xg

τ = (sτ, mτ), wheresτ = (sτ(d))Dd=1

still denotes node locations in each domain, andmτ =

(mτ(d))Dd=1 denotes the buffer state of the BS, where

mτ(d) is the amount of traffic generated for domain

d, all at the beginning of roundτ (i.e., at time (∆ + 1)(τ−1));

2. Global action: ag

τ = (agτ(d))Dd=1 represents the

distri-bution of the ferries among the domains in round τ

(i.e.,ag τ(d)∈NandPDd=1a g τ(d) =K); 3. Global observation: og τ = (υτ,b′τ), whereυτ = (υτ(d))Dd=1

denotes the TTCs for each domain, andb′τ = (b′τ,d)Dd=1

the updated beliefs of node locations at the end of roundτ (at time (∆ + 1)τ); if there is no contact with a domain either because no ferry serves the domain (ag

τ(d) = 0) or because the ferries fail to contact the

node within time ∆, defineυτ(d) :=∞;

4. Global reward: the one-time reward is defined as

rg(xg,ag,og) :=PD_d₌₁m(d)βυ(d), whereas the overall

reward has a slightly different form from (1):

Rπg ∞ :=E[ ∞ X τ=1 β(∆+1)(τ−1)rg(xgτ,a πg τ ,ogτ)]. (13)

We now explain the meaning of this reward function.

Lemma 5.1. The global reward Rπg

∞ is equal to the

dis-counted total throughputλβ (see Section 2.3) under the

(6)

Proof: For a domain d, delivery occurs if and only if there is a contact within ∆, each containingmτ(d) data

(assuming the entire ferry buffer content can be delivered in one shot) at time (∆ + 1)(τ−1) +υτ(d). Its throughput is

thusE_[P∞

τ=1mτ(d)·β(∆+1)(τ−1)+υτ(d)]. Summing over all

domains givesRπg

∞ =λβ.

Remarks: Our control objective of maximizingλβ is thus

converted into maximizingRπg

∞ at global level via ferry

allo-cationagτ and maximizing the weightsE[βυ(d)] at local level

via ferry navigation al

t in each domain (under given agτ).

Note that the latter is consistent with the local reward8 _in

Section 4.1.

Here the period ∆ is introduced to facilitate synchroniza-tion among ferries. Its selecsynchroniza-tion involves a tradeoff: a smaller ∆ means lower probabilities of contact per round but more rounds during a given time, whereas a larger ∆ means higher probabilities of contact per round but fewer rounds. The ex-act value can be tuned to optimize the overall performance (see Fig. 8 and 11).

5.2

Global Control Policy

Let yg

τ denote the observed state. Since one part (sτ)

of the state xg

τ is partially observable and the other part

(mτ) is completely observable, we have a mixed stateygτ :=

(bτ,mτ), wherebτ = (bτ,d)Dd=1are the beliefs of node

loca-tionssτ as reported by the local controllers. At the end of

each round, the new beliefsbτ+1will go through a sequence

of transitions depending on the actions and observations of the local controllers by (4). Instead of repeating those de-tails, we simply consider the outputb′

τ as part of the global

observation such thatbτ+1=b′τ. The BS buffer statemτ+1

transits differently for each domain depending on whether the domain receives service or not:

mτ+1(d) =λd(∆ + 1) +mτ(d)Iυτ(d)>∆. (14)

The global control is much harder to solve than the local control due to the large solution space with|A|= K+_KD−1

actions. Themyopic dispatch policy is given by:

πMY g (yg) := arg max ag D X d=1 m(d)E_[_βυ(d)_]_. ₍₁₅₎

Let Υ(d) denote the TTC in domain d without deadline constraint, whose distribution is given by (7) for initial node distributionbdandag(d) ferries. By definition, we can

com-puteE_[_βυ₍_d_{)] as a truncated average}P∆

t=1βtPr{Υ(d) =t}.

The problem is that even the myopic policy can become complex if the action space is large. To address this issue, we look at simpler approximations. First, we note that the global reward is related to the local reward as follows.

Lemma _5.2. _{For each domain with} _ag₍_d₎ _> ₀ _{and local}

control rewardRπl d,R πl d −β∆≤E[βυ(d)]≤R πl d .

Proof: See Appendix.

The lemma indicates that for sufficiently large ∆, the my-opic policy for global control will allocate the ferries to maxi-mize the weighted sum of local control rewardsP

dm(d)R πl

d .

Assuming myopic search policies are used for local control, from the previous analysis (Corollary 4.4), we can replace

8_{A subtle difference is that}_υ₍_d_{) is the TTC truncated at}

∆, although the difference will be small for large ∆; see Lemma 5.2.

Rπl

d by its lower boundR πl

d ≥βa

g₍_d₎_/_[(1₋_β₎_N

d+βag(d)],

which gives anapproximate myopic dispatch policy:

πMY g (yg)≈arg max ag D X d=1 m(d)ag(d) (1−β)Nd+βag(d) . (16)

A nice property of this approximation is that the function

fd(a) :=m(d)a/[(1−β)Nd+βa] is increasing and concave

in a, i.e., extra ferries for a domain only provide diminish-ing gain compared with existdiminish-ing ferries. This monotonicity means that instead of searching all K+_KD−1

possible val-ues ofag, we can sequentially dispatch one ferry at a time to maximize the reward gain among all the domains. This property implies the following algorithm for implementing (16). Starting fromag=0, repeat fork= 1, . . . , K:

1. findd∗= arg maxd=1,...,Dfd(ag(d) + 1)−fd(ag(d));

2. updateag(d∗)←ag(d∗) + 1.

Compared with theO(∆D K+_KD−1

) complexity9(per round) of the myopic dispatch policy, this approximation signifi-cantly reduces the complexity toO(KD).

In the special case of large domains (Nd ≫ K, ∀d), we

can further simplify (16) into anasymptotically approximate myopic dispatch policy:

πMY g (yg)≈arg max ag D X d=1 m(d)ag(d) Nd , (17)

for which the optimal action is simply to dispatch all the ferries to domaind∗= arg maxdm(d)/Nd. In contrast, for

smaller domains, the ferries may be split among multiple domains in a round. In fact, under (16), they will be dis-patched to the same domain if and only if ∃d such that

fd(K)−fd(K−1)≥fd′(1)−fd′(0) for anyd′6=d.

5.3

Performance of Global Control

To analyze the performance of global control, it is neces-sary to characterize the evolution of buffer states{mτ}∞τ=1.

For each domaind, ifqτ(d) denotes the delivery probability

in roundτ, then we see from (14) that{mτ(d)}∞τ=1follows a

Markovian-like transition diagram in Fig. 3. However, it is not a real Markov chain, asqτ(d) depends on the node beliefs

and the policies, making it intractable for analysis. In this section, we derive bounds for the closed-form dispatch policy (17) to obtain insights under the assumption ofNd≥K,∀d.

λd∆ 2λd∆ 3λd∆ . . .

qτ(d)

1−qτ(d)

1−qτ(d) 1−qτ(d)

Figure 3: Evolution of buffer statemτ(d).

Under this policy, the delivery probabilityqτ(d) and the

average immediate rewardE_[_βυτ(d)_{] can be bounded in closed}

form as follows.

Lemma _5.3. _{Under myopic search policies and the}

dis-patch policy in (17), let dτ := arg maxdmτ(d)/Nd denote

9_{By precomputing}_E_[_βυ(d)_|_ag₍_d_{) =}_k_{] for each}_d_{= 1}_{, . . . , D}

and k = 1, . . . , K, we can reduce the complexity to

O(DK∆ +D K+_KD−1

(7)

the domain served in roundτ. We have qτ(dτ)≥1− 1− K Ndτ ∆ =:q_τ, (18) E_[_βυτ(dτ)_]_≥ βK 1−β∆₁₋ K N_dτ ∆ Ndτ−β(Ndτ −K) . (19)

These bounds suggest the following evolution ofmτ10:

mτ+1=

_m

τ+λ(∆ + 1)−mτ(dτ)edτ w.p. q_τ,

mτ+λ(∆ + 1) o.w. (20)

Sincedτis a function ofmτ, we see that the process{mτ}∞τ=1

following evolution (20) is a Markov chain. This chain also determines the reward: since the expected immediate reward is mτ(dτ)E[βυτ(dτ)], applying (19) gives a lower bound on

the reward r(mτ) := βKmτ(dτ) 1−β∆₁₋ K N_dτ ∆ Ndτ −β(Ndτ−K) , (21)

which is only a function ofmτ. These results lead to the

following performance bound.

Theorem 5.4. Under myopic search policies and the

ap-proximate myopic dispatch policy in (17), the discounted to-tal throughput is lower bounded by

Rπg ∞ ≥E[ ∞ X τ=1 β(∆+1)(τ−1)r(mτ)|m1=0] =:Rπ∞g, (22)

where the expectation is over the Markov chain {mτ}∞τ=1

specified in (20).

Generally, (22) does not have a closed-form solution as it depends on the transient statistics of the Markov chain. Hence, we develop a numerical method to approximate it arbitrarily. LetMτ :={(mτ(i), p(τi))} denote the set of all

possible values ofmτ and the probabilitypτ of reaching it

from the initial buffer state m1. Then Algorithm 1 com-putes a T-step approximation of Rπg

∞ by computing Mτ

(τ = 1, . . . , T+ 1) iteratively. In each iteration (lines 2– 8), it enumerates all elements ofMτ (line 4), computes the

possible next states and their probabilities inMτ+1(line 7),

and accumulates the corresponding reward (line 8). Obviously, Rπg

T is a lower bound of Rπ∞g. On the other

hand, it is easy to see that assuming immediate delivery for every domain in roundsτ > T gives an upper bound:

Rπg ∞ −R πg T ≤β (∆+1)T X (m_T₊₁_,p_T₊₁₎_∈M_T₊₁ pT+1 D X d=1 mT+1(d) ! +β (∆+1)(T+1)_{(∆ + 1)} 1−β∆+1 D X d=1 λd ! . (23)

Replacing line 2 by a stopping rule requiring the above bound≤ǫfor a constantǫ >0 will guarantee the resulting

Rπg

T to beǫ-close toRπ∞g. Note that the size ofMτ grows

exponentially as |Mτ| = 2τ−1, and thus the accuracy of

the above approximation will be limited by the complexity constraint.

10_{Here “w.p.” stands for “with probability” and “o.w.” for}

“otherwise”.

Algorithm 1Evaluate Global Reward

Require: Domain size (Nd)Dd=1, data rateλ, number of

fer-ries K, round length ∆, discount factor β, initial BS buffer statem1, and horizonT.

Ensure: Return the T-step approximated reward lower boundRπg T . 1: M1← {(m1,1)},Rπ0g = 0 2: forτ= 1 toT do 3: Mτ+1← ∅,Rπτg ←R πg τ−1 4: for all(mτ, pτ)∈ Mτ do 5: dτ←arg maxdmNτ(dd) 6: q_τ ←1−1− K Ndτ ∆ 7: add (mτ +λ(∆ + 1)−mτ(dτ)edτ, pτqτ), (mτ + λ(∆ + 1), pτ(1−qτ)) toMτ+1 8: Rπg τ ←Rπτg+β(∆+1)(τ−1)pτr(mτ)

6.

SIMULATION AND COMPARISON WITH

PREDETERMINED CONTROL

Having developed the DAS policies, we now evaluate their performance against appropriate benchmarks. A benchmark of particular interest is the class of predetermined control policies specified by fixed ferry routes. In the sequel, we will derive the optimal predetermined policies for local and global control and compare them with the proposed policies.

6.1

Local Control

For the predetermined local controller, knowledge of node location is fixed at the steady-state distribution b∗. Thus,

to maximize the chance of contact, the best policy for k

ferries is to wait for the node at thek most probable cells

s(i) ₍_i_{= 1}_{, . . . , k}_{) such that}_b

∗(s(1)) ≥ b∗(s(2)) ≥. . .. We

will call this thewaiting policy.

We now compare the waiting policy with the myopic search policy. Suppose the node follows a 2-D random walk on a grid with the level of mobility controlled by an activeness parameter α:=P

j6=iP(i, j) (i.e., probability of moving to

a different cell), starting from the steady-state distribution (uniform distribution). As illustrated in Fig. 4–5, the per-formance of both policies improves (reward increases while TTC decreases) as the number of ferrieskincreases, and the myopic policy clearly outperforms the waiting policy. The performance gap, however, varies depending on the level of mobility, with a larger gap in low mobility cases. Intuitively, this is because the waiting policy relies on node mobility to create contact opportunities, whereas the myopic policy will actively search for the node and is thus less affected. We also plot the lower and upper bounds on the myopic reward (Corollary 4.4–4.5) and the corresponding bounds on its mean TTC; we note that both bounds track the actual per-formance consistently and that the bounds are fairly tight.

We have also conducted similar simulations under biased mobility (Fig. 6–7), modeled by alocalized random walkwith

tightness parameter η [10]. This model allows non-uniform transition probabilitiesP(i, j)∝e−η||j−h|| _{biased toward a}

home cell h (ifη ≥0), where||j−h||denotes the taxicab distance between cells j and h, andη controls the level of bias (η= 0 for standard random walk). In the simulations, the home cell was at the center of the grid. The results in-dicate that performance trends and comparisons are similar

(8)

1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 waiting myopic upper bound lower bound k R πl ∞

(a) Local reward.

1 2 3 4 5 6 7 8 9 10 100 101 102 103 waiting myopic upper bound lower bound k E [Υ πl ] (b) Mean TTC. Figure 4: Myopic search policy vs. waiting policy: low-mobility random walk (α= 0.1, η = 0, N = 25,

β= 0.9, 5000 Monte Carlo runs).

1 2 3 4 5 6 7 8 9 10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 waiting myopic upper bound lower bound k R πl ∞

(a) Local reward.

1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30 35 waiting myopic upper bound lower bound k E [Υ πl ] (b) Mean TTC. Figure 5: Myopic search policy vs. waiting policy: high-mobility random walk (α= 0.9, η= 0, rest as in Fig. 4). 1 2 3 4 5 6 7 8 9 10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 waiting myopic upper bound lower bound k R πl ∞

(a) Local reward.

1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 waiting myopic upper bound lower bound k E [Υ πl ] (b) Mean TTC. Figure 6: Myopic search policy vs. waiting policy: low-mobility, localized random walk (α= 0.1,η= 0.5, rest as in Fig. 4).

to those under symmetric mobility, but with a smaller gap between the two policies, as biased mobility provides more information to the waiting policy through its non-uniform steady-state distribution. The analytical bounds are looser in these cases.

In all of the above simulation results, we observe that the reward and the TTC metrics both improve monotonically as the number of ferries increases, but the marginal gain becomes smaller, particularly in the case of TTC; this is in accordance with the theoretical results developed in Sec-tion 4.3.

6.2

Global Control

The problem of predetermined global control is analogous to that of dynamic global control in Section 5.1 except that the beliefs of node locations are fixed at their steady states, and the BS buffer state is replaced by its expectationmτ.

For fair comparison with the myopic dispatch policy (15), we consider the followingmyopic predetermined dispatch policy:

πg0(m) = arg max ag D X d=1 m(d)E_[_βυ0(d)_]_, ₍₂₄₎ 1 2 3 4 5 6 7 8 9 10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 waiting myopic upper bound lower bound k R πl ∞

(a) Local reward.

1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 waiting myopic upper bound lower bound k E [Υ πl ] (b) Mean TTC. Figure 7: Myopic search policy vs. waiting policy: high-mobility, localized random walk (α = 0.9, η = 0.5, rest as in Fig. 4).

whereυ0(d) is the TTC in domaindunder the waiting pol-icy, starting from the steady-state distribution. After one round, the expected buffer state will evolve as mτ+1(d) =

λd(∆ + 1) +mτ(d) Pr{υ0(d)>∆}. A crucial difference

be-tween (24) and (15) is that E_[_βυ0(d)_{] and Pr}_{_υ0₍_d₎ _> _∆_}

are fixed for any given ag(d). Therefore, given the initial buffer statem1=m1, the entire sequences of{mτ}∞τ=1and {ag

τ}∞τ=1are determined.

To complete the policy specification, we need to eval-uate E_[_βυ0(d)_{] and Pr}_{_υ0₍_d₎ _> _∆_} _{for a given} _ag₍_d_{) =} k ∈ {1, . . . , K}. Let Hk (index d is dropped for

simplic-ity) denote the hitting time for the k most probable cells

{s(i)}ki=1 in the steady state b∗, starting from b∗. Then

E_[_βυ0(d)_{] =} P∆

t=1βtPr{Hk = t} and Pr{υ0(d) > ∆} =

Pr{Hk>∆}. By analysis similar to that in Section 4.3, we

have Pr{Hk = t}=pt0Qtj−=11(1−p 0

j), wherep0t is the

con-ditional contact probability given byp0

t =Pki=1b 0

t(s(i)), for

b01=b∗ andb0t+1=PTb0_t_\{_s(i)_}k

i=1.

We now compare the performance of the above predeter-mined policy with that of the myopic dispatch policy (15) and its approximations (16–17). We assume 2-D random walk for each domain with identical size, activeness, and traffic rate. We first evaluate the performance of these poli-cies for different values of the round length ∆; see Fig. 8 (“approx myopic 1” for (16) and “approx myopic 2” for (17)), where for each ∆, we simulate the policies for enough time (≥100 slots) to approximate the total rewardRπg

∞.

Interest-ingly, although it takes much longer to guarantee a contact (at least 5 slots even if nodes do not move), all the dis-patch policies achieve their best performance at a smaller ∆: ∆ = 3 for the myopic policy (15) and its approximation (16), ∆ = 2 for its other approximation (17), and ∆ = 1 for the predetermined policy (24).

Based on the above results, we compare the policies under their best ∆ with respect to (wrt) the discounted through-put λβ,T := E[PTt=1N(t)β

t_{] (Fig. 9), where} _N₍_t_{) denotes}

the amount of traffic delivered in slott, and wrt the undis-counted throughput λβ=1,T (Fig. 10). The myopic policy

significantly outperforms the predetermined policy on both metrics (by 125% on λβ,T and by 47% on λ1,T).

More-over, the two approximations provide close-to-myopic per-formance at much reduced complexities, where (16) even outperforms the myopic policy onλβ,T. This shows that

my-opic policy is suboptimal for global control. We also evaluate anǫ-approximation (withǫ= 0.25) of the analytical bound in (22) by Algorithm 1 (“myopic lower bound”), which gives a conservative lower bound for myopic policies.

(9)

1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 11 12 myopic approx myopic 1 approx myopic 2 predetermined ∆ R πg ∞ Figure 8: Optimizing ∆ (D = 2, Nd≡ 25, αd ≡0.5, ηd≡0, K = 5, λd≡1, β= 0.9, 500Monte Carlo runs). 0 20 40 60 80 100 0 2 4 6 8 10 12 myopic approx myopic 1 approx myopic 2 predetermined myopic lower bound

T

λβ

,T

Figure 9: Myopic vs.

prede-termined dispatch policy:

dis-counted throughput (∆

opti-mized by Fig. 8, rest as in Fig. 8).

0 20 40 60 80 100 0 20 40 60 80 100 120 140 160 180 200 myopic approx myopic 1 approx myopic 2 predetermined T λ1 ,T

prede-termined dispatch policy: undis-counted throughput (same as Fig. 9).

of different mobility parameters and traffic rates for each domain under the localized random walk model defined in Section 6.1; see Fig. 11–13 (the optimal ∆ is 2 for the my-opic policies and 1 for the predetermined policy). The re-sults show similar trends, except that the performance gap between the myopic policies and the predetermined policy shrinks (the myopic policy outperforms the predetermined policy by 68% onλβ,Tand by 26% onλ1,T). Intuitively, this

is because in heterogeneous cases, prior knowledge such as domain sizes, mobility parameters, and traffic rates already provides good distinction between domains, which helps the predetermined dispatch policy just as biased mobility helps the waiting policy in local control.

7.

CONCLUSION

We have considered the control of multiple data ferries in partitioned wireless networks. Compared with the lit-erature, our solution deals with a more challenging case of stochastically moving nodes and limited ferry-node and ferry-ferry communication ranges. Using the approach of stochastic control, we develop a fully dynamic solution via the hierarchical policy of “dispatch” and “search”, which only requires local cooperation between ferries and provides sub-stantial performance improvements over existing predeter-mined control in high uncertainty scenarios (e.g., uniform node steady-state distributions, identical partitions).

Although we have focused on data dissemination in pre-senting the detailed solution, our approach is applicable to other communication scenarios as well. Under other commu-nication modes, the local control remains the same whereas the global control needs to be modified. For example, a global POMDP similar to that in Section 5.1, withmτ(d)

denoting the traffic generatedby domaindand reward

rg(xg, ag, og) := PDd=1m(d)β∆+1Iυ(d)≤∆, corresponds to

data harvesting (with delayed pickup). A more complicated variation can model peer-to-peer communication via the re-lay of BS by tracking the buffer states of each source domain and the BS with a set of state variables and using the BS buffer statemτ to compute the rewardrg(xg,ag,og) as in

Section 5.1. Detailed studies are left to future work.

8.

REFERENCES

[1] G. D. Celik and E. Modiano. Dynamic vehicle routing for data gathering in wireless networks. InIEEE CDC, December 2010.

[2] T. He, K.-W. Lee, and A. Swami. Flying in the dark: Controlling autonomous data ferries with partial observations. InACM MobiHoc, 2010.

[3] D. Henkel and T. Brown. On controlled node mobility in delay-tolerant networks of unmannned aerial vehicles. InISART, 2006.

[4] D. Henkel and T. Brown. Towards autonomous data ferry route design through reinforcement learning. In

IEEE/ACM WoWMoM, 2008.

[5] D. Jea, A. Somasundara, and M. Srivastava. Multiple controlled mobile elements (data mules) for data collection in sensor networks. InDCOSS’05. [6] V. Kavitha and E. Altman. Analysis and design of

message ferry routes in sensor networks using polling models. InIEEE WiOpt, May 2010.

[7] C. Papadimitriou and J. Tsitsiklis. The complexity of markov decision processes.Math. of Operation Research, 1987.

[8] E. Sondik. The optimal control of partially observable markov processes over the infinite horizon: Discounted costs.OR, 1978.

[9] M. Tariq, M. Ammar, and E. Zegura. Message ferry route design for sparse ad hoc networks with mobile nodes. InACM MobiHoc, 2006.

[10] B. Walker, T. Clancy, and J. Glenn. Using localized random walks to model delay-tolerant networks. In

IEEE MILCOM, 2008.

[11] J. Wu, S. Yang, and F. Dai. Logarithmic

store-carry-forward routing in mobile ad hoc networks.

IEEE Trans. PDS, 2007.

[12] Z. Zhang and Z. Fei. Route design for multiple ferries in delay tolerant networks. In IEEE WCNC, 2007. [13] W. Zhao, M. Ammar, and E. Zegura. Controlling the

mobility of multiple data transport ferries in a delay-tolerant network. InIEEE INFOCOM’05. [14] W. Zhao, M. Ammar, and E. Zegura. A message

ferrying approach for data delivery in sparse mobile ad hoc networks. InACM MobiHoc, 2004.

APPENDIX

Proof of Lemma 4.2

Based on (8), we have ∂Rπ∞ ∂pπ t =βt t−1 Y j=1 (1−pπj)− ∞ X s=t+1 βspπs s−1 Y j=1,j6=t (1−pπj) = "_t₋₁ Y j=1 (1−pπj) # βt−E_[_βΥπ_|_Υπ_{> t}_]_≥₀_. Therefore,Rπ

(10)

1 2 3 4 5 6 7 8 9 10 4 5 6 7 8 9 10 11 12 myopic approx myopic 1 approx myopic 2 predetermined ∆ R πg ∞ Figure 11: Optimizing∆(D= 3, N := (Nd)Dd=1 = (16, 25, 36), α = (0.9, 0.5,0.1), η= (0,0.2,0.5), K= 5, λ = (1, 0.75, 0.5), β = 0.9, 500

Monte Carlo runs).

0 20 40 60 80 100 0 2 4 6 8 10 12 myopic approx myopic 1 approx myopic 2 predetermined myopic lower bound

T

λβ

,T

prede-termined dispatch policy:

dis-counted throughput (∆

opti-mized by Fig. 11, rest as Fig. 11).

0 20 40 60 80 100 0 50 100 150 200 250 myopic approx myopic 1 approx myopic 2 predetermined T λ1 ,T

prede-termined dispatch policy: undis-counted throughput (same as Fig. 12).

Proof of Lemma 4.3

First, we prove by induction that (recallb(t)_{= (}_PT₎t_b0₎

bπt(i)≤ b(t)₍_i₎ 1−Pt−1 j=1 P s∈aπ j b (j)₍_s₎, ∀i∈ S, t≥1. (25) Fort= 1,bπ 1 =PTb0=b(1). Fort >1, bπt(i)= X j6∈aπ t−1 bπ t−1(j)Pj,i 1−P s∈aπ t−1b π t−1(s) ≤ b (t)₍_i₎ 1−Pt−1 j=1 P s∈aπ j b (j)₍_s₎,

obtained by applying (25) forbπt−1(j). This proves (25).

Then, we apply the above topπt =Ps∈aπ

t b π t(s): pπt ≤ P s∈aπ t b (t)₍_s₎ 1−Pt−1 j=1 P s∈aπ j b (j)₍_s₎ ≤pt forp_tdefined as in (10).

Proof of Corollary 4.4

The proof is based on the fact that pMY

t ≥ k/N, ∀t. By

Lemma 4.2, we will lower boundRMY

∞ by replacingpMYt with k/N, which yields RMY ∞ ≥ ∞ X t=1 βtk N(1− k N) t−1 ₌ βk N(1−β) +βk.

Proof of Corollary 4.5

Ifb0 =b∗, then b(t) ≡b∗,∀t. Accordingly, Bt,k ≡B∗,k,

andT0=⌈B−_∗_,k1⌉. Plugging these into (9) gives (11).

If, in addition, P is doubly stochastic, then it is known thatb∗is uniform, i.e.,B∗,k =k/N, applying which to (11)

gives R∞=β⌈N/k⌉ 1− k N ⌈N k⌉ −1 +k(β−β ⌈N/k⌉₎ N(1−β) . (26) If we maximize the right-hand side of (26) with respect to

⌈N

k⌉, calculation shows that the maximum is achieved at ⌈N k⌉= N k, N k + 1, or N k − 1 1−β− 1

logβ + 1. At the first two

values, the right-hand side is

kβ(1−βN/k)

N(1−β) <

βk

(1−β)N;

at the third value, it is

k N βNk+1− 1 1−β− 1 logβ logβ + β 1−β ! < βk (1−β)N.

Combining both givesR∞≤βk/[(1−β)N].

Proof of Lemma 5.2

The upper bound holds becauseRπl

d =E[βΥ(d)] whileE[βυ(d)]

is a truncated average. The lower bound is because

Rπl d −E[β υ(d)_{] =} ∞ X t=∆+1 βtPr{Υ(d) =t} ≤β∆Pr{Υ(d)>∆} ≤β∆. (27)

Proof of Lemma 5.3

Letpt be the conditional contact probability in domaindτ.

By definition, we have pt ≥K/Ndτ for the myopic search

policy. From the analysis of local control (Section 4.3), we haveqτ(dτ) = 1−Q∆_t₌₁(1−pt) and E_[_βυτ(dτ)_{] =} ∆ X t=1 βtPr{υτ(dτ) =t}= ∆ X t=1 βtpt t−1 Y j=1 (1−pj),

both increasing withpt. Plugging in the lower bound ofpt

yields the results.

Proof of Theorem 5.4

Due to discount, the total throughputRπg

∞ is an increasing

function of the probability of delivery qτ(dτ) and is thus

lower bounded if qτ(dτ) is replaced by its lower bound in

(18), makingmτ evolve according to (20).

Moreover, for given (bτ, mτ), the expected immediate

reward under the dispatch policy (17) ismτ(dτ)E[βυτ(dτ)],

which is lower bounded byr(mτ) defined in (21) for anybτ