Stochastic Dynamic Programming Based Approaches to Sensor Resource Management
R.B. Washburn, M.K. Schneider, J.J. Fox Fusion Technology and Systems Division
Alphatech, Inc.
Burlington, MA
fbob.washburn, michael.schneider, john.foxg@alphatech.com
Abstract – This paper describes a stochastic dynamic programming based approach to solve Sensor Resource Management (SRM) problems such as occur in tracking multiple targets with electronically scanned, multi-mode radar. Specifically, it formulates the SRM problem as a stochastic scheduling problem and develops approximate solutions based on the Gittins index rule. Novel results in- clude a hybrid state stochastic model for the information dynamics of tracked targets, an exact index rule solution of the SRM problem for a simplified tracking model, and use of approximate dynamic programming to extend the index rule solution to more realistic models.
Keywords: Sensor resource management, dynamic pro- gramming, Gittins index, multi-armed bandit.
1 Introduction
Modern sensor suites contain sensors that are agile, al- lowing fast redirection (e.g., electronic scanning) and fast switching between modes (e.g., radar modes such as SAR and MTI), giving a large number of control options. How- ever, allocation of sensor resources is also subject to a va- riety of constraints: there may be many things to look at (complex environments with multiple targets), limited time to sense targets because of deadlines on the usefulness of information (urgent requests) or exposure to threat environ- ments, sensors may have a limited field of view, and so on.
Developing sensor control algorithms that satisfy such con- straints and maximize the value of information obtained is the goal of Sensor Resource Management (SRM) research.
Stochastic dynamic programming [1] provides a princi- pled, systematic approach to formulating and solving SRM problems [8]. However, its application requires paying the price of developing mathematical models for the problem formulation and then finding efficient algorithmic solutions.
In this paper we describe new results in developing effi- cient solutions based on stochastic dynamic programming to solve SRM problems such as occur in tracking multiple
targets with electronically scanned, multi-mode radar.
Specifically, we formulate the SRM problem as a stochastic scheduling (or multi-armed bandit) problem and develop approximate solutions based on the Gittins index rule [2]. Although there has been much research on index rules [7], [6], [3], [9], [5], there appear to be no applications to SRM except for the recent work by Krishnamurthy and Evans [4]. The reason for this is that the SRM problem, formulated as stochastic scheduling, doesn’t usually satisfy the conditions necessary for the existence of an index rule solution. Thus, one is forced to develop approximations (e.g., the “restless bandit” work of [9], [5]). In this paper we show how to relax one of the “usual” conditions and to develop index rule-based SRM algorithms for kinematic tracking.
The new contributions of this paper include:
1. Stochastic scheduling formulation of SRM and a cor- responding extension of the index rule solution to more general reward functions, more suitable for SRM problems.
2. Hybrid state stochastic model of information dynam- ics for tracking problems, which facilitates efficient solutions of stochastic dynamic programs and allevi- ates the “curse of dimensionality.”
3. Exact index rule solution for a simple tracking model.
This solution provides a basis for efficient approxima- tions of more realistic tracking models.
4. Lookahead approximation of the stochastic dynamic program solution based on an index rule solution, ap- plicable to more realistic tracking models.
Section 2 formulates the SRM problem as a stochastic scheduling problem and shows how to extend the index rule solution to more general types of reward functions.
This section also discusses generalization of the formula- tion to controlling sensors with variable dwell times and
selectable modes. Section 3 describes hybrid state mod- els of information dynamics to use for tracking problems.
These models use discrete-valued random variables to drive a continuous-valued information state. This approach helps reduce the dimensionality of the resulting stochastic dy- namic programming equations. In Section 4 we find the op- timal index rule solution for a simple version of the tracking problem and use this solution in Section 5 as the basis for approximate solutions of more complex models. Section 5 also describes numerical results of the lookahead approxi- mation for a multi-target tracking example.
2 SRM Problem Formulation
2.1 Stochastic Scheduling Formulation
We formulate SRM as a stochastic control problem in which sensor actions u(t) are the controls and the con- trolled state, denotedx( t), is an information state. Con- ceptually,x( t)denotes the accuracy of the information in the track. Most generally, it can be the conditional proba- bility of the target state given previous observations (as in [4]). In our problemx(t)will be an estimate of the track error covariance. In this paper we will assume that the sen- sor action applies to one target at a time and that time is discrete (e.g.,t =1;2;:::). The controlu(t)=iindicates which targeti to look at timet. The information state is initiallyxi
(0)=
i, andxi
( t+1)is given by a stochastic equation
x
i
( t+1)=f
i (x
i (t);w
i
( t);u(t)) (1) fori = 1;:::;n, wherewi
(t)are random variables repre- senting uncertainties in target dynamics and sensor mea- surements. We assume thatwi
(t)are independent random variables with distributionpi
( w
i ).
At each timetwe must determine an actionu(t)for the sensor to take depending only on the information state
x(t)=(x
1
(t);:::;x
n
(t)) (2)
of all targets. The SRM problem is to determineu(t)as a function ofx( t)which maximizes the expected value of the total discounted reward, namely
J()=max
E
(
1
X
k =0
t n
X
i=1 R
i (x
i
( t))dtjx( 0)= )
(3) where the optimization is taken over all stationary control policies(i.e.,u(t)=( x(t))). The parameteris the discount factor. The dynamic programming equation is
J()=
max
u (
n
X
i=1 R
i (
i )+
Z
J(f(;w;u) )p(w)dw )
(4)
where
p(w)= n
Y
i=1 p
i (w
i
): (5)
The optimal control policy is given by finding the action
u=()which achieves the maximumJ().
2.2 SRM and Index Rules
In the conventional stochastic scheduling problem (for which there are index rule solutions), one receives the dis- counted reward
t
R
u( t) x
u(t) ( t)
(6) at timet. In the SRM problem one receives the reward
t
n
X
i=1 R
i (x
i
(t)): (7)
In the conventional scheduling problem, the statexi (t)of a machine evolves via
x
u(t)
( t+1)=f
u(t) x
u(t) (t);w
u(t) (t)
(8) and
x
i
( t+1)=x
i
( t) ifi6=u(t): (9) In the SRM problem (8) is still true, where the equation represents the prediction and update of the target track state over one time step. The noisewi
(t)expresses uncertainty in how the target state actually changes as well as uncer- tainties in the measurement of the target state. However, if the sensor doesn’t look at the target, the evolution of the state is
x
i
(t+1)=g
i (x
i (t);w
i
( t)) ifi6=u(t) (10) where the equation represents the prediction of the target track state over one time step, without a measurement up- date. The noisewi
(t)expresses uncertainty in how the tar- get state changes in one time step. In general,gi
(x
i
;w
i )6=
x
i.
Thus, the SRM problem differs from the stochastic scheduling problem in two critical respects:
1. The reward depends on unsensed targets (in the conventional scheduling problem idle machines con- tribute no reward).
2. The state of an unsensed target can change (in the conventional scheduling problem the state of idle ma- chines remain frozen).
In some SRM problems it is possible to have
g
i (x
i
;v
i )=x
i (11)
so that the state of unsensed targets remains the same, just as in the scheduling problem. For example, this is the case
ifxiare statistics of a static parameter (e.g., target classifi- cation problems, wherexiis a probability vector of possible target types). If (11) is true, we can reformulate the SRM problem with an equivalent reward function so that the re- ward doesn’t depend on unsensed targets, again just like the scheduling problem, and thus we can get index rules which are optimal solutions of the SRM problem.
Theorem 1. Suppose that
x
i
( t+1)=f
i (x
i (t);w
i
( t);u(t)) (12) for i = 1;:::;n; wherewi
(t)are independent for all i;t andwi
(t)has distributionpi (w
i
)for alli;t. Suppose that the controlu(t)2f1;:::;ngandu(t)depends on the state
x( t). Suppose that
f
i (x
i
;w
i
;u)=x
i (13)
for allu6=i. Define
~
R
i ( x
i )=
Z
R
i (f
i (x
i
;w
i
;i))p
i (w
i )dw
i R
i (x
i ):
(14) Then a policy optimizes
E (
1
X
t=0
t
~
R
u(t) x
u( t) (t)
jx( 0) )
(15)
if and only if it optimizes
E (
1
X
t=0
t n
X
i=1 R
i ( x
i
(t))jx( 0) )
: (16)
Proof. For any controlu(t)show that
E (
1
X
t=0
t
~
R
u(t) x
u( t) (t)
jx( 0) )
= aE (
1
X
t=0
t n
X
i=1 R
i (x
i
(t))jx( 0) )
+b
for constantsa= 1 1
>0;b.
Corollary 2. If the target state remains unchanged when the target is not viewed by the sensor, then there is an index rule which is an optimal solution of the SRM problem. That is, there are functionsmi
(x
i
), for each targeti, such that the optimal SRM policy is given by
u(t)=argmax
i fm
i (x
i
( t))g: (17) IfJi
( x
i
;M)is the solution of
J
i (x
i
;M)=
max
M;
~
R
i ( x
i )
+ R
J
i ( f
i (x
i
;w
i );M)p
i (w
i )dw
i
; (18)
where
~
R
i ( x
i )=
Z
R
i (f
i (x
i
;w
i ))p
i (w
i )dw
i R
i (x
i ); (19) then the index function is defined by
m
i ( x
i
)=minfM:J
i (x
i
;M)=Mg: (20) Proof. Follows from the theorem above and results for find- ing index functions in [1].
The result shows that the assumption that idle machines do not contribute to the reward is not a critical assump- tion. As long as the states of idle machines are frozen, the theorem shows that it is possible to find an equivalent reward such that idle machines do not contribute. Thus,
‘frozen states of idle machines’ is the key assumption in or- der to have an optimal index rule solution to the stochastic scheduling problem. Of course, the frozen state assump- tion is not realistic for tracking SRM problems. However, one can use index rules as the basis for good suboptimal approximations of the more general problem.
2.3 Extension of SRM Formulation
It is necessary to extend the problem formulation to model more realistic situations in which sensors have se- lectable modes and in which different modes require differ- ent amounts of time to complete (e.g., consider the differ- ence between SAR and MTI modes for a radar). In such situations we formulate the scheduling problem so that the controlu(t) and the information state xi
(t)are constant over the time interval(k) t < (k+1). Letxi
(k)
denote the information state of targetiat stagek. Stagek begins at time(k)and ends at time (k+1). At each stagekwe must determine a controlu(k)for the sensor to use over the period(k)t <( k+1). In this case the controlu(k)has the form
u(k)=(( k);(k)) (21) whereis the targeti=1;:::;nto look at andis a sensor mode to use. Time(0)=0;and(k+1)is given by
(k+1)=( k)+( (k)); (22) so that the time duration(( k))of stagekdepends on the sensor mode(k)used. This controlu(k)depends only on the information state at the beginning of the state, namely,
x( k)=(x
1
(k);:::;x
n
( k)): (23) The information state evolves according to
x
i
( k+1)=f(x
i (k);w
i
(k);u(k)) (24) We can assume thatwi
( k)depends only onxi
( k)andu(k) and has conditional distributionpi
(w
i jx
i
;u).
The SRM problem is to determineu(t)as a function of
x( t)which maximizes the expected value of the total dis- counted reward, namely
J( )=max
E
(
Z
1
0 e
t n
X
i=1 R
i (x
i
(t))dtjx( 0)= )
(25) where the optimization is taken over all stationary control policies(i.e.,u(t)=( x(t))). The parameteris the discount rate. Because the information state is piecewise constant, the total discounted value is
Z
1
0 e
t n
X
i=1 R
i (x
i (t) )dt
= 1
X
k =0 e
(k )
e
(k +1)
n
X
i=1 R
i ( x
i (k)):
Thus, we obtain an equivalent discrete time problem with the dynamic programming equation
J()=
max
u (
1 e (u)
P
n
i=1 R
i (
i )
+e (u)
R
J(f( ;w;u))p(wj;u)dw )
(26) where
p(wj;u)= n
Y
i=1 p
i (w
i j
i
;u): (27)
3 Model of Information Dynamics for Tracking Problems
3.1 Modeling Detection Uncertainty
To design the SRM algorithm we need to model how the information state responds to sensor actions, as modeled by the equation
x
i
(t+1)=f
i ( x
i (t);w
i
(t);u(t)): (28) Modeling the information state as a conditional probabil- ity distribution of the target state variable is one possible model, but for many problems it gives a large dimensional stochastic evolution equation and a correspondingly large dimensional dynamic program to solve. Thus, we try to model the information dynamics with smaller number of salient statistics in order to obtain a tractable stochastic dy- namic program.
For tracking problems, in which the SRM algorithm is trying to maximize the quality of kinematic tracking for multiple targets, the track error covariance is an effective information statexi
(t)and hybrid state equations, in which
w
i
( t)takes a finite number of values, provide a low com- plexity description of the information dynamics.
The simplest instance of the model haswi
(t)just indi- cate target detections so thatwi
(t)takes just two possible values (denote, wi
(t) = D if the target is detected and
w
i ( t) =
D if the target isn’t detected). Leti denote the probability that targetiis detected, assuming that the sensor looks at targeti. Thenfihas the following form.
f
i (x
i
;D;i)=
F
i
x
i (t)
1
+H T
i R
1
i H
i
1
F T
i +G
i Q
i G
T
i
(29)
f
i x
i
;
D;i
=F
i x
i (t)F
T
i +G
i Q
i G
T
i
(30) If the sensor doesn’t look at targeti(u6=i), then
f
i (x
i
;w
i
;u)=F
i x
i (t)F
T
i +G
i Q
i G
T
i
(31) whetherwi
=DorD. Herexi
(t)denotes an error covari- ance and the matricesFi,Gi,Qi,Hi,Ri model the effect of prediction and update on the error covariance.
3.2 Modeling Association Uncertainty
We can also include the effect of misassociations by con- sidering a four state processwi
(t)as follows. Suppose that
w
ican be one of the four states
(A;D); A;
D
;
A ;D
;
A ;
D
(32) whereDstands for detection,Dfor missed detection,Afor association, andAfor misassociation. Thenfi
(x
i
;w
i
;u)
has the following form.
f
i (x
i
;(A;D);i)=
F
i
x
i (t)
1
+H T
i R
1
i H
i
1
F T
i +G
i Q
i G
T
i
(33)
f
i x
i
; A;
D
;i
=F
i x
i ( t)F
T
i +G
i Q
i G
T
i
(34)
f
i x
i
;
A;D
;i
=
f
i x
i
;
A ;
D
;i
=F
i (x
i (t)+S
i )
1
F T
i +G
i Q
i G
T
i
(35) The idea is thatSi
0models the bias induced by a mis- association. If the sensor doesn’t look at targeti(u 6=i), then
f
i ( x
i
;w
i
;u)=F
i x
i (t)F
T
i +G
i Q
i G
T
i
: (36) A simple model forSiis
S
i
= a
2
i
d
i +2
K
i
^
R
i K
T
i
(37) whereai is a gate size parameter (e.g.,ai
=3orai
=4),
d
i is the dimension of the measurement,Kiis the Kalman gain
K
i
=
x
i (t)
1
+H T
i R
1
i H
i
1
H T
i R
1
i
(38)
and
^
R
i
=H
i x
i (t)H
T
i +R
i (39)
is the predicted measurement covariance.
The probabilities of the discrete events are given by
Prfw
i
(t)=(A;D) g=P
AjD
i
: (40)
Pr
w
i
( t)= A;
D
=P
Aj
D (1
i
): (41)
Pr
w
i (t)=
A;D
= 1 P
AjD
i
: (42)
Pr
w
i (t)=
A;
D
= 1 P
Aj
D
( 1
i
): (43) Fordi
=2;the association probability models are given by
P
AjD
= 1
2 1
2 e
1
2 a
2
i
e a
2
i bi
b
i +
1
2
(44)
and
P
Aj
D
=1 e bia
2
i
; (45)
where
b
i
=
i
r
det
^
R
i
; (46)
whereai is the desired gate size, R^is the predicted mea- surement covariance, andiis the expected density of con- fuser objects in the measurement gate of targeti. Note that
b
i is the expected number of confusers in the one-sigma predicted measurement gate. This is sometimes referred to as the “contention” for the target.
4 Tracking SRM Index Rule
4.1 Target Location Example
Suppose that each targetihas a location andxi
(t)is the root mean square error of its location estimate. Define the goal setGto be
G=[0;Æ] (47)
for someÆ >0. Define the individual target track reward function as
R( x
i
)=1(x
i
2G) (48)
where1(C)denotes the indicator function, equal to1if the conditionCis true and equal to0otherwise.
The sensor looks at just one target,i=u(t)at timet. If the sensor looks at targetiand the target is detected, then
x
i
(t+1)=
x
i (t)
2
+ 2
1=2
: (49)
Assume that is the probability of detecting the target, if the sensor looks at it. If the target is not detected or if the sensor doesn’t look at targeti, then
x
i
(t+1)=x
i
(t): (50)
Letdenote the initial location error–i.e.,xi
(0)=.
4.2 Equations for Index Function
Transformxi tozi
= x 2
i
. Then we can express the reward as
R(z
i )=1 z
i 2[Æ
2
;1)
(51) with the evolution equation
f(z
i
;w
i )=
z
i +
2if target is detected
z
iif target is not detected : (52) To find the index rule to a sum of individual target re- wards, we need to replace the reward functionR(zi
)with
~
R(z
i
)defined by
~
R(z
i ) =
Z
R(f( z
i
;w
i ))p
i (w
i )dw
i R( z
i )
=
ifÆ 2 2z<Æ 2
0ifz<Æ 2 2orzÆ 2 : To find the index function corresponding to R~( zi
), we solve
J
i (z
i
;M)=
max 8
<
:
M;1 z
i 2[Æ
2
2
;Æ 2
)
+J
i z
i +
2
;M
+(1 )J
i (z
i
;M)
9
=
; : (53) by successive approximation, noting that the value function at any iteration is piecewise constant. The successive ap- proximation iterations converge to the optimal value func- tion
J
i ( z
i
;M)=
0
X
k >
a
k
(M)1(z
i
2[a+k;a+(k+1)))
+M1(z
i
2[a+;1))
where
k (M)=
(
jk j+1
1
+ jk j +1
MifM jk j+1 1
1 jk j+1
MifM >
jk j+1
1
1 jk j +1
(54) fork0and
=
1 (1 )
<1: (55)
4.3 Index Rule
The index functionm(zi
)is defined as the smallestM such that
J
i ( z
i
;M)=M: (56)
Forzi
2 [a+k;a+(k+1)), this is the smallestM such thatk
( M)=M. Thus,
m(z
i )=
jk j+1
1
jk j+1
(57)
forzi
2[a+k;a+( k+1)). Similarly, forzi 2[a+
;1), we have
m(z
i
)=0: (58)
In terms of the parametersÆand, we can express the index function as
m(z
i )=
8
<
:
jk j +1
1
1
jk j+1 ifÆ 2+(k 1) 2zi
<Æ 2
+k
2
;k0
0ifzi
Æ 2
:
(59) Thus, the index rule control is
u(t)=argmaxfm(z
i
) g; (60)
which is equivalent to
u(t)=argmax
z
i (t):z
i (t)<Æ
2
: (61) If we assign different goalsÆias well as different values
V
ito each target, then it is easy to see that the corresponding index functions aremi
(z
i
)given by
m
i (z
i )=
8
>
<
>
: V
i
jk j+1
i
1
1 jk j+1
i
ifÆ 2
i
+(k 1) 2
z
i
<Æ 2
i +k
2
;k0
0ifzi
Æ 2
i
:
(62) where
i
=
i
1 (1
i )
<1: (63)
4.4 Extension to Higher Dimensions
We can extend the previous scalar results to more gen- eral situations where the information state is a covariance matrix. Thus, letxibe the target track error covariance for targeti. Assume thatxievolves according to
f(x
i
;w
i
;u)=
f(x
i
) if=iandwi
=1
x
iotherwise (64)
wherewi
(t)is a0 1random variable and
Prfw
i
(t)=1g=
i
: (65)
Assume thatf( x)satisfies
f( x)x (66)
in the sense of positive definite matricesxandf(x). Define the reward functionRi
(x
i )by
R
i (x
i )=
V
iifi (x
i )Æ
i
0otherwise : (67)
Assume thatiis also monotonic in the sense that
i (x
i )
i ( x
0
) (68)
ifxi
x 0
i
in the sense of positive definite matrices. For example,i
( x
i
)is monotonic in this sense if it is the trace or maximum eigenvalue of the matrixxi. Finally, define the integer functioni
( x
i
)to be the minimumnsuch that
i ( f
n
( x
i ))Æ
i
: (69)
The index rule for the scheduling problem is given by
m
i (x
i )=
8
<
: V
i
i (x
i )
i
1
1
i (x
i )
i
ifi (x
i )>0
0otherwise
(70)
whereiis defined as above.
Proof. One can prove this extension by using the same ap- proach as for the simple scalar case–e.g., by solving the dy- namic programming equation by successive approximation and computing the index rule. A more elegant approach can be obtained by using the insightful interpretation of the in- dex functionmi
(x
i
)given by [7] as the maximization over stopping times:That is,
m
i (x
i
)=max
>0 E
n
P
1
t=0
t
~
R
i (x
i (t))jx
i
(0)=x
i o
E n
P
1
t=0
t
jx
i
( 0)=x
i o
;
(71) where for the SRM problemR~iis the transformed reward function.
5 Approximations Based on Index Rule and Examples
5.1 Lookahead Approximation
The index rule is optimal only if the state of unsensed targets remains unchanged, which is not a realistic assump- tion for tracking problems. However, as noted in [4], it is a reasonable approximation to make when the state changes slowly. In our problem formulation, we require the track error covariance to grow slowly over the time scale 1 (the discount rate defines a “soft” time horizon for the prob- lem). If the error covariance grows more rapidly over this time horizon, we can use the index rule as a basis for a bet- ter approximation of the stochastic dynamic programming problem. For example, we have used the index rule’s value function to compute n-step lookahead approximations to the dynamic programming equations [1].
The state of the dynamic program is the vectorx( t)of target tracking error covariances (xi
( t)is the covariance for targeti). The dynamic programming equation has the form
J(x)=max
u
R( x)+ Z
J(F(x;u;w) )p(w)dw
(72)
whereR( x)is the immediate reward
R(x)= n
X
i=1 R
i (x
i
); (73)
andF(x;u;w)is the dynamic equation
x(t+1)=F(x( t);u(t);w(t)) (74) with noisew( t). IfJ0
( x)denotes an initial approximation for the value function, then
J
n
(x)=max
u
R(x)+ Z
J
n 1
( F( x;u;w))p(w)dw
(75) defines then-step lookahead approximation forn1.
We will use the index rule to defineJ0and compute the one- and two-step lookahead approximations numerically from it. J0 is computed by assuming that the undetected target covariancexi don’t change. Letmi
( x
i
)denote the index function of targeti, and definei andi
(x
i )as in the previous section. Moreover, let i
(x
i ) = m
i (x
i ) if
i (x
i
) >0andi (x
i
) =1ifi (x
i
)=0. If we re-order the target indicesi so that i > i0 implies thati
( x
i )
i 0
( x
i 0
), we can show that
J
0
( x)=(1 ) 1
n
X
i=1 V
i i
Y
j=1
i(xi)
i
: (76) Using this expression forJ0 we can computeJn numer- ically from (75) forn = 1;2. The following examples il- lustrates the performance for the lookahead approximations and compares them to the performance of the pure index rule (n=0) approximation.
5.2 Example
The one- and two-step lookahead and index rule controls are tested on a simple scenario with10targets, one of which has a high value and others which have low values. Each target moves according to a simple one-dimensional Brow- nian motion model. The simulation parameters are given in Table 1. Performance is measured for each target in two ways. The first is the time-averaged relative mean square error
1
(T T
s +1)Æ
T
X
t=Ts ( x
i (t) x^
i (t))
2 (77)
whereT is the length of the run,Tsis the time from which the system is assumed to be in steady-state,Æ is the error goal,xiis the true target position, andx^iis the tracker po- sition estimate for a simple Kalman filter. For low-value targets, the MSE is averaged over the targets. The second performance measure is the time averaged frequency that the error goal is attained,
1
(T T
s +1)Æ
T
X
t=T 1(jx
i ( t) x^
i
( t)j<Æ) (78)
For low-value targets, the frequency of goal attainment is averaged over the targets. These performance measures are computed for different values of the moving process noise.
The results are plotted in Figures 1-4. Note that the one and two-step lookahead controls keep the high-value target errors down better than the index rule for moderate to low values of the process noise. At high values, the two-step lookahead does slightly worse. This is because the two- step lookahead control is devoting some time to low value targets, dramatically decreasing their errors.
Parameter Value
Number of Monte Carlo Runs 100
Time of Run (T) 200
Steady-state Start Time (Ts) 100 Discount Factor () 0:9 High Target Value (V1) 100 Low Target Value (Vi
;i=2::10) 1 High-Value Target Error Variance Goal (Æ1) 0:25
Low -Value Target Error Variance Goal (Æi
;i=2::10) 4
Measurement Noise Variance (q) 1 Probability of Detection () 0:8 Initial Position Variance (xi
( 0)) 20
Table 1: Simulation Parameters.
0 0.05 0.1
0 1 2 3 4 5
Moving Process Noise
Relative MSE
High−value Target Index Rule
Lookahead 1 Step Lookahead 2 Steps
Figure 1: High-Value Target MSE.
References
[1] D.P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, Belmont, Massachusetts, 2001.
0 0.05 0.1 0
1 2 3 4 5
Moving Process Noise
Relative MSE
Low−value Targets Index Rule
Lookahead 1 Step Lookahead 2 Steps
Figure 2: Low -Value Target MSE.
0 0.05 0.1
0 0.2 0.4 0.6 0.8 1
Moving Process Noise
Goal Attainment Frequency
High−value Target
Index Rule
Lookahead 1 Step Lookahead 2 Steps
Figure 3: High-Value Target Goal Attainment Frequency.
[2] J.C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society, B 41, 148-164, 1979.
[3] M.N. Katehakis and A.F. Veinott, Jr.,“The multiarmed bandit problem: decomposition and computation,”
Mathematics of Operations Research, Vol. 12, No. 2, pp. 262-268, 1987.
[4] Vikram Krishnamurthy and Robin J. Evans, “Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking,” IEEE
0 0.05 0.1
0 0.2 0.4 0.6 0.8 1
Moving Process Noise
Goal Attainment Frequency
Low−value Targets
Index Rule
Lookahead 1 Step Lookahead 2 Steps
Figure 4: Low -Value Target Goal Attainment Frequency.
Trans. on Signal Processing, Vol. 49, 2893-2907, De- cember, 2001.
[5] J. Ni˜no-Mora, “Restless bandits, partial conservation laws and indexability,” Dept. of Economics and Busi- ness, Universitat Pompeu Fabra, Barcelona, Spain, De- cember 13, 1999.
[6] J.N. Tsitsiklis, “A Lemma on the Multiarmed Bandit Problem,” IEEE Transactions on Automatic Control, AC-31, 576-577, 1986.
[7] P.P. Varaiya, J.C. Walrand, and C. Buyukkov, “Exten- sions of the Multiarmed Bandit Problem: The Dis- counted Case,” IEEE Transactions on Automatic Con- trol, AC-30, 426-439, 1985.
[8] R. Washburn, A. Chao, D. Castanon, and R. Malho- tra, “Stochastic Dynamic Programming for Far-Sighted Sensor Management,” Proc. of the IRIS National Sym- posium on Sensor and Data Fusion, Vol. 2, 277-295, 1997.
[9] P. Whittle, “Restless Bandits: Activity Allocation in a Changing World,” in J. Gani (editor), A Celebration of Applied Probability, Journal of Applied Probability, 25A, 287-298, 1988.