3.2 Model-free Source Seeking
3.2.2 Single-robot Source Seeking
Instead of a sensor network, asinglerobot may travel to several sensing locations in order to collect the measurements needed for a finite-difference gradient approximation. Thm. 3.5
requires visitingdx+ 1 locations with a single robot which may result in costly maneuvers.
In this section, we emphasize that the robot can use a random direction stochastic approx- imation (RDSA, Kushner and Yin 2003) to the gradient instead. The idea of RDSA is to estimate the gradient along a single direction at timet instead of in alldx dimensions. We
show that if the directions are chosen consistently over time, this random direction gradient ascent will still converge. In detail, we use the following gradient estimate:
ˆ gt(xt) := p dx z t+ −zt− kxt+−xt−k2 Ttat, (3.28)
whereat∈Rdx is a random direction vector with probability distributionpt:Rdx →[0,1],
and Tt is a rotation matrix, whose role is explained below. Also, xt+ := xt +ct+Ttat
and xt− := xt −ct−Ttat are two measurement locations determined by gain coefficients ct−, ct+ ∈ R≥0. At iteration t of the algorithm, the signal field is sampled at points xt−
direction of the gradient estimate to generate the next estimate of the source locationxt+1.
Repeating this procedure generates a sequence of states {(xt, xt−, xt+)}, which represents
a trajectory in the world coordinate frame. The role of the matrices Tt is to transform
this trajectory to the body coordinate frame of the robot. The sampling points at time t
represented in the robot body frame are:
xbt− =−ct−R(−θt)Ttat xbt+ =ct+R(−θt)Ttat xbt+1 =−γtR(−θt)ˆgt(0),
where θt is the robot orientation and R(−θt) is a rotation matrix. Choosing Tt = R(θt),
allows the robot to run the algorithm without the need for localization:
xbt+1=−γt p dx " zt+ −zt− kxbt+ −xbt−k # at (3.29)
The design parameters of the algorithm are γt, ct−, ct+, and the pdf pt of the random
directions at. Successful application of this algorithm in a real environment requires a
careful choice of the design parameters that takes the robot capabilities and the geometric constraints of the environment into account in order to ensure that the resulting sequence
{(xt, xt−, xt+)} can be followed by the robot.
Proposition 3.6. If the sensor state spaceX is bounded and convex, the random direction stochastic gradient ascent in (3.29) converges a.s. to a local maximum of the signal field
h(·, y) under the following assumptions:
(Signal field) h(x, y) is bounded and three times continuously differentiable in x
(Sample point gains) ct:= max{ct−, ct+}>0, ct→0
(Gradient gains) γt>0, γt→0, P∞t=0γt=∞, andP∞t=0γt2/c2t <∞
(Direction vectors) at are i.i.d. with distribution pt, which is symmetric with respect to reflections about the coordinate axes, and satisfy E[katk2] =
√
dx and E[ataTt] =I.
Proof. The result follows from a slight modification of the proof of the RDSA algorithm (Kushner and Yin 2003, Chapter 5.6 and 10.7) which takes into account that ct−Ttat 6= ct+Ttat. Note that: E k xt+ −xt−k2 √ dx = (ct++ct−)√kTtk2E[katk2] dx = (ct+ +ct−).
The rest of the assumptions are satisfied because the measurement noise vt is zero mean
and square integrable and the boundedness ofX guarantees that suptkxtk<∞.
Remark. The smoothness condition on h guarantees that ˆgt is almost surely an unbiased
estimate ofgtto within anO(c2t) error, which is guaranteed to be small by the assumption
Since our goal is to apply the algorithm in an environment with obstacles, we disregard the assumption that the state space X is convex. Of course, the theoretical almost sure convergence guarantee in Prop. 3.6 is lost but we would like to show that the algorithm still works in practice with the appropriate choice of parameters. The rest of this section concentrates on the choice of γt, pt, ct−, and ct+ with two goals in mind: first, to take the
geometric constraints of the environment and the robot characteristics into account so that the generated sampling points {(xt, xt−, xt+)} are easy to follow and, second, to provide
only a few high-level and intuitive parameters to the user of the algorithm. The parameter choice is simplified to the following two constants:
• Aggressiveness factor r > 0: a constant determining the size of the steps of the algorithm. Intuitively, r is the amount of meters, by which the position of the robot is expected to change in the early iterations. The further away the source is expected to be, the larger the value ofr should be.
• Stability factor s ≥0: a constant which allows for large steps in the early iterations of the algorithm without causing instabilities. It should be set to 5−10 percent of the expected number of iterations of the algorithm.
Choosing the direction vectors at
Several choices for the pdf of at have been considered in literature (Spall 2003, Kushner
and Yin 2003, Le Ny and Pappas 2010) with a Bernoulli distribution in each coordinate being preferred in applications. We note that the Bernoulli distribution is an optimal choice only for signal fields, which are aligned with the coordinate axes in a way that their third cross-derivatives ∂3h/∂xi∂xj∂xk are all zero (Theiler and Alper 2006). Since the signal
field will not be axis aligned in a non-convex X it is beneficial to choose atuniformly from
all possible directions. In particular, we let pt be a shell distribution, which is defined as
follows: chooseai∼ N(0,1) fori= 1, . . . , dx and then rescale the vector to guarantee that
its magnitude is√dx as required in Prop. 3.6.
Choosing the gradient gains γt
The usual form used for the gradient gain coefficients in literature (Spall 2003,Kushner and Yin 2003) is:
γt=
γ
(t+ 1 +s)α, t= 0,1, . . . , (3.30)
whereγ >0 is a constant,sis the stability factor mentioned earlier, andα >0 governs the decay rate for the gains and can be set to α= 0.602 as suggested in (Spall 2003, Ch.6).
A modification to this choice is required for our application. If the numerator γ is con- stant, the gain coefficients γk are monotonically decreasing, which is not desirable because
the robot will be taking decreasing steps along the gradient and it might get trapped in a location, where the magnitude of the gradient estimate is small. We replace γ with a time-varying numerator γt0 > 0, which is inversely proportional to the magnitude of the gradient estimate. This is beneficial because when the magnitude of the gradient estimate is large, the robot takes small steps in a controlled manner towards the source but if the magnitude of the gradient estimate decreases, the gain coefficients increase allowing the
robot to follow the gradient even if the signal field is very flat. Based on these observations we propose: γt0 = r(1 +s) α 1 w t X j=t+1−w 1 dx kgˆj(xj)k1 , t= 0,1, . . . , (3.31)
whereris the aggressiveness factor andw∈Nis a window over which the mean magnitude of the elements of ˆgt is averaged. The size of wdetermines the speed at which γt reacts to
changes in the magnitude of the gradient estimate. We used w= 10 in the experiments in Sec. 3.2.4.
Choosing the sample point gains ct− and ct+
A typical schedule used in the stochastic approximation literature (Spall 2003,Kushner and Yin 2003) for the sample point gain coefficients is:
ct−= c0−
(t+ 1)β, ct+ = c0+
(t+ 1)β, t= 0,1, . . . , (3.32)
wherec0−, c0+∈R≥0 are constants andβ ∈R>0 is the gain decay rate. In practical applica-
tions, a slow decay leads to better finite sample performance and a good choice isβ= 0.101 (seeSpall 2003). The constantsc0−and c0+are typically set to the standard deviation of the
measurement noise vt at the current position of the robot by measuring the signal several
times. The gains ct−, ct+, and the direction vector at affect the position of the sampling
points xt− and xt+ as specified in (3.28). While the choice in (3.32) is applicable to the
obstacle-free case, when dealing with a general environment it needs to be modified to ac- commodate for the constraints introduced by the obstacles. Let bt+ be the value for ct+
originally suggested in (3.32). Suppose that the robot is traveling from its previous estimate
xt−1 towards xt. As soon as xt is in the robot’s field of view F, we choose xt+ in F to
ensure that it is reachable. Alg. 8 with x ←xt and r ←bt+ shows how to sample at and
simultaneously choosect+, with a magnitude as close tobt+ as possible.
Algorithm 8Rejection Sampling ofat
1: Input: Positionx∈Rdx in the robot body frame, radiusr∈R≥0, and field of viewF ⊆X
2: Output: Step sizect+and direction vectorat
3: LetdSbe a small area element on the surface of the hypersphere of radius√dxcentered atx
4: count←1
5: repeat
6: Sampleat from the shell distribution
7: if (1−P({at∈dS}))count<0.05then
8: decreaser;count←1
9: else
10: count←count+ 1
11: untilr= 0orthere is a path betweenxand (x+rat) inF
12: returnat andct+←r
The shell distribution is sampled for a direction at, which selects a possible sample point q=xt+rat. Line 11 checks if the path fromxttoq is within the robot’s field of view and
if so the chosen values foratand ct+ ←r are returned. Otherwise, another sample foratis
chosen. Thus, the allowable values for xt+ =xt+ct+at lie on the intersection of the field of
view F and the ball of radius r√dx centered at xt. Due to the obstacles, there might not
be a feasible choice for at with the specified radius r. When the probability of selecting a
sampling point in any small regiondSon the surface of the sphere of radiusr is above 95% but a suitable direction has not been chosen yet, the radius is decreased (line 6). Once the directionat is known, the choice ofct− is the maximum distance that can be traveled along
−at starting from xt and up tobt+ or until an obstacle is reached.