Nicholas J Radcliffe
6. Vose M D and Liepins G E 1991 Punctuated equilibria in genetic search Complex Syst 5 31–44 7 Vose M D and Wright A H 1995 Simple genetic algorithms with linear fitness Evolut Comput 2 347–
8. Whitley L D 1993 An executable model of a simple genetic algorithm Foundations of Genetic Algorithms 2 ed
Theoretical Foundations and Properties of Evolutionary Computations
B2.3
Modes of stochastic convergence
G¨unter Rudolph
Abstract
The purpose of this section is to introduce the notion of stochastic convergence of sequences of random variables and to present some interrelationships between various modes of stochastic convergence. Building on this foundation a precise definition of global convergence of evolutionary algorithms is given.
The term convergence is used in classical analysis to describe the limit behavior of numerical deterministic sequences. It is natural to expect that a similar concept ought exist for random sequences. In fact, such a concept does exist—but there is a difference: since random sequences are defined on probability spaces the main difference between the convergence concept of classical analysis and stochastic convergence relies on the fact that the latter must take into account the existence of a probability measure. As a consequence, depending on the manner in which the probability measure enters the definition various modes of stochastic convergence must be distinguished.
Definition B2.3.1. Let X be a random variable and (Xt :t ≥0)a sequence of random variables defined
on a probability space (,A,P). Then (Xt)is said:
(i) to converge completely toX, denoted asXt
c →X, if for any >0 lim t→∞ t X i=0 P{|Xi−X|> }<∞ (B2.3.1)
(ii) to converge almost surely toX, denoted asXt
a.s. →X, if P n lim t→∞|Xt−X| =0 o =1
(iii) to converge in probability to X, denoted as Xt
P
→X, if for any >0 lim
t→∞P{|Xt−X|> } =0 (B2.3.2)
(iv) to converge in mean toX, denoted asXt
m
→X, if
lim
t→∞E[|Xt−X|]=0.
Theorem B2.3.1 (Lukacs 1975, pp 33–6, 51–2, Chow and Teicher 1978, pp 43–4). Let X be a random variable and (Xt : t ≥ 0) a sequence of random variables defined on a probability space (,A,P).
The following implications are valid:
Xt c →X⇒Xt a.s. →X⇒Xt P →X andXt m →X⇒Xt P →X.
The reverse implications are not true in general. But ifis countable then convergence in probability is
equivalent to almost sure convergence.
Evidently, if the probabilities in (B2.3.2) converge to zero sufficiently fast that the series in (B2.3.1) is finite, then convergence in probability implies complete convergence, but which additional conditions must be fulfilled such that some of the first three modes of convergence given in definition B2.3.1 imply convergence in mean? In other words, when may one interchange the order of taking a limit and expectation such that
lim
t→∞E[Xt]=E[ limt→∞Xt]?
To answer the question one has to introduce the notion of uniform integrability of random variables. Definition B2.3.2. A collection of random variables(Xt :t ≥0)is called uniformly integrable if
sup{E[|Xt|] :t≥0}<∞
and for every >0 there exists aδ >0 such thatP{At}< δ implies|E[Xt·1At]|< for everyt ≥0. Now the following result is provable.
Theorem B2.3.2 (Chow and Teicher 1978, p 100). A sequence of random variables converges in mean if and only if the sequence is uniformly integrable and converges in probability. Since the defining condition of uniform integrability is rather unwieldly, sufficient but simpler conditions are often useful.
Theorem B2.3.3 (Williams 1991, pp 127–8). LetY be a nonnegative random variable and(Xt :t ≥0)be
a collection of random variables on a joint probability space. If |Xt|< Y for alln ≥0 andE[Y]<∞
then the random variables(Xt:t≥0)are uniformly integrable.
Evidently, the above result remains valid if the random variableY is replaced by some nonnegative finite constant. Another useful convergence condition is given below.
Theorem B2.3.4 (Chow and Teicher 1978, pp 98–9). If (Xt :t ≥ 0)are random variables with E[|Xt|]<
∞and
lim
t→∞sups>t E[|Xs−Xt|]=0
there exists a random variableX withE[|X|]<∞such thatXt
m
→X and conversely.
The last mode of stochastic convergence considered here is related to convergence of distribution functions.
Definition B2.3.3. Let{FX(x), FXt(x):t≥0}be a collection of distribution functions of random variables
X and(Xt:t≥0)on a probability space(,A,P). If
lim
t→∞FXt(x)=FX(x)
for every continuity point x of FX(·), then the sequenceFXt is said to converge weakly to FX, denoted as FXt
w
→ FX. In such an event, the sequence of random variables (Xt : t ≥ 0) is said to converge in
distribution toX, denoted asXt
d
→X.
This concept has a simple relationship to convergence in probability.
Theorem B2.3.5 (Lukacs 1975, p 33, 38). LetXand(Xt :t≥0)be random variables on a joint probability
space. Then Xt P →X ⇒Xt d →X. Conversely, ifXt d
→X andFX is degenerated (i.e.X is a constant)
thenXt
P
Modes of stochastic convergence
After these preparatory statements one is in the position to establish the connection between stochastic convergence of random variables and the term global convergence of evolutionary algorithms. For this purpose letAx be the object variable space of the optimization problem B2.1
min{f (x):x∈Ax} resp. max{f (x):x∈Ax}
where f : Ax → R is the objective function. An individual is an element of the spaceI = Ax ×As
where As is the (possibly empty) space of strategy parameters. Thus, the population Pt of individuals
at generation t ≥ 0 of some evolutionary algorithm is an element of the product space Iµ where µ is
the size of the parent population. Since the genetic operators are stochastic the sequence (Pt : t ≥ 0)
generated by some evolutionary algorithm (EA) is a stochastic trajectory through the space Iµ. The behavior of this trajectory, even in the limitt→ ∞, may be very complicated in general, but in the sense of optimization one is less interested in the behavior of this trajectory—rather, one would like to know whether or not the sequence of populations contains admissible solutions of the optimization problem that become successively better and are globally optimal in the end ideally. Therefore it suffices to observe the behavior of the trajectory of the best solution contained in populations(Pt :t ≥0). For this purpose let
b:Iµ →A
x be a map that extracts the best solution represented by some individual of a population. Thus,
the stochastic sequence (Bt :t ≥0)withBt =b(Pt)is a trajectory through the spaceAx. But even this
stochastic sequence generally exhibits too complex a behavior to formulate a simple definition of global convergence. For example, it may oscillate between globally optimal solutions and much more complex dynamics are imaginable. To avoid these difficulties one could restrict the observations to the behavior of the sequence (f (Bt) : t ≥ 0) of the best objective function values contained in a population. For
this purpose set Xt = |f (b(Pt))−f∗|wheref∗ is the global minimum or maximum of the optimization
problems above. Provided that the sequence of random variables (Xt :t ≥ 0)converges in some mode
to zero, one can be sure that the populationPt will contain better and better solutions of the optimization
problem for increasingt. Therefore it appears reasonable to agree upon the following convention. Definition B2.3.4. Let (Pt : t ≥ 0) be the stochastic sequence of populations generated by some
evolutionary algorithm. The EA is said to converge completely (almost surely, in probability, in mean, in distribution) to the global optimum if the sequence (Xt : t ≥ 0)with Xt = |f (b(Pt))−f∗| converges
completely (almost surely, in probability, in mean, in distribution) to zero. There are some immediate conclusions. For example, if one can show that some EA converges in distribution to the global optimum, theorem B2.3.5 ensures that the EA is globally convergent in probability. Moreover, if it is known that |f (x)| <∞ for allx ∈ Ax one may conclude, owing to theorem B2.3.3,
that the EA converges in mean to the global optimum as well.
Finally, it should be remarked that the probabilistic behavior of the sequence of populations can be
modeled as a stochastic process—in fact, in most cases these stochastic processes are Markov chains. B2.2,B2.2.2 Then the state space of the processes is not necessarily the product space Iµ, because the order of the
individuals within a population is of no importance. However this does not affect the general concept given above—only the actual implementation of the map b(·)has to adjusted before convergence properties of B2.2.5 evolutionary algorithms can be derived.
References
Chow Y S and Teicher H 1978 Probability Theory (New York: Springer) Lukacs E 1975 Stochastic Convergence 2nd edn (New York: Academic)