Results in Sections 4.1 and 4.2 are proved for the case, where the two-sided randomized differences are

From (4.1.28) and (4.1.31) by the local Lipschitz continuity of it follows that

Remark 4.2.2 Results in Sections 4.1 and 4.2 are proved for the case, where the two-sided randomized differences are

used where and are given by (4.1.3) and (4.1.4), respectively. But, all results presented in Sections 4.1 and 4.2 are also valid for the case where the one-sided randomized differences

are used, where and are given by (4.1.3) and (4.1.6), respectively.

In this case, in (4.1.27), (4.1.28) and in the expression of should be replaced by 1, and (4.1.29)–(4.1.32) disappear. Accordingly, (4.1.36) changes to

Theorems 4.1.1-4.1.4 and 4.2.1 remain unchanged. The conclusion of Theorem 4.2.2 remains valid too, if in Condition iv)

changes to

4.3. Global Optimization

As pointed out at the beginning of the chapter, the KW algorithm may lead to a local minimizer of Before the 1980s, the random search or its combination with a local search method was the main stochastic approach to achieve the global minimum when the values of L can exactly be observed without noise. When the structural property of L is used for local search, a rather rapid convergence rate can be derived, but it is hard to escape a local attraction domain. The random search has a chance to fall into any attraction domain, but its convergence rate decreases exponentially as the dimension of the problem increases.

Simulating annealing is an attractive method for global optimization, but it provides only convergence in probability rather than path-wise convergence. Moreover, simulation shows that for functions with a few local minima, simulated annealing is not efficient. This motivates one to combine KW-type method with random search. However, a simple combination of SA and random search does not work: in order to reach the global minimum one has to reduce the noise effect as time goes on.

A hybrid algorithm composed of a search method and the KW algorithm is presented in the sequel with main effort devoted to design eas-

ily realizable switching rules and to provide an effective noise-reducing method.

We define a global optimization algorithm, which consists of three parts: search, selection, and optimization. To be fixed, let us discuss the global minimization problem. In the search part, we choose an initial value and make the local search by use of the KW algorithm with randomized differences and expanding truncations described in Section 4.1 to approach the bottom of the local attraction domain. At the same time, the average of the observations for L is used to serve as an estimate of the local minimum of L in this attraction domain. In the selection part, the estimates obtained for the local minima of L are compared with each other, and the smallest one among them together with the corresponding minimizer given by the KW algorithm are selected. Then, the optimization part takes place, where again the local search is carried out, i.e., the KW algorithm without any truncations is applied to improve the estimate for the minimizer. At the same time, the corresponding minimum of L is reestimated by averaging the noisy observations. After this, the algorithm goes back to the search part again.

For the local search, we use observations (4.1.3) and (4.1.4), or (4.1.5) and (4.1.6). To be fixed, let us use (4.1.5) and (4.1.6).

In the sequel, by KW algorithm with expanding truncations we mean the algorithm defined by (4.1.11) and (4.1.12) with

where and are given by (4.1.5) and (4.1.6), respectively. Sim- ilar to (4.1.9) and (4.1.10) we have

where

By KW algorithm we mean

with defined by (4.3.2).

It is worth noting that unlike (4.1.8), is used in (4.3.1). Roughly speaking, this is because in the neighborhood of a miminizer of is increasing, and in (4.1.11) should be an observation on

In order to define switching rules, we have to introduce integer-valued and increasing functions and such that

and

Define

In the sequel, by the search period we mean the part of algorithm starting from the test of selecting the initial value up to the next selection of initial value. At the end of the search period, we are given and being the estimates for the global minimizer and the minimum of L, respectively. Variables such as

and etc. in the search period are equipped by superscript etc.

The global optimization algorithm is defined by the following five steps.

(GO1) Starting from at the search period, the initial value

is chosen according to a given rule (deterministic or random), and then is calculated by the KW algorithm with expanding truncations (4.1.11) and (4.1.12) with defined by (4.3.1), for which , step sizes and and used for truncation are defined as follows:

where c > 0 and are fixed constants, and are two sequences of positive real numbers increasingly diverging to infinity.

(GO2) Set the initial estimate

for

and update the

estimate for by

where is the noise when observing After steps, is obtained.

(GO3) Let be a given sequence of real numbers such that

and as Set For if

then set Otherwise, keep unchanged.

(GO4) Improve to by the KW algorithm with expanding

truncations (4.1.11) and (4.1.12) with defined by (4.3.1), for which

where in (4.1.11) and (4.1.12) may be an arbitrary sequence of numbers increasingly diverging to infinity, and

At the same time, update the estimate for by

where is the noise when observing At the end of this step, and are derived.

(GO5) Go back to (GO1) for the search period.

We note that for the search period is added to and (see (4.3.7) and (4.3.8)). The purpose of this is to diminish the effect of the observation noise as increases. Therefore, and both tend to zero, not only as but also as The following example shows that adding an increasing to the denominators of

and is necessary.

Example 4.3.1 Let

It is clear that the global minimizer is and are two local minima. Furthermore, and are attraction domains for –1 and +1, respectively.

Since is linear, for local search we apply the ordinary KW algorithm without truncation

Here, no randomized differences are introduced, because this is a one- dimentional problem.

Assume

where

and and are mutually independent and both are sequences of iid random variables with

Let us start from (GO1) and take

(not tending to infinity),

If then, by noticing one of and

must belong to Elementary calculation shows that

Paying attention to (4.3.13), we see

and

i.e.,

This means that is located in one of the attraction domains and Furthermore, by (4.3.12) and (4.3.13), the observations carried out at these domains are free of noise. Let us consider the further development of the algorithm, once has fallen into the in- terval or To be fixed, let us assume

For we have

or which implies

If say, then since

It suffices to consider the case where i.e., because for the case we again have (4.3.14) and

Simple computation shows that starting from the observations are free of noise, and the algorithm becomes

As a result of computation, we have

Then, starting from the algorithm will be iterated according to (4.3.14), and hence

For the case it can similarly be shown that

Therefore, whatever the initial value is chosen, will never converge to the global minimizer if in (GO1) does not diverge to infinity.

Let us introduce conditions to be used.

Since we are seeking for global minima of Condition A4.1.2’ should be modified.

A4.3.1 is locally Lipschitz continuous,

and L(J) is nowhere dense, where the set of

extremes of L.

Note that for seeking minima of the corresponding part in A4.1.2’, should be modified as follows: used in (4.1.11) is such that

A4.3.2

A4.3.3 For any convergent subsequence of

where denotes given by (4.3.3) with replaced by denotes used for the ¢ search period, and

A4.3.4 For any convergent subsequence

where is given by (1.3.2).

It is worth emphasizing that each in the sequence is used only once when we form and

We now give sufficient conditions for A4.1.2, A4.3.3, and A4.4.4. For this, we first need to define generated by estimates and derived up-to current time. Precisely, for running in the search period of Step (GO1) define

In document Stochastic Approximation Applications (Page 184-191)