• No results found

Chapter 2_handout.pdf

N/A
N/A
Protected

Academic year: 2020

Share "Chapter 2_handout.pdf"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

CHAPTER 2

D

IRECT

M

ETHODS FOR

S

TOCHASTIC

S

EARCH

•Organization of chapter in ISSO

–Introductory material –Random search methods

•Attributes of random search •Blind random search (algorithm A)

•Two localized random search methods (algorithms B and C)

–Random search with noisy measurements

–Nonlinear simplex (Nelder-Mead) algorithm

•Noise-free and noisy measurements

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Some Attributes of Direct Random Search

with Noise-Free Loss Measurements

• Three random search algorithms discussed in ISSO— algorithms A (blind random search), B, and C—share desirable attributes:

• Ease of programming

• Use of onlyLvalues (vs. gradient values)

– Avoid “artful contrivance” of more complex methods

• Reasonable computational efficiency • Generality

– Algorithms apply to virtually any function

• Theoretical foundation

(2)

2

2-3

Formal Convergence of Random

Search Algorithms

• Well-known results on convergence of random search

– Applies to convergence of and/or L

– Applies when noise-freeLmeasurements used in algorithms

• Algorithm A (blind random search) converges under very general conditions

– Applies to continuous or discrete functions

• Conditions for convergence of algorithms B and C somewhat more restrictive, but still quite general

ISSOpresents theorem for continuous functions – Other convergence results exist

• Convergence ratetheory also exists: how fast to converge?

– Algorithm A generally slow in high-dimensional problems

2-4

Algorithm A:

Simple (“Blind”) Random Search

Step 0 (initialization) Choose an initial value of inside of . Set k= 0.

Step 1 (candidate value)Generate a new

independent value new(k+1)  , according to the chosen probability distribution. If L(new(k+1)) < set = new(k+1). Else take

Step 2 (return or stop) Stop if maximum number ofL

evaluations has been reached or user is otherwise satisfied with the current estimate for ; else, return to step 1 with the new k set to the former k+1.

0

ˆ

  

ˆ

(

k

),

L

ˆ

1

(3)

2-5

First Several Iterations of Algorithm A on

Problem with Constraints and Quadratic

Loss Function (Example 2.1 in ISSO)

Iteration k new(k)

T

L(new(k)) ˆTk L(ˆk)

0   [2.00, 2.00] 8.00

1 [2.25, 1.62] 7.69 [2.25, 1.62] 7.69

2 [2.81, 2.58] 14.55 [2.25, 1.62] 7.69

3 [1.93, 1.19] 5.14 [1.93, 1.19] 5.14

4 [2.60, 1.92] 10.45 [1.93, 1.19] 5.14

5 [2.23, 2.58] 11.63 [1.93, 1.19] 5.14

6 [1.34, 1.76] 4.89 [1.34, 1.76] 4.89

• Simple quadratic loss function L() = Ton domain = [1,3][1,3]

– Unique value = [1,1]Twith L() = 2.0

Global Convergence of Algorithm A

• Theorem 2.1 (Sect. 2.2 of ISSO) shows almost sure (a.s.) convergence of algorithm A to  under three key conditions

– Theorem uses concept of infimum(inf) of a function: greatest lower boundon specified domain

(4)

4

2-7

(a)Continuous L(); probability density for

newis > 0 on = [0, )

(b)Discrete L(); discrete sampling for newwith P(new= i) > 0 fori = 0, 1, 2,...

(c)Noncontinuous L(); probability density for new is > 0 on  = [0, )



Functions for Convergence and

Nonconvergence of Algorithm A

(Blind Random Search)

• Functions that do ((a) and (b) below) or do not ((c) below) satisfy condition (2.2) of Theorem 2.1:

2-8

Algorithm B:

Localized Random Search

Step 0 (initialization) Choose an initial value of inside of . Set k= 0.

Step 1 (candidate value)Generate a random dk. Check if

. If not, generate new dkor move to nearest valid point. Let new(k+1)  be or the modified point.

Step 2 (check for improvement) If L(new(k+1)) < set = new(k+1). Else take = .

Step 3 (return or stop) Stop if maximum number ofL

evaluations has been reached or if user satisfied with current estimate; else, return to step 1 with new kset to former k+1.

0 ˆ

  

ˆ

(

k

),

L

ˆ 1

k

ˆ

k

d

k

ˆ

k

d

k

ˆ

k

d

k

ˆ

1

(5)

2-9

Comments on Algorithm B

• Algorithm B useful in many practical problems: easy to apply with reasonable efficiency when p> 1 (even p >> 1)

• Relative to algorithm A, search in algorithm B more localized in neighborhood of current estimate

– Better exploitation of information acquired about shape of L()

– “Localized” terminology not to be confused with global vs. local algorithms discussed in Chapter 1

• Algorithm B finds global optimum (“in probability”) per Theorem 2.2 in ISSO

• User free to set distribution of deviation vector dk although

N(0,2I

p) is most common in continuous problems

– Distribution should have mean zero and each component should have variation (e.g., standard deviation) consistent with magnitudes of corresponding components in 

• Often better if variability of dk reduced as k increases

Algorithm C:

Enhanced Localized Random Search

• Similar to algorithm B

• Exploits knowledge of good/bad directions

• If move in one direction produces decreasein loss, add bias to next iteration to continuealgorithm moving in “good” direction

• If move in one direction produces increasein loss, add bias to next iteration to move algorithm in oppositeway

(6)

6

2-11

Examples 2.3 and 2.4 in ISSO:

Comparison of Algorithms A, B, and C

• Relatively simple p= 2 problem used elsewhere (Styblinski and Tang, 1990) to test simulated annealing algorithms

– Quartic loss function (plot on next slide)

• One global solution; several local minima/maxima

• Started all algorithms at common initial condition and compared based on common number of loss evaluations

– Algorithm A needed no tuning

– Algorithms B and C required “trial runs” to tune algorithm coefficients

(7)

2-13

Examples 2.3 and 2.4 in ISSO (cont’d):

Sample Means of Terminal Values

– L

(

)

in Multimodal Loss Function

(with Approximate 95% Confidence Intervals)

ˆ

(

k

)

L

Notes:

Each sample mean is from 40 independent runs of relevant algorithm

Confidence intervals for algorithms B and C overlap slightly since 0.51 < 0.67

Examples 2.3 and 2.4 in ISSO (cont’d):

Typical Adjusted Loss Values ( – L

(

)

)

and Estimates of

in Multimodal

Loss Function (One Typical Run)

ˆ

(

k

)

(8)

8

2-15

Random Search Algorithms with Noisy

Loss Function Measurements

• Basic implementation of random search above assumes perfect (noise-free) values of L

• Some applications require use of noisymeasurements:

y() = L() + noise

• Simplest modification is to form average of y values at each iteration as approximation to L

• Alternative modification is to set threshold > 0 for improvement before new value is accepted in algorithm • Thresholding in algorithm B with modified step 2:

Step 2 (modified) If y(new(k+1)) < set = new(k+1). Else take = .

• Very limited convergence theory with noisy measurements

– In fact, random search generally nonconvergentwith noisy loss measurements

 

ˆ

(

k

)

,

y

ˆ

1

k

ˆ

k1

ˆ

k

2-16

Nonlinear Simplex (Nelder-Mead) Algorithm

• Nonlinear simplex method is popular search method (e.g., fminsearch in MATLAB)

• Simplex is convex hullof p + 1 points in p

– Convex hull is smallest convex set enclosing the p + 1 points – Forp= 2 convex hull is triangle

– For p= 3 convex hull is pyramid

• Algorithm searches for  by moving convex hull within 

• If algorithm works properly, convex hull shrinks/collapses onto 

• No injected randomness (contrast with algorithms A, B, and C), but allowance for noisy loss measurements

(9)

2-17

Steps of Nonlinear Simplex Algorithm

Step 0 (Initialization) Generate initial set of p+ 1 extreme points in p, i(i= 1, 2, …,p + 1), vertices of initial simplex

Step 1 (Reflection) Identify where max, second highest, and min loss values occur; denote them by max, 2max, and min, respectively. Let cent = centroid (mean) of all i except for

max. Generate candidate vertex reflby reflecting max through

cent using refl= (1 + )cent   max (> 0).

Step 2a (Accept reflection) If L(min) L(refl) < L(2max), then

reflreplaces max; proceed to step 3; else go to step 2b.

Step 2b (Expansion) If L(refl) < L(min), then expand reflection using exp= refl+ (1  )cent, > 1; else go to step 2c. If

L(exp) < L(refl), then expreplaces max; otherwise reject expansion and replace max by refl. Go to step 3.

Steps of Nonlinear Simplex Algorithm (cont’d)

Step 2c (Contraction) If L(refl)  L(2max), then contract simplex: Either case (i) L(refl) < L(max), or case (ii) L(max) 

L(refl). Contraction point is cont= max/refl+ (1  )cent, 0  

 1, where max/refl= reflif case (i), otherwise max/refl= max. In case (i), accept contraction if L(cont) L(refl); in case (ii),

accept contraction if L(cont) < L(max). If accepted, replace

maxby contand go to step 3; otherwise go to step 2d.

Step 2d (Shrink) If L(cont) L(max), shrink entire simplex using a factor 0 <  < 1, retaining only min. Go to step 3.

Step 3 (Termination) Stop if convergence criterion or

(10)

10

2-19

Illustration of Steps of Nonlinear Simplex

Algorithm with p = 2

Reflection

exp Expansion when

L(refl) < L(min)

max min

cent

refl

con t

max min

refl

cont

cent

2max

Contraction when

L(refl) L(max) (“inside”)

Shrink after failed contraction when

L(refl) < L(max)

con t

max min

cent

ref l

Contraction when

L(refl) < L(max) (“outside”)

Figure

Illustration of Steps of Nonlinear Simplex  Algorithm with p = 2 Reflection  expExpansion when L( refl ) &lt; L( min )  max  min  cent  reflcont  max  min  reflcontcent  2max Contraction when L( refl )  L( max ) (“inside”)

References

Related documents

The Assessment of Student Learning task illustrates how you diagnose student learning needs through your analysis of student work samples. It provides evidence of your ability to

Your own costs will be different depending on the care you receive, the prices your providers charge, and the reimbursement your health plan allows. Can I use Coverage Examples

To stretch the right piri- formis, lean your upper body forward, tuck your left toes under, and slide or walk your left leg straight back, allow- ing your right thigh to rotate out

Collaborative development of a stochastic groundwater model of the Waimakariri – Christchurch Aquifer System Zeb Etheridge Environment Canterbury Modelling glacial

На наш погляд, це вельми привабливі, але нереалістичні побажання, адже у розколотому на безліч частин світі з десятків держав і сотень народів за відсутності

distributing corporation and the controlled corporation must be engaged immediately after the distribution in the active conduct of a trade or business or, if

A point worth mentioning is the positive trend of software products branded 24 ORE, whose revenues increased by 3.7% versus 1H09, a result achieved thanks to

Assessment of sexual risk behaviours of in-school youth: Effect of living arrangement of students; West Gojam zone, Amhara regional state, Ethiopia..