• No results found

Chapter 6_handout.pdf

N/A
N/A
Protected

Academic year: 2020

Share "Chapter 6_handout.pdf"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

CHAPTER 6

S

TOCHASTIC

A

PPROXIMATION AND

THE

F

INITE-

D

IFFERENCE

M

ETHOD

•Organization of chapter in ISSO

–Contrast of gradient-based and gradient-free algorithms –Motivating examples

–Finite-difference algorithm –Convergence theory –Asymptotic normality

–Selection of gain sequences –Numerical examples

–Extensions and segue to SPSA in Chapter 7

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Motivation for Algorithms

Not Requiring Gradient of Loss Function

• Primary interest here is in optimization problems for which we

cannotobtain direct measurements of L

/



cannotuse techniques such as Robbins-Monro SA, steepest descent, etc.

can (in principle) use techniques such as Kiefer and

Wolfowitz SA (Chapter 6), genetic algorithms (Chapters 9–10),…

• Many such “gradient-free” problems arise in practice –Generic difficult parameter estimation

(2)

6-3

Model-Free Control Setup

(Example 6.2 in

ISSO

)

• As usual, want to minimize L() in presence of noisy

measurements of L(): y() = L() + noise

• Here, noisy measurements y() = Q(,V) represent simulation

output

• Want to optimize parameters in simulation, where also has direct physical meaning in real system

– Run simulation to determine best for use in real system

Cannot easily use stochastic gradient methods due to inability to calculate Q/ need gradient-free method

Simulation-Based Optimization

(Example 6.3 in

ISSO

)

SA method

y() Monte Carlo

Simulation inputs

(3)

6-5

Finite Difference SA (FDSA) Method

• FDSA has standard “first-order” form of root-finding (Robbins-Monro) SA

– Finite difference approximation replaces direct gradient measurement (Chap. 5)

– Resulting algorithm sometimes called Kiefer-Wolfowitz SA

• Let denote FD estimate of g() at kth iteration (next

slide)

• Let denote estimate for  at kth iteration

• FDSA algorithm has form

where ak is nonnegative gain value

• Under conditions,   in stochastic sense (a.s.)

ˆ ( )k

g

k

ˆ

1

ˆ ˆ ˆ ( )ˆ

k  kakgk k

  

k

ˆ

Finite Difference Gradient Approximation

• Classical method for approximating gradients in Kiefer-Wolfowitz SA is by finite differences

• FD gradient approximation used in SA recursion as gradient measurement (previous slide)

• Standard two-sided gradient approximation at iteration kis

where j is p-dimensional with 1 in jth entry, 0 elsewhere

• Each computation of FD approximation takes 2p

                                

k k k k

k

k k

k k p k k p

k

y c y c

c

y c y c

c 1 1 ˆ ˆ ( ) ( ) 2 ˆ

ˆ ( )

ˆ ˆ

( ) ( )

2

(4)

6-7

Selection of Gain Sequences

a

k

and

c

k

• Effective practical implementation requires “intelligent” selection of coefficients in gain sequences in SA algorithm and FD gradient estimate:

where coefficientsa, c, , and are strictly positive and

stability constant A 0 is same as in Sect. 4.4

• Asymptotically optimal = 1, = 1/6 not always best • “Trial and error” sometime used for gain selection

Semi-automatic method (Sect. 6.6):= 0.602,  = 0.101,

c standard deviation of noise, A 10% (or less) total

number of iterations, and a chosen such that change in SA

estimate does not exceed desired magnitude of change in early iterations

‒ Choosing arequires sample gradient estimates at initial 

 

 

  

k k

a c

a c

k A and k

( 1 ) ( 1)

Example: Wastewater Treatment Problem

(Example 6.5 in

ISSO

)

• Small-scale problem with p = 2

– Aim is to optimize water cleanliness and methane gas byproduct

– Evaluated algorithms with 50 realizations of N= 2000

measurements

• Used FDSA with gains ak = a

/

(1 + k) and ck = 1

/

(1 + k)1/6 – Asymptotically optimal decay rates found “best”

• Gain tuning chooses a; naïve gain sets a = 1

• Also compared with random search algorithm B from Chapter 2

• Algorithms use noisy loss measurements (same noise

(5)

6-9

Mean values of

L

(

)

with 95% Confidence Intervals

FDSA with “naïve” gains

FDSA with tuned gains

N = 100 (25 iters.)

0.11 [0.087, 0.140]

0.083 [0.057, 0.108]

N = 2000 (500 iters.)

0.023 [0.017, 0.028]

0.021 [0.016, 0.026]

 Above numbers much lower than random search

algorithm B: best value at N = 2000 is 0.38

 Shows value of approximating gradient in FDSA

ˆ

( )

k

L

Example: Skewed-Quartic Loss Function

(Examples 6.6 and 6.7 in

ISSO

)

• Larger-scale problem with p = 10:

()i is the ith component of B, and pB is an upper triangular

matrix of ones

• Used N = 1000 measurements; 50 replicates

• Used FDSA with gains ak = a

/

(1+k+A) and ck = c

/

(1+k)

• “Semi-automatic” and manual gain tuning • Also compared with random search algorithm B

 

 

3 

4

1 1

( ) T T 0.1 (p ) 0.01 (p )

i i

i i

(6)

6-11

Algorithm Comparison with Skewed-Quartic

Loss Function (

p

= 10) (Example 6.6 in

ISSO

)

Example with Skewed-Quartic Loss:

Mean Terminal Values and 95% Confidence

Intervals for

FDSA: semi-automatic

gains

FDSA: manually tuned gains

Random searchB

0.427

[0.411, 0.443] [0.502, 0.561] 0.531 [1.190, 1.378]1.285

FDSA semi-automatic is best with respect to  error

 Random search algorithm B produces solution further from  than initial condition!

Butloss value is better than initial condition

 

k 0

ˆ

ˆ

References

Related documents

Lower Township FD No. Use the "Questionnaire Detail" tabs to provide further information, as necessary... 6) Use the " Vehicle List " tabs to list of the Fire

The Department of Economics covers a broad spectrum of research areas such as international trade, employment and migration, econometric theory, international finance,

Wireless improvements Email upgrades Improved network speed, robustness Training Data storage and backup improvements Easier, more direct access to appropriate help

The diploma completion program must employ Kansas licensed teachers to provide instruction and/or Kansas licensed virtual course monitors to provide oversight of students. If

In this paper, which builds upon the analysis in Drèze (2002), we derive the optimal insurance policy in a general model with a discrete number of states of health and we show

Author (s), year Methodology Building type Occupants interactions In fl uential parameter Gandhi and Brager (2016) [20] 2 Years Field Study, Data Analysis Using Rstudio

The results have shown that the feed speed proved to be the most significant factor having impact on energy demand in the process of cutting, the second most significant factor was

Hampton University, Hampton, Va., 1868 Claflin University, Orangeburg, S.C., 1869 Clark College, Atlanta, Ga., 1869 Dillard University, New Orleans, La., 1869 Alabama