• No results found

Chapter 10_handout.pdf

N/A
N/A
Protected

Academic year: 2020

Share "Chapter 10_handout.pdf"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

CHAPTER 10

E

VOLUTIONARY

C

OMPUTATION

II

:

G

ENERAL

M

ETHODS AND

T

HEORY

•Organization of chapter in ISSO

– Introduction

– Evolution strategy and evolutionary programming; comparisons with GAs

– Schema theory for GAs – What makes a problem hard? – Convergence theory

– No free lunch theorems

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Methods of EC

• Genetic algorithms (GAs), evolution strategy (ES), and evolutionary programming (EP) are most common EC methods

• Many modern EC implementations borrow aspects from one or more EC methods

– Ant colony optimization, differential evolution, particle swarm optimization, etc.

(2)

10-3

ES Algorithm with Noise-Free Loss

Measurements

Step 0 (initialization)Randomly or deterministically

generate initial population of N values of    and

evaluate Lfor each of the values.

Step 1 (offspring)Generate offspring from current

population of N candidate values such that all values

satisfy direct or indirect constraints on .

Step 2 (selection)For (N+)-ES, select N best values from

combined population of Noriginal values plus offspring;

for (N,)-ES, select N best values from population of > N

offspring only.

Step 3 (repeat or terminate) Repeat steps 1 and 2 or

terminate.

Schema Theory for GAs

• Key innovation in Holland (1975) is a form of theoretical

foundation for GAs based on schemas

– Represents first attempt at serious theoretical analysis – But not entirely successful, as “leap of faith” required to

relate schema theory to actual convergence of GA

• “GAs work by discovering, emphasizing, and recombining good ‘building blocks’ of solutions in a highly parallel fashion.” (Melanie Mitchell, An Introduction to Genetic Algorithms [p. 27], 1996, paraphrasing John Holland)

– Statement above more intuitive than formal

(3)

10-5

Schema Theory for GAs (cont’d)

• Schema is template for chromosomes in GAs

• Example: [* 1 0 * * * * 1], where the * symbol represents a

don’t care(or free) element

– [11001101] is specific instance of this schema

• Schemas sometimes called building blocksof GAs

• Two fundamental results: Schema theoremand implicit

parallelism

• Schema theorem says that better templates dominate the population as generations proceed

• Implicit parallelism says that GA processes >> N schemas

at each iteration

• Schema theory is controversial

– Not connected to algorithm performance in same direct way as usual convergence theory for iterates of algorithm

Convergence Theory via Markov Chains

• Schema theory inadequate

– Mathematics behind schema theory not fully rigorous – Unjustified claims about implications of schema theory • More rigorous convergence theory exists

– Pertains to noise-free loss (fitness) measurements

– Pertains to finite representation (e.g., bit coding or floating point representation on digital computer)

• Convergence theory relies on Markov chains • Each state in chain represents possible population • Markov transition matrix P contains all information for

(4)

10-7

GA Markov Chain Model

• GAs with binary bit coding can be modeled as (discrete state) Markov chains

• Recall states in chain represent possible populations • ith element of probability vector p

k represents probability of

achieving ith population at iteration k

• Transition matrix: The i, j element of P represents the probability of population i producing population j through

the selection, crossover and mutation operations

– Depends on loss (fitness) function, selection method, and reproduction and mutation parameters

Given transition matrix P, it is known that +1

pTk =p PTk

Rudolph (1994) and Markov Chain

Analysis for Canonical GA

• Rudolph (1994, IEEE Trans. Neural Nets.) uses Markov

chain analysis to study “canonical GA” (CGA)

• CGA includes binary bit coding, crossover, mutation, and “roulette wheel” selection

– CGA is focus of seminal book, Holland (1975)

• CGA does notinclude elitismlack of elitism is critical aspect of theoretical analysis

• CGA assumes mutation probability 0 < Pm < 1 and

single-point crossover probability 0  Pc 1

• Key preliminary result: CGA is ergodic Markov chain:

(5)

10-9

Rudolph (1994) and Markov Chain

Analysis for CGA (cont’d)

• Ergodicity for CGA provides a negative result on convergence in Rudolph (1994)

• Let denote lowest of N(= population size) loss

values within population at iteration k

– represents loss value for in population kthat has maximum fitness value

• Main theorem: CGA satisfies

(above limit on left-hand side exists by ergodicity)

• Implies CGA does not converge to the global optimum



 

min, ˆ

lim k

( )

1

k P L L

min, ˆ

k L

min, ˆ

k L

Rudolph (1994) and Markov Chain

Analysis for CGA (cont’d)

• Fundamental problem with CGA is that optimal solutions are found but then lost

• CGA has no mechanism for retaining optimal solution • Rudolph discusses modification to CGA yielding positive

convergence results

• Appends “super individual” to each population – Super individual represents best chromosome so far – Not eligible for GA operations (selection, crossover,

mutation)

– Not same as elitism

(6)

10-11

Contrast of Suzuki (1995) and Rudolph

(1994) in Markov Chain Analysis for GA

• Suzuki (1995, IEEE Trans. Systems, Man, and Cyber.)

uses Markov chain analysis to study GA with elitism

– Same as CGA of Rudolph (1994) except for elitism

• Suzuki (1995) only considers unique states (populations) – Rudolph (1994) includes redundant states

• With N= population size and B = no. of bits/chromosome:

unique states in Suzuki (1995),

2NBstates in Rudolph (1994) (much larger than number of

unique statesabove)

• Above affects bookkeeping; does not fundamentally change relative results of Suzuki (1995) and Rudolph (1994)            

( 2 1)!

2 1

(2 1)! !

B B B N N N N

Convergence Under Elitism

• In both CGA case (Rudolph, 1994) and case with elitism (Suzuki, 1995) the limit exists:

(dimension of differs according to definition of states, unique or nonunique as on previous slide)

• Suzuki (1995) assumes each population includes oneelite element and that crossover probability Pc = 1

• Let represent jth element of , and Jrepresent indices j

where population j includes chromosome achieving L()

• Then from Suzuki (1995):

0



 lim

pT p PT k

k p p

p

j p  

j 1

(7)

10-13

Calculation of Stationary Distribution

• Markov chain theory provides useful conceptual device • Practical calculation difficult due to explosive growth of

number of possible populations (states)

• Growth is in terms of factorialsof N and bit string length

(B)

• Practical calculation of pk usually impossible due to difficulty in getting P

• Transition matrix can be very large in practice – E.g., if N = B= 6, Pis 108108matrix!

– Real problems have Nand B much largerthan 6

• Ongoing work attempts to severely reduce dimension by limiting states to only most important (e.g., Spears, 1999; Moey and Rowe, 2004)

Example 10.2 from

ISSO

: Markov Chain

Calculations for Small-Scale Implementation

• Consider L() =   = [0,15]

• Function has local and global minimum; plot on next slide • Several GA implementations with very small population

sizes (N) and numbers of bits (B)

• Small scale implementations imply Markov transition matrices are computable

– But still not trivial, as matrix dimensions range from approximately 20002000 to 40004000

 

(8)

10-15

Loss Function for Example 10.2 in

ISSO

Markov chain theory provides probability of finding solution (= 15) in given number of iterations

Example 10.2 (cont’d): Probability

Calculations for Very Small-Scale GAs

Probability that GA with elitism produces population containing optimal solution

GA iteration 0 5 10 20 30 40 50 100 150

Crossover (Pc) = 1.0

Mutation (Pm) = 0.05

Population (N) = 2 Bit length (B) = 6

0.03 0.08 0.15 0.32 0.48 0.62 0.74 0.97 1.00

Pc = 1.0

Pm = 0.05

N = 4 B = 4

0.21 0.51 0.69 0.92 1.00 -- -- --

--Pc = 1.0

(9)

10-17

Summary of GA Convergence Theory

• Schema theory (Holland, 1975) was most popular method for theoretical analysis until approximately mid-1990s

– Schema theory not fully rigorous and not fully connected to actual algorithm performance

• Markov chain theory provides more formal means of

convergence—and convergence rate—analysis

• Rudolph (1994) used Markov chains to provide largely negative result on convergence for canonical GAs

– Canonical GA does not converge to optimum

• Suzuki (1995) considered GAs with elitism; unlike Rudolph

(1994), GA is now convergent

• Challenges exist in practical calculation of Markov transition matrix

No Free Lunch Theorems (Reprise, Chap. 1)

• No free lunch (NFL) Theorems apply to EC algorithms – Theorems imply there can be no universally efficient EC

algorithm

– Performance of one algorithm when averaged over all problems is identical to that of any other algorithm • Suppose EC algorithm A applied to loss L

– Let denote lowest loss value from most recent N

population elements after n³Nunique function evaluations • Consider the probability that after n unique

evaluations of the loss:

 

ˆn

L

 

ˆn ,

P L   L A  

ˆn

L  

References

Related documents

or non-OUD) Initiation &amp; Addiction Quits Deaths Medical users, prescribed dosage - Treatment (MAT) - Overdoses &amp; fear response - Heroin price - Efforts to control.

The species has otherwise been found in the continental parts of Northern Norway north to F0 (Strand 1946, Vik 1991)!. In Northern Norway otherwise found in TRI and FI (Vik

Patients with immedi- ate-type reactions were tested by an open food chal- lenge, and those with atopic dermatitis or potentially presenting with subjective, but

While the majority of staff (76%) submitted search requests using methods of written communication, including email and search request forms, staff using methods of

Deferred by 2 years for technical reasons and consultations on patient confidentiality Early adopter pilots in March 2007 Choose and Book To enable GPs to help patients

2) Thickness, H+ form, (procedure 401.1). Thickness of the sample was measured with a micrometer with a 1/4&#34; dia. The membrane sample was quickly blotted with filter paper and

Taken together, despite the fact that the number of potential applications for CONP-based therapies appears countless, given that ROS and oxidative stress linked to so

As for the second typology of documents, the results obtained allow us to conclude that the PyOCR library presents a better performance when the original image edition