PapadimitriouTCSplus.ppt

(1)

Satisfiability

and

Evolution

Adi Livnat, Christos Papadimitriou, Aviad Rubinstein, Greg Valiant,

(2)

“One curious aspect of evolution is that everybody thinks he understands it!”

Jacques Monod

“Nothing makes sense in life except in the light of evolution”

(3)

(4)

Waddington’s Experiment (1952)

Generation 0

(5)

Waddington’s Experiment (1952)

Generation 1

Temp:

40

o

C

(6)

Generation 2

Temp:

40

o

C

(7)

Generation 3

Temp:

40

o

C

(8)

(…)

Generation 20

Temp:

40

o

C

(9)

Surprise!

Generation 20

Temp:

20

o

C

(10)

Genetic Assimilation

(11)

Is There a Genetic Explanation?

Suppose:

•The “red phenotype” depends

on genes x₁, …, x_n plus h = “high temp”

•“red” = F(x, h), Boolean

(12)

How do Allele Frequencies Change

from Generation to Generation?

• Suppose x_i = 1 with probability p_i • Next generation?

(13)

A Genetic Explanation?

Wanted: Boolean function F ( x, h ) with these properties:

•Initially, Prob_{x ~ p[0]} [F ( x, h = 0)] ≈ 0% •Then Prob_p[0][F ( x, 1)] ≈ 15%

•After breeding Prob_p[1][F ( x, 1)] ≈ 60%

(14)

A Genetic Explanation!

“red” = “x₁ + x₂ + … + x₁₀ + 3h ≥ 10”

•T = 0, ~0% •T = 1, ~15% •T = 2, ~60% •T = 20, ~ 99% •h = 0, ~25%

(15)

Stepping Back now:

Evolution Today

• A powerful and prestigious theory

• Founded on the ideas of Darwin and Mendel • Informed by sophisticated math models

developed in the early 20th century (mainly,

population genetics)

(16)

Yet

many important

mysteries remain:

• What is the role of sex/recombination? • Why is there so much genetic diversity

within species?

(17)

Now back to Waddington’s

Experiment

•The “red phenotype” seems to be a complex trait which actually does emerge…

(18)

So, how about generalizing it

• We have an arbitrary Boolean function of genes (no environmental variable h)

• Suppose the satisfying genotypes have a small fitness advantage (1 vs. 1 + ε, say) • (Instead of a 0 - 1 advantage as in

Waddington’s experiment)

(19)

Perhaps Monotone Functions?

• Recall: p_i' = prob

p [ xi = 1 | F(x, h) ]

• Now, p_i’ ≈ (1 – ε) × p i +

ε × prob_p [ x_i = 1 | F(x) ]

• If F(x) is monotone, and x_i has some influence, then p_i' > p

i

(20)

Monotone Functions (cont.)

• _{After exponentially many steps, done}

Theorem: n / (ε×σ0) steps suffice

σ0 = initial satisfaction probability

Proof: Boolean Fourier analysis

(21)

A perenthesis: Genetic

Algorithms

• My road to Evolution

• In life sex is succesful and ubiquitous

• _{Why do GA perform so poorly when}

compared to Simulated Annealing?

• _Answer:_{Evolution is not a good metaphor} for heuristics

(22)

Back to monotone functions:

But wait a minute…

• Why are we assuming a product distribution?

• Isn’t there linkage (correlation) in genetics? • For example, imagine F = … (x₁ = x₂) …

• Prob [x₁ = x₂] = ¾ ≠ ½ × ½ + ½ × ½

(23)

Nagylaki’s Theorem

Theorem [N 1993]

: After O(log n)

generations, LD = O(ε)

(24)

Why? Trace genes in ancestor tree

…

3 log

n

(25)

Bounding

Linkage Disequilibrium

• _But_{if there is selection, sampling is not}

quite uniform

• ~ ε bias introduced at each generation • Therefore, LD = O(ε log n)

(26)

So, Assumption Justified!

• OK to assume product distribution. • (Since fitness values are 1 and 1 + ε)

(27)

Arbitrary Boolean Functions!

Main Theorem:

Any Boolean function

of genes which confers a small

evolutionary advantage will be

(28)

Wait a minute, this is wrong!

• XOR: Suppose that F = (x₁ ≠ x₂)

• What will happen if we start uniform?

• _{No change!}

• Also, if, e.g., F = “exactly k out of n are true” and we start at x_i = k/n

(29)

Main Theorem:

Precise statement

• To form the next generation:

1. Sample from current product

distribution {p_i}, N individuals, let the empirical distribution be {q_i}

(gets you unstuck in XOR etc.)

2. Then apply Selection: p_i’ ≈ q

(30)

Main Theorem:

Detailed Parameters

• n = number of genes involved

• σ0 = initial satisfaction probability • ε = selection strength, must be < 1/n

• N = population > n3 / (σ0)4

(31)

ε < 1/n ?!?

• How come a theorem about the effectiveness of selection seems to fail when selection is strong?

• Intuitive explanation: In the interior of the cube the process is close to gradient descent

(32)

Main Open Problem

Greg’s Conjecture: Result remains true even

if the fitness is 0 – 1

•Evidence from experiments

(33)

Main Theorem: Proof

• We want to show that the sample - select

process leads to a satisfying population • We track the expected fitness f[p]

• Core of the proof: We bound the variance introduced by sampling by the expected fitness increase in the selection step:

p_i’ = q

(34)

Main Theorem: Proof (cont.)

p_i’ = q

i + ε × probq [ xi = 1 | F(x) ]

•We show:

variance introduced fitness increase

E_q[(f (q) –f (p))2] ≤ E [f (p’) – f (q)]/(N(1 – nε))

(35)

Main Theorem: Proof (cont.)

E_q[(f (q) –f (p))2] ≤ E

(36)

E_q[(f (q) –f (p))2] ≤

linear mass of the q-biased Fourier transform of f

E_q [Σ_i(Fq

{i})2]/N

(37)

E_q[(f (q) –f (p))2] ≤

E_q[Σ_i(Fq

{i})2]/N

(38)

E_q[(f (q) –f (p))2] ≤

E_q[Σ_i(Fq

{i})2]/N

(39)

E_q[(f (q) –f (p))2] ≤

Σ_i(Fq

{i})2

(40)

• Next: the total effect of the variance steps is small • Idea: Σ_t f (qt+1) – f (pt) is a martingale

• But no obvious upper bound

• Need specialized martingale inequality

(41)

Main Theorem: Proof (cont.)

• Finally: the process gets so close to the boundary that increase is miniscule

• Random walk (with absorbing boundaries!) will eventually get stuck at a vertex of the cube

• End of proof

(42)

Discussion

• An interesting and nontrivial algorithmic fact about satisfiability

• Remember Greg’s conjecture

• Parameter bounds should be very improvable • Monotone functions bound essentially tight

(43)

Implications for

Evolution

?

• Interesting new mechanism for the emergence of complex traits

(44)

Implications for

Evolution

?

(cont.)

(45)

Implications for

Evolution

?

(cont.)

• [CLPV, PNAS 2014]: Evolution under sex is tantamount to a repeated coordination game

played by the genes: the strategies are the

allleles, the probabilities are the frequencies in

the population, the utility is fitness, and the

game is played through multiplicative weights!

(46)

Soooo,

Evolution

and TCS

Remember the three mysteries of Evolution (not the only ones, btw):

•What is the role of sex/recombination?

•Why is there so much genetic diversity within species?

• How do complex traits emerge?

(47)