from a Bayesian perspective
MATTEORUGGIERO
University of Pavia, Italy
Joint with STEPHENG. WALKER, University of Kent, UK
Outline
Background
Bayes nonparametrics and Gibbs sampler Fleming-Viot processes
Bayesian construction of Fleming-Viot diffusions
Basic construction: neutrality Extension to selective models
Dirichlet process
Definition.(Ferguson, 1973) Given a finite non-null measure α on (X ,X ), a random probabil-ity measureµon (X ,X ) is said to be aDirichlet processwithparameter α, denoted µ ∼Dα, if for every K = 1, 2, . . . and for every measurable partition (B1, . . . , BK) ofX ,
`µ(B1), . . . , µ(BK)´ ∼ Dirichlet`α(B1), . . . , α(BK)´.
Conjugate posterior process. (Ferguson, 1973) Let µ be a Dirichlet process on (X ,X ) with parameter α, and let X1, . . . , Xnbe a sample of size n from µ. Then
µ | X1, . . . , Xn∼Dα+Pn i=1δXi where δxdenotes a point mass at x.
Connection with GEM distribution.Sethuraman (1994) proposed a series representation for the Dirichlet process, the so-called stick-breaking construction, with locations Yi iid∼ α/α(X ) and weights piwith GEM distribution
Dα= L „ ∞ X i=1 „ Vi i−1 Y j=1 (1 − Vj) | {z } pi « δYi « Vi iid ∼ Beta(1, α(X )), Yiiid∼ α α(X )
A Polya urn for the Dirichlet process
Blackwell and MacQueen (1973)
Let again α be a finite measure on (X ,X ), and let {Xn}n≥1be such that
X1∼ α
α(X ), Xn+1|X1, . . . , Xn∼
α +Pn i=1δXi
α(X ) + n , n> 1. The observed colours have higher probability of being drawn again. If α is non atomic, there is a continuum of colours. Then{Xn}n≥1is calledPolya sequencewith parameter α and
(a) α +
Pn i=1δXi
α(X ) + n =⇒ µ
∗
a.s., µ∗being a discrete measure
(b) µ∗∼Dα
(c) X1, X2, . . . | µ∗ iid∼ µ∗
Rephrasing, we can write the joint law of X1, . . . , Xn, for n ≥ 1, as
P(X1∈ dx1, . . . , Xn∈ dxn) = E » n Y i=1 µ(dxi) – = Z F n Y i=1 µ(dxi)Dα(dµ)
Gibbs sampling
Geman and Geman (1984)
Special case of a Metropolis-Hastings algorithm, broadly used in Bayesian inference.
Suppose we want to sample from some joint distribution pX,Y(x, y), but this is unfeasible. Given the initial value (x0, y0), it is usually easier to sample from the (full) conditional distributions
X1∼ pX|Y(x | y0) Y1∼ pY|X(y | x1) X2∼ pX|Y(x | y1) . . .
and so on. Then {(xn, yn)}n≥1is aMarkov chainwith stationary distribution pX,Y(x, y). Taking M such chains {(xi
n, yin)}n≥1,i=1,...,M, for sufficiently large N ≥ 1 we can approximate 1 M M X i=1 f(xi N, yiN) ≈ Z f(x, y)pX,Y(x, y)dxdy
If the coordinates are updated in a random order and visited infinitely often, the chain is also reversible w.r.t. pX,Y(x, y).
Outline
Background
Bayes nonparametrics and Gibbs sampler Fleming-Viot processes
Bayesian construction of Fleming-Viot diffusions
Basic construction: neutrality Extension to selective models
Fleming-Viot processes
Fleming and Viot (1979)
A Fleming-Viot process is a probability-measure-valued diffusions which describes the evolution in time of an infinite population subject to mutation, resampling and (possibly) selection and recombination.
Among its main features:
• individuals are labeled by points in acomplete separable metric space X, calledtype space
(for simplicity we assume X is compact);
• it takesvalues onthe setP(X )of Borel probability measures;
• it hassample-paths inthe spaceCP(X )([0, ∞))of continuous functions from [0, ∞) to P(X ).
The neutral version has infinitesimal generator A0ϕ(µ) = m X i=1 hPif, µmi + 1 2 X 1≤k6=i≤m hΦkif− f , µmi hf , µi = Z fdµ with domain D(A0) = n ϕ(µ) ∈ B(P(X )) : ϕ(µ) = hf , µmi, f ∈ C(Xm ), m ∈ N o
where P is the generator of a Feller mutation process on X , Piacts on xiin f (x1, . . . , xn), Φki changes xkto xiin f .
When
Pf(x) =θ 2 Z
ˆf (y) − f (x)˜ν0(dy) ν0non atomic, θ > 0 itsstationary distributionisDα, with α = θν0. (Ethier and Kurtz, 1986)
Its transition function is given by P(t, µ, dν) = ∞ X m=0 dm(t) Z XmDα+ Pm i=1δXi(dν)µ(dX1) . . . µ(dXm)
where dm(t) = P(Dt= m) and Dtis a death process starting a.s. from ∞, andDα+Pm i=1δXiis a posterior Dirichlet process. (Ethier and Griffiths, 1993)
If we addselection, then the FVP has generator Aσϕ(µ) = m X i=1 hPif, µmi +1 2 X 1≤k6=i≤m hΦkif− f , µmi + m X i=1 hσi(·) f − σm+1(·) f , µm+1i
where σi(·) = σ(xi) is the selection coefficient, andstationary distributionproportional to e2hσ,µiDα(dµ)
Outline
Background
Bayes nonparametrics and Gibbs sampler Fleming-Viot processes
Bayesian construction of Fleming-Viot diffusions
Basic construction: neutrality Extension to selective models
Gibbs sampling the Polya urn
Given an exchangeable vector Xn= (X1, . . . , Xn), define aGibbs sampler driven Markov chain
{Xn(k)}k≥1such that at each transition
• xiis removed from xn= (x1, . . . , xn) with probability 1/n • a replacement X0
i is sampled from the Blackwell-MacQueen prediction scheme with α = θν0, where θ = α(X ) and ν0= α/α(X ) non atomic, namely
X0i| x(−i)∼ θ θ + n − 1ν0(dx 0 i) + 1 θ + n − 1 n X k6=i δxk(dx 0 i) (1)
• the arrival state is (x1, . . . , xi−1, x0i, xi+1, . . . , xn).
This amounts to performing a Gibbs sampler on (X1, . . . , Xn), with a random scan (update Xi with index i random) and full conditionals Pα
Xi|X(−i)(dxi|x1, . . . , xi−1, xi+1, . . . , xn) given by (1).
This produces a reversible Markov chain {(X1(k), . . . , Xn(k))}k≥1with stationary distribution PXα 1,...,Xn(dx1, . . . , dxn) = ν0(dx1) θν0(dx2) + δx1 θ + 1 . . . θν0(dxn) +Pn k=1δxk(dxn) θ + n − 1
The particle process
Embed it in continuous time in DXn[0, ∞), with Exp(λn) sojourn times, and let λn= n(θ + n − 1)/2
Remarks
a) λnsubstitutes time rescaling.
b) for θ = 0 there is no mutation, and λnis the transition rate of Kingman’s coalescent.
The generator of the Xn-valued process is Anf(x) = n X i=1 λn n(θ + n − 1) Z h f(ηi(x|y)) − f (x) i (θν0+ n X k6=i δxk)(dy)
where ηi(x|y) = (x1, . . . , xi−1, y, xi+1, . . . , xn).
Define the process of empirical measures {µn(t)}t≥0:= {1nPni=1δxi(t)}t≥0with c`adl`ag sample-paths in DP(X )[0, ∞).
Convergence and stationarity
Neutral diffusion model
Define ϕm(µ) = hf , µ(m)i, µ(m)= (n − m)! n! X 1≤i16=...6=im≤n δ(xi1,...,xim)
Then Anϕm(µ) = hAnf, µ(m)i is the generator of the measure-valued process and ||Anϕm(µ) − A0φm(µ)|| −→
n→∞0 φm(µ) = hf , µ
mi
where A is the generator of a FV process. Since the linear span of functions φmis a core for A in C(P(X )), and both Anand A generate strongly continuous contraction semigroups, this implies{µn(t)} =⇒ {µ∞(t)} in DP(X )[0, ∞), where {µ∞(t)} is a FV process.
From de Finetti’s theorem, w.p. 1 we have µn(t) ⇒ µ∞(t) for every t, and µ∞(t) ∼Dθν0. From the well-posedness of the martingale problem for A0, it follows that the stationary distri-bution of {µ∞(t)} is a Dirichlet processDθν0.
Outline
Background
Bayes nonparametrics and Gibbs sampler Fleming-Viot processes
Bayesian construction of Fleming-Viot diffusions
Basic construction: neutrality Extension to selective models
A generalised Polya urn scheme
Consider the exchangeable lawQα,βn
X1,...,Xn(dx1, . . . , dxn) ∝ P α
X1,...,Xn(dx1, . . . , dxn)βn(x1) . . . βn(xn) (2) where we assume βn∈ B(X ) for all n.
Remark
It can be shown that Qα,βn
X1,...,Xnadmits representation in terms of aDirichlet process mixture(Lo, 1984), a model widely used for Bayesian density estimation.
From (2)the predictive law for xiis Qα,βn
Xi|X(−i)(dxi|x1, . . . , xi−1, xi+1, . . . , xn) ∝ θβn(xi) ν0(dxi) + n X
l6=i
βn(xl) δxl(dxi)
and it is clear that for βn(x) ≡ 1 it reduces to the Blackwell-MacQueen case Qα,1X 1,...,Xn(dx1, . . . , dxn) ∝ P α X1,...,Xn(dx1, . . . , dxn) = n Y i=1 θν0(dxi) + P l≤iδxl(dxi) θ + i − 1 .
Gibbs sampling again
Similarly to the neutral case, define aMarkov chain{Xn(k)}k≥1such that at each transition • xiis removed from xn= (x1, . . . , xn) with probability 1/n
• a replacement is sampled from the generalized Blackwell-MacQueen predictive
Qα,βn
Xi|X(−i)(dxi|x1, . . . , xi−1, xi+1, . . . , xn) ∝ θβn(xi) ν0(dxi) + n X
l6=i
βn(xl) δxl(dxi)
This produces a chain reversible with respect to Qα,βX
1,...,Xn(dx1, . . . , dxn). Embed it in continuous time in DXn[0, ∞) with Exp(λn,i) sojourn times such that
λn,i= 1 2n „ θ Z βn(u) ν0(du) +X l6=i βn(xl) «
and note that βn≡ 1 ⇒ λn,i= n(θ + n − 1)/2
Remark
Convergence
Fleming-Viot process with selection
When the weights in Qα,βn
X1,...,Xnhave form βn(x) = 1 + 2
nσ(x), with σ ∈ B(X ) the particle process has generator
Anσf(x) = n X i=1 1 2θ Z h f(ηi(x|y)) − f (x)i{1 +2 nσ(y)}ν0(dy) +1 2 X 1≤k6=i≤n h f(ηi(x|xk)) − f (x) i +1 n X 1≤k6=i≤n σ(xk) h f(ηi(x|xk)) − f (x) i
Remark
σ represents the fitness of the offspring, acting as fertility selection.
Since hAnσf, µ(m)i → Aσφm(µ) strongly, it can be shown that the process of empirical measures converges in distribution in DP(X )[0, ∞) to the FV process with fertility selection
1 n n X i=1 δxi(t), t ≥ 0 ff =⇒ n→∞{µ σ ∞(t), t ≥ 0} in DP(X )[0, ∞) where µσ
Diploid case
For adiploid population, take a bivariate selection function βn(x, y) ∈ Bsym(X2) and consider the law, joint with pairings Pn,
Qα,βn X1,...,Xn,Pn(dx1, . . . , dxn, Pn) ∝ P α X1,...,Xn(dx1, . . . , dxn) Y k βn(xk, xjk)
With appropriate modifications, the same procedure leads to a generalized urn scheme with con-ditional law proportional to
θ n X j6=i βn(xi, xj) ν0(dxi) + n X k6=i n X j6=i βn(xi, xj) δxk(dxi)
and a particle process with Poisson rate λn,i= 1 2n „ θ Z n X j6=i βn(xi, xj) ν0(dxi) + n X k6=i n X j6=i βn(xk, xj) «
whose process of empirical measures converge to a FV process with diploid selection. When βn(x) =R βn(x, y)µ(y) we recover the haploid case. When βn(x, y) ≡ 1 we recover
n(θ + n − 1)/2 and θν0(dxi) +
n X
k6=i δxk(dxi).
Stationarity
We exploit the representation of Qα,βn
X1,...,Xn(dx1, . . . , dxn) in terms of Dirichlet process mixture model. Given
zi|xiind∼ Kn(·|xi) xi|µiid∼ µ µ ∼Dθν0 so that
L(X1, . . . , Xn|z1, . . . , zn) ∝ Kn(·|x1) . . . Kn(·|xn)PXα1,...,Xn. Assuming Kn(1|xi) = βn(xi) we have Qα,βn
X1,...,Xnis the stationary of (x|zn= 1). Consider the Gibbs sampler extended to (x1, . . . , xn, µ|zn= 1), alternating updates to
(x1, . . . , xn|µ, zn= 1) and (µ|x1, . . . , xn, zn= 1).
Hence (µ|zn= 1) is a MV chain with stationary L(µ|zn= 1). From Bayes’ theorem we have L(µ|zn= 1) ∝ L(zn= 1|µ)Dθν0(dµ) ∝ » Z βn(y)µ(dy) –n Dθν0(dµ) = Πn(dµ)
The limit of Πnwill be the de Finetti measure of the sequence (x1, x2, . . . |z∞= 1), since (x1, . . . , xn|µ, zn= 1) iid ∼ µ µ ∼ Πn from which (x1, . . . , xn|zn= 1) ∼ Qα,βn X1,...,Xnimplies 1 n n X i=1 δxi=⇒ µ
∗ a.s. µ∗∼ Π∞ (if it exists) (3)
When X is compact,P(X ) (with the topology of weak convergence) is compact, hence {Πn} is tight, and Π∞is well defined. If βn(x) = 1 +2nσ(x) we have
Π∞(dµ) ∝ lim n » 1 +2 n Z σ(y)µ(dy) –n Dθν0(dµ) ∝ e2R σdµDθν 0(dµ).
Since the martingale problem for Aσis well-posed, (3) is enough to conclude that Π∞is the stationary distribution of the FV process with selection.
ANTONIAKC.E. (1974).Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.Ann. Statist. 2. BLACKWELLD.ANDMACQUEENJ.B. (1973).Ferguson distributions via Polya urn schemes.Ann. Statist. 1.
DAWSOND.A.ANDGREVENA. (1999).Hierarchically interacting Fleming-Viot processes with selection and mutation: multiple space time scale analysis and quasi-equilibria.Electron. J. Probab. 4.
DAWSOND.A., GREVENA.ANDVAILLANCOURTJ. (1995).Equilibria and quasi-equilibria for infinite collections of interacting Fleming-Viot processes.Trans. Amer. Math. Soc., 347.
DONNELLYP.ANDKURTZT.G. (1996).A countable representation of the Fleming-Viot measure-valued diffusion.Ann. Probab. 24. DONNELLYP.ANDKURTZT.G. (1999).Genealogical processes for Fleming-Viot models with selection and recombination.Ann.
Appl. Probab. 9.
ETHIERS.N.ANDGRIFFITHSR.C. (1993).The transition function of a Fleming-Viot process.Ann. Probab. 21. ETHIERS.N.ANDKURTZT.G. (1986).Markov processes: characterization and convergence.Wiley.
ETHIERS.N.ANDKURTZT.G. (1994).Convergence to the Fleming-Viot process in the weak atomic topology.Stoch. Proc. Appl. 54. FERGUSONT.S. (1973).A Bayesian analysis of some nonparametric problems.Ann. Statist., 1.
FLEMINGW. H.ANDVIOTM. (1979).Some measure-valued processes in population genetics theory.Indiana University Mathematics J. 28.
GEMAN, S.ANDGEMAN, D. (1984).Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.IEEE Trans. Patt. Anal. Mach. Intelligence, 6.
LOA. Y. (1984) .On a class of Bayesian nonparametric estimates I: density estimatesAnn. Statist., 12. SETHURAMANJ. (1994).A constructive definition of Dirichlet priors.Statist. Sinica, 4.