Combined use of importance weights and resampling weights in sequential Monte Carlo methods

(1)

Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071912

COMBINED USE OF

IMPORTANCE WEIGHTS AND RESAMPLING WEIGHTS

IN SEQUENTIAL MONTE CARLO METHODS

∗,∗∗

Francois Le Gland

1

Abstract. A particle approximation of Feynman–Kac distributions is presented here, that combines SIS and SIR algorithms in the sense that only a fraction of the importance weights is used for re-sampling, and two diﬀerent approaches are proposed to analyze its performance. The ﬁrst approach is based on a representation in terms of path–space distributions, and could be used to analyze the joint particle approximation of distributions for a reference model and for several alternate models at the same time. The second approach, which is of independent interest and seems very promising, is based on a representation in terms of a multiplicative functional, and could be used to analyze particle approximation with adaptive resampling schemes.

(Updated version from 18 December 2007)

Introduction

Consider the unnormalized and normalized Feynman–Kac distributions deﬁned on the setE by

γ_n, φ=E[φ(X_n)

n

k=0

g_k(X_k) ] and µ_n, φ=γn, φ

γ_n,1 ,

respectively, where{X_k, k= 0,1,· · ·, n}is a Markov chain taking values inE and characterized by

• its initial probability distributionη₀(dx),

• and its transition probabilitiesQ_k(x, dx), for anyk= 1,· · ·, n,

and wheregk(x) is a bounded nonnegative function for anyk= 0,1,· · ·, n, and more generally

γn, φ=

E· · ·

Eφ(xn)γ0(dx0) n

k=1

Rk(xk−1, dxk),

which includes the previous case, with

γ₀(dx) =g₀(x)η₀(dx) and R_k(x, dx) =Q_k(x, dx)g_k(x),

∗_{This work was partially supported by CNRS, under the}_MathSTIC_project_{Chaˆınes de Markov Cach´}_{ees et Filtrage Particulaire}_,

and under theAS–STICprojectM´ethodes Particulaires(AS 67), and by Electricit´e de France R&D.

∗∗ _{This paper is dedicated to the memory of Natacha Caylus (1977–2006).}

1 _{IRISA / INRIA, Campus de Beaulieu, 35042 RENNES C´}_{edex, France}

c

EDP Sciences, SMAI 2007

(2)

for any k = 1,· · ·, n. Associated with this probabilistic (or more generally, integral) representation is the recurrence relationγ_k =γ_k₋₁R_k, for anyk= 1,· · ·, n. In full generality, it is always possible to decompose the nonnegative measure

γ0(dx) =W0(x)p0(dx), (1)

in terms of a nonnegative function and a normalized probability distribution, and to decompose the nonnegative kernel

R_k(x, dx) =W_k(x, x)P_k(x, dx), (2) in terms of a nonnegative function and a normalized Markov kernel, for anyk= 1,· · · , n.

Assuming a further factorization of the nonnegative functions

W0(x) =W0imp(x)W0red(x) and Wk(x, x) =Wkimp(x, x)Wkred(x, x), (3)

for anyk= 1,· · ·, n, the following decompositions hold

γ₀(dx) =W₀imp(x)W₀red(x)p₀(dx) and R_k(x, dx) =W_kimp(x, x)W_kred(x, x)P_k(x, dx), (4)

for any k = 1,· · · , n, and the main contribution of this paper is to exploit this decomposition to design and study particle approximations of the form

µN_k =

N

i=1

ui_kwi_kδ

ξ_ki with N

i=1

w_ki = 1 and

N

i=1

ui_kwi_k= 1,

which combine SIS and SIR algorithms [10, Section 3.4.4], with two species of nonnegative weights

• the weights (w1

k,· · ·, wkN) are used for resampling, • the importance weights (u1

k,· · ·, uNk) are used for weighting.

One possible motivation is the joint particle approximation of Feynman–Kac distributions associated with reference and alternate models, under the following absolute continuity assumptions

γ0(dx) =r0(x)γ00(dx) and Rk(x, dx) =rk(x, x)R0k(x, dx), (5)

for anyk= 1,· · ·, n. In this case indeed, assuming a decomposition of the reference nonnegative measure

γ₀0(dx) =W₀0(x)p0₀(dx),

in terms of a nonnegative function and a normalized probability distribution, and assuming a decomposition of the reference nonnegative kernel

R0_k(x, dx) =W_k0(x, x)P_k0(x, dx),

in terms of a nonnegative function and a normalized Markov kernel, for anyk= 1,· · · , n, the following decom-positions hold

γ₀(dx) =r₀(x)W₀0(x)p0₀(dx) and R_k(x, dx) =r_k(x, x)W_k0(x, x)P_k0(x, dx), (6)

for anyk= 1,· · ·, n, which clearly are of the form (4), and it is possible to design particle approximations of the form

µN_k =

N

i=1

ui_kw0_k,iδ

ξ_k0,i with N

i=1

w_k0,i= 1 and

N

i=1

ui_kw0_k,i= 1

where

(3)

• the importance weights (u1

k,· · ·, uNk) depend on both the reference and alternate models.

Clearly, the two point of views are mathematically equivalent, upon suitable substitution of the diﬀerent nonnegative measures and nonnegative functions, and to decide which point of view to adopt usually depends on the application.

The paper is organized as follows : An original particle approximation is presented in Section 1, that com-bines importance weights and resampling weights. To analyze its performance, and in particular to derive a central limit theorem as the numberN of particles goes to infinity, with an explicit expression for the asymptotic variance, two different approaches are proposed. A first representation is introduced in Section 2, in terms of path–space distributions and a path–particle approximation is presented in Section 3, with a central limit theo-rem stated in Theotheo-rem 3.2 and with an explicit expression for the asymptotic variance given in Proposition 3.5. A second representation is introduced in Section 4, in terms of the multiplicative functional of a Markov chain and an extended particle approximation is presented in Section 5, where importance weights are treated as particles, with a central limit theorem stated in Theorem 5.2 and with an explicit expression for the asymptotic variance given in Proposition 5.5. It is checked that all three particle approximations actually correspond to the same algorithm, and that the two expressions obtained for the asymptotic variance are actually equal. The respective merits of the two different approaches are discussed in the Conclusion.

The following abuse of notation W0(x, x) = W0(x), W0imp(x, x) =W0imp(x) andW0red(x, x) = W0red(x)

will be used throughout the paper.

1. Particle Approximation with Combined Weighting and Resampling

Recall that the unnormalized distributions satisfy the following recurrence relation

γ_k =γ_k₋1Rk=µk−1Rkγk−1,1,

for anyk= 1,· · ·, n. Recall also the decomposition (1), and introducing a particle approximation of the form

µ_k≈µN_k =

N

i=1

ui_kwi_kδ

ξi_k with N

i=1

wi_k= 1 and

N

i=1

ui_kwi_k= 1 ,

and using the decomposition (2), yields

µN_k₋₁Rk(dx) =

N

i=1

ui_k₋₁Wk(ξ_ki₋₁, x)

wi_k(x)

w_ki₋₁Pk(ξ_ki, dx)

mi_k(dx)

,

which can be interpreted as the marginal nonnegative measure onE associated with a nonnegative measure on the product set{1,· · ·, N} ×E. Using the auxiliary variable approach [11], the resulting ASIR algorithm can be described by

γ₀N = 1

N N

i=1

W₀(ξ₀i)δ

ξ₀i and γkN =

1

N N

i=1

u_kτk₋i₁W_k(ξ_kτk₋i₁, ξ_ki)δ

ξ_ki γkN−1,1,

where (ξ1₀,· · ·, ξ₀N) are i.i.d. random variables taking values in E and with common probability distribu-tion p₀(dx), and where ((τ1

k, ξ1k),· · ·,(τkN, ξkN)) are i.i.d. random variables taking values in the product set {1,· · ·, N} ×E and with common probability distribution (m1_k(dx),· · ·, mN_k(dx)), or equivalently

(4)

independently for anyi= 1,· · ·, N. Using the further factorization (3) results in the following approximations

µN₀ =

N

i=1

W₀imp(ξi₀)W₀red(ξ₀i)δ ξ₀i

N

j=1

W₀imp(ξ₀j)W₀red(ξj₀) =

N

i=1

ui₀wi₀δ ξ₀i ,

and

µN_k =

N

i=1

uτ_kk₋i₁W_kimp(ξτ_k₋ki₁, ξi_k)W_kred(ξ_kτ₋ki₁, ξi_k)δ ξ_ki

N

j=1

uτ j k

k−1Wkimp(ξ τ_kj

k−1, ξkj)Wkred(ξτ j k k−1, ξkj)

=

N

i=1

ui_kwi_k δ ξi_k ,

for the normalized distributions, which deﬁnes implicitly the resampling weights

wi₀= W

red 0 (ξ0i)

N

j=1

W₀red(ξ₀j)

and w_ki = W

red

k (ξτ i k k−1, ξki)

N

j=1

W_kred(ξτ j k k−1, ξkj)

,

and the importance weights

ui₀= W

imp 0 (ξ0i)

N

j=1

wi₀W₀imp(ξ₀j)

and ui_k = u

τ_ki k−1W

imp

k (ξτ i k k−1, ξik)

N

j=1

wi_kuτ j k

k−1Wkimp(ξ τ_kj k−1, ξjk)

,

for anyi= 1,· · ·, N.

Remark 1.1. If the substitution W₀imp ← r₀, Wred

0 ← W00 and p0 ← p00 is made, and if the substitution

W_kimp ←r_k,Wred

k ←Wk0 and Pk ←Pk0 is made for any k= 1,· · · , n, with the notations of (6), then clearly

the particle positions (ξ_k1,· · · , ξ_kN) and the resampling weights (w1_k,· · · , wN_k ) depend on the reference model only, whereas the importance weights (u1

k,· · · , uNk) depend on both the reference and alternate models. If in

addition the derivatives in (5) are continuous or differentiable w.r.t. some parameter of the model, then the importance weights will automatically inherit the same property, as suggested in [8]. This idea has been used in Monte Carlo maximum likelihood estimation [1, 7] or in smooth particle approximation of Feynman–Kac distributions [3, 4, 6, 9]. Using a single particle system for the reference value, with different importance weights corresponding to different values, makes the resulting approximation regular, but also poorly accurate for values too far from the reference. In opposition, using a different particle system for each different value, would make the resulting approximation very irregular, but also uniformly accurate for all values.

(5)

2. Representation in Path–Space

Using the decompositions (4) yields

γ_n, φ =

E· · ·

Eφ(xn)γ0(dx0) n

k=1

R_k(x_k₋1, dxk)

=

E· · ·

Eφ(xn)W

imp

0 (x0)W0red(x0)p0(dx0)

n

k=1

W_kimp(x_k₋₁, x_k)W_kred(x_k₋₁, x_k)P_k(x_k₋₁, dx_k)

= E[φ(X_n)

n

k=0

W_kimp(X_k₋₁, X_k)

n

k=0

W_kred(X_k₋₁, X_k) ],

where{X_k, k= 0,1,· · ·, n}is a Markov chain taking values in Eand characterized by

• its initial probability distributionp0(dx),

• and its transition probabilitiesP_k(x, dx), for anyk= 1,· · ·, n.

Consider the unormalized and normalized Feynman–Kac distributions deﬁned on the path–spaceE0:n =E× · · · ×E by

γ_n•, f_n=E[f_n(X_n•)

n

k=0

g•_k(X_k•) ] and µ•_n, f_n=γ •

n, fn γ_n•,1 ,

respectively, where{X_k•, k= 0,1,· · ·, n}is a path–space valued Markov chain deﬁned byX_k•= (X₀,· · ·, X_k) =

X_0:_k for anyk= 0,1,· · ·, nand characterized by

• its initial probability distributionη•₀(dx₀) =p₀(dx₀),

• and its transition probabilitiesQ•_k(x_0:_k₋₁, dx_0:_k) =δx_0:_k₋₁(dx_0:_k₋₁)P_k(x_k₋₁, dx_k), for anyk= 1,· · ·, n, and where g_k•(x_0:_k) = Wred

k (xk−1, xk) for any k = 0,1,· · ·, n. Associated with this path–space probabilistic

representation is the recurrence relationγ_k•=γ_k•₋₁R_k•, with

R_k•(x_0:_k₋₁, dx_0:_k) =Q•_k(x_0:_k₋₁, dx_0:_k)g•_k(x_0:_k) =δx_0:_k₋₁(dx_0:_k₋₁)P_k(x_k₋₁, dx_k)W_kred(x_k₋₁, x_k),

for anyk= 1,· · ·, n. In particular for any function of the form

f_n(x0:n) =F(xn, n

k=0

W_kimp(x_k₋1, xk)),

deﬁned on path–space, it holds

γ•_n, f_n=E[F(X_n, n

k=0

W_kimp(X_k₋₁, X_k))

n

k=0

W_kred(X_k₋₁, X_k) ],

and for instance, for the function

Tnφ(x0:n) =φ(xn)

n

k=0

W_kimp(xk−1, xk) then clearly γn•, Tnφ=γn, φ, (7)

or in other wordsγn=γ_n• Tnin terms of a transformed distribution, and for the function

T_nφ(x0:n) =φ0(xn)| n

k=0

(6)

with the unnormalized and normalized Feynman–Kac distributions deﬁned on the setE by

γ_n, φ=E[φ(Xn)|

n

k=0

W_kimp(Xk−1, Xk)|2

n

k=0

W_kred(Xk−1, Xk) ] and µn, φ= γ_n, φ

γ_n,1 ,

respectively.

3. Particle Approximation in Path–Space

Recall that the unnormalized distributions satisfy the recurrence relation

γ_k•=γ•_k₋₁R•_k =g•_k(µ•_k₋₁Q•_k)γ_k•₋₁,1,

for anyk= 1,· · ·, n. Introducing a weighted particle approximation of the form

µ•_k≈µ•_k,N =

N

i=1

wi_kδ

ξ_k•,i with N

i=1

w_ki = 1,

where ξ_k•,i = (ξ₀i_,k,· · ·, ξ_k,ki ) is a path–space particle with terminal position ξ_ki = ξ_k,ki for any i = 1,· · · , N, yields

µ•_k,N₋₁ Q•_k(dx_0:_k) =

N

i=1

wi_k₋₁δ

ξ•_k₋,i₁(dx0:k−1)Pk(xk−1, dxk).

The corresponding SIR algorithm can be described by

γ₀•,N = 1

N N

i=1

g•₀(ξ₀•,i)δ

ξ₀•,i and γ

•,N k =

1

N N

i=1

g_k•(ξ_k•,i)δ ξ•_k,i γ

•,N k−1,1,

where (ξ₀•,1,· · ·, ξ₀•,N) are i.i.d. random variables on the set E with common probability distribution p₀(dx₀), and whereg•₀(ξ₀•,i) =Wred

0 (ξ0i,0) for anyi= 1,· · · , N, and where (ξ•k,1,· · ·, ξk•,N) are i.i.d. random variables on

the path–spaceE0:k =E× · · · ×E with common probability distributionµ_k•,N₋₁Q•_k(dx0:k), or equivalently

τ_ki ∼(w1_k₋₁,· · · , wN_k₋₁) and (ξi₀_,k,· · · , ξ_ki₋₁_,k) = (ξ₀τ_,kki₋₁,· · ·, ξ_kτ₋ki₁_,k₋₁) and ξi_k,k∼P_k(ξ_ki₋₁_,k, dx_k),

independently for any i = 1,· · ·, N, and where g_k•(ξ_k•,i) = Wred

k (ξki−1,k, ξk,ki ) = Wkred(ξ

τ_ki

k−1,k−1, ξik,k) for any

i= 1,· · · , N. This results in the following approximations

µ•₀,N=

N

i=1

W₀red(ξ₀i)δ ξ₀•,i N

j=1

W₀red(ξ₀j) =

N

i=1

w₀i δ

ξ₀•,i and µ

•,N k =

N

i=1

W_kred(ξτ i k

k−1, ξki)δ_ξ•,i

k N

j=1

W_kred(ξτ j k k−1, ξ

j k)

=

N

i=1

(7)

for the normalized distributions, in terms of the terminal positions of the path–space particles, which deﬁnes implicitly the resampling weights as

wi₀= W

red 0 (ξ0i)

N

j=1

W₀red(ξ₀j)

and w_ki = W

red

N

j=1

,

for anyi= 1,· · ·, N. In particular for the function

Tkφ(x0:k) =φ(xk)

k

p=0

W_pimp(xp−1, xp),

deﬁned on path–space and already introduced in (7), it holds

Tkφ(ξ_k•,i) =φ(ξ_k,ki )

k

p=0

W_pimp(ξ_pi₋₁_,k, ξ_p,ki ),

hence

γ₀•,N, T₀φ= 1

N N

i=1

g₀•(ξ•₀,i)T₀φ(ξ₀•,i) = 1

N N

i=1

W₀red(ξ₀i)v₀i φ(ξ₀i),

where

v₀i =W₀imp(ξ₀i),

for anyi= 1,· · ·, N, and

γ_k•,N, T_kφ= 1

N N

i=1

g_k•(ξ_k•,i)T_kφ(ξ_k•,i)γ_k•,N₋₁,1= 1

N N

i=1

W_kred(ξ_kτ₋ki₁, ξi_k)v_ki φ(ξ_ki)γ_k•₋,N₁,1,

where

vi_k =

k

p=0

W_pimp(ξ_pi₋₁_,k, ξ_p,ki )

= W_kimp(ξi_k₋₁_,k, ξi_k,k)

k−1

p=0

W_pimp(ξ_pi₋₁_,k, ξ_p,ki )

= W_kimp(ξτ i k

k−1,k−1, ξk,ki )

k−1

p=0

W_pimp(ξτ i k p−1,k−1, ξ

τ_ki p,k−1)

= W_kimp(ξτ i k k−1, ξik)v

τ_ki k−1 ,

for any i= 1,· · ·, N. Notice the underlying recursive structure, in the form of a multiplicative functional of a Markov chain up to resampling. In view of the interpretationγn=γ•_nTnin terms of a transformed distribution, this results in the following approximations

γ₀N = 1

N N

i=1

W₀red(ξi₀)vi₀δ

ξi₀ and γkN =

1

N N

i=1

W_kred(ξ_kτ₋ki₁, ξ_ki)v_ki δ

(8)

for the unnormalized distributions, hence

µN₀ =

N

i=1

W₀red(ξ₀i)v₀i δ_ξi

0

N

j=1

W₀red(ξj₀)vj₀

=

N

i=1

ui₀wi₀δ_ξi

0 and µ

N k =

N

i=1

W_kred(ξτ i k

k−1, ξki)vikδ_ξi

k N

j=1

W_kred(ξτ j k

k−1, ξjk)vkj

=

N

i=1

ui_kwi_kδ_ξi k ,

for the normalized distributions, which deﬁnes implicitly the importance weights as

ui_k = v

i k N

j=1

v_kj w_kj

with vi₀=W₀imp(ξ₀i) and v_ki =W_kimp(ξτ i k k−1, ξki)v

τ_ki k−1 ,

for anyi= 1,· · ·, N.

Remark 3.1. It is clear that the particle positions (defined as the terminal positions of the path–space particles) and resampling weights defined here are the same as the particle positions and resampling weights defined in Section 1, and it is easy to check by induction that the importance weights defined here are the same as the importance weights defined in Section 1.

For this algorithm, the following central limit theorem holds [5, Chapter 9], [2, Chapter 9].

Theorem 3.2.

√

N γ

•,N

n −γn•, fn

γ_n•,1 =⇒N(0, V •

n(fn)),

in distribution as N↑ ∞, with asymptotic variance

V_n•(fn) = var(g •

0 R•1:nfn, η•₀)

η•₀, g₀•R•_1:_n12 +

n

k=1

var(g_k•R•_k_+1:_nf_n, η_k•)

η•_k, g•_kR•_k_:_n12 ,

whereη•_k=µ•_k₋₁Q•_k for anyk= 1,· · ·, n, and where

R•_k_+1:_nfn(x0:k) =R•k+1· · ·Rn•fn(x0:k) =E[fn(Xn•) n

p=k+1

g•_p(X_p•)|X_k•=x0:k] ,

for any k= 0,1,· · · , n, with R•_n_+1:_nf_n(x0:n) =fn(x0:n)by convention. Remark 3.3 (CLT for normalizing constants). It follows from the identity

γ_nN−γ_n,1

γn,1 =

γ_n•,1

γn,1

γ_n•,N −γ_n•, T_n1

γ_n•,1 ,

and from Theorem 3.2 that

√

N γ N

n −γn,1

γ_n,1 =⇒N(0, Vn(

γ_n•,1

γ_n,1Tn1) ), in distribution asN ↑ ∞.

Remark 3.4 (CLT for normalized distributions). It follows from the decomposition

µN_n −µ_n, φ=γ •

n,1 γ_n,1

γ_n•, Tn1

γn•,N, Tn1

γ_n•,N−γ•_n

(9)

from Theorem 3.2 and from the Slutsky lemma, that

√

N µN_n −µ_n, φ=⇒N(0, V_n(γ •

n,1

γn,1 Tn(φ− µn, φ)) ),

in distribution asN ↑ ∞.

As a consequence, it is worth finding an explicit expression, in terms of Feynman–Kac distributions defined on the setE, of the asymptotic varianceVn(fn) for functions of the formfn =Tnφdefined on the path–space

E0:n=E× · · · ×E. Let

R_k_+1:_nφ(x) =R_k₊₁· · ·R_nφ(x) =E[φ(X_n)

n

p=k+1

W_p(X_p₋₁, X_p)|X_k=x],

for anyk= 0,1,· · ·, n, withRn+1:nφ(x) =φ(x) by convention. Then

var(g₀•R_1:•_nT_n(γ •

n,1 γn,1φ), η

• 0)

η•₀, g₀•R•_1:_n12 =

var(W0R1:nφ, p0) p₀, W₀R_1:_n12 ,

and

var(g_k•R•_k_+1:_nT_n(γ •

n,1 γn,1 φ), η

•

k) η_k•, g•_kR_k•_+1:_n12 =

γ_k₋₁,1 γ_k•₋₁,1

γ_k₋₁,12

var(W_k (R_k+1:nφ)◦π, µk−1⊗Pk) µ_k₋₁, R_k_:_n12

+ [γ

k−1,1 γ•k−1,1 γ_k₋₁,12

µ_k₋₁, R_k:nφ2 µ_k₋₁, R_k_:_n12 −

µ_k₋1, Rk:nφ2 µ_k₋₁, R_k_:_n12 ],

for anyk= 1,· · ·, n.

Proposition 3.5. In particular for the normalizing constant

√

N γ N

n −γn,1

γ_n,1 =⇒N(0, Vn),

V_n = var(W0R1:nφ, p0)

p0, W0R1:n12

+

n

k=1

γ_k₋₁,1 γ_k•₋₁,1

γk−1,12

var(W_k (R_k+1:n1)◦π, µk−1⊗Pk) µk−1, Rk:n12

+

n

k=1

[γ

k−1,1 γk•−1,1 γk−1,12

µ_k₋₁, R_k:n12 µk−1, Rk:n12 −

1 ].

Remark 3.6. In the extreme case where only resampling weights are used, i.e. if W_kred=Wk andW_kimp ≡1 for anyk= 0,1,· · ·, n, then the last sum cancels out, and

V_n =var(W0R1:nφ, p0)

p₀, W₀R_1:_n12 +

n

k=1

var(Wk (Rk+1:n1)◦π, µk−1⊗Pk) µ_k₋₁, R_k_:_n12 ,

(10)

4. Representation in Terms of a Multiplicative Functional

Starting rather from the absolute continuity assumptions

γ0(dx) =W0imp(x)γ00(dx) and Rk(x, dx) =Wkimp(x, x)R0k(x, dx), (8)

which deﬁne implicitly, in view of the decompositions (4)

γ₀0(dx) =W₀red(x)p0(dx) and Rk0(x, dx) =Wkred(x, x)Pk(x, dx), (9)

for any k = 1,· · ·, n, consider the unnormalized and normalized Feynman–Kac distributions deﬁned on the product set E×[0,∞) by

γ_ne, F=

E· · ·

EF(xn, n

k=0

W_kimp(xk−1, xk))γ00(dx0)

n

k=1

R0_k(xk−1, dxk) and µen, F= γe

n, F γe

n,1 ,

respectively. In particular

γ_ke, φ⊗e0=

E· · ·

Eφ(xn)γ

0 0(dx0)

n

k=1

R0_k(x_k₋1, dxk) =γk0, φ,

wheree0(v)≡1 by deﬁnition, and using the absolute continuity assumptions (8)

γ_ke, φ⊗e =

E· · ·

E

φ(x_n)W₀imp(x₀)

n

k=1

W_kimp(x_k₋₁, x_k)γ₀0(dx₀)

n

k=1

R0_k(x_k₋₁, dx_k)

=

E· · ·

Eφ(xn)γ0(dx0) n

k=1

Rk(xk−1, dxk) =γk, φ,

wheree(v)≡vby deﬁnition. Associated with these two integral representations are the two recurence relations

γ0

k =γk0−1R0k andγk=γk−1Rk respectively, for anyk= 1,· · · , n. Finally

γ_ke, φ⊗e2 =

E· · ·

E

φ(x_n)|W₀imp(x₀)

n

k=1

W_kimp(x_k₋₁, x_k)|2γ₀0(dx₀)

n

k=1

R0_k(x_k₋₁, dx_k)

=

E· · ·

E

φ(x_n)γ₀(dx₀)

n

k=1

R_k(x_k₋₁, dx_k) =γ_k, φ,

where

γ₀(dx) =|W₀imp(x)|2γ₀0(dx) and R_k(x, dx) =|W_kimp(x, x)|2R_k0(x, dx),

for anyk= 1,· · ·, n. In particular forφ(x)≡1, it holds

γ_ke,1=γ_ke,1⊗e0=γk0,1 and γke,1⊗e=γk,1,

for the normalizing constants, and

µe_k, φ⊗e0=

γe

k, φ⊗e0 γe

k,1

= γ

0

k, φ γ0

k,1

(11)

µe_k, φ⊗e= γ

e

k, φ⊗e γe

k,1

= γk, φ

γ0

k,1

=γk,1

γ0

k,1

µk, φ, (11)

and

µe_k, φ⊗e2= γ

e

k, φ⊗e2 γe

k,1

= γ

k, φ γ0

k,1

= γ

k,1 γ0

k,1 µ_k, φ,

for the normalized Feynman–Kac distributions. In other words, the extended unnormalized Feynman–Kac distribution encodes all the diﬀerent Feynman–Kac distributions, normalized or unnormalized, for the reference and alternate models, and in particular

γ_ne, φ0⊗e0+φ⊗e=γn0, φ0+γn, φ.

It follows from the deﬁnition that

γ₀e(dx, dv) =γ₀0(dx)δ

W₀imp(x)(dv),

and introducing the extended nonnegative kernel

Re_k(x, v, dx, dv) =R0_k(x, dx)δ

v W_kimp(x, x)(dv ₎_,

deﬁned on the product setE×[0,∞), for anyk= 1,· · · , n, it is easily seen that

γ_ne, F =

E· · ·

EF(xn, n

k=0

W_kimp(x_k₋₁, x_k))γ₀0(dx₀)

n

k=1

R0_k(x_k₋₁, dx_k)

=

E· · ·

E

∞

0 · · ·

_∞

0

F(x_n, v_n)δ

W₀imp(x₀)(dv0)

n

k=1

δ

v_k₋₁W_kimp(x_k₋₁, x_k)(dvk)

γ₀0(dx₀)

n

k=1

R_k0(x_k₋₁, dx_k)

=

E

_∞

0 · · ·

E

_∞

0

F(xn, vn)γ₀e(dx0, dv0)

n

k=1

Re_k(xk−1, vk−1, dxk, dvk),

which justiﬁes the interpretation of this nonnegative measure as an unnormalized Feynman–Kac distribution. Associated with this integral representation is the recurrence relation γe

k = γke−1 Rek, for any k = 1,· · ·, n.

Furthermore, using the decompositions (9) yields

γ_ne, F =

E· · ·

EF(xn, n

k=0

W_kimp(x_k₋1, xk))γ00(dx0)

n

k=1

R0_k(x_k₋1, dxk)

=

E· · ·

EF(xn, n

k=0

W_kimp(x_k₋1, xk)) n

k=0

W_kred(x_k₋1, xk)p0(dx0)

n

k=1

P_k(x_k₋1, dxk)

= E[F(X_n, n

k=0

W_kimp(X_k₋₁, X_k))

n

k=0

W_kred(X_k₋₁, X_k) ],

where{X_k, k= 0,1,· · ·, n}is a Markov chain taking values in Eand characterized by

(12)

• and its transition probabilitiesP_k(x, dx), for anyk= 1,· · ·, n. Introducing

M₀=W₀imp(X₀) and M_k =W_kimp(X_k₋₁, X_k)M_k₋₁ ,

for anyk= 1,· · ·, n, deﬁnes a multiplicative functional{Mk, k= 0,1,· · ·, n}associated with the Markov chain

{Xk, k= 0,1,· · ·, n}, and yields yet another representation as

γ_ne, F=E[F(X_n, M_n)

n

k=0

W_kred(X_k₋1, Xk) ] with Mn = n

k=0

W_kimp(X_k₋1, Xk),

where jointly{(Xk, Mk), k= 0,1,· · ·, n}form another Markov chain taking values in the product setE×[0,∞), characterized by

• its initial probability distributionpe₀(dx, dv) =p0(dx)δ_Wimp 0 (x)

(dv),

• and its transition probabilitiesP_ke(x, v, dx, dv) =P_k(x, dx)δ

v W_kimp(x, x)(dv), for anyk= 1,· · ·, n. Finally, let We

0(x, v) =W0red(x) andWke(x, v, x, v) =Wkred(x, x) for anyk= 1,· · · , n.

5. Particle Approximation Using a Multiplicative Functional

Recall that the unnormalized distributions satisfy the recurrence relation

γ_ke=γ_ke₋₁Re_k=µ_ke₋₁Re_k γ_ke₋₁,1,

for anyk= 1,· · ·, n. Using the decompositions (9), notice that

γ₀e(dx, dv) =W₀red(x)p0(dx)δ_Wimp 0 (x)

(dv) =W₀e(x, v)pe₀(dx, dv),

and

Re_k(x, v, dx, dv) =W_kred(x, x)P_k(x, dx)δ

v W_kimp(x, x)(dv) =W

e

k(x, v, x, v)Pke(x, v, dx, dv),

and introducing a weighted particle approximation of the form

µe_k≈µe_k,N =

N

i=1

wi_kδ

(ξ_ki, vi_k) with

N

i=1

wi_k= 1,

yields

µe_k,N₋₁Re_k(dx, dv) =

N

i=1

W_kred(ξ_ki₋₁, x) w_ki₋₁P_k(ξ_ki₋₁, dx)δ

v_ki₋₁W_kimp(ξi_k₋₁, x)(dv)

mi_k(dx, dv)

,

which can be interpreted as the marginal nonnegative measure on the product setE×[0,∞) associated with a nonnegative measure deﬁned on the product set {1,· · ·, N} ×E ×[0,∞). Using the auxiliary variable approach [11], the resulting ASIR algorithm can be described by

γ₀e,N = 1

N N

i=1

W₀red(ξ₀i)δ

(ξ₀i, vi₀) and γ

e,N k =

1

N N

i=1

W_kred(ξ_kτk₋i₁, ξ_ki)δ

(ξ_ki, vi_k)γ

(13)

where ((ξ1

0, v01),· · ·,(ξ0N, v0N)) are i.i.d. random variables taking values in the product setE×[0,∞) and with

common probability distributionpe

0(dx, dv), or equivalently

ξ₀i ∼p₀(dx) and v₀i =W₀imp(ξ₀i),

independently for anyi= 1,· · ·, N, and where ((τ_k1, ξ1_k, v_k1),· · ·,(τ_kN, ξ_kN, v_kN)) are i.i.d. random variables taking values in the product set{1,· · ·, N} ×E×[0,∞) and with common probability distribution (m1_k(dx, dv),· · · , mN_k (dx, dv)), or equivalently

τ_ki∼(w1_k₋₁,· · · , wN_k₋₁) and ξi_k∼P_k(ξ_kτk₋i₁, dx) and v_ki =vτ_k₋ki₁W_kimp(ξ_kτ₋ki₁, ξ_ki),

independently for anyi= 1,· · · , N. In view of the relation (10), this results in the following approximations

µ0₀,N =

N

i=1

W₀red(ξ₀i)δ ξ₀i

N

j=1

W₀red(ξj₀) =

N

i=1

w₀i δ

ξ₀i and µ

0,N k =

N

i=1

W_kred(ξ_kτk₋i₁, ξ_ki)δ ξi_k

N

j=1

W_kred(ξτ j k k−1, ξ

j k) = N i=1

w_ki δ ξ_ki ,

for the normalized distributions under the reference model, which deﬁnes implicitly the resampling weights as

wi₀= W

red 0 (ξ0i)

N

j=1

W₀red(ξ₀j)

and w_ki = W

red

N

j=1

,

for anyi= 1,· · ·, N. Similarly, in view of the relation (11), this results in the following approximations

µN₀ =

N

i=1

W₀red(ξ₀i)vi₀δ ξ_ki

N

j=1

W₀red(ξ₀j)vj₀

=

N

i=1

ui₀wi₀δ

ξ_k0,i and µ N k =

N

i=1

W_kred(ξ_kτ₋ki₁, ξ_ki)v_ki δ ξi_k

N

j=1

W_kred(ξτ j k

k−1, ξkj)vjk

=

N

i=1

ui_k w_ki δ ξ_ki ,

for the normalized distributions under the alternate model, which deﬁnes implicitly the importance weights as

ui₀= W

imp 0 (ξ0i)

N

j=1

wi₀W₀imp(ξj₀)

and ui_k= v

τ_ki k−1W

imp

N

j=1

w_ki vτ j k k−1W

imp

k (ξτ j k k−1, ξkj)

,

for anyi= 1,· · ·, N.

Remark 5.1. It is clear that the particle positions, resampling weights and importance weights deﬁned here are the same as the particle positions, resampling weights and importance weights deﬁned in Sections 1 and 3.

For this algorithm, the following central limit theorem holds [5, Chapter 9], [2, Chapter 9].

Theorem 5.2.

√

N γ

e,N

n −γne, F γe

n,1

(14)

V_n(F) = var(W

e

0 Re1:nF, pe0) pe

0, W0eRe1:n12

+

n

k=1

var(We

k (Rek+1:nF)◦π, pek) pe

k, Wke(Rke+1:n1)◦π2 ,

wherepe_k=µe_k₋₁⊗P_ke for any k= 1,· · ·, n, and where

Re_k_+1:_nF(x, v) =R_ke₊₁· · ·Re_nF(x, v) =E[F(X_n, M_n)

n

p=k+1

W_pred(X_p₋1, Xp)|Xk=x, Mk=v],

for any k= 0,1,· · · , n, with Re

n+1:nF(x, v) =F(x, v)by convention. Remark 5.3 (CLT for normalizing constants). It follows from the identity

λ0

γ0,N

n −γn0,1 γ0

n,1 +λ

γ_nN −γn,1

γ_n,1 =

γe,N

n −γne, γe

n,1 , λ0+λ γ0

n,1

γ_n,1 (1⊗e),

and from Theorem 5.2 that

√

N [λ0

γ_n0,N−γ_n0,1

γ0

n,1 +λ

γ_nN−γn,1

γ_n,1 ] =⇒N(0, Vn(λ0+λ

γ_n0,1

γ_n,1(1⊗e) )),

Remark 5.4 (CLT for normalized distributions). It follows from the decomposition

µ0_n,N−µ0_n, φ0+µNn −µn, φ

=γ

e,N

n −γne γe

n,1 , γ

e

n,1

γne,N,1(φ0− µ

0

n, φ0)⊗e0+

γe

n,1⊗e γne,N,1⊗e

γ0

n,1

γ_n,1 (φ− µn, φ)⊗e,

from Theorem 5.2 and from the Slutsky lemma, that

√

N [µ0_n,N −µ0_n, φ0+µNn −µn, φ] =⇒N(0, Vn((φ0− µ0n, φ0)⊗e0+

γ_n0,1

γ_n,1(φ− µn, φ)⊗e)),

As a consequence, it is worth finding an explicit expression, in terms of Feynman–Kac distributions defined on the set E, of the asymptotic varianceVn(F) for functions of the form F =φ0⊗e0+φ⊗e defined on the

product set E×[0,∞). Let

R0_k_+1:_nφ₀(x) =R0_k₊₁· · ·R0_nφ₀(x) and R_k_+1:_nφ(x) =R_k₊₁· · ·R_nφ(x),

for anyk= 0,1,· · ·, n, withR0

n+1:nφ0(x) =φ0(x) andRn+1:nφ(x) =φ(x) by convention. Then

var(W₀eRe_1:_n(φ₀⊗e₀+γ

0

n,1

γn,1 φ⊗e), p

e 0)

pe

0, W0eRe1:n12

=var(W

red

0 R01:nφ0, p0) p₀, Wred

0 R01:n12

+cov(W

red

0 R01:nφ0, W0R1:nφ, p0) p₀, Wred

0 R01:n1 p0, W0R1:n1

+var(W0R1:nφ, p0)

p0, W0R1:n12

(15)

and

var(W_ke(R_ke_+1:_n(φ0⊗e0+

γ0

n,1

γ_n,1 φ⊗e))◦π, µ

e

k−1⊗Pke) µe

k−1⊗Pke, Wke(Rke+1:n1)◦π2

=var(W

red

k (R0k+1:nφ0)◦π, µ0_k₋1⊗Pk) µ0

k−1, R0k:n12

+ 2cov(W

red

k (R0k+1:nφ0)◦π, Wk(Rk+1:nφ)◦π, µk−1⊗Pk) µ0

k−1, Rk0:n1 µk−1, Rk:n1

+ 2µk−1, R

0

k:nφ0 − µ0k−1, R0k:nφ0 µ0

k−1, R0k:n1

+γ

k−1,1 γk0−1,1 γk−1,12

var(W_k(R_k+1:nφ)◦π, µk−1⊗Pk) µk−1, Rk:n12

+ [γ

k−1,1 γk0−1,1 γk−1,12

µ_k₋₁, R_k:nφ2 µk−1, Rk:n12 −

µ_k₋1, Rk:nφ2 µk−1, Rk:n12

],

for anyk= 1,· · ·, n.

Proposition 5.5. In particular for the normalizing constant

√

N γ N

n −γn,1

γn,1 =⇒N(0, Vn),

V_n = var(W0R1:nφ, p0)

p₀, W₀R_1:_n12 +

n

k=1

γ_k₋₁,1 γ_k0₋₁,1

γ_k₋₁,12

var(Wk (Rk+1:n1)◦π, µ_k₋1⊗Pk) µ_k₋₁, R_k_:_n12

+

n

k=1

[γ

k−1,1 γk0−1,1 γ_k₋₁,12

µ_k₋₁, R_k:n12 µ_k₋₁, R_k_:_n12 −1 ].

Remark 5.6. It is clear that the expression for the asymptotic variance given here is the same as the expression given in Proposition 3.5.

6. Conclusion

Even though the two diﬀerent approaches used here to obtain a central limit theorem for the proposed particle approximation do actually provide the same explicit expression for the asymptotic variance, they diﬀer in the following aspects.

In the ﬁrst approach, based on a representation in path–space, the importance weight functions appear only in the test function, and it is therefore easy to analyze the joint particle approximation of unnormalized distributions (and normalizing constants and normalized distributions, as a by–product) for the reference model and for several alternate models at the same time, just by choosing the appropriate test function in the central limit theorem. In view of (7), the idea is simply to use one test function per alternate model, i.e.

γa,N_n −γ_na, φ_a

γ_na,1 =

γ•_n,1

γa_n,1

γ_n•,N−γ_n•, T_naφ_a

(16)

hence

√

N

a∈A

γ_na,N−γa_n, φ_a

γ_na,1 =⇒N(0, Vn(

a∈A γ_n•,1

γ_na,1T

a nφa) ),

in distribution asN ↑ ∞, with a correlation structure reﬂected in the asymptotic covariance matrix, easily ob-tained by polarization. In other words, this ﬁrst approach seems appropriate to analyze particle approximations in statistical models depending on some parameter, in sensitivity analysis, etc.

In the second approach, based on a representation in terms of a multiplicative functional, the importance weight functions appear in the extended Markov model, and it would be necessary to change the Markov model to analyze the joint particle approximation of unnormalized distributions for the reference model and several alternate models at the same time. On the other hand, it would be easy with this Markov interpretation to analyze particle approximation with adaptive resampling schemes, where the decision to use resampling weights only vs. importance weights only is made dependent on an empirical criterion (eﬀective number of particles, entropy of the sample, etc.) evaluated using the current particle approximation. The idea would be simply, as a generalization of (3), to introduce a factorization depending on the current normalized or unnormalized distribution on the product setE×[0,∞) : this would result in a representation in terms of a McKean model, and the associated particle approximation could be easily analyzed [5].

References

[1] Olivier Capp´e, Randal Douc, ´Eric Moulines, and Christian P. Robert. On the convergence of the Monte Carlo maximum likelihood method for latent variable models.Scandinavian Journal of Statistics, 29(4):615–635, December 2002.

[2] Olivier Cappé, Éric Moulines, and Tobias Rydén.Inference in Hidden Markov Models. Springer Series in Statistics. Springer– Verlag, New York, 2005.

[3] Natacha Caylus, Arnaud Guyader, Fran¸cois Le Gland, and Nadia Oudjane. Application du filtrage particulaire à l’inférence statistique des HMM. InActes des 36èmes Journées de Statistique, Montpellier. SFdS, May 2004.

[4] Frédéric Cérou, Fran¸cois Le Gland, and Nigel J. Newton. Stochastic particle methods for linear tangent filtering equations. In José-Luis Menaldi, Edmundo Rofman, and Agnès Sulem, editors,Optimal Control and Partial Differential Equations. In honour of professor Alain Bensoussan’s 60th birthday, pages 231–240. IOS Press, Amsterdam, 2001.

[5] Pierre Del Moral.Feynman–Kac Formulae. Genealogical and Interacting Particle Systems with Applications. Probability and its Applications. Springer–Verlag, New York, 2004.

[6] Arnaud Doucet and Vladislav B. Tadi´c. Parameter estimation in general state–space models using particle methods.Annals of the Institute of Statistical Mathematics, 55(2):409–422, June 2003.

[7] Charles J. Geyer. On the convergence of Monte Carlo maximum likelihood calculations.Journal of the Royal Statistical Society, Series B, 56(1):261–274, 1994.

[8] Charles J. Geyer. Estimation and optimization of functions. In Walter R. Gilks, Sylvia Richardson, and David J. Spiegelhalter, editors,Markov Chain Monte Carlo in Practice, chapter 14, pages 241–258. Chapman & Hall, London, 1996.

[9] Arnaud Guyader, Fran¸cois Le Gland, and Nadia Oudjane. A particle implementation of the recursive MLE for partially observed diﬀusions. InProceedings of the 13th Symposium on System Identiﬁcation (SYSID), Rotterdam, pages 1305–1310. IFAC / IFORS, August 2003.

[10] Jun S. Liu.Monte Carlo Strategies in Scientiﬁc Computing. Springer Series in Statistics. Springer–Verlag, New York, 2001. [11] Michael K. Pitt and Neil Shephard. Filtering via simulations : auxiliary particle ﬁlter. Journal of the American Statistical