• No results found

Combined use of importance weights and resampling weights in sequential Monte Carlo methods

N/A
N/A
Protected

Academic year: 2020

Share "Combined use of importance weights and resampling weights in sequential Monte Carlo methods"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071912

COMBINED USE OF

IMPORTANCE WEIGHTS AND RESAMPLING WEIGHTS

IN SEQUENTIAL MONTE CARLO METHODS

∗,∗∗

Francois Le Gland

1

Abstract. A particle approximation of Feynman–Kac distributions is presented here, that combines SIS and SIR algorithms in the sense that only a fraction of the importance weights is used for re-sampling, and two different approaches are proposed to analyze its performance. The first approach is based on a representation in terms of path–space distributions, and could be used to analyze the joint particle approximation of distributions for a reference model and for several alternate models at the same time. The second approach, which is of independent interest and seems very promising, is based on a representation in terms of a multiplicative functional, and could be used to analyze particle approximation with adaptive resampling schemes.

(Updated version from 18 December 2007)

Introduction

Consider the unnormalized and normalized Feynman–Kac distributions defined on the setE by

γn, φ=E[φ(Xn)

n

k=0

gk(Xk) ] and µn, φ=γn, φ

γn,1 ,

respectively, where{Xk, k= 0,1,· · ·, n}is a Markov chain taking values inE and characterized by

its initial probability distributionη0(dx),

and its transition probabilitiesQk(x, dx), for anyk= 1,· · ·, n,

and wheregk(x) is a bounded nonnegative function for anyk= 0,1,· · ·, n, and more generally

γn, φ=

E· · ·

(xn)γ0(dx0) n

k=1

Rk(xk−1, dxk),

which includes the previous case, with

γ0(dx) =g0(x)η0(dx) and Rk(x, dx) =Qk(x, dx)gk(x),

This work was partially supported by CNRS, under theMathSTICprojectChaˆınes de Markov Cach´ees et Filtrage Particulaire,

and under theAS–STICprojectM´ethodes Particulaires(AS 67), and by Electricit´e de France R&D.

∗∗ This paper is dedicated to the memory of Natacha Caylus (1977–2006).

1 IRISA / INRIA, Campus de Beaulieu, 35042 RENNES C´edex, France

c

EDP Sciences, SMAI 2007

(2)

for any k = 1,· · ·, n. Associated with this probabilistic (or more generally, integral) representation is the recurrence relationγk =γk1Rk, for anyk= 1,· · ·, n. In full generality, it is always possible to decompose the nonnegative measure

γ0(dx) =W0(x)p0(dx), (1)

in terms of a nonnegative function and a normalized probability distribution, and to decompose the nonnegative kernel

Rk(x, dx) =Wk(x, x)Pk(x, dx), (2) in terms of a nonnegative function and a normalized Markov kernel, for anyk= 1,· · · , n.

Assuming a further factorization of the nonnegative functions

W0(x) =W0imp(x)W0red(x) and Wk(x, x) =Wkimp(x, x)Wkred(x, x), (3)

for anyk= 1,· · ·, n, the following decompositions hold

γ0(dx) =W0imp(x)W0red(x)p0(dx) and Rk(x, dx) =Wkimp(x, x)Wkred(x, x)Pk(x, dx), (4)

for any k = 1,· · · , n, and the main contribution of this paper is to exploit this decomposition to design and study particle approximations of the form

µNk =

N

i=1

uikwikδ

ξki with N

i=1

wki = 1 and

N

i=1

uikwik= 1,

which combine SIS and SIR algorithms [10, Section 3.4.4], with two species of nonnegative weights

the weights (w1

k,· · ·, wkN) are used for resampling, the importance weights (u1

k,· · ·, uNk) are used for weighting.

One possible motivation is the joint particle approximation of Feynman–Kac distributions associated with reference and alternate models, under the following absolute continuity assumptions

γ0(dx) =r0(x)γ00(dx) and Rk(x, dx) =rk(x, x)R0k(x, dx), (5)

for anyk= 1,· · ·, n. In this case indeed, assuming a decomposition of the reference nonnegative measure

γ00(dx) =W00(x)p00(dx),

in terms of a nonnegative function and a normalized probability distribution, and assuming a decomposition of the reference nonnegative kernel

R0k(x, dx) =Wk0(x, x)Pk0(x, dx),

in terms of a nonnegative function and a normalized Markov kernel, for anyk= 1,· · · , n, the following decom-positions hold

γ0(dx) =r0(x)W00(x)p00(dx) and Rk(x, dx) =rk(x, x)Wk0(x, x)Pk0(x, dx), (6)

for anyk= 1,· · ·, n, which clearly are of the form (4), and it is possible to design particle approximations of the form

µNk =

N

i=1

uikw0k,iδ

ξk0,i with N

i=1

wk0,i= 1 and

N

i=1

uikw0k,i= 1

where

(3)

the importance weights (u1

k,· · ·, uNk) depend on both the reference and alternate models.

Clearly, the two point of views are mathematically equivalent, upon suitable substitution of the different nonnegative measures and nonnegative functions, and to decide which point of view to adopt usually depends on the application.

The paper is organized as follows : An original particle approximation is presented in Section 1, that com-bines importance weights and resampling weights. To analyze its performance, and in particular to derive a central limit theorem as the numberN of particles goes to infinity, with an explicit expression for the asymptotic variance, two different approaches are proposed. A first representation is introduced in Section 2, in terms of path–space distributions and a path–particle approximation is presented in Section 3, with a central limit theo-rem stated in Theotheo-rem 3.2 and with an explicit expression for the asymptotic variance given in Proposition 3.5. A second representation is introduced in Section 4, in terms of the multiplicative functional of a Markov chain and an extended particle approximation is presented in Section 5, where importance weights are treated as particles, with a central limit theorem stated in Theorem 5.2 and with an explicit expression for the asymptotic variance given in Proposition 5.5. It is checked that all three particle approximations actually correspond to the same algorithm, and that the two expressions obtained for the asymptotic variance are actually equal. The respective merits of the two different approaches are discussed in the Conclusion.

The following abuse of notation W0(x, x) = W0(x), W0imp(x, x) =W0imp(x) andW0red(x, x) = W0red(x)

will be used throughout the paper.

1.

Particle Approximation with Combined Weighting and Resampling

Recall that the unnormalized distributions satisfy the following recurrence relation

γk =γk1Rk=µk−1Rkγk−1,1,

for anyk= 1,· · ·, n. Recall also the decomposition (1), and introducing a particle approximation of the form

µk≈µNk =

N

i=1

uikwikδ

ξik with N

i=1

wik= 1 and

N

i=1

uikwik= 1 ,

and using the decomposition (2), yields

µNk1Rk(dx) =

N

i=1

uik1Wk(ξki1, x)

wik(x)

wki1Pk(ξki, dx)

mik(dx)

,

which can be interpreted as the marginal nonnegative measure onE associated with a nonnegative measure on the product set{1,· · ·, N} ×E. Using the auxiliary variable approach [11], the resulting ASIR algorithm can be described by

γ0N = 1

N N

i=1

W0(ξ0i)δ

ξ0i and γkN =

1

N N

i=1

ukτki1Wk(ξkτki1, ξki)δ

ξki γkN−1,1,

where (ξ10,· · ·, ξ0N) are i.i.d. random variables taking values in E and with common probability distribu-tion p0(dx), and where ((τ1

k, ξ1k),· · ·,(τkN, ξkN)) are i.i.d. random variables taking values in the product set {1,· · ·, N} ×E and with common probability distribution (m1k(dx),· · ·, mNk(dx)), or equivalently

(4)

independently for anyi= 1,· · ·, N. Using the further factorization (3) results in the following approximations

µN0 =

N

i=1

W0imp(ξi0)W0red(ξ0i)δ ξ0i

N

j=1

W0imp(ξ0j)W0red(ξj0) =

N

i=1

ui0wi0δ ξ0i ,

and

µNk =

N

i=1

kki1Wkimp(ξτkki1, ξik)Wkred(ξkτki1, ξik)δ ξki

N

j=1

j k

k−1Wkimp(ξ τkj

k−1, ξkj)Wkred(ξτ j k k−1, ξkj)

=

N

i=1

uikwik δ ξik ,

for the normalized distributions, which defines implicitly the resampling weights

wi0= W

red 0 (ξ0i)

N

j=1

W0red(ξ0j)

and wki = W

red

k (ξτ i k k−1, ξki)

N

j=1

Wkred(ξτ j k k−1, ξkj)

,

and the importance weights

ui0= W

imp 0 (ξ0i)

N

j=1

wi0W0imp(ξ0j)

and uik = u

τki k−1W

imp

k (ξτ i k k−1, ξik)

N

j=1

wik j k

k−1Wkimp(ξ τkj k−1, ξjk)

,

for anyi= 1,· · ·, N.

Remark 1.1. If the substitution W0imp r0, Wred

0 W00 and p0 p00 is made, and if the substitution

Wkimp ←rk,Wred

k ←Wk0 and Pk ←Pk0 is made for any k= 1,· · · , n, with the notations of (6), then clearly

the particle positions (ξk1,· · · , ξkN) and the resampling weights (w1k,· · · , wNk ) depend on the reference model only, whereas the importance weights (u1

k,· · · , uNk) depend on both the reference and alternate models. If in

addition the derivatives in (5) are continuous or differentiable w.r.t. some parameter of the model, then the importance weights will automatically inherit the same property, as suggested in [8]. This idea has been used in Monte Carlo maximum likelihood estimation [1, 7] or in smooth particle approximation of Feynman–Kac distributions [3, 4, 6, 9]. Using a single particle system for the reference value, with different importance weights corresponding to different values, makes the resulting approximation regular, but also poorly accurate for values too far from the reference. In opposition, using a different particle system for each different value, would make the resulting approximation very irregular, but also uniformly accurate for all values.

(5)

2.

Representation in Path–Space

Using the decompositions (4) yields

γn, φ =

E· · ·

(xn)γ0(dx0) n

k=1

Rk(xk1, dxk)

=

E· · ·

(xn)W

imp

0 (x0)W0red(x0)p0(dx0)

n

k=1

Wkimp(xk1, xk)Wkred(xk1, xk)Pk(xk1, dxk)

= E[φ(Xn)

n

k=0

Wkimp(Xk1, Xk)

n

k=0

Wkred(Xk1, Xk) ],

where{Xk, k= 0,1,· · ·, n}is a Markov chain taking values in Eand characterized by

its initial probability distributionp0(dx),

and its transition probabilitiesPk(x, dx), for anyk= 1,· · ·, n.

Consider the unormalized and normalized Feynman–Kac distributions defined on the path–spaceE0:n = · · · ×E by

γn•, fn=E[fn(Xn)

n

k=0

g•k(Xk) ] and µ•n, fn=γ

n, fn γn•,1 ,

respectively, where{Xk•, k= 0,1,· · ·, n}is a path–space valued Markov chain defined byXk= (X0,· · ·, Xk) =

X0:k for anyk= 0,1,· · ·, nand characterized by

its initial probability distributionη•0(dx0) =p0(dx0),

and its transition probabilitiesQ•k(x0:k1, dx0:k) =δx0:k1(dx0:k1)Pk(xk1, dxk), for anyk= 1,· · ·, n, and where gk(x0:k) = Wred

k (xk−1, xk) for any k = 0,1,· · ·, n. Associated with this path–space probabilistic

representation is the recurrence relationγk=γk1Rk, with

Rk(x0:k1, dx0:k) =Q•k(x0:k1, dx0:k)g•k(x0:k) =δx0:k1(dx0:k1)Pk(xk1, dxk)Wkred(xk1, xk),

for anyk= 1,· · ·, n. In particular for any function of the form

fn(x0:n) =F(xn, n

k=0

Wkimp(xk1, xk)),

defined on path–space, it holds

γ•n, fn=E[F(Xn, n

k=0

Wkimp(Xk1, Xk))

n

k=0

Wkred(Xk1, Xk) ],

and for instance, for the function

Tnφ(x0:n) =φ(xn)

n

k=0

Wkimp(xk−1, xk) then clearly γn•, Tnφ=γn, φ, (7)

or in other wordsγn=γn Tnin terms of a transformed distribution, and for the function

Tnφ(x0:n) =φ0(xn)| n

k=0

(6)

with the unnormalized and normalized Feynman–Kac distributions defined on the setE by

γn, φ=E[φ(Xn)|

n

k=0

Wkimp(Xk−1, Xk)|2

n

k=0

Wkred(Xk−1, Xk) ] and µn, φ= γn, φ

γn,1 ,

respectively.

3.

Particle Approximation in Path–Space

Recall that the unnormalized distributions satisfy the recurrence relation

γk=γ•k1R•k =g•k(µ•k1Q•k)γk1,1,

for anyk= 1,· · ·, n. Introducing a weighted particle approximation of the form

µ•k≈µ•k,N =

N

i=1

wikδ

ξk•,i with N

i=1

wki = 1,

where ξk•,i = (ξ0i,k,· · ·, ξk,ki ) is a path–space particle with terminal position ξki = ξk,ki for any i = 1,· · · , N, yields

µ•k,N1 Q•k(dx0:k) =

N

i=1

wik1δ

ξ•k,i1(dx0:k−1)Pk(xk−1, dxk).

The corresponding SIR algorithm can be described by

γ0•,N = 1

N N

i=1

g•0(ξ0•,i)δ

ξ0•,i and γ

•,N k =

1

N N

i=1

gk(ξk•,i)δ ξ•k,i γ

•,N k−1,1,

where (ξ0•,1,· · ·, ξ0•,N) are i.i.d. random variables on the set E with common probability distribution p0(dx0), and whereg•0(ξ0•,i) =Wred

0 (ξ0i,0) for anyi= 1,· · · , N, and where (ξ•k,1,· · ·, ξk•,N) are i.i.d. random variables on

the path–spaceE0:k =E× · · · ×E with common probability distributionµk•,N1Q•k(dx0:k), or equivalently

τki (w1k1,· · · , wNk1) and (ξi0,k,· · · , ξki1,k) = (ξ0τ,kki1,· · ·, ξkτki1,k1) and ξik,k∼Pk(ξki1,k, dxk),

independently for any i = 1,· · ·, N, and where gk(ξk•,i) = Wred

k (ξki−1,k, ξk,ki ) = Wkred(ξ

τki

k−1,k−1, ξik,k) for any

i= 1,· · · , N. This results in the following approximations

µ•0,N=

N

i=1

W0red(ξ0i)δ ξ0•,i N

j=1

W0red(ξ0j) =

N

i=1

w0i δ

ξ0•,i and µ

•,N k =

N

i=1

Wkred(ξτ i k

k−1, ξki)δξ•,i

k N

j=1

Wkred(ξτ j k k−1, ξ

j k)

=

N

i=1

(7)

for the normalized distributions, in terms of the terminal positions of the path–space particles, which defines implicitly the resampling weights as

wi0= W

red 0 (ξ0i)

N

j=1

W0red(ξ0j)

and wki = W

red

k (ξτ i k k−1, ξki)

N

j=1

Wkred(ξτ j k k−1, ξkj)

,

for anyi= 1,· · ·, N. In particular for the function

Tkφ(x0:k) =φ(xk)

k

p=0

Wpimp(xp−1, xp),

defined on path–space and already introduced in (7), it holds

Tkφ(ξk•,i) =φ(ξk,ki )

k

p=0

Wpimp(ξpi1,k, ξp,ki ),

hence

γ0•,N, T0φ= 1

N N

i=1

g0(ξ•0,i)T0φ(ξ0•,i) = 1

N N

i=1

W0red(ξ0i)v0i φ(ξ0i),

where

v0i =W0imp(ξ0i),

for anyi= 1,· · ·, N, and

γk•,N, Tkφ= 1

N N

i=1

gk(ξk•,i)Tkφ(ξk•,i)γk•,N1,1= 1

N N

i=1

Wkred(ξkτki1, ξik)vki φ(ξki)γk,N1,1,

where

vik =

k

p=0

Wpimp(ξpi1,k, ξp,ki )

= Wkimp(ξik1,k, ξik,k)

k1

p=0

Wpimp(ξpi1,k, ξp,ki )

= Wkimp(ξτ i k

k−1,k−1, ξk,ki )

k1

p=0

Wpimp(ξτ i k p−1,k−1, ξ

τki p,k−1)

= Wkimp(ξτ i k k−1, ξik)v

τki k−1 ,

for any i= 1,· · ·, N. Notice the underlying recursive structure, in the form of a multiplicative functional of a Markov chain up to resampling. In view of the interpretationγn=γ•nTnin terms of a transformed distribution, this results in the following approximations

γ0N = 1

N N

i=1

W0red(ξi0)vi0δ

ξi0 and γkN =

1

N N

i=1

Wkred(ξkτki1, ξki)vki δ

(8)

for the unnormalized distributions, hence

µN0 =

N

i=1

W0red(ξ0i)v0i δξi

0

N

j=1

W0red(ξj0)vj0

=

N

i=1

ui0wi0δξi

0 and µ

N k =

N

i=1

Wkred(ξτ i k

k−1, ξki)vikδξi

k N

j=1

Wkred(ξτ j k

k−1, ξjk)vkj

=

N

i=1

uikwikδξi k ,

for the normalized distributions, which defines implicitly the importance weights as

uik = v

i k N

j=1

vkj wkj

with vi0=W0imp(ξ0i) and vki =Wkimp(ξτ i k k−1, ξki)v

τki k−1 ,

for anyi= 1,· · ·, N.

Remark 3.1. It is clear that the particle positions (defined as the terminal positions of the path–space particles) and resampling weights defined here are the same as the particle positions and resampling weights defined in Section 1, and it is easy to check by induction that the importance weights defined here are the same as the importance weights defined in Section 1.

For this algorithm, the following central limit theorem holds [5, Chapter 9], [2, Chapter 9].

Theorem 3.2.

N γ

•,N

n −γn•, fn

γn•,1 =N(0, V

n(fn)),

in distribution as N↑ ∞, with asymptotic variance

Vn(fn) = var(g

0 R•1:nfn, η•0)

η•0, g0•R•1:n12 +

n

k=1

var(gk•R•k+1:nfn, ηk)

η•k, g•kR•k:n12 ,

whereη•k=µ•k1Q•k for anyk= 1,· · ·, n, and where

R•k+1:nfn(x0:k) =R•k+1· · ·Rn•fn(x0:k) =E[fn(Xn•) n

p=k+1

g•p(Xp)|Xk=x0:k] ,

for any k= 0,1,· · · , n, with R•n+1:nfn(x0:n) =fn(x0:n)by convention. Remark 3.3 (CLT for normalizing constants). It follows from the identity

γnN−γn,1

γn,1 =

γn•,1

γn,1

γn•,N −γn•, Tn1

γn•,1 ,

and from Theorem 3.2 that

N γ N

n −γn,1

γn,1 =N(0, Vn(

γn•,1

γn,1Tn1) ), in distribution asN ↑ ∞.

Remark 3.4 (CLT for normalized distributions). It follows from the decomposition

µNn −µn, φ=γ

n,1 γn,1

γn•, Tn1

γn•,N, Tn1

γn•,N−γ•n

(9)

from Theorem 3.2 and from the Slutsky lemma, that

N µNn −µn, φ=N(0, Vn(γ

n,1

γn,1 Tn(φ− µn, φ)) ),

in distribution asN ↑ ∞.

As a consequence, it is worth finding an explicit expression, in terms of Feynman–Kac distributions defined on the setE, of the asymptotic varianceVn(fn) for functions of the formfn =Tnφdefined on the path–space

E0:n=E× · · · ×E. Let

Rk+1:nφ(x) =Rk+1· · ·Rnφ(x) =E[φ(Xn)

n

p=k+1

Wp(Xp1, Xp)|Xk=x],

for anyk= 0,1,· · ·, n, withRn+1:(x) =φ(x) by convention. Then

var(g0•R1:nTn(γ

n,1 γn,1φ), η

0)

η•0, g0•R•1:n12 =

var(W0R1:nφ, p0) p0, W0R1:n12 ,

and

var(gk•R•k+1:nTn(γ

n,1 γn,1 φ), η

k) ηk•, g•kRk+1:n12 =

γk1,1 γk1,1

γk1,12

var(Wk (Rk+1:)◦π, µk−1⊗Pk) µk1, Rk:n12

+ [γ

k−1,1 γ•k−1,1 γk1,12

µk1, Rk:2 µk1, Rk:n12

µk1, Rk:2 µk1, Rk:n12 ],

for anyk= 1,· · ·, n.

Proposition 3.5. In particular for the normalizing constant

N γ N

n −γn,1

γn,1 =N(0, Vn),

in distribution as N↑ ∞, with asymptotic variance

Vn = var(W0R1:nφ, p0)

p0, W0R1:n12

+

n

k=1

γk1,1 γk1,1

γk−1,12

var(Wk (Rk+1:n1)◦π, µk−1⊗Pk) µk−1, Rk:n12

+

n

k=1

[γ

k−1,1 γk•−1,1 γk−1,12

µk1, Rk:n12 µk−1, Rk:n12

1 ].

Remark 3.6. In the extreme case where only resampling weights are used, i.e. if Wkred=Wk andWkimp 1 for anyk= 0,1,· · ·, n, then the last sum cancels out, and

Vn =var(W0R1:nφ, p0)

p0, W0R1:n12 +

n

k=1

var(Wk (Rk+1:n1)◦π, µk−1⊗Pk) µk1, Rk:n12 ,

(10)

4.

Representation in Terms of a Multiplicative Functional

Starting rather from the absolute continuity assumptions

γ0(dx) =W0imp(x)γ00(dx) and Rk(x, dx) =Wkimp(x, x)R0k(x, dx), (8)

which define implicitly, in view of the decompositions (4)

γ00(dx) =W0red(x)p0(dx) and Rk0(x, dx) =Wkred(x, x)Pk(x, dx), (9)

for any k = 1,· · ·, n, consider the unnormalized and normalized Feynman–Kac distributions defined on the product set [0,∞) by

γne, F=

E· · ·

EF(xn, n

k=0

Wkimp(xk−1, xk))γ00(dx0)

n

k=1

R0k(xk−1, dxk) and µen, F= γe

n, F γe

n,1 ,

respectively. In particular

γke, φ⊗e0=

E· · ·

(xn)γ

0 0(dx0)

n

k=1

R0k(xk1, dxk) =γk0, φ,

wheree0(v)1 by definition, and using the absolute continuity assumptions (8)

γke, φ⊗e =

E· · ·

E

φ(xn)W0imp(x0)

n

k=1

Wkimp(xk1, xk)γ00(dx0)

n

k=1

R0k(xk1, dxk)

=

E· · ·

(xn)γ0(dx0) n

k=1

Rk(xk−1, dxk) =γk, φ,

wheree(v)≡vby definition. Associated with these two integral representations are the two recurence relations

γ0

k =γk01R0k andγk=γk−1Rk respectively, for anyk= 1,· · · , n. Finally

γke, φ⊗e2 =

E· · ·

E

φ(xn)|W0imp(x0)

n

k=1

Wkimp(xk1, xk)|2γ00(dx0)

n

k=1

R0k(xk1, dxk)

=

E· · ·

E

φ(xn)γ0(dx0)

n

k=1

Rk(xk1, dxk) =γk, φ,

where

γ0(dx) =|W0imp(x)|2γ00(dx) and Rk(x, dx) =|Wkimp(x, x)|2Rk0(x, dx),

for anyk= 1,· · ·, n. In particular forφ(x)1, it holds

γke,1=γke,1⊗e0=γk0,1 and γke,1⊗e=γk,1,

for the normalizing constants, and

µek, φ⊗e0=

γe

k, φ⊗e0 γe

k,1

= γ

0

k, φ γ0

k,1

(11)

µek, φ⊗e= γ

e

k, φ⊗e γe

k,1

= γk, φ

γ0

k,1

=γk,1

γ0

k,1

µk, φ, (11)

and

µek, φ⊗e2= γ

e

k, φ⊗e2 γe

k,1

= γ

k, φ γ0

k,1

= γ

k,1 γ0

k,1 µk, φ,

for the normalized Feynman–Kac distributions. In other words, the extended unnormalized Feynman–Kac distribution encodes all the different Feynman–Kac distributions, normalized or unnormalized, for the reference and alternate models, and in particular

γne, φ0⊗e0+φ⊗e=γn0, φ0+γn, φ.

It follows from the definition that

γ0e(dx, dv) =γ00(dx)δ

W0imp(x)(dv),

and introducing the extended nonnegative kernel

Rek(x, v, dx, dv) =R0k(x, dx)δ

v Wkimp(x, x)(dv ),

defined on the product set[0,∞), for anyk= 1,· · · , n, it is easily seen that

γne, F =

E· · ·

EF(xn, n

k=0

Wkimp(xk1, xk))γ00(dx0)

n

k=1

R0k(xk1, dxk)

=

E· · ·

E

0 · · ·

0

F(xn, vn)δ

W0imp(x0)(dv0)

n

k=1

δ

vk1Wkimp(xk1, xk)(dvk)

γ00(dx0)

n

k=1

Rk0(xk1, dxk)

=

E

0 · · ·

E

0

F(xn, vn)γ0e(dx0, dv0)

n

k=1

Rek(xk−1, vk−1, dxk, dvk),

which justifies the interpretation of this nonnegative measure as an unnormalized Feynman–Kac distribution. Associated with this integral representation is the recurrence relation γe

k = γke1 Rek, for any k = 1,· · ·, n.

Furthermore, using the decompositions (9) yields

γne, F =

E· · ·

EF(xn, n

k=0

Wkimp(xk1, xk))γ00(dx0)

n

k=1

R0k(xk1, dxk)

=

E· · ·

EF(xn, n

k=0

Wkimp(xk1, xk)) n

k=0

Wkred(xk1, xk)p0(dx0)

n

k=1

Pk(xk1, dxk)

= E[F(Xn, n

k=0

Wkimp(Xk1, Xk))

n

k=0

Wkred(Xk1, Xk) ],

where{Xk, k= 0,1,· · ·, n}is a Markov chain taking values in Eand characterized by

(12)

and its transition probabilitiesPk(x, dx), for anyk= 1,· · ·, n. Introducing

M0=W0imp(X0) and Mk =Wkimp(Xk1, Xk)Mk1 ,

for anyk= 1,· · ·, n, defines a multiplicative functional{Mk, k= 0,1,· · ·, n}associated with the Markov chain

{Xk, k= 0,1,· · ·, n}, and yields yet another representation as

γne, F=E[F(Xn, Mn)

n

k=0

Wkred(Xk1, Xk) ] with Mn = n

k=0

Wkimp(Xk1, Xk),

where jointly{(Xk, Mk), k= 0,1,· · ·, n}form another Markov chain taking values in the product set[0,∞), characterized by

its initial probability distributionpe0(dx, dv) =p0(dx)δWimp 0 (x)

(dv),

and its transition probabilitiesPke(x, v, dx, dv) =Pk(x, dx)δ

v Wkimp(x, x)(dv), for anyk= 1,· · ·, n. Finally, let We

0(x, v) =W0red(x) andWke(x, v, x, v) =Wkred(x, x) for anyk= 1,· · · , n.

5.

Particle Approximation Using a Multiplicative Functional

Recall that the unnormalized distributions satisfy the recurrence relation

γke=γke1Rek=µke1Rek γke1,1,

for anyk= 1,· · ·, n. Using the decompositions (9), notice that

γ0e(dx, dv) =W0red(x)p0(dx)δWimp 0 (x)

(dv) =W0e(x, v)pe0(dx, dv),

and

Rek(x, v, dx, dv) =Wkred(x, x)Pk(x, dx)δ

v Wkimp(x, x)(dv) =W

e

k(x, v, x, v)Pke(x, v, dx, dv),

and introducing a weighted particle approximation of the form

µek≈µek,N =

N

i=1

wikδ

(ξki, vik) with

N

i=1

wik= 1,

yields

µek,N1Rek(dx, dv) =

N

i=1

Wkred(ξki1, x) wki1Pk(ξki1, dx)δ

vki1Wkimp(ξik1, x)(dv)

mik(dx, dv)

,

which can be interpreted as the marginal nonnegative measure on the product set[0,∞) associated with a nonnegative measure defined on the product set {1,· · ·, N} ×E ×[0,∞). Using the auxiliary variable approach [11], the resulting ASIR algorithm can be described by

γ0e,N = 1

N N

i=1

W0red(ξ0i)δ

(ξ0i, vi0) and γ

e,N k =

1

N N

i=1

Wkred(ξkτki1, ξki)δ

(ξki, vik)γ

(13)

where ((ξ1

0, v01),· · ·,(ξ0N, v0N)) are i.i.d. random variables taking values in the product set[0,∞) and with

common probability distributionpe

0(dx, dv), or equivalently

ξ0i ∼p0(dx) and v0i =W0imp(ξ0i),

independently for anyi= 1,· · ·, N, and where ((τk1, ξ1k, vk1),· · ·,(τkN, ξkN, vkN)) are i.i.d. random variables taking values in the product set{1,· · ·, N} ×E×[0,∞) and with common probability distribution (m1k(dx, dv),· · · , mNk (dx, dv)), or equivalently

τki∼(w1k1,· · · , wNk1) and ξik∼Pk(ξkτki1, dx) and vki =kki1Wkimp(ξkτki1, ξki),

independently for anyi= 1,· · · , N. In view of the relation (10), this results in the following approximations

µ00,N =

N

i=1

W0red(ξ0i)δ ξ0i

N

j=1

W0red(ξj0) =

N

i=1

w0i δ

ξ0i and µ

0,N k =

N

i=1

Wkred(ξkτki1, ξki)δ ξik

N

j=1

Wkred(ξτ j k k−1, ξ

j k) = N i=1

wki δ ξki ,

for the normalized distributions under the reference model, which defines implicitly the resampling weights as

wi0= W

red 0 (ξ0i)

N

j=1

W0red(ξ0j)

and wki = W

red

k (ξτ i k k−1, ξki)

N

j=1

Wkred(ξτ j k k−1, ξkj)

,

for anyi= 1,· · ·, N. Similarly, in view of the relation (11), this results in the following approximations

µN0 =

N

i=1

W0red(ξ0i)vi0δ ξki

N

j=1

W0red(ξ0j)vj0

=

N

i=1

ui0wi0δ

ξk0,i and µ N k =

N

i=1

Wkred(ξkτki1, ξki)vki δ ξik

N

j=1

Wkred(ξτ j k

k−1, ξkj)vjk

=

N

i=1

uik wki δ ξki ,

for the normalized distributions under the alternate model, which defines implicitly the importance weights as

ui0= W

imp 0 (ξ0i)

N

j=1

wi0W0imp(ξj0)

and uik= v

τki k−1W

imp

k (ξτ i k k−1, ξki)

N

j=1

wki j k k−1W

imp

k (ξτ j k k−1, ξkj)

,

for anyi= 1,· · ·, N.

Remark 5.1. It is clear that the particle positions, resampling weights and importance weights defined here are the same as the particle positions, resampling weights and importance weights defined in Sections 1 and 3.

For this algorithm, the following central limit theorem holds [5, Chapter 9], [2, Chapter 9].

Theorem 5.2.

N γ

e,N

n −γne, F γe

n,1

(14)

in distribution as N↑ ∞, with asymptotic variance

Vn(F) = var(W

e

0 Re1:nF, pe0) pe

0, W0eRe1:n12

+

n

k=1

var(We

k (Rek+1:nF)◦π, pek) pe

k, Wke(Rke+1:n1)◦π2 ,

wherepek=µek1⊗Pke for any k= 1,· · ·, n, and where

Rek+1:nF(x, v) =Rke+1· · ·RenF(x, v) =E[F(Xn, Mn)

n

p=k+1

Wpred(Xp1, Xp)|Xk=x, Mk=v],

for any k= 0,1,· · · , n, with Re

n+1:nF(x, v) =F(x, v)by convention. Remark 5.3 (CLT for normalizing constants). It follows from the identity

λ0

γ0,N

n −γn0,1 γ0

n,1 +λ

γnN −γn,1

γn,1 =

γe,N

n −γne, γe

n,1 , λ0+λ γ0

n,1

γn,1 (1⊗e),

and from Theorem 5.2 that

N [λ0

γn0,N−γn0,1

γ0

n,1 +λ

γnN−γn,1

γn,1 ] =N(0, Vn(λ0+λ

γn0,1

γn,1(1⊗e) )),

in distribution asN ↑ ∞.

Remark 5.4 (CLT for normalized distributions). It follows from the decomposition

µ0n,N−µ0n, φ0+µNn −µn, φ

=γ

e,N

n −γne γe

n,1 , γ

e

n,1

γne,N,1(φ0− µ

0

n, φ0)⊗e0+

γe

n,1⊗e γne,N,1⊗e

γ0

n,1

γn,1 (φ− µn, φ)⊗e,

from Theorem 5.2 and from the Slutsky lemma, that

N [µ0n,N −µ0n, φ0+µNn −µn, φ] =N(0, Vn((φ0− µ0n, φ0)⊗e0+

γn0,1

γn,1(φ− µn, φ)⊗e)),

in distribution asN ↑ ∞.

As a consequence, it is worth finding an explicit expression, in terms of Feynman–Kac distributions defined on the set E, of the asymptotic varianceVn(F) for functions of the form F =φ0⊗e0+φ⊗e defined on the

product set [0,∞). Let

R0k+1:nφ0(x) =R0k+1· · ·R0nφ0(x) and Rk+1:nφ(x) =Rk+1· · ·Rnφ(x),

for anyk= 0,1,· · ·, n, withR0

n+1:0(x) =φ0(x) andRn+1:(x) =φ(x) by convention. Then

var(W0eRe1:n(φ0⊗e0+γ

0

n,1

γn,1 φ⊗e), p

e 0)

pe

0, W0eRe1:n12

=var(W

red

0 R01:0, p0) p0, Wred

0 R01:n12

+cov(W

red

0 R01:0, W0R1:nφ, p0) p0, Wred

0 R01:n1 p0, W0R1:n1

+var(W0R1:nφ, p0)

p0, W0R1:n12

(15)

and

var(Wke(Rke+1:n(φ0⊗e0+

γ0

n,1

γn,1 φ⊗e))◦π, µ

e

k−1⊗Pke) µe

k−1⊗Pke, Wke(Rke+1:n1)◦π2

=var(W

red

k (R0k+1:0)◦π, µ0k1⊗Pk) µ0

k−1, R0k:n12

+ 2cov(W

red

k (R0k+1:0)◦π, Wk(Rk+1:)◦π, µk−1⊗Pk) µ0

k−1, Rk0:n1 µk−1, Rk:n1

+ 2µk−1, R

0

k:0 − µ0k−1, R0k:0 µ0

k−1, R0k:n1

+γ

k−1,1 γk01,1 γk−1,12

var(Wk(Rk+1:)◦π, µk−1⊗Pk) µk−1, Rk:n12

+ [γ

k−1,1 γk01,1 γk−1,12

µk1, Rk:2 µk−1, Rk:n12

µk1, Rk:2 µk−1, Rk:n12

],

for anyk= 1,· · ·, n.

Proposition 5.5. In particular for the normalizing constant

N γ N

n −γn,1

γn,1 =N(0, Vn),

in distribution as N↑ ∞, with asymptotic variance

Vn = var(W0R1:nφ, p0)

p0, W0R1:n12 +

n

k=1

γk1,1 γk01,1

γk1,12

var(Wk (Rk+1:n1)◦π, µk1⊗Pk) µk1, Rk:n12

+

n

k=1

[γ

k−1,1 γk01,1 γk1,12

µk1, Rk:n12 µk1, Rk:n12 1 ].

Remark 5.6. It is clear that the expression for the asymptotic variance given here is the same as the expression given in Proposition 3.5.

6.

Conclusion

Even though the two different approaches used here to obtain a central limit theorem for the proposed particle approximation do actually provide the same explicit expression for the asymptotic variance, they differ in the following aspects.

In the first approach, based on a representation in path–space, the importance weight functions appear only in the test function, and it is therefore easy to analyze the joint particle approximation of unnormalized distributions (and normalizing constants and normalized distributions, as a by–product) for the reference model and for several alternate models at the same time, just by choosing the appropriate test function in the central limit theorem. In view of (7), the idea is simply to use one test function per alternate model, i.e.

γa,Nn −γna, φa

γna,1 =

γ•n,1

γan,1

γn•,N−γn•, Tna

(16)

hence

N

a∈A

γna,N−γan, φa

γna,1 =N(0, Vn(

a∈A γn•,1

γna,1T

a nφa) ),

in distribution asN ↑ ∞, with a correlation structure reflected in the asymptotic covariance matrix, easily ob-tained by polarization. In other words, this first approach seems appropriate to analyze particle approximations in statistical models depending on some parameter, in sensitivity analysis, etc.

In the second approach, based on a representation in terms of a multiplicative functional, the importance weight functions appear in the extended Markov model, and it would be necessary to change the Markov model to analyze the joint particle approximation of unnormalized distributions for the reference model and several alternate models at the same time. On the other hand, it would be easy with this Markov interpretation to analyze particle approximation with adaptive resampling schemes, where the decision to use resampling weights only vs. importance weights only is made dependent on an empirical criterion (effective number of particles, entropy of the sample, etc.) evaluated using the current particle approximation. The idea would be simply, as a generalization of (3), to introduce a factorization depending on the current normalized or unnormalized distribution on the product set[0,∞) : this would result in a representation in terms of a McKean model, and the associated particle approximation could be easily analyzed [5].

References

[1] Olivier Capp´e, Randal Douc, ´Eric Moulines, and Christian P. Robert. On the convergence of the Monte Carlo maximum likelihood method for latent variable models.Scandinavian Journal of Statistics, 29(4):615–635, December 2002.

[2] Olivier Capp´e, ´Eric Moulines, and Tobias Ryd´en.Inference in Hidden Markov Models. Springer Series in Statistics. Springer– Verlag, New York, 2005.

[3] Natacha Caylus, Arnaud Guyader, Fran¸cois Le Gland, and Nadia Oudjane. Application du filtrage particulaire `a l’inf´erence statistique des HMM. InActes des 36`emes Journ´ees de Statistique, Montpellier. SFdS, May 2004.

[4] Fr´ed´eric C´erou, Fran¸cois Le Gland, and Nigel J. Newton. Stochastic particle methods for linear tangent filtering equations. In Jos´e-Luis Menaldi, Edmundo Rofman, and Agn`es Sulem, editors,Optimal Control and Partial Differential Equations. In honour of professor Alain Bensoussan’s 60th birthday, pages 231–240. IOS Press, Amsterdam, 2001.

[5] Pierre Del Moral.Feynman–Kac Formulae. Genealogical and Interacting Particle Systems with Applications. Probability and its Applications. Springer–Verlag, New York, 2004.

[6] Arnaud Doucet and Vladislav B. Tadi´c. Parameter estimation in general state–space models using particle methods.Annals of the Institute of Statistical Mathematics, 55(2):409–422, June 2003.

[7] Charles J. Geyer. On the convergence of Monte Carlo maximum likelihood calculations.Journal of the Royal Statistical Society, Series B, 56(1):261–274, 1994.

[8] Charles J. Geyer. Estimation and optimization of functions. In Walter R. Gilks, Sylvia Richardson, and David J. Spiegelhalter, editors,Markov Chain Monte Carlo in Practice, chapter 14, pages 241–258. Chapman & Hall, London, 1996.

[9] Arnaud Guyader, Fran¸cois Le Gland, and Nadia Oudjane. A particle implementation of the recursive MLE for partially observed diffusions. InProceedings of the 13th Symposium on System Identification (SYSID), Rotterdam, pages 1305–1310. IFAC / IFORS, August 2003.

[10] Jun S. Liu.Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer–Verlag, New York, 2001. [11] Michael K. Pitt and Neil Shephard. Filtering via simulations : auxiliary particle filter. Journal of the American Statistical

References

Related documents

of Defense or the Veterans Administration that the veteran is totally and permanently disabled or an identification card issued by the Division of Veterans Affairs; spouses shall

Capital Metrics Leadership & Organizational Capacity Government Policy/Regulation/ Tax Code Increased economic return and social/ environmental impacts far beyond the potential

For program areas, worker groups or essential services for which competency sets do not yet exist, it is important that training and education is developed in a way that at least

The main implementing regulations addressing environmental and occupational health and safety issues in Saudi Arabia are the Labour Law and the General

The group consists of lawyers from the United States and Europe, as well as colleagues from MWE China Law Offices who focus on class action defense, Chinese litigation,

For each aircraft, you may specify the aircraft type, its en- gines, units and weights used and aircraft specific parameters like fuel burn adjustments, default speed

common causes of bilateral leg edema are idiopathic edema (in young women) and chronic venous insufficiency (in.