Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071912
COMBINED USE OF
IMPORTANCE WEIGHTS AND RESAMPLING WEIGHTS
IN SEQUENTIAL MONTE CARLO METHODS
∗,∗∗Francois Le Gland
1Abstract. A particle approximation of Feynman–Kac distributions is presented here, that combines SIS and SIR algorithms in the sense that only a fraction of the importance weights is used for re-sampling, and two different approaches are proposed to analyze its performance. The first approach is based on a representation in terms of path–space distributions, and could be used to analyze the joint particle approximation of distributions for a reference model and for several alternate models at the same time. The second approach, which is of independent interest and seems very promising, is based on a representation in terms of a multiplicative functional, and could be used to analyze particle approximation with adaptive resampling schemes.
(Updated version from 18 December 2007)
Introduction
Consider the unnormalized and normalized Feynman–Kac distributions defined on the setE by
γn, φ=E[φ(Xn)
n
k=0
gk(Xk) ] and µn, φ=γn, φ
γn,1 ,
respectively, where{Xk, k= 0,1,· · ·, n}is a Markov chain taking values inE and characterized by
• its initial probability distributionη0(dx),
• and its transition probabilitiesQk(x, dx), for anyk= 1,· · ·, n,
and wheregk(x) is a bounded nonnegative function for anyk= 0,1,· · ·, n, and more generally
γn, φ=
E· · ·
Eφ(xn)γ0(dx0) n
k=1
Rk(xk−1, dxk),
which includes the previous case, with
γ0(dx) =g0(x)η0(dx) and Rk(x, dx) =Qk(x, dx)gk(x),
∗This work was partially supported by CNRS, under theMathSTICprojectChaˆınes de Markov Cach´ees et Filtrage Particulaire,
and under theAS–STICprojectM´ethodes Particulaires(AS 67), and by Electricit´e de France R&D.
∗∗ This paper is dedicated to the memory of Natacha Caylus (1977–2006).
1 IRISA / INRIA, Campus de Beaulieu, 35042 RENNES C´edex, France
c
EDP Sciences, SMAI 2007
for any k = 1,· · ·, n. Associated with this probabilistic (or more generally, integral) representation is the recurrence relationγk =γk−1Rk, for anyk= 1,· · ·, n. In full generality, it is always possible to decompose the nonnegative measure
γ0(dx) =W0(x)p0(dx), (1)
in terms of a nonnegative function and a normalized probability distribution, and to decompose the nonnegative kernel
Rk(x, dx) =Wk(x, x)Pk(x, dx), (2) in terms of a nonnegative function and a normalized Markov kernel, for anyk= 1,· · · , n.
Assuming a further factorization of the nonnegative functions
W0(x) =W0imp(x)W0red(x) and Wk(x, x) =Wkimp(x, x)Wkred(x, x), (3)
for anyk= 1,· · ·, n, the following decompositions hold
γ0(dx) =W0imp(x)W0red(x)p0(dx) and Rk(x, dx) =Wkimp(x, x)Wkred(x, x)Pk(x, dx), (4)
for any k = 1,· · · , n, and the main contribution of this paper is to exploit this decomposition to design and study particle approximations of the form
µNk =
N
i=1
uikwikδ
ξki with N
i=1
wki = 1 and
N
i=1
uikwik= 1,
which combine SIS and SIR algorithms [10, Section 3.4.4], with two species of nonnegative weights
• the weights (w1
k,· · ·, wkN) are used for resampling, • the importance weights (u1
k,· · ·, uNk) are used for weighting.
One possible motivation is the joint particle approximation of Feynman–Kac distributions associated with reference and alternate models, under the following absolute continuity assumptions
γ0(dx) =r0(x)γ00(dx) and Rk(x, dx) =rk(x, x)R0k(x, dx), (5)
for anyk= 1,· · ·, n. In this case indeed, assuming a decomposition of the reference nonnegative measure
γ00(dx) =W00(x)p00(dx),
in terms of a nonnegative function and a normalized probability distribution, and assuming a decomposition of the reference nonnegative kernel
R0k(x, dx) =Wk0(x, x)Pk0(x, dx),
in terms of a nonnegative function and a normalized Markov kernel, for anyk= 1,· · · , n, the following decom-positions hold
γ0(dx) =r0(x)W00(x)p00(dx) and Rk(x, dx) =rk(x, x)Wk0(x, x)Pk0(x, dx), (6)
for anyk= 1,· · ·, n, which clearly are of the form (4), and it is possible to design particle approximations of the form
µNk =
N
i=1
uikw0k,iδ
ξk0,i with N
i=1
wk0,i= 1 and
N
i=1
uikw0k,i= 1
where
• the importance weights (u1
k,· · ·, uNk) depend on both the reference and alternate models.
Clearly, the two point of views are mathematically equivalent, upon suitable substitution of the different nonnegative measures and nonnegative functions, and to decide which point of view to adopt usually depends on the application.
The paper is organized as follows : An original particle approximation is presented in Section 1, that com-bines importance weights and resampling weights. To analyze its performance, and in particular to derive a central limit theorem as the numberN of particles goes to infinity, with an explicit expression for the asymptotic variance, two different approaches are proposed. A first representation is introduced in Section 2, in terms of path–space distributions and a path–particle approximation is presented in Section 3, with a central limit theo-rem stated in Theotheo-rem 3.2 and with an explicit expression for the asymptotic variance given in Proposition 3.5. A second representation is introduced in Section 4, in terms of the multiplicative functional of a Markov chain and an extended particle approximation is presented in Section 5, where importance weights are treated as particles, with a central limit theorem stated in Theorem 5.2 and with an explicit expression for the asymptotic variance given in Proposition 5.5. It is checked that all three particle approximations actually correspond to the same algorithm, and that the two expressions obtained for the asymptotic variance are actually equal. The respective merits of the two different approaches are discussed in the Conclusion.
The following abuse of notation W0(x, x) = W0(x), W0imp(x, x) =W0imp(x) andW0red(x, x) = W0red(x)
will be used throughout the paper.
1.
Particle Approximation with Combined Weighting and Resampling
Recall that the unnormalized distributions satisfy the following recurrence relation
γk =γk−1Rk=µk−1Rkγk−1,1,
for anyk= 1,· · ·, n. Recall also the decomposition (1), and introducing a particle approximation of the form
µk≈µNk =
N
i=1
uikwikδ
ξik with N
i=1
wik= 1 and
N
i=1
uikwik= 1 ,
and using the decomposition (2), yields
µNk−1Rk(dx) =
N
i=1
uik−1Wk(ξki−1, x)
wik(x)
wki−1Pk(ξki, dx)
mik(dx)
,
which can be interpreted as the marginal nonnegative measure onE associated with a nonnegative measure on the product set{1,· · ·, N} ×E. Using the auxiliary variable approach [11], the resulting ASIR algorithm can be described by
γ0N = 1
N N
i=1
W0(ξ0i)δ
ξ0i and γkN =
1
N N
i=1
ukτk−i1Wk(ξkτk−i1, ξki)δ
ξki γkN−1,1,
where (ξ10,· · ·, ξ0N) are i.i.d. random variables taking values in E and with common probability distribu-tion p0(dx), and where ((τ1
k, ξ1k),· · ·,(τkN, ξkN)) are i.i.d. random variables taking values in the product set {1,· · ·, N} ×E and with common probability distribution (m1k(dx),· · ·, mNk(dx)), or equivalently
independently for anyi= 1,· · ·, N. Using the further factorization (3) results in the following approximations
µN0 =
N
i=1
W0imp(ξi0)W0red(ξ0i)δ ξ0i
N
j=1
W0imp(ξ0j)W0red(ξj0) =
N
i=1
ui0wi0δ ξ0i ,
and
µNk =
N
i=1
uτkk−i1Wkimp(ξτk−ki1, ξik)Wkred(ξkτ−ki1, ξik)δ ξki
N
j=1
uτ j k
k−1Wkimp(ξ τkj
k−1, ξkj)Wkred(ξτ j k k−1, ξkj)
=
N
i=1
uikwik δ ξik ,
for the normalized distributions, which defines implicitly the resampling weights
wi0= W
red 0 (ξ0i)
N
j=1
W0red(ξ0j)
and wki = W
red
k (ξτ i k k−1, ξki)
N
j=1
Wkred(ξτ j k k−1, ξkj)
,
and the importance weights
ui0= W
imp 0 (ξ0i)
N
j=1
wi0W0imp(ξ0j)
and uik = u
τki k−1W
imp
k (ξτ i k k−1, ξik)
N
j=1
wikuτ j k
k−1Wkimp(ξ τkj k−1, ξjk)
,
for anyi= 1,· · ·, N.
Remark 1.1. If the substitution W0imp ← r0, Wred
0 ← W00 and p0 ← p00 is made, and if the substitution
Wkimp ←rk,Wred
k ←Wk0 and Pk ←Pk0 is made for any k= 1,· · · , n, with the notations of (6), then clearly
the particle positions (ξk1,· · · , ξkN) and the resampling weights (w1k,· · · , wNk ) depend on the reference model only, whereas the importance weights (u1
k,· · · , uNk) depend on both the reference and alternate models. If in
addition the derivatives in (5) are continuous or differentiable w.r.t. some parameter of the model, then the importance weights will automatically inherit the same property, as suggested in [8]. This idea has been used in Monte Carlo maximum likelihood estimation [1, 7] or in smooth particle approximation of Feynman–Kac distributions [3, 4, 6, 9]. Using a single particle system for the reference value, with different importance weights corresponding to different values, makes the resulting approximation regular, but also poorly accurate for values too far from the reference. In opposition, using a different particle system for each different value, would make the resulting approximation very irregular, but also uniformly accurate for all values.
2.
Representation in Path–Space
Using the decompositions (4) yields
γn, φ =
E· · ·
Eφ(xn)γ0(dx0) n
k=1
Rk(xk−1, dxk)
=
E· · ·
Eφ(xn)W
imp
0 (x0)W0red(x0)p0(dx0)
n
k=1
Wkimp(xk−1, xk)Wkred(xk−1, xk)Pk(xk−1, dxk)
= E[φ(Xn)
n
k=0
Wkimp(Xk−1, Xk)
n
k=0
Wkred(Xk−1, Xk) ],
where{Xk, k= 0,1,· · ·, n}is a Markov chain taking values in Eand characterized by
• its initial probability distributionp0(dx),
• and its transition probabilitiesPk(x, dx), for anyk= 1,· · ·, n.
Consider the unormalized and normalized Feynman–Kac distributions defined on the path–spaceE0:n =E× · · · ×E by
γn•, fn=E[fn(Xn•)
n
k=0
g•k(Xk•) ] and µ•n, fn=γ •
n, fn γn•,1 ,
respectively, where{Xk•, k= 0,1,· · ·, n}is a path–space valued Markov chain defined byXk•= (X0,· · ·, Xk) =
X0:k for anyk= 0,1,· · ·, nand characterized by
• its initial probability distributionη•0(dx0) =p0(dx0),
• and its transition probabilitiesQ•k(x0:k−1, dx0:k) =δx0:k−1(dx0:k−1)Pk(xk−1, dxk), for anyk= 1,· · ·, n, and where gk•(x0:k) = Wred
k (xk−1, xk) for any k = 0,1,· · ·, n. Associated with this path–space probabilistic
representation is the recurrence relationγk•=γk•−1Rk•, with
Rk•(x0:k−1, dx0:k) =Q•k(x0:k−1, dx0:k)g•k(x0:k) =δx0:k−1(dx0:k−1)Pk(xk−1, dxk)Wkred(xk−1, xk),
for anyk= 1,· · ·, n. In particular for any function of the form
fn(x0:n) =F(xn, n
k=0
Wkimp(xk−1, xk)),
defined on path–space, it holds
γ•n, fn=E[F(Xn, n
k=0
Wkimp(Xk−1, Xk))
n
k=0
Wkred(Xk−1, Xk) ],
and for instance, for the function
Tnφ(x0:n) =φ(xn)
n
k=0
Wkimp(xk−1, xk) then clearly γn•, Tnφ=γn, φ, (7)
or in other wordsγn=γn• Tnin terms of a transformed distribution, and for the function
Tnφ(x0:n) =φ0(xn)| n
k=0
with the unnormalized and normalized Feynman–Kac distributions defined on the setE by
γn, φ=E[φ(Xn)|
n
k=0
Wkimp(Xk−1, Xk)|2
n
k=0
Wkred(Xk−1, Xk) ] and µn, φ= γn, φ
γn,1 ,
respectively.
3.
Particle Approximation in Path–Space
Recall that the unnormalized distributions satisfy the recurrence relation
γk•=γ•k−1R•k =g•k(µ•k−1Q•k)γk•−1,1,
for anyk= 1,· · ·, n. Introducing a weighted particle approximation of the form
µ•k≈µ•k,N =
N
i=1
wikδ
ξk•,i with N
i=1
wki = 1,
where ξk•,i = (ξ0i,k,· · ·, ξk,ki ) is a path–space particle with terminal position ξki = ξk,ki for any i = 1,· · · , N, yields
µ•k,N−1 Q•k(dx0:k) =
N
i=1
wik−1δ
ξ•k−,i1(dx0:k−1)Pk(xk−1, dxk).
The corresponding SIR algorithm can be described by
γ0•,N = 1
N N
i=1
g•0(ξ0•,i)δ
ξ0•,i and γ
•,N k =
1
N N
i=1
gk•(ξk•,i)δ ξ•k,i γ
•,N k−1,1,
where (ξ0•,1,· · ·, ξ0•,N) are i.i.d. random variables on the set E with common probability distribution p0(dx0), and whereg•0(ξ0•,i) =Wred
0 (ξ0i,0) for anyi= 1,· · · , N, and where (ξ•k,1,· · ·, ξk•,N) are i.i.d. random variables on
the path–spaceE0:k =E× · · · ×E with common probability distributionµk•,N−1Q•k(dx0:k), or equivalently
τki ∼(w1k−1,· · · , wNk−1) and (ξi0,k,· · · , ξki−1,k) = (ξ0τ,kki−1,· · ·, ξkτ−ki1,k−1) and ξik,k∼Pk(ξki−1,k, dxk),
independently for any i = 1,· · ·, N, and where gk•(ξk•,i) = Wred
k (ξki−1,k, ξk,ki ) = Wkred(ξ
τki
k−1,k−1, ξik,k) for any
i= 1,· · · , N. This results in the following approximations
µ•0,N=
N
i=1
W0red(ξ0i)δ ξ0•,i N
j=1
W0red(ξ0j) =
N
i=1
w0i δ
ξ0•,i and µ
•,N k =
N
i=1
Wkred(ξτ i k
k−1, ξki)δξ•,i
k N
j=1
Wkred(ξτ j k k−1, ξ
j k)
=
N
i=1
for the normalized distributions, in terms of the terminal positions of the path–space particles, which defines implicitly the resampling weights as
wi0= W
red 0 (ξ0i)
N
j=1
W0red(ξ0j)
and wki = W
red
k (ξτ i k k−1, ξki)
N
j=1
Wkred(ξτ j k k−1, ξkj)
,
for anyi= 1,· · ·, N. In particular for the function
Tkφ(x0:k) =φ(xk)
k
p=0
Wpimp(xp−1, xp),
defined on path–space and already introduced in (7), it holds
Tkφ(ξk•,i) =φ(ξk,ki )
k
p=0
Wpimp(ξpi−1,k, ξp,ki ),
hence
γ0•,N, T0φ= 1
N N
i=1
g0•(ξ•0,i)T0φ(ξ0•,i) = 1
N N
i=1
W0red(ξ0i)v0i φ(ξ0i),
where
v0i =W0imp(ξ0i),
for anyi= 1,· · ·, N, and
γk•,N, Tkφ= 1
N N
i=1
gk•(ξk•,i)Tkφ(ξk•,i)γk•,N−1,1= 1
N N
i=1
Wkred(ξkτ−ki1, ξik)vki φ(ξki)γk•−,N1,1,
where
vik =
k
p=0
Wpimp(ξpi−1,k, ξp,ki )
= Wkimp(ξik−1,k, ξik,k)
k−1
p=0
Wpimp(ξpi−1,k, ξp,ki )
= Wkimp(ξτ i k
k−1,k−1, ξk,ki )
k−1
p=0
Wpimp(ξτ i k p−1,k−1, ξ
τki p,k−1)
= Wkimp(ξτ i k k−1, ξik)v
τki k−1 ,
for any i= 1,· · ·, N. Notice the underlying recursive structure, in the form of a multiplicative functional of a Markov chain up to resampling. In view of the interpretationγn=γ•nTnin terms of a transformed distribution, this results in the following approximations
γ0N = 1
N N
i=1
W0red(ξi0)vi0δ
ξi0 and γkN =
1
N N
i=1
Wkred(ξkτ−ki1, ξki)vki δ
for the unnormalized distributions, hence
µN0 =
N
i=1
W0red(ξ0i)v0i δξi
0
N
j=1
W0red(ξj0)vj0
=
N
i=1
ui0wi0δξi
0 and µ
N k =
N
i=1
Wkred(ξτ i k
k−1, ξki)vikδξi
k N
j=1
Wkred(ξτ j k
k−1, ξjk)vkj
=
N
i=1
uikwikδξi k ,
for the normalized distributions, which defines implicitly the importance weights as
uik = v
i k N
j=1
vkj wkj
with vi0=W0imp(ξ0i) and vki =Wkimp(ξτ i k k−1, ξki)v
τki k−1 ,
for anyi= 1,· · ·, N.
Remark 3.1. It is clear that the particle positions (defined as the terminal positions of the path–space particles) and resampling weights defined here are the same as the particle positions and resampling weights defined in Section 1, and it is easy to check by induction that the importance weights defined here are the same as the importance weights defined in Section 1.
For this algorithm, the following central limit theorem holds [5, Chapter 9], [2, Chapter 9].
Theorem 3.2.
√
N γ
•,N
n −γn•, fn
γn•,1 =⇒N(0, V •
n(fn)),
in distribution as N↑ ∞, with asymptotic variance
Vn•(fn) = var(g •
0 R•1:nfn, η•0)
η•0, g0•R•1:n12 +
n
k=1
var(gk•R•k+1:nfn, ηk•)
η•k, g•kR•k:n12 ,
whereη•k=µ•k−1Q•k for anyk= 1,· · ·, n, and where
R•k+1:nfn(x0:k) =R•k+1· · ·Rn•fn(x0:k) =E[fn(Xn•) n
p=k+1
g•p(Xp•)|Xk•=x0:k] ,
for any k= 0,1,· · · , n, with R•n+1:nfn(x0:n) =fn(x0:n)by convention. Remark 3.3 (CLT for normalizing constants). It follows from the identity
γnN−γn,1
γn,1 =
γn•,1
γn,1
γn•,N −γn•, Tn1
γn•,1 ,
and from Theorem 3.2 that
√
N γ N
n −γn,1
γn,1 =⇒N(0, Vn(
γn•,1
γn,1Tn1) ), in distribution asN ↑ ∞.
Remark 3.4 (CLT for normalized distributions). It follows from the decomposition
µNn −µn, φ=γ •
n,1 γn,1
γn•, Tn1
γn•,N, Tn1
γn•,N−γ•n
from Theorem 3.2 and from the Slutsky lemma, that
√
N µNn −µn, φ=⇒N(0, Vn(γ •
n,1
γn,1 Tn(φ− µn, φ)) ),
in distribution asN ↑ ∞.
As a consequence, it is worth finding an explicit expression, in terms of Feynman–Kac distributions defined on the setE, of the asymptotic varianceVn(fn) for functions of the formfn =Tnφdefined on the path–space
E0:n=E× · · · ×E. Let
Rk+1:nφ(x) =Rk+1· · ·Rnφ(x) =E[φ(Xn)
n
p=k+1
Wp(Xp−1, Xp)|Xk=x],
for anyk= 0,1,· · ·, n, withRn+1:nφ(x) =φ(x) by convention. Then
var(g0•R1:•nTn(γ •
n,1 γn,1φ), η
• 0)
η•0, g0•R•1:n12 =
var(W0R1:nφ, p0) p0, W0R1:n12 ,
and
var(gk•R•k+1:nTn(γ •
n,1 γn,1 φ), η
•
k) ηk•, g•kRk•+1:n12 =
γk−1,1 γk•−1,1
γk−1,12
var(Wk (Rk+1:nφ)◦π, µk−1⊗Pk) µk−1, Rk:n12
+ [γ
k−1,1 γ•k−1,1 γk−1,12
µk−1, Rk:nφ2 µk−1, Rk:n12 −
µk−1, Rk:nφ2 µk−1, Rk:n12 ],
for anyk= 1,· · ·, n.
Proposition 3.5. In particular for the normalizing constant
√
N γ N
n −γn,1
γn,1 =⇒N(0, Vn),
in distribution as N↑ ∞, with asymptotic variance
Vn = var(W0R1:nφ, p0)
p0, W0R1:n12
+
n
k=1
γk−1,1 γk•−1,1
γk−1,12
var(Wk (Rk+1:n1)◦π, µk−1⊗Pk) µk−1, Rk:n12
+
n
k=1
[γ
k−1,1 γk•−1,1 γk−1,12
µk−1, Rk:n12 µk−1, Rk:n12 −
1 ].
Remark 3.6. In the extreme case where only resampling weights are used, i.e. if Wkred=Wk andWkimp ≡1 for anyk= 0,1,· · ·, n, then the last sum cancels out, and
Vn =var(W0R1:nφ, p0)
p0, W0R1:n12 +
n
k=1
var(Wk (Rk+1:n1)◦π, µk−1⊗Pk) µk−1, Rk:n12 ,
4.
Representation in Terms of a Multiplicative Functional
Starting rather from the absolute continuity assumptions
γ0(dx) =W0imp(x)γ00(dx) and Rk(x, dx) =Wkimp(x, x)R0k(x, dx), (8)
which define implicitly, in view of the decompositions (4)
γ00(dx) =W0red(x)p0(dx) and Rk0(x, dx) =Wkred(x, x)Pk(x, dx), (9)
for any k = 1,· · ·, n, consider the unnormalized and normalized Feynman–Kac distributions defined on the product set E×[0,∞) by
γne, F=
E· · ·
EF(xn, n
k=0
Wkimp(xk−1, xk))γ00(dx0)
n
k=1
R0k(xk−1, dxk) and µen, F= γe
n, F γe
n,1 ,
respectively. In particular
γke, φ⊗e0=
E· · ·
Eφ(xn)γ
0 0(dx0)
n
k=1
R0k(xk−1, dxk) =γk0, φ,
wheree0(v)≡1 by definition, and using the absolute continuity assumptions (8)
γke, φ⊗e =
E· · ·
E
φ(xn)W0imp(x0)
n
k=1
Wkimp(xk−1, xk)γ00(dx0)
n
k=1
R0k(xk−1, dxk)
=
E· · ·
Eφ(xn)γ0(dx0) n
k=1
Rk(xk−1, dxk) =γk, φ,
wheree(v)≡vby definition. Associated with these two integral representations are the two recurence relations
γ0
k =γk0−1R0k andγk=γk−1Rk respectively, for anyk= 1,· · · , n. Finally
γke, φ⊗e2 =
E· · ·
E
φ(xn)|W0imp(x0)
n
k=1
Wkimp(xk−1, xk)|2γ00(dx0)
n
k=1
R0k(xk−1, dxk)
=
E· · ·
E
φ(xn)γ0(dx0)
n
k=1
Rk(xk−1, dxk) =γk, φ,
where
γ0(dx) =|W0imp(x)|2γ00(dx) and Rk(x, dx) =|Wkimp(x, x)|2Rk0(x, dx),
for anyk= 1,· · ·, n. In particular forφ(x)≡1, it holds
γke,1=γke,1⊗e0=γk0,1 and γke,1⊗e=γk,1,
for the normalizing constants, and
µek, φ⊗e0=
γe
k, φ⊗e0 γe
k,1
= γ
0
k, φ γ0
k,1
µek, φ⊗e= γ
e
k, φ⊗e γe
k,1
= γk, φ
γ0
k,1
=γk,1
γ0
k,1
µk, φ, (11)
and
µek, φ⊗e2= γ
e
k, φ⊗e2 γe
k,1
= γ
k, φ γ0
k,1
= γ
k,1 γ0
k,1 µk, φ,
for the normalized Feynman–Kac distributions. In other words, the extended unnormalized Feynman–Kac distribution encodes all the different Feynman–Kac distributions, normalized or unnormalized, for the reference and alternate models, and in particular
γne, φ0⊗e0+φ⊗e=γn0, φ0+γn, φ.
It follows from the definition that
γ0e(dx, dv) =γ00(dx)δ
W0imp(x)(dv),
and introducing the extended nonnegative kernel
Rek(x, v, dx, dv) =R0k(x, dx)δ
v Wkimp(x, x)(dv ),
defined on the product setE×[0,∞), for anyk= 1,· · · , n, it is easily seen that
γne, F =
E· · ·
EF(xn, n
k=0
Wkimp(xk−1, xk))γ00(dx0)
n
k=1
R0k(xk−1, dxk)
=
E· · ·
E
∞
0 · · ·
∞
0
F(xn, vn)δ
W0imp(x0)(dv0)
n
k=1
δ
vk−1Wkimp(xk−1, xk)(dvk)
γ00(dx0)
n
k=1
Rk0(xk−1, dxk)
=
E
∞
0 · · ·
E
∞
0
F(xn, vn)γ0e(dx0, dv0)
n
k=1
Rek(xk−1, vk−1, dxk, dvk),
which justifies the interpretation of this nonnegative measure as an unnormalized Feynman–Kac distribution. Associated with this integral representation is the recurrence relation γe
k = γke−1 Rek, for any k = 1,· · ·, n.
Furthermore, using the decompositions (9) yields
γne, F =
E· · ·
EF(xn, n
k=0
Wkimp(xk−1, xk))γ00(dx0)
n
k=1
R0k(xk−1, dxk)
=
E· · ·
EF(xn, n
k=0
Wkimp(xk−1, xk)) n
k=0
Wkred(xk−1, xk)p0(dx0)
n
k=1
Pk(xk−1, dxk)
= E[F(Xn, n
k=0
Wkimp(Xk−1, Xk))
n
k=0
Wkred(Xk−1, Xk) ],
where{Xk, k= 0,1,· · ·, n}is a Markov chain taking values in Eand characterized by
• and its transition probabilitiesPk(x, dx), for anyk= 1,· · ·, n. Introducing
M0=W0imp(X0) and Mk =Wkimp(Xk−1, Xk)Mk−1 ,
for anyk= 1,· · ·, n, defines a multiplicative functional{Mk, k= 0,1,· · ·, n}associated with the Markov chain
{Xk, k= 0,1,· · ·, n}, and yields yet another representation as
γne, F=E[F(Xn, Mn)
n
k=0
Wkred(Xk−1, Xk) ] with Mn = n
k=0
Wkimp(Xk−1, Xk),
where jointly{(Xk, Mk), k= 0,1,· · ·, n}form another Markov chain taking values in the product setE×[0,∞), characterized by
• its initial probability distributionpe0(dx, dv) =p0(dx)δWimp 0 (x)
(dv),
• and its transition probabilitiesPke(x, v, dx, dv) =Pk(x, dx)δ
v Wkimp(x, x)(dv), for anyk= 1,· · ·, n. Finally, let We
0(x, v) =W0red(x) andWke(x, v, x, v) =Wkred(x, x) for anyk= 1,· · · , n.
5.
Particle Approximation Using a Multiplicative Functional
Recall that the unnormalized distributions satisfy the recurrence relation
γke=γke−1Rek=µke−1Rek γke−1,1,
for anyk= 1,· · ·, n. Using the decompositions (9), notice that
γ0e(dx, dv) =W0red(x)p0(dx)δWimp 0 (x)
(dv) =W0e(x, v)pe0(dx, dv),
and
Rek(x, v, dx, dv) =Wkred(x, x)Pk(x, dx)δ
v Wkimp(x, x)(dv) =W
e
k(x, v, x, v)Pke(x, v, dx, dv),
and introducing a weighted particle approximation of the form
µek≈µek,N =
N
i=1
wikδ
(ξki, vik) with
N
i=1
wik= 1,
yields
µek,N−1Rek(dx, dv) =
N
i=1
Wkred(ξki−1, x) wki−1Pk(ξki−1, dx)δ
vki−1Wkimp(ξik−1, x)(dv)
mik(dx, dv)
,
which can be interpreted as the marginal nonnegative measure on the product setE×[0,∞) associated with a nonnegative measure defined on the product set {1,· · ·, N} ×E ×[0,∞). Using the auxiliary variable approach [11], the resulting ASIR algorithm can be described by
γ0e,N = 1
N N
i=1
W0red(ξ0i)δ
(ξ0i, vi0) and γ
e,N k =
1
N N
i=1
Wkred(ξkτk−i1, ξki)δ
(ξki, vik)γ
where ((ξ1
0, v01),· · ·,(ξ0N, v0N)) are i.i.d. random variables taking values in the product setE×[0,∞) and with
common probability distributionpe
0(dx, dv), or equivalently
ξ0i ∼p0(dx) and v0i =W0imp(ξ0i),
independently for anyi= 1,· · ·, N, and where ((τk1, ξ1k, vk1),· · ·,(τkN, ξkN, vkN)) are i.i.d. random variables taking values in the product set{1,· · ·, N} ×E×[0,∞) and with common probability distribution (m1k(dx, dv),· · · , mNk (dx, dv)), or equivalently
τki∼(w1k−1,· · · , wNk−1) and ξik∼Pk(ξkτk−i1, dx) and vki =vτk−ki1Wkimp(ξkτ−ki1, ξki),
independently for anyi= 1,· · · , N. In view of the relation (10), this results in the following approximations
µ00,N =
N
i=1
W0red(ξ0i)δ ξ0i
N
j=1
W0red(ξj0) =
N
i=1
w0i δ
ξ0i and µ
0,N k =
N
i=1
Wkred(ξkτk−i1, ξki)δ ξik
N
j=1
Wkred(ξτ j k k−1, ξ
j k) = N i=1
wki δ ξki ,
for the normalized distributions under the reference model, which defines implicitly the resampling weights as
wi0= W
red 0 (ξ0i)
N
j=1
W0red(ξ0j)
and wki = W
red
k (ξτ i k k−1, ξki)
N
j=1
Wkred(ξτ j k k−1, ξkj)
,
for anyi= 1,· · ·, N. Similarly, in view of the relation (11), this results in the following approximations
µN0 =
N
i=1
W0red(ξ0i)vi0δ ξki
N
j=1
W0red(ξ0j)vj0
=
N
i=1
ui0wi0δ
ξk0,i and µ N k =
N
i=1
Wkred(ξkτ−ki1, ξki)vki δ ξik
N
j=1
Wkred(ξτ j k
k−1, ξkj)vjk
=
N
i=1
uik wki δ ξki ,
for the normalized distributions under the alternate model, which defines implicitly the importance weights as
ui0= W
imp 0 (ξ0i)
N
j=1
wi0W0imp(ξj0)
and uik= v
τki k−1W
imp
k (ξτ i k k−1, ξki)
N
j=1
wki vτ j k k−1W
imp
k (ξτ j k k−1, ξkj)
,
for anyi= 1,· · ·, N.
Remark 5.1. It is clear that the particle positions, resampling weights and importance weights defined here are the same as the particle positions, resampling weights and importance weights defined in Sections 1 and 3.
For this algorithm, the following central limit theorem holds [5, Chapter 9], [2, Chapter 9].
Theorem 5.2.
√
N γ
e,N
n −γne, F γe
n,1
in distribution as N↑ ∞, with asymptotic variance
Vn(F) = var(W
e
0 Re1:nF, pe0) pe
0, W0eRe1:n12
+
n
k=1
var(We
k (Rek+1:nF)◦π, pek) pe
k, Wke(Rke+1:n1)◦π2 ,
wherepek=µek−1⊗Pke for any k= 1,· · ·, n, and where
Rek+1:nF(x, v) =Rke+1· · ·RenF(x, v) =E[F(Xn, Mn)
n
p=k+1
Wpred(Xp−1, Xp)|Xk=x, Mk=v],
for any k= 0,1,· · · , n, with Re
n+1:nF(x, v) =F(x, v)by convention. Remark 5.3 (CLT for normalizing constants). It follows from the identity
λ0
γ0,N
n −γn0,1 γ0
n,1 +λ
γnN −γn,1
γn,1 =
γe,N
n −γne, γe
n,1 , λ0+λ γ0
n,1
γn,1 (1⊗e),
and from Theorem 5.2 that
√
N [λ0
γn0,N−γn0,1
γ0
n,1 +λ
γnN−γn,1
γn,1 ] =⇒N(0, Vn(λ0+λ
γn0,1
γn,1(1⊗e) )),
in distribution asN ↑ ∞.
Remark 5.4 (CLT for normalized distributions). It follows from the decomposition
µ0n,N−µ0n, φ0+µNn −µn, φ
=γ
e,N
n −γne γe
n,1 , γ
e
n,1
γne,N,1(φ0− µ
0
n, φ0)⊗e0+
γe
n,1⊗e γne,N,1⊗e
γ0
n,1
γn,1 (φ− µn, φ)⊗e,
from Theorem 5.2 and from the Slutsky lemma, that
√
N [µ0n,N −µ0n, φ0+µNn −µn, φ] =⇒N(0, Vn((φ0− µ0n, φ0)⊗e0+
γn0,1
γn,1(φ− µn, φ)⊗e)),
in distribution asN ↑ ∞.
As a consequence, it is worth finding an explicit expression, in terms of Feynman–Kac distributions defined on the set E, of the asymptotic varianceVn(F) for functions of the form F =φ0⊗e0+φ⊗e defined on the
product set E×[0,∞). Let
R0k+1:nφ0(x) =R0k+1· · ·R0nφ0(x) and Rk+1:nφ(x) =Rk+1· · ·Rnφ(x),
for anyk= 0,1,· · ·, n, withR0
n+1:nφ0(x) =φ0(x) andRn+1:nφ(x) =φ(x) by convention. Then
var(W0eRe1:n(φ0⊗e0+γ
0
n,1
γn,1 φ⊗e), p
e 0)
pe
0, W0eRe1:n12
=var(W
red
0 R01:nφ0, p0) p0, Wred
0 R01:n12
+cov(W
red
0 R01:nφ0, W0R1:nφ, p0) p0, Wred
0 R01:n1 p0, W0R1:n1
+var(W0R1:nφ, p0)
p0, W0R1:n12
and
var(Wke(Rke+1:n(φ0⊗e0+
γ0
n,1
γn,1 φ⊗e))◦π, µ
e
k−1⊗Pke) µe
k−1⊗Pke, Wke(Rke+1:n1)◦π2
=var(W
red
k (R0k+1:nφ0)◦π, µ0k−1⊗Pk) µ0
k−1, R0k:n12
+ 2cov(W
red
k (R0k+1:nφ0)◦π, Wk(Rk+1:nφ)◦π, µk−1⊗Pk) µ0
k−1, Rk0:n1 µk−1, Rk:n1
+ 2µk−1, R
0
k:nφ0 − µ0k−1, R0k:nφ0 µ0
k−1, R0k:n1
+γ
k−1,1 γk0−1,1 γk−1,12
var(Wk(Rk+1:nφ)◦π, µk−1⊗Pk) µk−1, Rk:n12
+ [γ
k−1,1 γk0−1,1 γk−1,12
µk−1, Rk:nφ2 µk−1, Rk:n12 −
µk−1, Rk:nφ2 µk−1, Rk:n12
],
for anyk= 1,· · ·, n.
Proposition 5.5. In particular for the normalizing constant
√
N γ N
n −γn,1
γn,1 =⇒N(0, Vn),
in distribution as N↑ ∞, with asymptotic variance
Vn = var(W0R1:nφ, p0)
p0, W0R1:n12 +
n
k=1
γk−1,1 γk0−1,1
γk−1,12
var(Wk (Rk+1:n1)◦π, µk−1⊗Pk) µk−1, Rk:n12
+
n
k=1
[γ
k−1,1 γk0−1,1 γk−1,12
µk−1, Rk:n12 µk−1, Rk:n12 −1 ].
Remark 5.6. It is clear that the expression for the asymptotic variance given here is the same as the expression given in Proposition 3.5.
6.
Conclusion
Even though the two different approaches used here to obtain a central limit theorem for the proposed particle approximation do actually provide the same explicit expression for the asymptotic variance, they differ in the following aspects.
In the first approach, based on a representation in path–space, the importance weight functions appear only in the test function, and it is therefore easy to analyze the joint particle approximation of unnormalized distributions (and normalizing constants and normalized distributions, as a by–product) for the reference model and for several alternate models at the same time, just by choosing the appropriate test function in the central limit theorem. In view of (7), the idea is simply to use one test function per alternate model, i.e.
γa,Nn −γna, φa
γna,1 =
γ•n,1
γan,1
γn•,N−γn•, Tnaφa
hence
√
N
a∈A
γna,N−γan, φa
γna,1 =⇒N(0, Vn(
a∈A γn•,1
γna,1T
a nφa) ),
in distribution asN ↑ ∞, with a correlation structure reflected in the asymptotic covariance matrix, easily ob-tained by polarization. In other words, this first approach seems appropriate to analyze particle approximations in statistical models depending on some parameter, in sensitivity analysis, etc.
In the second approach, based on a representation in terms of a multiplicative functional, the importance weight functions appear in the extended Markov model, and it would be necessary to change the Markov model to analyze the joint particle approximation of unnormalized distributions for the reference model and several alternate models at the same time. On the other hand, it would be easy with this Markov interpretation to analyze particle approximation with adaptive resampling schemes, where the decision to use resampling weights only vs. importance weights only is made dependent on an empirical criterion (effective number of particles, entropy of the sample, etc.) evaluated using the current particle approximation. The idea would be simply, as a generalization of (3), to introduce a factorization depending on the current normalized or unnormalized distribution on the product setE×[0,∞) : this would result in a representation in terms of a McKean model, and the associated particle approximation could be easily analyzed [5].
References
[1] Olivier Capp´e, Randal Douc, ´Eric Moulines, and Christian P. Robert. On the convergence of the Monte Carlo maximum likelihood method for latent variable models.Scandinavian Journal of Statistics, 29(4):615–635, December 2002.
[2] Olivier Capp´e, ´Eric Moulines, and Tobias Ryd´en.Inference in Hidden Markov Models. Springer Series in Statistics. Springer– Verlag, New York, 2005.
[3] Natacha Caylus, Arnaud Guyader, Fran¸cois Le Gland, and Nadia Oudjane. Application du filtrage particulaire `a l’inf´erence statistique des HMM. InActes des 36`emes Journ´ees de Statistique, Montpellier. SFdS, May 2004.
[4] Fr´ed´eric C´erou, Fran¸cois Le Gland, and Nigel J. Newton. Stochastic particle methods for linear tangent filtering equations. In Jos´e-Luis Menaldi, Edmundo Rofman, and Agn`es Sulem, editors,Optimal Control and Partial Differential Equations. In honour of professor Alain Bensoussan’s 60th birthday, pages 231–240. IOS Press, Amsterdam, 2001.
[5] Pierre Del Moral.Feynman–Kac Formulae. Genealogical and Interacting Particle Systems with Applications. Probability and its Applications. Springer–Verlag, New York, 2004.
[6] Arnaud Doucet and Vladislav B. Tadi´c. Parameter estimation in general state–space models using particle methods.Annals of the Institute of Statistical Mathematics, 55(2):409–422, June 2003.
[7] Charles J. Geyer. On the convergence of Monte Carlo maximum likelihood calculations.Journal of the Royal Statistical Society, Series B, 56(1):261–274, 1994.
[8] Charles J. Geyer. Estimation and optimization of functions. In Walter R. Gilks, Sylvia Richardson, and David J. Spiegelhalter, editors,Markov Chain Monte Carlo in Practice, chapter 14, pages 241–258. Chapman & Hall, London, 1996.
[9] Arnaud Guyader, Fran¸cois Le Gland, and Nadia Oudjane. A particle implementation of the recursive MLE for partially observed diffusions. InProceedings of the 13th Symposium on System Identification (SYSID), Rotterdam, pages 1305–1310. IFAC / IFORS, August 2003.
[10] Jun S. Liu.Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer–Verlag, New York, 2001. [11] Michael K. Pitt and Neil Shephard. Filtering via simulations : auxiliary particle filter. Journal of the American Statistical