2.2 Interpretation as Importance Sampling
2.2.3 Importance Weights
For the moment, our aim is to approximateNt via
is
using the proposaldistribution Nt. The Rao–Blackwellisation described in the next subsec-
tion then leads to the usual
smc
approximation oft. We must thereforeensure that the Radon–Nikodým derivative
x wt.xNt/WD dNt dNt .xNt/ D1fu1Wtg.x b1Wt 1Wt / 1.du1/1.u1;fb1g/ tjt.z1Wt;fbtg/q1m.b1;du1/ t Y sD2 s..u1Ws;z1Ws 1; os 1; bs 1/;fbsg/1fbs 1g.as tbs / Rsm 1..z1Ws 1; os 1; bs/;fbs 1g/ s.u1Ws 1;dus/ Qsm..z1Ws 1; os 1;as 1; bs/;dus/ (2.5)
is well defined. Here, we have slightly abused notation by writing Radon– Nikodým derivatives as.dx/=.dx/ WDŒd=d.x/, for any two meas-
ures, and where we have defined the following quantities. s 2 K.X1Ws 1;Xs/is a kernel which extends the Step-.s 1/target
measure to the Step-starget measure, i.e. it satisfiess 1˝ s Ds.
Rsm 1..z1Ws 1; os 1; n/;/denotes the ‘marginal’ distribution of thenth
parent index underRs 1..z1Ws 1; os 1/; /.
Qsm..z1Ws 1; os 1;as 1; n/; /denotes the ‘marginal’ distribution of the nth particle underQs..z1Ws 1; os 1;as 1/; /. The marginal Step-1 pro-
posal distributionq1m.b1; /is similarly defined.
2.8 Remark. The kernelsRm
s 1andQsminduce marginal distributions only
in the sense that they do not condition on the other parent indices or particles generated at Steps. They generally still depend on all the auxiliary variables, parent indices and particles sampled at previous steps. Indeed, this is why the ‘conditional’
smc
kernel does not represent a (full) conditional distribution under the distribution induced by thesmc
algorithm as pointed out in Remark 2.7 (see also Remark 1.16).We will comment on particular choices for the kernels and measures guaranteeing the existence of the above importance weight in Section 2.3.
Distribution Over Particle Indices. The kernelssintroduced in the
previous subsection need to be chosen carefully to preserve absolute continuity, especially if a non-exchangeable resampling scheme is used. A generally applicable choice considered in Lee, Murray and Johansen (in prep.), which is also implicitly used by most
smc
algorithms, is to letsbe a time-reversal kernel of some stochastic kernels2 K1.X1Ws;Ks/
(which defines a distribution overBt) underRsm 1, as defined in Assump-
tion 2.9.
2.9 Assumption. 1WD1and, fors >1,
s..u1Ws;z1Ws 1; os 1; abss1/;fbsg/ D R m s 1..z1Ws 1; os 1; bs/;fasbs1g/s.u1Ws;fbsg/ PNs nD1s.u1Ws;fng/Rsm 1..z1Ws 1; os 1; n/;fans 1g/ : (2.6)
It often suffices to lets.u1Ws; /UnifKs. However, more complex ker-
nels are sometimes needed to ensure absolute continuity in Equation 2.5. For instance, a more complex kernelsis needed in the discrete particle
filter summarised in Subsection 2.3.4.
The main advantage of the time-reversal kernel is that the import- ance weight in Equation 2.5 depends on the resampling distribution only through the denominator in Equation 2.6. Hence, it is usually not neces- sary to require the resampling scheme to be exchangeable – even if we cannot evaluate the distribution implied byRsm 1. For instance, if we use
an unbiased resampling scheme and ifs.u1Ws; / WDUnifKs, thenRsm 1
drops out in the importance weights from Equation 2.5 because
s..u1Ws;z1Ws 1; os 1; bs 1/;fbsg/ Rsm 1..z1Ws 1; os 1; bs/;fbs 1g/ D ( 1=Wbs 1 s 1 .z1Ws 1/; if we resample at Steps, 1fbs 1g.bs/; otherwise.
However, note that sampling according tos(which depends onRsm 1)
and Rsc 1 is still required when sampling from the
csmc
kernel. Forvarious common resampling schemes,Rsm 1andRsc 1are derived in Lee,
Murray and Johansen (in prep.) and for completeness, they are also stated in Appendix A of this work.
2.2 Interpretation as Importance Sampling
2.2.4
Rao–Blackwellisation
LetNtis;1WD xwt.Xxt/•Xxt be an
is
approximation of the extended targetmeasureNt based on a single sampleXxt D.U1Wt; B1Wt;Z1Wt/ Nt.
Of course, we are only interested in approximating the marginalt of
N
t. The usual
smc
approximation of this marginal measure,smc ;N1Wtt , can
be obtained by Rao–BlackwellisingNtis;1as described in Lee, Murray and
Johansen (in prep.).
More precisely, note that
wb1Wt
t .Z1Wt/WDEŒwxt.Xxt/1fb1Wtg.B1Wt/jZ1Wt
is non-zero only ifb1Wt coincides with a particle lineage under the
smc
algorithm, i.e. ifb1Wt DB1nWtjt, for somen2Kt. We can therefore identify Nt (unnormalised) Step-t particle weights, forn2Kt, as
wnt.z1Wt/WDw bn
1Wtjt
t .z1Wt/:
For anyA2B.X1Wt/, a
mosis
approximation oft.A/is thus given by mosis;N1Wt t .A/DE N tis;1.A xZt/ ˇ ˇZ1Wt DX b1Wt2K1Wt wb1Wt t .Z1Wt/•X1bWt1Wt.A/ D Nt X nD1 wnt.Z1Wt/•XBn1Wtjt 1Wt .A/ Dsmc;N1Wt t .A/:The above construction immediately implies that the
smc
estimate of the normalising constant,zsmc;N1Wtt Dsmc ;N1Wt
t .1/D Nis ;1
t .1/, is a (one-
sample)
is
estimate and is therefore unbiased. Nonetheless, we stress again that the unbiasedness property alone does not ensure estimates that are useful in practice, i.e. estimates whose error can be controlled. Condi- tions under which this is guaranteed are summarised in Subsection 2.2.5. Finally, recall that wxt D dNt=dNt. For later reference, we state thefollowing slight generalisation of Andrieu et al. (2010, Theorem 2) (but which is really just a special case of Proposition 1.13).
2.10 Proposition. Assume that tjt.z1Wt;fng/DWtn.z1Wt/; for any.n;z1Wt/2 Kt Z 1Wt, then x wt.xNt/Dztsmc;N1Wt; for anyxNt 2 xXt.
Proof. This follows immediately from the definition ofwn
t.z1Wt/.