Importance Weights - Interpretation as Importance Sampling

2.2 Interpretation as Importance Sampling

2.2.3 Importance Weights

For the moment, our aim is to approximateNt via

is

using the proposal

distribution Nt. The Rao–Blackwellisation described in the next subsec-

tion then leads to the usual

smc

approximation oft. We must therefore

ensure that the Radon–Nikodým derivative

x wt.xNt/WD dNt dNt .xNt/ D1_fu₁_Wtg.x b₁_Wt 1Wt / ₁.du₁/₁.u₁;fb₁g/ tjt.z₁Wt;fbtg/q₁m.b₁;du₁/ t Y sD2 s..u₁Ws;z₁Ws ₁; os ₁; bs ₁/;fbsg/1fbs 1g.as tbs / Rsm 1..z1Ws ₁; os ₁; bs/;fbs ₁g/ s.u1Ws 1;dus/ Qsm..z1Ws ₁; os ₁;as ₁; bs/;dus/ (2.5)

is well defined. Here, we have slightly abused notation by writing Radon– Nikodým derivatives as.dx/=.dx/ WDŒd=d.x/, for any two meas-

ures, and where we have defined the following quantities. s 2 K.X₁_Ws ₁;Xs/is a kernel which extends the Step-.s 1/target

measure to the Step-starget measure, i.e. it satisfiess ₁˝ s Ds.

Rsm ₁..z1Ws ₁; os ₁; n/;/denotes the ‘marginal’ distribution of thenth

parent index underRs ₁..z₁Ws ₁; os ₁/; /.

Qsm..z1Ws ₁; os ₁;as ₁; n/; /denotes the ‘marginal’ distribution of the nth particle underQs..z1Ws 1; os 1;as 1/; /. The marginal Step-1 pro-

posal distributionq₁m.b₁; /is similarly defined.

2.8 Remark. The kernelsRm

s ₁andQsminduce marginal distributions only

in the sense that they do not condition on the other parent indices or particles generated at Steps. They generally still depend on all the auxiliary variables, parent indices and particles sampled at previous steps. Indeed, this is why the ‘conditional’

smc

kernel does not represent a (full) conditional distribution under the distribution induced by the

smc

algorithm as pointed out in Remark 2.7 (see also Remark 1.16).

We will comment on particular choices for the kernels and measures guaranteeing the existence of the above importance weight in Section 2.3.

Distribution Over Particle Indices. The kernels_sintroduced in the

previous subsection need to be chosen carefully to preserve absolute continuity, especially if a non-exchangeable resampling scheme is used. A generally applicable choice considered in Lee, Murray and Johansen (in prep.), which is also implicitly used by most

smc

algorithms, is to let

sbe a time-reversal kernel of some stochastic kernels2 K1.X₁_Ws;Ks/

(which defines a distribution overBt) underRsm ₁, as defined in Assump-

tion 2.9.

2.9 Assumption. ₁WD₁and, fors >1,

s..u₁Ws;z₁Ws ₁; os ₁; abss₁/;fbsg/ D R m s ₁..z1Ws ₁; os ₁; bs/;fasbs₁g/s.u₁Ws;fbsg/ PNs nD1s.u₁Ws;fng/Rsm ₁..z1Ws ₁; os ₁; n/;fans ₁g/ : (2.6)

It often suffices to lets.u₁Ws; /UnifKs. However, more complex ker-

nels are sometimes needed to ensure absolute continuity in Equation 2.5. For instance, a more complex kernelsis needed in the discrete particle

filter summarised in Subsection 2.3.4.

The main advantage of the time-reversal kernel is that the importance weight in Equation 2.5 depends on the resampling distribution only through the denominator in Equation 2.6. Hence, it is usually not neces- sary to require the resampling scheme to be exchangeable – even if we cannot evaluate the distribution implied byRsm ₁. For instance, if we use

an unbiased resampling scheme and ifs.u₁Ws; / WDUnifKs, thenRsm ₁

drops out in the importance weights from Equation 2.5 because

s..u₁Ws;z₁Ws ₁; os ₁; bs ₁/;fbsg/ Rsm ₁..z1Ws ₁; os ₁; bs/;fbs ₁g/ D ( 1=Wbs 1 s ₁ .z1Ws ₁/; if we resample at Steps, 1_fbs 1g.bs/; otherwise.

However, note that sampling according tos(which depends onRsm ₁)

and R_sc ₁ is still required when sampling from the

csmc

kernel. For

various common resampling schemes,Rsm ₁andRsc ₁are derived in Lee,

Murray and Johansen (in prep.) and for completeness, they are also stated in Appendix A of this work.

2.2 Interpretation as Importance Sampling

2.2.4 Rao–Blackwellisation

LetNtis;1WD xwt.Xxt/•Xxt be an

is

approximation of the extended target

measureNt based on a single sampleXxt D.U1Wt; B1Wt;Z1Wt/ Nt.

Of course, we are only interested in approximating the marginalt of

t. The usual

smc

approximation of this marginal measure,smc ;N₁_Wt

t , can

be obtained by Rao–BlackwellisingNtis;1as described in Lee, Murray and

Johansen (in prep.).

More precisely, note that

wb1Wt

t .Z1Wt/WDEŒwxt.Xxt/1_fb₁_Wtg.B1Wt/jZ1Wt

is non-zero only ifb₁Wt coincides with a particle lineage under the

smc

algorithm, i.e. ifb₁_Wt DB₁n_W_t_j_t, for somen2Kt. We can therefore identify Nt (unnormalised) Step-t particle weights, forn2Kt, as

wnt.z1Wt/WDw bn

1Wtjt

t .z1Wt/:

For anyA2B.X₁_W_t/, a

mosis

approximation oft.A/is thus given by mosis;N1Wt t .A/DE N tis;1.A xZt/ ˇ ˇZ1_Wt DX b₁_Wt2K₁_Wt wb1Wt t .Z1Wt/•X₁b_Wt1Wt.A/ D Nt X nD1 wnt.Z1Wt/•_XBn1Wtjt 1Wt .A/ Dsmc;N1Wt t .A/:

The above construction immediately implies that the

smc

estimate of the normalising constant,zsmc;N1Wt

t Dsmc ;N₁_Wt

t .1/D Nis ;₁

t .1/, is a (one-

sample)

is

estimate and is therefore unbiased. Nonetheless, we stress again that the unbiasedness property alone does not ensure estimates that are useful in practice, i.e. estimates whose error can be controlled. Condi- tions under which this is guaranteed are summarised in Subsection 2.2.5. Finally, recall that wxt D dNt=dNt. For later reference, we state the

following slight generalisation of Andrieu et al. (2010, Theorem 2) (but which is really just a special case of Proposition 1.13).

2.10 Proposition. Assume that tjt.z₁Wt;fng/DWtn.z1Wt/; for any.n;z₁_W_t/2 K_t Z 1Wt, then x wt.xNt/Dztsmc;N1Wt; for anyxNt 2 xXt.

Proof. This follows immediately from the definition ofwn

t.z1Wt/.

In document On extended state space constructions for monte carlo methods (Page 60-63)