1.2 Inference for State Space Models
1.2.1 SMC Algorithms
SMC algorithms provide posterior estimation using a series of predicting and updating recursions. The Sequential Importance Sampling technique can be seen as a general framework for particle filtering. Importance sampling is well studied in classical Monte Carlo literature, and can be used as a variance reduction technique. Large variance reduction can be achieved for instance when calculating the Value-at-Risk of large portfolio losses [GHS00]. Importance sampling can also be applied to the efficient calculation of deep out of the money options. In SMC methods, importance sampling is used as a way to associate importance weights to individual particles, to overcome sampling from the “wrong” distribution too often.
38 1.2 Inference for State Space Models
Sequential Importance Sampling (SIS) [GSS93]: Suppose that paths x(i)0:n ∼πθ(X0:n|y0:n)can
be generated given a set of observations y0:n, for i = 1, . . . , N. Thus, the marginal density of the hidden model given some observations can be approximated. Suppose that particle paths
(x(i)0:k−1)i=1,...,N are available at time k−1, weighted equally. An N-particle approximation of the posterior density is
πθN(x0:k−1|y0:k−1) = 1 N N
∑
i=1 δ (x0:k−1(i) ) ,where δ is the Dirac measure. By sampling ¯x(i)k ∼ fθ(.|x(i)k−1) for i = 1, . . . , N, a prediction for
the density at time step k is
pNθ (x0:k|y0:k−1) = N1 N
∑
i=1 δ (x(i)0:k−1, ¯x(i)k ) . (1.2.4) The target distribution at time step k isπθ(x0:k|y0:k) = � gθ(yk|xk)pθ(x0:k|y0:k−1)
X gθ(yk|xk)pθ(x0:k|y0:k−1)dxk
. (1.2.5)
The notation used throughout is {(x(i)k , w(i)k )}N
i=1, denoting the set of particle positions and corresponding weights at time step k. A set of particles, weighted according to their likelihood give the following approximations of πθ(x0:k|y0:k), for time steps k ≥ 0. Substituting the predicted density in (1.2.5) by the approximation (1.2.4) yields
¯πN θ (x0:k|y0:k) = N
∑
i=1 w(i)k δ (x(i)0:k) , where the weights(w(i)k )i=1,...,N satisfyw(i)k ∝ gθ(yk|¯xk(i)) and
∑
N i=1w(i)k =1 . The weighted approximations, πN
θ (x0:k|y0:k), are then propagated through time, up to the
terminal time step n. A feature of the SIS algorithm is that the path trajectories (x0:n(i))i=1,...,N
are independent and identically distributed. Define ˆIN(ϕ):=
∑
Ni=1
as the SMC estimate of I(ϕ) in (1.2.3). SIS is usually successful for small n, however after several iterations most paths will have a negligible weight [DdFG01, Section 1.3.2]. Eventually one particle will dominate and be used to approximate the expectation, which illustrates the weight degeneracy problem.
Resampling: The variance of the weights increases with the number of time steps, and for a
fixed accuracy, the computational cost grows exponentially [KLW94]. To stabilise the variance of weights, resampling methods have been proposed. Resampling consists of choosing a new set of particles based on the original set. The common idea is to increase the number of particles with higher weights, and reduce the number of particles that have low probability. At each time step, k, N particles from the current particle set could be sampled with replacement according to: E � Nk(i)|x(i)0:k � = Nw(i)k .
The new particle set consists of Nk(i) realisations of particles x0:k(i), with weights reset to 1/N for each resampled particle. Details for resampling schemes and examples of the empirical measures are presented in [Dou05, DMDJ12]. Multinomial resampling draws N new particles from a multinomial distribution according to the normalised weights (wk(i))i=1,...,N. Systematic resampling uses a single random uniform draw to generate the new particle set. It is often preferred due to computational simplicity, however the method is sensitive to the ordering of particles [Dou05]. Other methods include residual resampling and stratified resampling [BC09, Dou05]. More complicated schemes have been studied, where the number of particles follow some evolutionary process [CDML99]. Resampling at each discrete time step can be harmful, so metrics such as the effective sample size (ESS) can be used as a trigger for performing a resampling step [LC98].
Definition 1.2.2. Define the ESS approximation for a set of particles with weights(w(i)k )i=1,...,N as:
Ne f f := 1
∑Ni=1 �
w(i)k
�2 ∈ [1, N], k ∈ I.
Ne f f approximates the equivalent number of i.i.d. random samples needed for an estimate, such that its Monte Carlo variance is that of the N-particle weighted approximation. A threshold can be set such that when Ne f f drops below it, a resampling step is performed. In
40 1.2 Inference for State Space Models
the literature, this threshold is commonly chosen as N/2 or N/3.
Intuitively, particles with high weights are more likely to be resampled, and particles with low weights will eventually cease to exist upon successive resampling steps. The effect of many successive resampling steps at time n leads to a loss of path diversity at time n−k for some lag k > 0, which is referred to as the path degeneracy problem. Attempts
have been made to minimise this problem by careful resampling and monitoring of the ESS [LC98, Whi, CDML99]. Path degeneracy is induced from resampling, and eventually approximations of the distribution would be just using one path. The trade-off in resampling can be summarised as controlling the variance of the weights, whilst not dramatically reducing the diversity of particles. Many paths will have the same history when looking through the path of the particles and ultimately all paths will coalesce to a single path [DJ08].
In situations where the consecutive distributions are very different, interpolating distributions have been proposed to reduce the need to resample particles as often [GC00]. Such techniques are often computationally expensive as the number of intermediate distributions could be prohibitive [BLB08].
Particle Filter with Resampling: In Algorithm 1.2.1, the most general particle filter with a
resampling step is described. The ESS metric is used as the trigger to resample, according to a user-set resampling scheme. This method is based on the SIS algorithm, with the inclusion of a resampling step.
Algorithm 1.2.1Particle Filter with Resampling (SIS/R)
Step 0: Initialise
a) For i =1→ N, sample x(i)0 ∼µ(·).
b) For i =1→ N, calculate normalised weights w(i)0 ∝ gθ
� y0|x0(i)
� .
Step 1: Main recursive step. For k =1 →n
a) Resample Step
if Ne f f < N/2 then
Resample set(x(i)k−1)i=1,...,N according to weights({x(i)k−1, w(i)k−1})i=1,...,N. For i =1 →N, set w(i)k−1 :=1/N.
end if
b) Propagate particles. For i =1 →N, sample x(i)k ∼ fθ
�
Xk|xk(i)−1 �
. c) For i =1→ N, compute normalised weights, w(i)k ∝ w(i)k−1gθ�yk|x(i)k �.
The families of SIS algorithms are “online”—the complexity of the algorithm does not increase as the number of time steps increases, and only a fixed memory is required for a fixed number of particles. This is due to the fact that only a forward pass is required. Smoothing algorithms requiring a forward and a backward pass are classified as being “offline”.