W ith the state space model, it is natural to use particle filters to infer it. However, th e standard particle filters will be inefficient in the HSSM because of the high dimensional state space in which a large sample of particles are required to approx im ate the posterior distribution. A generic method to improve the efficiency of the particle filter is to combine it with the idea of Rao-Blackwellisation (Casella and R obert, 1996). This is known as Rao-Blackwellised particle filter (RBPF). The in tuitive idea behind it is to reduce the dimension by marginalising out one sub-part of the state space. Please see Doucet (1998); Doucet et al. (2000a); Murphy and Russell (2001) for more examples.
In the changepoints problem, the partition of the state space model is natural, the changepoint Ct and the param eter B t such th a t
P r(C t , Bt\yi:t) = P i{B t \Cu Yi-.t) Pr(C*|y1:t). (3.10)
We can update P r(C t|y i:t) by particle filtering first, and then Px{Bt \Cu y v.t) con ditional on Ct. In particular, if Pr(Bt \Cu y i :t) has an analytical form, we only need to sample from Pr(Ct \yi:t), so fewer particles are needed to reach the same accuracy.
C H A P T E R 3. C H AN G EPO IN TS MODELS 48
3 .4 .1
M ix tu r e K alm an filter
A simplest example of HSSM is a conditional Gaussian linear state space model
(CGLSSM) in which a :
where A (C t), H (C t), R (C t) and S (C t) are known matrices conditional on Ct . Ut
and Vt follow a m ultivariate normal distribution 7V(0,1). Ct is a Markov random variable w ith a finite set of values. We consider a general case below, b u t a special case is the probability structure described in the previous section.
A standard m ethod to inference for the hidden state of CGLSSM is the mixture
Kalman filter (MKF) proposed by Chen and Liu (2000). They advocated to infer
the history of the Ct first, th a t is PrfCTilyi:*). It is obvious th a t conditional on the Ci;t, the posterior distribution of param eter B t can be calculated by the Kalman filter. As C \:t take a finite number of values, it is even possible to write down the posterior distribution of B t as a sum over all possible values of C i:t. The problem of this is th a t the number of term s in the sum will increase exponentially with time. The idea of particle filter is to approximate this sum by one with a fixed number of term; each term correspond to a particle, i.e. a realisation of Cbf.
According to (3.10), the joint posterior of C \:t and B t can be factorised as
Because given the changepoint c i:t, (3.11) is a Gaussian linear state space model, the p(/?t |c i:t,y i:t) term follows a normal distribution N (fit (ci:t), S t(c i:t)), where
fJLt {ci:t) and E t (ci;t) can be recursively calculated by the Kalman filter if c 1:t is
Bt — A(Ct)Bt~ i + R (C t)Utl
Yt = H {C t)B t + S{Ct)Vu
(3.11)
C H A P T E R 3. C H A N G E P O IN T S M ODELS given: 49
pt
=A i c f ^ i c f i M ^ r + R i c f m c f r ,
Qt =
H ( c f ) P tH ( c f ) T + S(ct(i))S (4i))T,
P*(c$) = i(c S _ i) (3.13)+PtH ( c f ) TQ i 1(yt -
E<(c$) =
Pt ~ PtH ( c ^ f Q T lH ( c f ) P t;
The posterior distribution p (c i:t\yi:t) is able to be approxim ated recursively by a standard filtering as:
p ( C l : t | y i : t - l ) = X ^ ( C * l C * - l ) ^ (- l ’ ( 3 *1 4 )
i = 1
p (ci:t|y i;t) oc (3-15)
i —1
Then th e joint posterior is approximated as
P ( C l: * ,A |y i: t ) OC 5 I p 0 8 t , , |c S ,y i: t ) p ( j /4 |C ( ’) )p(Ct’>|C t - l ) U'^-)l
i ~ 1
= N s ‘(c S ) ) p(»t|ct'))p (40l«i-i)«'«-1;
i = l
(i)
where th e weight of each particle wt is:
W t] ocp(2/tI ) p ( c t° I c ^ - i.) ^ * - i; = L (3-16)
i = 1
Chen and Liu (2000) adapted the SIS algorithm to obtain these particles, by which each particle is simulated from
«(*!<& ,») = 9(ft|c21,/it -x(cg_1),E t- i ( c l l 1))>
C H A P T E R 3. C H AN G EPO IN TS M O D ELS 50 updated as
.,.(0
p(yt\c{i))p(c‘i'}\cfl1)
(j) _
"
(j)
1 < j ( ' c < i ) l e ( i ) iu ( c H> 1 E ( c (i) )) L , Wt ( 3 - 1 7 ) Finally, a standard resampling step can be used. The detail of the MKF is listed as below:
A lg o r ith m 3.1 Mixture Kalman filter
A t certain time t, suppose we have particles j c ^ , ^ ^ ( c ^ ) , with
weights w ^ \ s , fo r i = 1 , . . . , N :
S te p 1 Generate from q{ci:t\c ^ \, £ * -1(0^ ) ) ;
S te p 2 Given c ^ ; update ^ ( c ^ ) and E *(c^) by the (3.13);
S te p 3 Update the new (normailised) weight as (3.17);
S te p 4 Resample {pLt (c ^ ) ,E * ( c ^ ) ) with the probability proportional to the
weights i f the E SS defined as (2.11) is less than a threshold value.
An alternative option to MKF is using the optimal resampling m ethod of Fearn head and Clifford (2003). Every time, the optimal sampling version of the MKF produces R descendants of each previous particle, each for a possible value of c f \
so there is no need to use any proposal density function to sample . Hence the weights are simply of form (3.16). Consequently, resampling has to be implemented at each tim e to reduce the number of particles from R N to N .
However, the optimal sampling method outperforms the MKF in two aspects: (i) it provides more accurate results in term s of both mean-square and absolute error; and (ii) it is much more efficient when the distribution of the weights are extremely skewed.
C H A P T E R 3. C H AN G EPO IN TS MODELS