The changepoint model - Maximum likelihood parameter estimation in time series models using seq

In this work a changepoint model is defined to be comprised of two discrete-time stochastic processes which are _{(Xk, Zk)}k≥1 and {Yk}k≥1. {(Xk, Zk)}k≥1 is an unobserved time-

homogeneous Markov chain taking values in _{X × Z where X = {1, 2, . . .} × {1, . . . , R}} and_{Z ⊆ R}p_{. (While the definition of}_{X in this manner is necessary for the resulting model}

to be a changepoint model, the definition of _{Z can change depending on the application} domain.) We denote realisations of the first component of this chain by xk = (dk, mk).

The variable mk takes values in the index set {1, . . . , R} and indicates the (generative)

model the chain is in at that time while dk indicates the duration the chain has spent in

model mk. The transition law of{(Xk, Zk)}k≥1 is

X1 ∼ µ, Xk|(xk−1 = (d, m), zk−1) = ( (d + 1, m) w.p. 1_{− λ}θ,m(d) (1, m′₎ _{w.p. λ} θ,m(d)× Pθ(m, m′) , Zk|(xk = (d′, m′), xk−1, zk−1) ∼ ( fθ,m′(z|z_k−1)dz if d′ 6= 1 πθ,m′(z)dz if d′ = 1 , (4.1)

where λθ,m(d)∈ [0, 1] for all θ ∈ Θ and (d, m) ∈ X ; Pθ is an R× R row stochastic matrix;

for each θ and m, fθ,m(z|zk−1) is the density of a Markov transition kernel on Z w.r.t.

a suitable dominating measure which is denoted by dz; and for each θ and m, πθ,m is

a probability density on Z. The transition kernel of the Markov chain {(Xk, Zk)}k≥1 is

assumed to be parametrised by the finite dimensional parameter θ_{∈ Θ. Without loss of} generality, it is assumed that the probability distribution of the initial state of the chain {Xk}k≥1, denoted µ, has all its mass on {(1, 1), . . . , (1, R)}, e.g. the uniform distribution

on _{{(1, 1), . . . , (1, R)}.}

For a sequence _{ak}_k≥1 and integers i, j, let ai:j denote the set{ai, ai+1, ..., aj}, which

is empty if j < i, and ai:∞ = {ai, ai+1, ...}. The process {Yk}k≥1 is a Y-valued observed

process which satisfies the following conditional independence property: Yk

{xk, zk}_k≥1, y1:k−1, yk+1:∞

∼ gθ,mk(y|zk)dy (4.2)

where for each θ and m, gθ,m is a probability density onY with respect to the dominating

measure dy. In this work Y ⊆ Rq _{although the definition of}_{Y may be altered depending}

on the application. Equations (4.1) and (4.2), now define the law of (X1:n, Z1:n, Y1:n).

Note that {Xk}k≥1 itself is a Markov chain and we denote its transition matrix by

pθ( xk| xk−1). Secondly, it is useful to visualise a realisation of {Xk}k≥1 as a labelled con-

tiguous partition of{1, 2, . . .}, {[t1, t2), [t2, t3), . . .} and ti+1> ti, where each set [ti, ti+1) of

the partition, which we call a segment, is accompanied by mti, the model number during

and are called as the changepoints. As_{Zk}k≥1 forgets its past at times of changepoints,

within the segment [ti, ti+1), {(Zk, Yk)}_t_i_≤k<t_i+1 is a HMM with initial, state transition,

and observation densities πθ,m_ti, fθ,m_ti, and gθ,m_ti respectively. In this sense, our model

is general enough to encompass both hidden semi-Markov models ([Barbu and Limnios, 2008; Murphy, 2002] and segmented hidden semi-Markov models [Dong and He, 2007; Gales and Young, 1993]. Below, we give an example of a changepoint model, which we will use in our experiments throughout this chapter.

Example 4.1. Consider the following changepoint model presented in Fearnhead and

Vasileiou [2009], where Zk = (Zk,1, Zk,2)∈ R × R+, andY = R. The model satisfies

X1 ∼ U{1}×{1,...,R}, Xk|(xk−1 = (d, m)) = ( (d + 1, m) w.p. (1− λm) (1, m′₎ _{w.p. λ} m× P (m, m′) , Zk|(xk= (d′, m′), zk−1) ∼ ( δzk−1 if d ′ _{6= 1} N Γ−1_(ξ m, κm, α, β) if d′ = 1 , Yk|zk ∼ N (zk,1, zk,2),

where N Γ−1₍_{·) denotes the normal-inverse gamma distribution and U}

A is the uniform

distribution over the set A. In relation to (4.1) and (4.2), we have λθ,m(d) = λm,

fθ,m(z|zk−1)dz = δzk−1(dz), πθ,m = N Γ

−1_(ξ

m, κm, α, β), and gθ(y|zk) = N (y; zk,1, zk,2).

Therefore, the parameters of interest are θ = (ξ1:R, κ1:R, λ1:R, α, β, P ). In this model,

the observations in each segment are i.i.d. Gaussian random variables whose mean and variance change from segment to segment and are drawn from the normal-inverse gamma distribution.

The following important conditional independence property, which follows from (4.1) and (4.2), will be frequently used in the derivations to follow: for any k′ _{≥ k,}

pθ(yk|x1:k′, y_1:k−1) = p_θ(y_k|x_k, y_1:k−1) = p_θ(y_k|x_k, y_k−d_k_+1:k−1).

(Recall that dk is the first component of xk.) This equation may be interpreted to mean

that ykonly depends statistically on the past observations that are received since the most

recent changepoint and not on the observations before that. For the models considered in this work we assume that pθ(yk|xk, y1:k−1) can be evaluated for any xk and y1:k (whenever

the conditional law is well defined). This assumption is satisfied by some important models (e.g. Caron et al. [2011]; Fearnhead and Vasileiou [2009]; Whiteley et al. [2009]), and allows us to focus inference on X1:n and θ given Y1:n as Z1:n may be integrated out.

X → [0, ∞) as Gθ,k(xk) = R πθ,mk(zj) Qk i=j+1fθ,mk(zi|zi−1) Qk i=jgθ,mk(yi|zi)dzj:k R πθ,mk(zj) Qk−1 i=j+1fθ,mk(zi|zi−1) Qk−1 i=j gθ,mk(yi|zi)dzj:k−1 , j = max(k_−dk+1, 1).

(Gθ,k is introduced for brevity.) Note that Gθ,k(xk) is precisely pθ(yk|xk, y1:k−1) at values

of xk where the latter is well defined. We can now express the probability density of the

observed process, or likelihood, succinctly as

pθ(y1:n) = Eθ " _n Y k=1 Gθ,k(Xk) # .

In document Maximum likelihood parameter estimation in time series models using sequential Monte Carlo (Page 95-97)