• No results found

2.3 Bayesian inference for changepoint models

2.3.1 Exact online inference

Following the paper of Fearnhead and Liu [2007] we model the data through a hidden state process, C1:n. This hidden state process will contain information about where the change-

points of the data are located. Our model is defined through specifying the distribution of the hidden state process, p(c1:n), and then the conditional distribution of the data given the

state process, p(y1:n|c1:n).

Our interest lies in inference about this hidden state process given the observations which involves calculating the posterior distribution for the states

p(c1:n|y1:n)∝p(y1:n|c1:n)p(c1:n). (2.3.2)

point prior to time t.

We model Ct as a Markov process, conditional on Ct, either Ct+1 = ct, which corresponds

to no changepoint at time t, or Ct+1 =t, if there is a changepoint at time t. We need that

p(Ct+1 = t|ct) only depends on ct. Thus Ct ∈ {0, . . . , t −1} with Ct = 0 meaning that

the current segment is the first segment. This Markov process is determined by a set of transition probabilities which depend only on the distance between the current time t and the last changepoint.

Due to this process being Markov we can decompose p(c1:n) into factors

p(c1:n) =p(C1 =c1)

n−1 Y

i=1

p(Ci+1 =ci+1|Ci =ci). (2.3.3)

The decomposition in (2.3.3) gives us two aspects of the process to define, namely the tran- sition probabilities p(Ci+1 =ci+1|ci) and the initial distribution, p(C1 =c1).

Firstly consider the transition probabilities. Now either Ct+1 =Ct orCt+1 =t depending on

whether a new segment starts between time t and t+ 1. The probability of a new segment starting is just the conditional probability of a segment being of length t−Ct given that is

at least t−Ct. The probability of a segment continuing is the conditional probability of a

segment having a length greater than t−Ct given that is at least of lengtht−Ct.

Let G(·) be the distribution function of the distance between two successive change points then the transition probabilities can be written down for p(Ct+1 = j|Ct = i) where i =

1, . . . , t−1 as p(Ct+1 =j|Ct=i) =                1−G(t−i) 1−G(t−i−1) if j =i G(t−i)−G(t−i−1) 1−G(t−i−1) if j =t 0 otherwise

This hidden process partitions the time interval into contiguous non-overlapping segments. Using this we want to define a likelihood for the observations conditional on this process

p(y1:n|c1:n), in (2.3.2). To make this model tractable, so we can write down a set of recursions,

we assume a conditional independence between segments:

p(y1:t|Ct=j) = p(y1:j|Ct=j)p(y(j+1):t|Ct=j)

We can then define, for all t < s,

P(t, s) =p(yt:s|Cs=t−1). (2.3.4)

Conditional on the hidden states c1:n the likelihood is

p(y1:n|c1:n) = n Y t=1 p(yt|c1:n,y1:t−1) = n Y t=1 p(yt|ct,y(ct+1):(t−1)). (2.3.5)

The terms in (2.3.5) can be written as

p(yt|ct,y(ct+1):(t−1)) =

P(ct+ 1, t) P(ct+ 1, t−1)

The main set of recursions is now derived that enable us to calculate the exact posterior numerically. There are two separate cases the first where j < t

p(Ct+1 =j|y1:(t+1))∝p(yt+1|y1:t, Ct+1 =j)p(Ct+1 =j|y1:t)

= P(j+ 1, t+ 1)

P(j+ 1, t) Pr(Ct+1 =j|Ct=j)p(Ct=j|y1:t). Then the second where j =t

p(Ct+1 =t|y1:(t+1))∝p(yt+1|y1:t, Ct+1 =j)p(Ct+1 =j|y1:t) =P(t+ 1, t+ 1) t−1 X i=0 Pr(Ct+1 =t|Ct=i)p(Ct=i|y1:t). If we define wt(+1j) =        P(j+1,t+1) P(j+1,t) if j < t P(t+ 1, t+ 1) if j =t

Then we can rewrite the set of recursions above more simply as

p(Ct+1 =j|y1:(t+1))∝        wt(+1j) 11−GG(t(t−ii)1)p(Ct =j|y1:t) if j < t w(tt+1) Pt−1 i=0 G(ti)G(ti1) 1−G(t−i−1) p(Ct=i|y1:t) if j =t (2.3.7)

Rewriting the recursions in the form shown in (2.3.7) enables us to calculate the posterior distribution ofCt+1 by propagating the posterior for Ct and adding on another support point

for a changepoint at time t.

For many simple models such as a change in mean the weights wt+1 can be calculated effi-

yj+1:t. These summaries can often be calculated and stored before we begin calculating the

recursions and then updated recursively. Indeed for such models the computational cost of calculating any such wt+1 is fixed, and does not increase witht−j.

Simulation

Given that we calculate and store the filtering distributions p(ct|y1:t) for all t = 1, . . . , n,

simulating from the full joint posterior is straightforward. This is done backwards in time by first simulating the last changepoint in the data cn and repeating this until we get to the

beginning of the data.

To simulate one realisation from this joint density:

1. Sett0 =n, and k = 0.

2. Simulate tk+1 from the filtering density p(Ctk|y1:tk), and set k =k+ 1.

3. Iftk >0 return to (2); otherwise output the set of simulated changepoints,tk1, tk2, . . . , t1.

A simple extension of this algorithm allows for efficient simulation of a large sample of realisations of sets of changepoints in a parallel manner. This is described in more detail in Fearnhead [2006].