Functional Pruning - Optimal Partitioning for Piecewise Linear Data

4.3 Optimal Partitioning for Piecewise Linear Data

4.3.2 Functional Pruning

Our algorithm relies on computing and storing cost functions for each possible changepoint vector, τ . This is computationally expensive and so we utilise pruning methods to discard potential changepoint

CHAPTER 4. OPTIMAL CHANGEPOINT DETECTION FOR PIECEWISE LINEAR DATA 68

vectors which can be shown to never be optimal. This will result in less functions to be updated at each time step increasing the overall efficiency of the algorithm.

One way we can prune these candidate changepoint vectors from the minimisation problem is when they can be shown to be dominated by other vectors for any given value of φ. Similar approaches are found in Rigaill (2015) and Maidstone et al. (2016) for independent segment models and is known as functional pruning.

In Theorem 3 we show how if a candidate changepoint vector, τ is not optimal at time s for any value of φ, then the related candidate changepoint vector (τ , s) (the concatenation of τ and s) is not optimal for any value of φ at time t where t > s. If this is the case, the vector (τ , s) can be pruned from the candidate changepoint set.

First we define the setT∗tas the set of changepoint vectors that are optimal for some φ at time t ∗

Tt=τ ∈ Tt: ft(φ) = fτt(φ), for some φ ∈ (−∞, ∞) , (4.9)

where Ttis the set of all possible changepoint vectors at time t. If a candidate vector τ is not in this set

at time s then the related candidate vector (τ , s) is not in the set at time t. This means that at time t we will need to store only the functions fτt(φ) corresponding to segmentations that are in

∗

Tt. Empirically

we show in Section 4.4.1 that, for most data sets, the size ofT∗tremains roughly constant as we increase

Theorem 3 If τ /∈T∗sthen (τ , s) /∈ ∗

Tt for all t > s.

The proof of Theorem 3 works by contrapositive. We show that if (τ , s) ∈

∗

Tt then a necessary

condition of this is that τ ∈

∗

Ts, taking the contrapositive of this gives Theorem 3.

Proof. Assume (τ , s) ∈

∗

Tt, then there exists φ such that

CHAPTER 4. OPTIMAL CHANGEPOINT DETECTION FOR PIECEWISE LINEAR DATA 69

Now for any φ∗,

fs(φ∗) + C(ys+1:t, φ∗, φ) + β ≥ min φ0_,r[f r_(φ0_{) + C(y} r+1:t, φ0, φ) + β] , = ft(φ), = f_{(τ ,s)}t (φ), = min φ00 {f s τ(φ00) + C(ys+1:t, φ00, φ) + β} , (4.10) = f_τs(φA) + C(ys+1:t, φA, φ) + β,

where φA is the value of φ00 which minimises (4.10). As φ∗ can be chosen as any value, we can choose it as φA_{. By cancelling terms we get f}s_(φA_{) ≥ f}s

τ(φA) and hence (from (4.7)), fs(φA) = fτs(φA) and

therefore τ ∈ T∗s. We have shown that if (τ , s) ∈ ∗

Tt then τ ∈ ∗

Ts, by taking the contrapositive the

theorem holds.

The key to an efficient algorithm will be a way of efficiently calculating

∗

Tt. We can use the above

theorem to help us do this. From Theorem 3 we can define a set

ˆ Tt= (τ , s) : s ∈ {0, . . . , t − 1}, τ ∈ ∗ Ts , (4.11)

and we will have that ˆTt⊇ ∗

Tt. So assume that we have calculated the sets ∗

Ts for s = 0, . . . , t − 1. We

can calculate ft

τ(φ) only for τ ∈ ˆT . When calculating ft(φ), as defined by (4.7), we can just minimise

over the set of changepoint vectors in ˆTt rather than the full set. Furthermore we can calculate which

of the sets of changepoints in ˆTt contribute to this minimum and remove those that do not contribute.

The remaining sets of changepoints define

∗

Tt.

To find out which sets of changepoints, τ , contribute to the minimisation of (4.7) we store the interval (or set of intervals) of the parameter space for which it is optimal. We define this interval as follows

Inttτ = φ : fτt(φ) = min τ0_{∈ ˆ}_T t fτt0(φ) . (4.12)

For a given t the union of these intervals over τ is just the real line (as for a given φ at least one changepoint vector τ corresponds to the optimal segmentation). Using this we can derive a simple algorithm for updating these intervals. We initialise the algorithm by setting the current parameter value as φcurr = −∞ and comparing the cost functions in our current set of candidates (which we initialise

CHAPTER 4. OPTIMAL CHANGEPOINT DETECTION FOR PIECEWISE LINEAR DATA 70

where fτt next intercepts with fτtcurr (smallest value of φ for which f

τ(φ) = fτtcurr(φ) and φ > φcurr) and store this as xτ. If for a τ ∈ Ttemp we have xτ = ∅ (i.e. fτt doesn’t intercept with fτtcurr for any φ > φcurr) then we remove τ from Ttemp. We take the minimum of xτ (the first of the intercepts) and

set it as our new φcurr and the corresponding changepoint vector that produces it as τcurr. We repeat

this procedure until the set Ttempconsists of only a single value τcurr which is the optimal segmentation

for all future φ > φcurr. This method is given in full in Algorithm 5.

Having calculated Intt_τ for all τ ∈ ˆT we can use these to calculate T . We remove τ from ˆ∗ T if Inttτ = ∅ and after doing this for all τ ∈ ˆT we are left with precisely those values of τ which make up

∗

T . This is used to recursively calculate ˆTt+1

ˆ Tt+1= ˆTt∪ (τ , t) : τ ∈ ∗ Tt . (4.13)

The computational cost for updating the intervals Intt

τ for all τ ∈ ˆTt depends on the number of

elements in ˆTt and the number of intervals which the real line is partitioned into by Inttτ. The size

of this partitioning is bounded above by 2|

∗

Tt|(| ∗

Tt| − 1) + 1 (as | ∗

Tt| is the number of quadratics that

partition the space and the bound is given by the number of possible intersections of these quadratics). Further the size of set ˆTtis t times the size of

∗

Tt−1(from the definition of ˆTt in equation (4.11)) and so

the overall computational cost for updating the intervals is bounded by O(|

∗

Tt|2× t| ∗

Tt−1|).

In Section 4.4.1 we will show that, in practice, |T∗t| is roughly constant. This means that empirically

the computational cost for updating Intt

τ for all τ ∈ ˆTtis O(n).

In document Efficient analysis of complex changepoint problems (Page 77-80)