4.7 Pruning Changepoint Vectors
4.7.2 Subset Pruning
We have seen how retrospective pruning can be used to remove previous changepoint vectors from future considerations. However, supposing we are at some current time-
CHAPTER 4. MULTIVARIATE CHANGEPOINT DETECTION 102 point τ∗ within the algorithm, this method of pruning does not prune any of the
cτ∗ ∈ ¯Cτ∗ which each have to be considered at τ∗. Pruning these vectors would
reduce the amount of vectors cτ∗ ∈ ¯Cτ∗ for which hc
τ ∗(c) has to be calculated for
each c ∈ Cτ∗−1(cτ∗). Within this section we introduce further theory which allows for
the pruning of such vectors at each time-point τ∗, which we refer to herein as subset
pruning.
Before continuing, we define some new notation in order to accommodate this theory. We use fj(t) to denote the minimum cost from time 0 up to time t in variable
j, including the α penalties but not the β penalties. We exclude these because fj(t)
represents a univariate cost, whereas β represents a multivariate penalty. Also, recall that for some changepoint vector c ∈ Cn, M(c) is the number of changepoint locations
occurring in any variable up to and including those in c. Hence, for some changepoint vector (t1, t2, . . . , tp), we can decompose F (·) as follows:
F (t1, t2, . . . , tp) = p X j=1 fj(tj) + βM (t1, t2, . . . , tp) .
Further, for a given J ∈ {1, . . . , p}, we use ¯CJ
τ∗ to denote the distinct subsets of ¯Cτ∗
such that ¯CτJ∗ contains only the cτ∗ ∈ ¯Cτ∗ which have J variables changing at time
τ∗, so that Pp
j=1I(c
j
τ∗ = τ∗) = J. This can be expressed by
¯ CτJ∗ = cτ∗ ∈ ¯Cτ∗ : p X j=1 I(cjτ∗ = τ∗) = J . (4.7.6)
Note that ¯Cτp∗ = {(τ∗, τ∗, . . . , τ∗)}. For ease of notation, we define P to be the set of
all variables, so that P = {1, . . . , p}.
The motivation behind subset pruning is the consideration of the following sce- nario. Suppose that we have some p-variate series X of length n, time-points w and
τ∗ such that τ∗ < w, and some cw ∈ ¯Cw. Suppose further that we make the assump-
tion that the minimum cost to cw from the changepoint vector (τ∗, τ∗, . . . , τ∗) is lower
than the minimum cost from all changepoint vectors cJ ∈ ¯CτJ∗, for some J ∈ P with
(τ∗, τ∗, . . . , τ∗) to c
w is lower that the minimum cost from all ci ∈ ¯Cτi∗, for i < J, to
cw. If such a property holds true, then this would allow for the pruning of different
subsets of affected variables, depending on the number of variables they contain which are changing at τ∗.
We will see in the following proposition that this characteristic does indeed hold under certain conditions. Before examining this result, it is necessary to introduce some further notation. For a given time-point τ∗ and changepoint vector c
τ∗, define
Pτ∗(cτ∗) to be the set of variable indices of cτ∗ such that cjτ∗ = τ∗, so that |Pτ∗(cJ)| = J
for each cJ ∈ ¯CτJ∗. That is,
Pτ∗(cτ∗) =
n
j ∈ P : cjτ∗ = τ∗
o
. (4.7.7)
Finally, for a given cτ∗ ∈ ¯CJ ∗
τ∗, for J < J∗ define the following set:
EτJ∗(cτ∗) = n c ∈ ¯CτJ∗ : cj ≤ cjτ∗ ∀ j ∈ P o , (4.7.8) so that EJ
τ∗(cτ∗) is the set of previous time-point vectors which are ‘viable’ for being
changepoint vectors prior to cτ∗. Proposition 4.7.2 establishes that, under certain
conditions regarding the changepoint vectors with one variable changing at some time- point τ∗, then we can prune the changepoint vectors which have i variables changing
at τ∗.
Proposition 4.7.2. Suppose that for some J ∈ {1, . . . , p} and each cJ ∈ ¯CτJ∗, we
have for every cJ −1 ∈
n
EτJ −1∗ (cJ) : cjJ −1 = cjJ ∀ j ∈ P \ Pτ∗(cJ)} that
hcw(cJ) < hcw(cJ −1) (4.7.9)
for some future vector cw ∈ ¯Cw, where w > τ∗.
Suppose further that we have changepoint vectors {cJ −1,j1∗, cJ −1,j2∗, . . . , cJ −1,ji∗} ∈
EτJ −1∗ (cJ) such that for each x = 1, . . . , i, we have cj ∗ x J −1,j∗ x = tj ∗ x and c j∗ x J = τ ∗ (with tj∗ x < τ ∗), and cj J −1,j∗ x = c j J for all j ∈ {P \ Pτ∗(cJ)}.
CHAPTER 4. MULTIVARIATE CHANGEPOINT DETECTION 104
Then if it holds that (i − 1)M(cJ) ≥Pix=1M(cJ −1,j∗
x), we have
hcw(cJ) < hcw(cJ −i) (4.7.10)
for every cJ −i ∈ {EτJ −i∗ (cJ) : cJ −ij = cjJ ∀ j ∈ P \ Pτ∗(cJ)}, i = 2, . . . , J − 1.
Proof. See Appendix A.3 for a full proof.
Proposition 4.7.2 implies that we do not need to calculate any of the hcw(cJ −i) for
any cJ −i. Hence, these cJ −i can be ‘pruned’ from our considerations for cw. Otherwise,
it is not necessarily true that hcw(cJ) < hcw(cJ −i), and so we are not able to use such
an inequality for pruning purposes.