Embarrassingly Parallel - Detecting abrupt changes in big data

Binary Segmentation (Scott and Knott, 1974; Vostrikova, 1981) and its variants: Cir- cular Binary Segmentation, CBS, (Olshen et al., 2004) and Wild Binary Segmentation,

WBS, (Fryzlewicz, 2014) fall into the category of embarrassingly parallel methods. That is it is obvious how to split the algorithm into smaller independent subtasks.

5.2.1 Binary Segmentation

In BS we initially search the whole dataset for one changepoint, i.e., the point,τ, that satisfies the condition in (5.2) and also minimises the left hand side of (5.2),

C(y1:τ) +C(yτ+1:n) +β <C(y1:n), (5.2)

where C(ys:t) is the cost from the data ys, ..., yt. The data is then split at τ and we

search for a changepoint in the two new segments independently. This continues until no more changepoints are detected. Traditionally the calculations for the changepoints are computed in a loop over all of the segments on one processor. The computational cost may be improved by sending these calculations to multiple cores.

In theory this should speed up the calculations however BS is O(nlogn) so the overhead of scheduling the tasks and returning the results may mean it is not worth implementing in parallel. Additionally, the speed up will be more noticeable in sit- uations where there are a large number of changepoints since at later stages of the algorithm more segments will be searched over for a change, so having multiple pro- cessors may be beneficial.

We explore the performance of parallelising BS in a couple of examples: one in which the number of changes is constant with increasing data length and one where the number of changes increases with increasing data length. In the first example we simulate the data from a blocks-signal (Donoho and Johnstone, 1994), withm = 11 changepoints for all data lengths. The signal has some Gaussian noise with variance equal to 1. We replicate this 100 times and the average time to run BS with different number of cores is shown in Figure 5.1a. The main thing to note from this is the time

CHAPTER 5. PARALLEL CHANGEPOINT DETECTION 93 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 50 250 500 750 1000 2 3 4 5 1 2 3 4 Number of Cores Time (s)

Binary Segmentation Time on Blocks Data

(a) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 75 250 500 750 4 8 12 1 2 3 4 Number of Cores Time (s)

Binary Segmentation Time on Random Data

(b)

Figure 5.1: Computational time taken to run Binary Segmentation in parallel over multiple cores. (a) Is the blocks-signal with 11 changepoints for all data lengths. (b) Is the random data with increasing changepoints as the data length increases.

slowly increases with increasing number of cores. In this example there is not any benefit of parallelising.

In the second example we simulate data-sets with the number of changes increasing with increasing data length; for data-sets of lengthnthere will bem=n/100 changes. We simulate the changepoints uniformly over time with the constraint that there must be atleast 20 time-points apart. To simulate the data we generate the segment means from a Gaussian distribution with mean 0 and standard deviation 5. We replicate this 100 times and the average time to run BS with different number of cores is shown in Figure 5.1b. This time there are marginal gains in speed if using 2 cores for larger data-sets. However these speed improvements are really small so it is probably not worth parallelising.

Variants of Binary Segmentation

Binary Segmentation lacks consistency due to its greedy nature. Fryzlewicz (2014) look at the asymptotic properties of BS with the cumulative sums (CUSUM) test statistic (Page, 1954) and show that as the number of data-points,n → ∞, then BS is only asymptotically guaranteed to identify the true changepoints if the minimum

segment length is O(n3/4). Specifically changepoints are likely to be missed if they are close to another changepoint. Fryzlewicz (2014) propose a method, Wild Binary Segmentation (WBS) that aims to overcome the lack of consistency of BS. At each stage of BS, instead of calculating the global cost C(y1:n), WBS randomly draws a

number of sub-samples, ys:e, where 1 ≤ s < e ≤ n, and detects a candidate change-

point within each sub-sample. The changepoint within each sub-sample that has the overall minimum cost is found to be the new changepoint and the data is now split here and the process is repeated, similar to BS.

The number of sub-samples chosen at each stage will affect the overall cost of this method. To improve the cost of WBS even further the computation of the single changepoints from the different sub-samples at each stage can be done over multiple cores. Thus WBS is trivially parallel and will be more amenable to parallelisation that BS since there are multiple calculations at every step of WBS that can be run simultaneously.

Another approach, Circular Binary Segmentation (CBS), was proposed by Olshen et al. (2004). This method uses an epidemic test statistic (Levin and Kline, 1985) to test for two changepoints in a segment instead of one as in the standard BS. The test statistic assumes that the mean before the first change and after the last change are the same. This is essentially the same as joining the endpoints of the segments, to make a circle, and then testing the mean of the arc between the changepoints against the mean of the compliment. To calculate the p-values in a non-normal setting they use a permutation approach to calculate reference distributions, however this is a computationally expensive approach which quadratically grows the with the number of changes.

Venkatraman and Olshen (2007) propose a couple of ways to speed up the computation of CBS however for additional speed up the permutation calculations at each stage of the algorithm can easily be parallelised.

CHAPTER 5. PARALLEL CHANGEPOINT DETECTION 95

In document Detecting abrupt changes in big data (Page 105-109)