Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors
Anonymous Author(s) Affiliation
Address email
Abstract
We propose a Bayesian model selection (BMS) boundary detection procedure using
1
non-local prior distributions for a sequence of data with multiple systematic mean
2
changes. By using the non-local priors in the Bayesian model selection framework,
3
the BMS method can effectively suppress the non-boundary spike points with large
4
instantaneous changes. Further, we speed up the algorithm by reducing the multiple
5
change points to a series of single change point detection problems. We establish
6
the consistency of the estimated number and locations of the change points under
7
various prior distributions. Extensive simulation studies are conducted to compare
8
the BMS with existing methods, and our method is illustrated with application to
9
the magnetic resonance imaging guided radiation therapy data.
10
1 Introduction
11
Traditional change point detection algorithms often handle the cases where the occurrent frequency of
12
the change points is relatively consistent across the signal. For example, the popular narrowest-over-
13
threshold (NOT) [1] algorithm is suitable under the setting where the different segments between the
14
change points have comparable lengths, while the stepwise marginal likelihood (SML) method [5]
15
works the best to identify the frequent change points. However, it is often case that the gaps between
16
the consecutive change points are dramatically different, while only ones with certain distances are of
17
interest. In this paper, we develop a computational efficient Bayesian model selection algorithm for
18
identifying the change points under this specific setting.
19
The inconsistent gaps between the change points can be observed from the signals generated by the
20
magnetic resonance imaging guided radiation therapy (MRgRT). One problem with MRgRT is that
21
when radiation traveled in the magnetic field, the dose level can be significantly enhanced near the
22
boundaries between different tissues or organs in human bodies. The Duke Mid-sized Optical-CT
23
System (DMOS) (See Figure A.1 in Appendix) was developed to identify the dose changes near the
24
region of this boundary artifact. The right panel in Figure A.1 contains one radiation dose, where we
25
can observe that the boundaries on and inside the dosimeter can be distinguished by noticeable peaks
26
of the radiation dose levels. In the experiment, multiple radiations enter the cylindrical dosimeter from
27
different directions, and a sequence of data ordered by their distances to the sources was collected.
28
Because the dosimeter is a circle and the cavity is in the middle, the radiation from different directions
29
would hit the boundary at similar distances away from their sources in the dosimeter.
30
In the MRgRT data, radiations in certain directions may experience temporary changes at non-
31
boundary locations, which may result from the abnormal status of the DMOS system rather than the
32
true dose changes. The temporary change points, appear in the data sequence as the spike points,
33
often mixed up with the ones on the boundary (the peak locations in Figure A.1) which makes the
34
boundary detection extremely challenging. Figures A.2 in Appendix shows the change points in the
35
MRgRT data identified by the popular NOT [1] and the SML [5] algorithms, respectively. It can be
36
Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017). Do not distribute.
seen that neither of the algorithms correctly identify the true boundaries. This motivate us to propose
37
a new algorithm in detecting the systematic changes when the segment length can have dramatic
38
differences.
39
As shown in the preliminary analysis of the MRgRT data, the local control of the discovery is crucial.
40
To avoid picking the spike points, we enforce minimal distances between the change points. Moreover,
41
we adopt the computational efficient local scan routine and propose a systematic two-stage procedure
42
to speed up the change point detection procedure. More specifically, the local scan method first
43
identifies the candidate points with minimal distance based on the local data, and then optimizes
44
an utility function to obtain the estimates for the locations and the total number of change points.
45
Because the change points are defined based on the mean changes in two consecutive segments, the
46
local data are sufficient to detect the systemic changes [6, 7, 8, 13, 14, 16, 18, 19].
47
To perserve the positive detection rate of the change points and reduce the false detection rate of the
48
non-change points, we utilize a Bayesian marginal likelihood function as the utility, and propose a
49
new Bayesian model selection (BMS) procedure for identifying change points. In particular, we show
50
that the selection consistency is achieved by choosing both the local [2, 3, 4, 17] and non-local priors
51
[11], whereas the convergence rate is faster under the non-local priors.
52
Our BMS procedure is cast in the model selection framework, which is faster than the dynamic
53
programming introduced under the SML framework. For example, for a single sequence in the
54
MRgRT data, BMS takes roughly 1.3 seconds and SML takes 3.1 seconds when the maximum
55
number of change points is capped at 100. The major factor that contributes to the efficiency of
56
BMS algorithm is the fact that BMS reduces the search space dramatically by selecting a small set of
57
candidate change-points. Further, once the candidate points are selected, BMS only needs to evaluate
58
two consecutive segments at a time, which allows the algorithms to be implemented in parallel.
59
2 Bayesian multiple change points detection
60
2.1 Probability model
61
Suppose there are p0true change points t1¤, . . . , ¤ tp0among n observations Yn tY1, . . . , Ynu.
62
As a convention, let t0 1 and tpp0 1q n 1. Denote λj tj 1 tjand λ minj0,...,p0λj.
63
We consider a set of Kn candidate points τ1, . . . , τKn, with τ0 1 and τKn 1 n 1, while
64
postponing the discussion on the candidate set selection to Section 2.4. Define nj τj 1 τj,
65
nI minj0,...,Kn1nj, nI ¤ λ. Let HpnIq tτj : j 1, . . . , Kn,|τj 1 τj| ¡ nIu denote
66
the set of all candidate points and T0pp0q ttj : j 1, . . . , p0u be the set of true change points.
67
The specification of the candidate points allows the BMS method to be implemented in a lower
68
dimensional space with the most influential points. It also guarantees that there are a sufficient
69
number of non-change points surrounding the true ones so that the consistency conditions are met.
70
The probability model takes the form of
71
Yl ντj l, lP rτj, τj 1q,
where the random errors lare independent with mean zero and variance σ2j. Further, we define
72
σ maxj0,...,p0σj.
73
For ease of exposition, we first consider the situation where the locations of the candidate change
74
points are given and T0pp0q HpnIq. Define ¯Yτj n1j1°τj1
lτj1Yl, which is the sample average
75
for thepj 1qth segment rτj1, τjq, j ¡ 1. Suppose the candidate point τkis not a change point,
76
then the points inrτk, τk 1q have the same mean as those in rτk1, τkq. If τk is a change point, we
77
expect a mean shift between the segmentsrτk, τk 1q and rτk1, τkq. Hence, we can formulate the
78
model and prior distribution for l¥ τ1as follows:
79
Yl ¯Yτk µk ξl, lP rτk, τk 1q, µk πpµkq, if τkis a change point,
µk 0, with probability 1, if τkis an nI-flat point,
where πpq is a prior density and ξl is a mean-zero error. Note that the observations in the first
80
segment are unchanged. Here the nI-flat point is defined as a non-change point which is at least nI 81
apart from any change points. We require the nI distance between the true change points and the flat
82
ones so that there are sufficient neighborhood samples to achieve the estimation consistency.
83
Let µk0be the true value of µk, and we assume |µk0| ¡ δ, where δ ¡ 0 is the lower bound of
84
the µk0, for the k’s with τk P T0pp0q. The prior distribution on µk is crucial for determining the
85
convergence rate of the BML procedure. We explore three types of priors: the local prior [9], the
86
non-local moment prior and the inverse moment prior in [11].
87
Local: πLpµq Np0, ω2q Moment: πMpµq µ2v{CM1{?
2π exppµ2{2q
Inverse moment: πIpµq sνq{2{Γpq{2sqµpq 1qexp pµ2{νqs( , where CM is the normalizing constant.
88
Let Mkrepresent the model that τkis the sole change point. We define the marginal likelihood with
89
the Gaussian kernel as
90
PrpYn|Mkq
Kn
¹
j1,jk τj¹11
lτj
exptpYl ¯Yτjq2u
» τk¹11
lτk
exptpYl ¯Yτk µq2uπpµqdµ.
The posterior model probability of Mk given Ynis
91
PrpMk|Ynq PrpYn|Mkq PrpMkq
°Kn
j1PrpYn|Mjq PrpMjq PrpYn|Mkq
°Kn
j1PrpYn|Mjq, (1) when Mjassumes a non-informative uniform prior, j 1, . . . , Kn. Note that it is not necessary for
92
Ynto be normally distributed to ensure the selection consistency in detecting mean changes. The
93
Gaussian kernel serves as a utility function, which tends to be large when the distances between the
94
true and the hypothetical segment means are small. Hence, as nÑ 8, PrpMk|Ynq approaches 1
95
when τkis indeed a change point and the τj’spj kq are nI-flat points.
96
2.2 Change point detection
97
We start with the simplest case that there is only one mean shift in the data, i.e., p0 1 is fixed
98
a priori. We select the candidate point τk associated with the largest PrpMk|Ynq among all the
99
candidates, which corresponds to the largest marginal likelihood PrpYn|Mkq. It can be shown that
100
PrpMk|Ynq
# 1
Kn
¸
jk
PrpYn|Mjq PrpYn|Mkq
+1 , where
101
PrpYn|Mjq
³ ±τj 11
lτj exptpYl ¯Yτj µq2uπpµqdµ
±τj 11
lτj exptpYl ¯Yτjq2u PrpYn|Mkq
³ ±τk 11
lτk exptpYl ¯Yτk µq2uπpµqdµ
±τk 11
lτk exptpYl ¯Yτkq2u .
The two marginal likelihoods indicate that the selection consistency is fully determined by the
102
evidence in favor of µk πpµkq and that of µj 0 for j k. The product on the right hand side
103
converges to 0 when nI grows to8 with the sample size. The candidate points retain enough samples
104
in the neighborhood to guarantee the convergency.
105
For the case with multiple change points (p0¡ 1), we select the points associated with the p0largest
106
PrpMk|Ynq, and the selection consistency is presented as follows.
107
Theorem 1. Let M tMk, τkP T0pp0qu. If the Bayes factor satisfies
108
PrpYn|Mjq Oppanjq (2)
forτjR T0pp0q, anj opp1q, and n1I{2δ{σ Ñ 8, then
109 ¸
MkPM
PrpMk|Ynq 1 OptKnanIexppnIδ2qu. (3) Hence whennI{logpnq Ñ c, 0 c ¤ 8, nI ¤ λ, we have
¸
MkPM
PrpMk|YnqÑ 1.p
The proof of Theorem 1 is delineated in the Appendix. Clearly, the selection consistency depends
110
on the convergence rate of anI, which is determined by the choice of the prior πpq. Lemmas 2–4
111
in the Appendix show that anj n1{2j when πpq πLpµq; anj nv1{2j if πpq πMpµq and
112
anj exptnsj{ps 1qu if πpq πIpµq. Hence, the selection consistency is achieved at the fastest
113
rate using the non-local inverse moment prior.
114
2.3 Number of change points
115
When p0is unknown, let Tppq be the set containing p points from the procedure described in Section
116
2.2. We define the marginal likelihood given Tppq, that is PrtYn|T ppqu as
117
¹
τjRT ppq τj¹11
lτj
exptpYl ¯Yτjq2u ¹
τkPT ppq
» τk¹11
lτk
exptpYl ¯Yτk µq2uπpµqdµ.
We can estimate the locations and the number of change points in two steps: (i) for any given
118
p, we obtain pTppq using the procedure described in the previous section, and (ii) we estimate p0 119
by pp via maximizing PrtYn| pTppqu with respect to p. In contrast to the procedure in [5] which
120
simultaneously estimates the locations and the number of change points by maximizing the marginal
121
likelihood with respect to Tppq and p, our BMS splits the estimation procedure into a scanning step
122
as described in Section 2.2 and an optimization step. This scanning–optimization mechanism reduces
123
the computational burden substantially, because the optimization in step (ii) is merely implemented
124
in a single dimension.
125
2.4 Candidate points selection
126
Previous discussions rely upon a critical assumption that the candidate points are specified in advance,
127
which, however, is often infeasible in practice. To facilitate the implementation of the BMS, we
128
need to find a candidate set HcpnIq that is close to HpnIq. For the selection consistency of the
129
change points, we require for each tj there is a τk P HcpnIq, such that Prp|tj τk| ¤ nIq
130
1 OprmintexppnIδ2q, anIus. Define
131
Ri
³ ±i nI1
li exptpYl ¯Yi µq2uπpµqdµ
±i nI1
li exptpYl ¯Yiq2u , where ¯Yi pnI 1q1°i1
jinIYj. By the argument similar to that in Lemma 1, Rigoes to infinity
132
when i is a true change point, and RiÑ 0 in probability, when i is an nI-flat point. Hence, the value
133
of Rican distinguish a change point from a set of nI-flat points. To further eliminate the non-change
134
points that are also not nI-flat, we implement the non-maximum suppression that removes away the
135
points which do not give the largest Ri’s in their nI-neighborhood. More specifically, the screening
136
procedure of selecting candidate points is described as follows.
137
Algorithm 1 : Screening
(1) For each i inrnI, n nIs, compute Ri.
(2) If Ri maxtRj: jP pi nI, i nIsu, i is selected as a candidate point.
(3) After scanning through the data sequence, a set of Kncandidate points, denoted as HcpnIq, is obtained.
This screening algorithm is comparable to that in [15], because by the Laplace approximation, we
138
can write
139
Ri Dn
±i nI1
li exptpYl ¯Yi µq2uπpµq
±i nI1
li exptpYl ¯Yiq2u t1 opp1qu
Dnexp
# 2
i n
I1
¸
li
Yl
i¸1 jinI
Yj
µ nIµ2 +
πpµqt1 opp1qu,
where Dn is a constant of order Oppn1{2I q and µ is the maximizer of °i nI1
li pYl ¯Yi
140
µq2 logπpµq. Clearly, the magnitude of the leading term in Ri is strongly associated with
141
n1I °i nI1
li Yl°i1 jinIYj
, which is the local diagnosis function with h nI in [15].
142
Next, we show the screening procedure identifies a set HcpnIq that would lead to the change point
143
consistency.
144
Proposition 1. Assume n1I{2δ{σ Ñ 8. For each tj P T0pp0q, there is a τ P HcpnIq such that
145
Prttj P pτ nI, τ nIqu 1 OrmintexppnIδ2q, anIus.
146
In theory, i tjmaximizes Riin the nI-neighborhood of tjasymptotically. By selecting the local
147
maximal Riin the screening procedure, HcpnIq would cover the nI-neighborhood of T0pp0q as
148
nÑ 8. Also the condition n1I{2δ{σ Ñ 8 indicates that the effect size cannot be too small in order
149
to find the candidate points around the true change points. After selecting the candidate points, we
150
perform the refinement step to identify the locations and the total number of change points.
151
Algorithm 2 : Refinement Scanning
(1) Compute PrpYn|Mkq by scanning over all the candidate points in HcpnIq.
(2) For each p, we obtain a set of change points pTppq corresponding to the p largest PrpYn|Mkq, k 1, . . . , Kn.
Optimization
(3) Selectpp that maximizes PrtYn| pTppqu.
Theorem 2. Assume that nI{logpnq Ñ c, 0 c ¤ 8, n1I{2δ{σ Ñ 8, lim supnÑ8nI{λ 1{2, and equation (2) holds. LetHcpnIq be the set containing candidate points such that |τk 1τk| ¡ nI, and for eachtj there is aτk P HcpnIq, Prp|tj τk| ¤ nIq 1 OprmintexppnIδ2q, anIus, Then,
Prppp p0q 1 OprmaxtexppnIδ2q, anIus, and furthermore,
152
Pr
# sup
ptjP pTpppq
inf
tjPT0pp0q|pptj tjq{n| ¤ nI{n +
1 OtexppnIδ2qu.
and
153
Pr
# sup
tjPT0pp0q
ptjP pinfTpppq|pptj tjq{n| nI{n +
1 OpanIq.
Theorem 2 shows that BMS controls both the over- and under-segmentation errors. The in-
154
trinsic rationale is that for any Tppq different from T0pp0q, there is at least a chosen point
155
τ P T ppq whose nI-neighborhood does not contain true change points. Then the likelihood ra-
156
tio PrtYn|T ppqu{ PrtYn|T0pp0qu goes to 0 with probability 1, because the ratio contains at least
157
one of PrpYn|Mjq and PrpYn|Mjq1 for τk P T0pp0q and τj R T0pp0q, which converges to 0 in
158
probability by Lemma 1 and (2).
159
For a given p, each multiplicand in PrtYn| pTppqu can be obtained and stored through the scanning
160
step. The optimization step essentially picks the maximal PrtYn| pTppqu among a set of known
161
quantities, which avoids intensive optimization procedures. Because the computational time for
162
PrpYn|Mkq grows at the speed of Opnq for k 1, . . . , Kn, the computational time for the refinement
163
stage grows with the sample size at the speed of OpnKnq.
164
3 Simulation
165
3.1 The sequence without spikes
166
To evaluate the performance of the proposed BMS method in the setting without spike points, we
167
generate data from two different models. Model I takes the form of
168
Yi hTJpxiq σi
where the error term i Np0, 1q, σ 0.5, and h p2.01, 2.51, 1.51, 2.01,
2.51,2.11, 1.05, 2.16, 1.56, 2.56, 2.11qT with p0 11. We set Jpxiq tp1 sgnpnxi tjqq{2, j 1, . . . , p0uT, and the xi’s are equally spaced points on [0, 1]. The true change points are
ptj{n, j 1, . . . , p0q p0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81q.
The random errors are generated from three distributions: the standard normal distribution Np0, 1q;
169
Student’s t distribution with 5 degrees of freedom tp5q, which is standardized to have a unit variance;
170
and the log-normal distribution LNp0, 1q, which is the exponential of the standard normal distribution
171
and also standardized to have a unit variance. Model II considers a heteroscedastic error term across
172
the segments. The data generating model is
173
Yi hTJptiq σi 1T¹Jptiq
j1
vj
wherepvj, j 1, . . . , 11q p1, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5q. Other model specifications remain the same as those in model I. We define the over- and under-segmentation errors as dp pGn|Gnq and dpGn| pGnq respectively,
dp pGn|Gnq sup
bPGn
inf
aP pGn|a b|, dpGn| pGnq sup
bP pGn
inf
aPGn
|a b|.
For the BMS procedure, we consider three different priors for πpq, corresponding to the local prior,
174
non-local moment prior and non-local inverse moment priors. To save the space, we only present
175
the results by using the non-local inverse moment prior. We take nI tlogpnqu1.5h, where h¥ 0.5
176
generally works well in the simulations.
177
For a comprehensive comparison with existing methods, we consider BMS under the non-local
178
inverse moment prior πIpq with q v 2 and s 6, PELT [12], WBS [7], NOT with normal or
179
heavy-tail distributions [1] and SML [5]. Table 1 summarizes the numerical results under model I
180
and model II with normal, Student’s t, and log-normal error distributions and their heteroscedastic
181
counterparts, respectively. On average, BMS performs the best in selecting the number of change
182
points and balancing both over- and under-segmentation errors. It is expected that the performances
183
of WBS, PELT, and SML deteriorate when the errors do not follow a normal distribution, because
184
all the three procedures heavily rely on the parametric model assumptions, and hence they are not
185
robust to model mis-specifications. In contrast, both BMS and NOT behave well under various error
186
distributions. Also, NOT and SML perform the best in controlling the over-segmentation errors,
187
while the resulting estimatorpp tends to be larger than the true p0. On the other hand, the BMS allows
188
for slightly larger over-segmentation errors in order to maintainpp to be more concentrated around p0.
189
3.2 The sequence with spike points.
190
In addition, we illustrate the features of the BMS, NOT and SML methods on the sequences contami-
191
nated with spike points. Assume the noises are normal, we generate 500 sequences each contains
192
n 1000 points with mean changes at 0.01 and 0.01 on the 400 and 440’s observations, respec-
193
tively. We set the noise standard deviation to be 0.002. Further, we generate 10 random samples
194
uniformly in the range ofp0.07, 0.08q and p0.07, 0.08q, and add them to the original sequence at
195
random locations to form up the spike points. Note that we choose these parameters to mimic the
196
real data setting. We implement BMS, NOT, and SML on the simulated samples. In BMS, we select
197
nI 12, which is the largest integer that smaller than 0.65tlogpnqu1.5.
198
From Table 2, we can seen that the BMS is insensitive to the spike points with the smallest|pp p0|
199
on average. In the experiments (not show) NOT ignores both the change points with small signal
200
Table 1: Comparison results averaged over 200 simulations among the BMS, PELT, WBS, NOT and SML methods under model I and model II under different error distributions: the standard normal distribution Np0, 1q, Student’s tp5q, and log-normal LNp0, 1q with constant variances; and the corresponding distributions with heteroscedastic variances. Standard deviations are in parentheses.
Error Method pp p0 dpGn| pGnq dp pGn|Gnq
Distribution ¤ 3 2 1 0 1 2 ¥ 3
Np0, 1q BMS 0 0 1 197 2 0 0 2.41 (6.06) 1.96 (3.94)
PELT 0 1 37 162 0 0 0 0.91 (1.19) 6.32 (11.92)
WBS 0 0 0 194 6 0 0 1.22 (4.13) 0.86 (0.79)
NOT 0 0 0 192 7 1 0 1.93 (8.04) 0.75 (0.80)
SML 0 0 0 132 52 13 3 12.94 (42.98) 0.78 (0.90)
tp3q BMS 0 0 8 190 2 0 0 2.15 (5.76) 2.83 (7.01)
PELT 0 4 31 165 0 0 0 0.95 (1.03) 6.24 (12.24)
NOT 0 0 3 184 3 6 4 7.57 (27.70) 1.51 (2.57)
SML 0 0 0 42 34 44 80 40.13 (53.68) 0.88 (0.87)
LNp0, 1q BMS 0 0 12 180 6 1 2 3.69 (12.12) 3.11 (6.89)
PELT 1 2 21 135 15 23 3 12.10 (29.63) 7.22 (13.32)
NOT 0 1 4 183 7 1 4 6.06 (26.32) 1.18 (4.45)
SML 0 0 0 0 0 4 196 111.77 (52.05) 0.73 (1.33)
Heterosced BMS 0 0 13 176 8 3 0 3.69 (7.08) 3.88 (7.36)
Np0, 1q PELT 0 0 31 169 0 0 0 1.49 (1.53) 6.15 (11.44)
NOT 0 0 0 150 23 21 6 7.52 (12.60) 1.66 (1.68)
SML 0 0 0 119 55 20 6 6.75 (23.98) 1.37 (1.52)
Heterosced BMS 0 0 14 181 4 1 0 2.79 (4.87) 4.21 (8.17)
tp5q PELT 0 4 35 159 2 0 0 1.50 (1.83) 7.76 (13.48)
NOT 0 1 5 179 10 2 3 8.15 (25.01) 2.36 (4.70)
SML 0 0 0 43 22 41 94 26.74 (35.42) 1.27 (1.59)
Heterosced BMS 0 0 20 173 7 0 0 3.73 (11.72) 4.20 (7.85)
LNp0, 1q PELT 0 2 21 142 22 12 1 6.07 (16.09) 8.10 (13.93)
NOT 0 0 4 183 7 4 2 6.32 (24.67) 1.42 (3.70)
SML 0 0 0 2 1 0 197 68.05 (47.15) 0.87 (1.67)
noise ratio and the spike signals with small segment length. On the other hand, SML is sensitive to
201
the extreme value. To this end, BMS is the most suitable procedure for the MRgRT data, because on
202
the one hand it reinforces the minimal segment length to avoid the identification of the spike signal;
203
on the other hand it retains small minimal segment length to detect change points with short distance.
204
Table 2: Comparison results averaged over 500 simulations among the BMS, NOT and SML methods on the data sequences with spike points.
Method pp p0
¤ 1 0 1 2 ¥ 3
BMS 31 276 113 67 13
NOT 387 32 20 11 50
SML 0 0 0 0 500
4 MRgRT data
205
We illustrate the BMS method with the application to the MRgRT data which contains 2265 data
206
points ordered by the distances from the sources of the radiations dose. The R code for implementing
207
the method can be downloaded from our GitHub repository [10]. Throughout the implementation, we
208
use the non-local inverse moment prior πIpµq sνq{2{Γtq{p2squµpq 1qexp pµ2{νqs( with
209
q 2, ν 2. We set nI 13, which is the largest integer that smaller than 0.65tlogpnqu1.5.
210
We first vary s from 2 to 10. Figure 1 shows that when s is small, we identify more change
211
points than the true ones, and as s grows, the number of identified change points decreases. This
212
phenomenon is consistent with the result in Lemma 4 that the convergence rate for the non-local prior
213
is OptexppnsI{p1 sqqu. When s is small, the Bayes factor vanishes slowly and hence the algorithm
214
picks redundant change points. When s is sufficiently large, the convergence rate approaches to
215
OptexppnIqu, and hence the algorithm eliminates the flat points more effectively. In the following
216
analysis, we select s 10, with which the algorithm provides the best result in Figure 1.
217
-0.010-0.0050.0000.0050.010
Distance from the source Y -0.010-0.0050.0000.0050.010
Distance from the source Y -0.010-0.0050.0000.0050.010
Distance from the source
Y
0 12 24 35 47 59 71 83 95 123
s= 2
s= 5
s = 10
Figure 1: Change points detection when using different s 2, 5, 10
Interestingly, after we remove the spike points and kept the data within the range ofp0.01, 0.01q,
218
we implement the NOT, SML, and BMS on the truncated sequence. Figures 2 shows that the results
219
from the three methods are overlapped. The NOT and BMS methods have similar results, and both
220
outperform SML. This implies that removing the spike points improves the boundary detection
221
accuracy for all the three methods. However, the spike remover is infeasible in practice, because the
222
locations and the magnitudes of the spike are difficult to track in human bodies.
-0.010-0.0050.0000.0050.010
Distance from the source
Signal level
0 12 24 36 49 63 75 87 101
Figure 2: Change points detection after restricting the data in the range of (-0.01, 0.01). BMS: the red line solid; NOT: the blue dash line; SML: the brown two dash linear.
223
5 Conclusion
224
We propose the BMS method that consistently identifies multiple mean changes in a data sequence.
225
The BMS method removes the flat points effectively without sacrificing the detection accuracy.
226
Further, our method is particularly useful when the data sequence contains spike points, which are
227
not of interest. We apply the BMS to analyze the MRgRT data, for which the NOT, SMT and other
228
methods fail to but the BMS algorithm correctly detects the mean changes boundaries. We explore
229
the BMS performance with different tuning parameters, and the resulting patterns are consistent with
230
the theoretical properties. Moreover, we demonstrate that the BMS is robust to the error distributions
231
by evaluating the detection procedures on the sequence with different random errors.
232
References
233
[1] Rafal Baranowski, Yining Chen, and Piotr Fryzlewicz. Narrowest-over-threshold detection of
234
multiple change-points and change-point-like features. arXiv preprint arXiv:1609.00293, 2016.
235
[2] Francesco Bertolino, Walter Racugno, and Elías Moreno. Bayesian model selection approach to
236
analysis of variance under heteroscedasticity. The Statistician, pages 503–517, 2000.
237
[3] Caterina Conigliani and Anthony O’hagan. Sensitivity of the fractional bayes factor to prior
238
distributions. Canadian Journal of Statistics, 28(2):343–352, 2000.
239
[4] Fulvio De Santis and Fulvio Spezzaferri. Consistent fractional bayes factor for nested normal
240
linear models. Journal of statistical planning and inference, 97(2):305–321, 2001.
241
[5] Chao Du, Chu-Lan Michael Kao, and SC Kou. Stepwise signal extraction via marginal
242
likelihood. Journal of the American Statistical Association, 111(513):314–330, 2016.
243
[6] Peter Eiauer and Peter Hackl. The use of MOSUMS for quality control. Technometrics,
244
20(4):431–436, 1978.
245
[7] Piotr Fryzlewicz. Wild binary segmentation for multiple change-point detection. Annals of
246
Statistics, 42(6):2243–2281, 2014.
247
[8] Joseph Glaz, Joseph I Naus, Sylvan Wallenstein, Sylvan Wallenstein, and Joseph I Naus. Scan
248
Statistics. Springer, 2001.
249
[9] Harold Jeffreys. The theory of probability. Oxford University Press, 1998.
250
[10] Fei Jiang, Guosheng Yin, and Francesca Dominici. Bayesian multiple change points detection
251
with non-local priors. R code, 2017.
252
[11] Valen E Johnson and David Rossell. On the use of non-local prior densities in Bayesian
253
hypothesis tests. Journal of the Royal Statistical Society: Series B, 72(2):143–170, 2010.
254
[12] Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of change points with a
255
linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598,
256
2012.
257
[13] Claudia Kirch and Birte Muhsal. A MOSUM procedure for the estimation of multiple random
258
change points. Preprint, 2014.
259
[14] Marc Lavielle and Carenne Ludeña. The multiple change-points problem for the spectral
260
distribution. Bernoulli, 6(5):845–869, 2000.
261
[15] Yue S Niu and Heping Zhang. The screening and ranking algorithm to detect dna copy number
262
variations. Annals of Applied Statistics, 6(3):1306, 2012.
263
[16] Philip Preuss, Ruprecht Puchstein, and Holger Dette. Detection of multiple structural breaks in
264
multivariate time series. Journal of the American Statistical Association, 110(510):654–668,
265
2015.
266
[17] Stephen G Walker. Modern bayesian asymptotics. Statistical Science, pages 111–117, 2004.
267
[18] Yi-Ching Yao. Approximating the distribution of the maximum likelihood estimate of the
268
change-point in a sequence of independent random variables. Annals of Statistics, 15(3), 1987.
269
[19] Chun Yip Yau and Zifeng Zhao. Inference for multiple change points in time series via likelihood
270
ratio scan statistics. Journal of the Royal Statistical Society: Series B, 78(4):895–916, 2015.
271
Appendix
272 273
A Additional Figures
274
Figure A.1:Reconstructed image of a slice in a cylindrical dosimeter with a cavity in the middle (left) and a typical line profile through the center of the cavity (right). The radiation enter the dosimeter from the hole on the left of the cylindrical dosimeter. The cylindrical rotates 360 degree so that the radiation can enter from different direction.
-0.010-0.0050.0000.0050.010
NOT
Distance from the source Y -0.010-0.0050.0000.0050.010
SML
Distance from the source
Y
Figure A.2:The change point detection results from the SML with normal prior and the maximal number of change points to be 30 and NOT methods for the dose level changes in the DMOS system.
B Proofs
275
Let g1pµq and g2pµq denote the first and second derivatives of a generic function gpµq with respect to
276
µ respectively, and further define the utility function as
277
Ukpµq
τk¸11 lτk
pYl ¯Yτk µq2.
The following conditions are imposed for the theoretical derivations.
278
(A1) Assume µ to be in a closed set of points in R.
279
(A2) Assume πpµq to be a continuous density function with bounded first and second derivatives.
280
(A3) Assume that PrtYn|T ppqu has a unique maximizer in the neighborhood of T0pp0q.
281
Lemma 1. Assume that τkis a change point for which the mean ofYl ¯Yτksatisfies|µk0| ¡ δ for
282
δ¡ 0, n1k{2δ{σ Ñ 8, then there is a constant D ¡ 0 such that
283
nlimÑ8Pr
³ ±τk 11
lτk exptpYl ¯Yτk µq2uπpµqdµ
±τk 11
lτk exptpYl ¯Yτkq2u ¡ exppDnkδ2q
1.
Proof: By the definition of Ukpµq, we can write
284 ³ ±τk 11
lτk exptpYl ¯Yτk µq2uπpµqdµ
±τk 11
lτk exptpYl ¯Yτkq2u
³exptUkpµquπpµqdµ exptUkp0qu .
We first define Nδpµk0q tµ : |µ µk0| δu and Nδcpµk0q as its compliment, and then show
285
nlimÑ8Pr
sup
µPNδcpµk0qtUkpµq Ukpµk0qu Dnkδ2
1. (4)
Note that
286
Ukpµq Ukpµk0q
nk
#
pµ2k0 µ2q n1k pµk0 µq
τk¸11 lτk
2pYl ¯Yτkq +
nktpµ2k0 µ2q 2pµk0 µqµk0 Oppn1k{2|µk0 µ|σkqu
nktpµ µk0q2u Oppn1k{2|µk0 µ|σkq
¤ nkδ2 Oppn1k{2|µk0 µ|σkq
nkδ2{2 nkδ2{2 Oppn1k{2|µk0 µ|σkq.
As n1k{2δ{σ Ñ 8, we have nkδ2{2 Oppn1k{2|µk0 µ|σkq 0 with probability 1, and thus
287
nlimÑ8Pr
sup
µPNδcpµk0qtUkpµq Ukpµk0qu Dnkδ2
1.
When τkis a change point, let µ 0, because |µk0| ¡ δ, we have
288
nlimÑ8Pr
exptUkp0qu
exptUkpµk0qu exppDnkδ2q
1. (5)
By the Laplace approximation,
289 »
exptUkpµquπpµqdµ OprUk2prµq1{2exptUkprµquπprµqs, (6) whererµ is the maximizer of Ukpµq logtπpµqu, and Uk2prµq Oppnjq. Let pµ be the maximizer of
290
Ukpµq, and then
291
0 L1kprµq Blogπprµq{Bµ
L2kpµqprµ pµq Blogπprµq{Bµ,
where µis a point on the line segment betweenrµ and pµ. As πpµq has two bounded derivatives by
292
condition (A2), L2kpµq Oppnjq, we have rµ pµ Oppn1j q. Therefore, (6) can be written as
293 »
exptUkpµquπpµqdµ OprUk2ppµq1{2exptUkppµquπppµqs
OprUk2pµk0q1{2exptUkpµk0quπpµk0qs, (7) where the last equality holds becausepµ is the least squared estimator. This implies
294
exptUkpµk0qu
³exptUkpµquπpµqdµ Oppn1k{2q, (8) which in conjunction with (5) leads to
295
nlimÑ8Pr
³ exptUkp0qu
exptUkpµquπpµqdµ exppDnkδ2q
1.
By condition (A2) and the boundedness of Ukpµq, we have
296
nlimÑ8Pr
³exptUkpµquπpµqdµ
exptUkp0qu ¡ exppDnkδq
1, which completes the proof.
297
Lemma 2. Let πpµq πLpµq be a local prior, and assume that τjis not a change point, i.e.,µj0 0,
298
then
299
³ ±τj 11
lτj exptpYl ¯Yτj µq2uπpµqdµ
±τj 11
lτj exptpYl ¯Yτjq2u Oppn1{2j q.
Proof: By the definition of Ujpµq, we can write
300
³exptUjpµquπpµqdµ exptUjp0qu
³ ±τj 11
lτj exptpYl ¯Yτj µq2uπpµqdµ
±τj 11
lτj exptpYl ¯Yτjq2u . Using the same argument as that leading to (7) with µj0 0, we have
301
»
exptUjpµquπpµqdµ OprU2p0q1{2exptUjp0quπp0qs.
Since U2p0q1{2 Oppn1{2j q, and πp0q is a bounded density, we have
302
³exptUjpµquπpµqdµ
exptUjp0qu Oppn1{2j q.
Lemma 3. Let
303
πpµq πMpµq µ2v CM
πbpµq,
whereCM is a normalizing constant,πbpµq with πbp0q ¡ 0 is the base prior density with 2v finite
304
moments, and bounded first two derivatives in the neighborhood around0. Assume that τj is not a
305
change point, i.e.,µj0 0, then
306
³ ±τj 11
lτj exptpYl ¯Yτj µq2uπpµqdµ
±τj 11
lτj exptpYl ¯Yτjq2u Oppnv1{2j q.
Proof: We can write
307
» τj¹11
lτj
exptpYl ¯Yτj µq2uπpµqdµ
»
exptUjpµq logπpµqudµ.
Let hpµq Ujpµq logπpµq 2vlogpµq logtπbpµqu Ujpµq, and let rµ be the maximizer of
308
hpµq, then we have
309
2v{rµ πb1prµq{πbprµq L1jprµq 0.
If we expand L1jprµq around pµ, the least squared estimator for µj0, the above equality can be rewritten
310
as
311
2v{n n1rµπb1prµq{πbprµq L2jpµqrµprµ pµq 0, where µis the point on the line segment betweenpµ and rµ. Therefore,
312
Oppn1q rµprµ pµq
prµ pµq2 pµprµ pµq
prµ pµ pµ{2q2 pµ2{4.
Along with the fact thatpµ Oppn1{2j q, we have rµ pµ Oppn1{2j q, and rµ Oppn1{2q. Next,
313
by the Laplace expansion, we have
314
»
expthpµqudu Oppt2v{rµ2 Uj2prµqu1{2expr2vlogprµq logtπbprµqu Ujprµqsq,
and also
315
n1Ujprµq n1Ujp0q n1Uj1pµqrµ
n1tUj1ppµq Uj1pµq Uj1ppµqurµ
Oppµ pµqrµ
Oppn1q, (9)
where µis on the line segment betweenrµ and 0. Thus,
316
|Ujprµq Ujp0q| Opp1q.
As a result,
317 ³ ±τj 11
lτj exptpYl ¯Yτj µq2uπpµqdµ
±τj 11
lτj exptpYl ¯Yτjq2u
³exptUjpµquπpµqdµ exptUjp0qu
³expthpuqudu exptUjp0qu
Oppt2v{rµ2 Uj2prµqu1{2expr2vlogprµq logtπMprµqu Ujprµq Ujp0qsq
Oppn1{2j rµ2vq
Oppn1{2vj q,
where the last equality holds by the fact thatrµ Oppn1{2q. This proves the result.
318
Lemma 4. Let
319
πpµq πIpµq sνq{2
Γtq{p2squµpq 1qexp
#
µ2 ν
s+ . Assume thatτjis not a change point, i.e.,µj0 0, then
320
³ ±τj 11
lτj exptpYl ¯Yτj µq2uπpµqdµ
±τj 11
lτj exptpYl ¯Yτjq2u Optexppnsj{ps 1qqu.
Proof: We first write
321
» τj¹11
lτj
exptpYl ¯Yτj µq2uπpµqdµ c
»
exp Ujpµq µ2sνs pq 1qlogpµq( dµ,
for a constant c. Let hpµq Ujpµq µ2s pq 1qlogpµq, and assume rµ is the maximizer of hpµq,
322
then we have
323
Uj1prµq 2srµ2s1νs pq 1qrµ1 Uj1pµqprµ pµq 2srµ2s1νs pq 1qrµ1 0, where µis on the line segment betweenrµ and pµ. The above equality yields
324
njrµ2s 2p1 pµ{rµq 2sνs pq 1qrµ2s
Uj2pµq{nj
, (10)
which impliesrµ Oppn1j{p2s 2qq.
325
From (10), we have nrµ2s 1prµ pµq Opp1q, which leads to
326
rµ pµ Optnp4s 3q{p2s 2q
j u. (11)
Following (30) in [11] and using our notation, we obtain
327
»
expthpµqudu Op
"p4s2 2sq2s 2
rµ Uj2prµq
*1{2
|rµ|q1exptrµ2sνs Ujprµqu
.