Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors

(1)

Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors

Anonymous Author(s) Affiliation

Address email

Abstract

We propose a Bayesian model selection (BMS) boundary detection procedure using

1

non-local prior distributions for a sequence of data with multiple systematic mean

2

changes. By using the non-local priors in the Bayesian model selection framework,

3

the BMS method can effectively suppress the non-boundary spike points with large

4

instantaneous changes. Further, we speed up the algorithm by reducing the multiple

5

change points to a series of single change point detection problems. We establish

6

the consistency of the estimated number and locations of the change points under

7

various prior distributions. Extensive simulation studies are conducted to compare

8

the BMS with existing methods, and our method is illustrated with application to

9

the magnetic resonance imaging guided radiation therapy data.

10

1 Introduction

11

Traditional change point detection algorithms often handle the cases where the occurrent frequency of

12

the change points is relatively consistent across the signal. For example, the popular narrowest-over-

13

threshold (NOT) [1] algorithm is suitable under the setting where the different segments between the

14

change points have comparable lengths, while the stepwise marginal likelihood (SML) method [5]

15

works the best to identify the frequent change points. However, it is often case that the gaps between

16

the consecutive change points are dramatically different, while only ones with certain distances are of

17

interest. In this paper, we develop a computational efficient Bayesian model selection algorithm for

18

identifying the change points under this specific setting.

19

The inconsistent gaps between the change points can be observed from the signals generated by the

20

magnetic resonance imaging guided radiation therapy (MRgRT). One problem with MRgRT is that

21

when radiation traveled in the magnetic field, the dose level can be significantly enhanced near the

22

boundaries between different tissues or organs in human bodies. The Duke Mid-sized Optical-CT

23

System (DMOS) (See Figure A.1 in Appendix) was developed to identify the dose changes near the

24

region of this boundary artifact. The right panel in Figure A.1 contains one radiation dose, where we

25

can observe that the boundaries on and inside the dosimeter can be distinguished by noticeable peaks

26

of the radiation dose levels. In the experiment, multiple radiations enter the cylindrical dosimeter from

27

different directions, and a sequence of data ordered by their distances to the sources was collected.

28

Because the dosimeter is a circle and the cavity is in the middle, the radiation from different directions

29

would hit the boundary at similar distances away from their sources in the dosimeter.

30

In the MRgRT data, radiations in certain directions may experience temporary changes at non-

31

boundary locations, which may result from the abnormal status of the DMOS system rather than the

32

true dose changes. The temporary change points, appear in the data sequence as the spike points,

33

often mixed up with the ones on the boundary (the peak locations in Figure A.1) which makes the

34

boundary detection extremely challenging. Figures A.2 in Appendix shows the change points in the

35

MRgRT data identified by the popular NOT [1] and the SML [5] algorithms, respectively. It can be

36

Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017). Do not distribute.

(2)

seen that neither of the algorithms correctly identify the true boundaries. This motivate us to propose

37

a new algorithm in detecting the systematic changes when the segment length can have dramatic

38

differences.

39

As shown in the preliminary analysis of the MRgRT data, the local control of the discovery is crucial.

40

To avoid picking the spike points, we enforce minimal distances between the change points. Moreover,

41

we adopt the computational efficient local scan routine and propose a systematic two-stage procedure

42

to speed up the change point detection procedure. More specifically, the local scan method first

43

identifies the candidate points with minimal distance based on the local data, and then optimizes

44

an utility function to obtain the estimates for the locations and the total number of change points.

45

Because the change points are defined based on the mean changes in two consecutive segments, the

46

local data are sufficient to detect the systemic changes [6, 7, 8, 13, 14, 16, 18, 19].

47

To perserve the positive detection rate of the change points and reduce the false detection rate of the

48

non-change points, we utilize a Bayesian marginal likelihood function as the utility, and propose a

49

new Bayesian model selection (BMS) procedure for identifying change points. In particular, we show

50

that the selection consistency is achieved by choosing both the local [2, 3, 4, 17] and non-local priors

51

[11], whereas the convergence rate is faster under the non-local priors.

52

Our BMS procedure is cast in the model selection framework, which is faster than the dynamic

53

programming introduced under the SML framework. For example, for a single sequence in the

54

MRgRT data, BMS takes roughly 1.3 seconds and SML takes 3.1 seconds when the maximum

55

number of change points is capped at 100. The major factor that contributes to the efficiency of

56

BMS algorithm is the fact that BMS reduces the search space dramatically by selecting a small set of

57

candidate change-points. Further, once the candidate points are selected, BMS only needs to evaluate

58

two consecutive segments at a time, which allows the algorithms to be implemented in parallel.

59

2 Bayesian multiple change points detection

60

2.1 Probability model

61

Suppose there are p0true change points t1¤, . . . , ¤ tp0among n observations Yn tY1, . . . , Y_nu.

62

As a convention, let t0 1 and t_pp0 1q n 1. Denote λj tj 1 tjand λ minj0,...,p0λj.

63

We consider a set of K_n candidate points τ₁, . . . , τ_K_n, with τ₀ 1 and τKn 1 n 1, while

64

postponing the discussion on the candidate set selection to Section 2.4. Define nj τj 1 τj,

65

nI minj0,...,Kn1nj, nI ¤ λ. Let HpnIq tτj : j 1, . . . , Kn,|τj 1 τj| ¡ nIu denote

66

the set of all candidate points and T₀pp0q ttj : j 1, . . . , p0u be the set of true change points.

67

The specification of the candidate points allows the BMS method to be implemented in a lower

68

dimensional space with the most influential points. It also guarantees that there are a sufficient

69

number of non-change points surrounding the true ones so that the consistency conditions are met.

70

The probability model takes the form of

71

Yl ντ_j l, lP rτj, τj 1q,

where the random errors lare independent with mean zero and variance σ²_j. Further, we define

72

σ maxj0,...,p0σ_j.

73

For ease of exposition, we first consider the situation where the locations of the candidate change

74

points are given and T0pp0q HpnIq. Define ¯Yτ_j n¹j1°τj1

lτj1Yl, which is the sample average

75

for thepj 1qth segment rτj1, τjq, j ¡ 1. Suppose the candidate point τkis not a change point,

76

then the points inrτk, τ_k ₁q have the same mean as those in rτk1, τ_kq. If τk is a change point, we

77

expect a mean shift between the segmentsrτk, τk 1q and rτk1, τkq. Hence, we can formulate the

78

model and prior distribution for l¥ τ1as follows:

79

Yl ¯Yτ_k µk ξl, lP rτk, τk 1q, µk πpµkq, if τkis a change point,

µk 0, with probability 1, if τkis an nI-flat point,

where πpq is a prior density and ξl is a mean-zero error. Note that the observations in the first

80

segment are unchanged. Here the nI-flat point is defined as a non-change point which is at least nI 81

apart from any change points. We require the nI distance between the true change points and the flat

82

ones so that there are sufficient neighborhood samples to achieve the estimation consistency.

83

(3)

Let µk0be the true value of µk, and we assume |µk0| ¡ δ, where δ ¡ 0 is the lower bound of

84

the µk0, for the k’s with τk P T0pp0q. The prior distribution on µk is crucial for determining the

85

convergence rate of the BML procedure. We explore three types of priors: the local prior [9], the

86

non-local moment prior and the inverse moment prior in [11].

87

Local: πLpµq Np0, ω²q Moment: π_Mpµq µ^2v{CM1{?

2π exppµ²{2q

Inverse moment: π_Ipµq sν^q^{2{Γpq{2sqµ^{pq 1q}exp pµ²{νq^s( , where CM is the normalizing constant.

88

Let Mkrepresent the model that τkis the sole change point. We define the marginal likelihood with

89

the Gaussian kernel as

90

PrpYn|Mkq

K_n

¹

j1,jk τj¹11

lτj

exptpYl ¯Y_τ_jq²u

» ^τk¹11

lτk

exptpYl ¯Y_τ_k µq²uπpµqdµ.

The posterior model probability of Mk given Ynis

91

PrpMk|Ynq PrpYn|Mkq PrpMkq

°Kn

j1PrpYn|Mjq PrpMjq PrpYn|Mkq

°Kn

j1PrpYn|Mjq, (1) when Mjassumes a non-informative uniform prior, j 1, . . . , Kn. Note that it is not necessary for

92

Y_nto be normally distributed to ensure the selection consistency in detecting mean changes. The

93

Gaussian kernel serves as a utility function, which tends to be large when the distances between the

94

true and the hypothetical segment means are small. Hence, as nÑ 8, PrpMk|Ynq approaches 1

95

when τkis indeed a change point and the τj’spj kq are nI-flat points.

96

2.2 Change point detection

97

We start with the simplest case that there is only one mean shift in the data, i.e., p₀ 1 is fixed

98

a priori. We select the candidate point τk associated with the largest PrpMk|Ynq among all the

99

candidates, which corresponds to the largest marginal likelihood PrpYn|Mkq. It can be shown that

100

PrpMk|Ynq

# 1

Kn

¸

jk

PrpYn|Mjq PrpYn|Mkq

+₁ , where

101

PrpYn|Mjq

³ ±τj 11

lτj exptpYl ¯Yτ_j µq²uπpµqdµ

±τ_j ₁1

lτj exptpYl ¯Y_τ_jq²u PrpYn|Mkq

³ ±τ_k ₁1

lτk exptpYl ¯Y_τ_k µq²uπpµqdµ

±τk 11

lτk exptpYl ¯Y_τ_kq²u .

The two marginal likelihoods indicate that the selection consistency is fully determined by the

102

evidence in favor of µ_k πpµkq and that of µj 0 for j k. The product on the right hand side

103

converges to 0 when nI grows to8 with the sample size. The candidate points retain enough samples

104

in the neighborhood to guarantee the convergency.

105

For the case with multiple change points (p0¡ 1), we select the points associated with the p0largest

106

PrpMk|Ynq, and the selection consistency is presented as follows.

107

Theorem 1. Let M tMk, τ_kP T0pp0qu. If the Bayes factor satisfies

108

PrpYn|Mjq Oppanjq (2)

forτ_jR T0pp0q, anj opp1q, and n¹_I^{2δ{σ Ñ 8, then

109 ¸

MkPM

PrpMk|Ynq 1 OptKnanIexppnIδ²qu. (3) Hence whennI{logpnq Ñ c, 0 c ¤ 8, nI ¤ λ, we have

¸

M_kPM

PrpMk|YnqÑ 1.^p

(4)

The proof of Theorem 1 is delineated in the Appendix. Clearly, the selection consistency depends

110

on the convergence rate of an_I, which is determined by the choice of the prior πpq. Lemmas 2–4

111

in the Appendix show that an_j n^1{2j when πpq πLpµq; an_j n^v1{2j if πpq πMpµq and

112

a_n_j exptn^s_j^{{ps 1q}u if πpq πIpµq. Hence, the selection consistency is achieved at the fastest

113

rate using the non-local inverse moment prior.

114

2.3 Number of change points

115

When p0is unknown, let Tppq be the set containing p points from the procedure described in Section

116

2.2. We define the marginal likelihood given Tppq, that is PrtYn|T ppqu as

117

¹

τ_jRT ppq τj¹11

lτj

exptpYl ¯Yτ_jq²u ¹

τ_kPT ppq

» ^τk¹11

lτk

exptpYl ¯Yτ_k µq²uπpµqdµ.

We can estimate the locations and the number of change points in two steps: (i) for any given

118

p, we obtain pTppq using the procedure described in the previous section, and (ii) we estimate p0 119

by pp via maximizing PrtYn| pTppqu with respect to p. In contrast to the procedure in [5] which

120

simultaneously estimates the locations and the number of change points by maximizing the marginal

121

likelihood with respect to Tppq and p, our BMS splits the estimation procedure into a scanning step

122

as described in Section 2.2 and an optimization step. This scanning–optimization mechanism reduces

123

the computational burden substantially, because the optimization in step (ii) is merely implemented

124

in a single dimension.

125

2.4 Candidate points selection

126

Previous discussions rely upon a critical assumption that the candidate points are specified in advance,

127

which, however, is often infeasible in practice. To facilitate the implementation of the BMS, we

128

need to find a candidate set HcpnIq that is close to HpnIq. For the selection consistency of the

129

change points, we require for each tj there is a τk P HcpnIq, such that Prp|tj τk| ¤ nIq

130

1 OprmintexppnIδ²q, an_Ius. Define

131

R_i

³ ±i n_I1

li exptpYl ¯Yi µq²uπpµqdµ

±i nI1

li exptpYl ¯Yiq²u , where ¯Yi pnI 1q¹°i1

jinIYj. By the argument similar to that in Lemma 1, Rigoes to infinity

132

when i is a true change point, and R_iÑ 0 in probability, when i is an nI-flat point. Hence, the value

133

of Rican distinguish a change point from a set of nI-flat points. To further eliminate the non-change

134

points that are also not nI-flat, we implement the non-maximum suppression that removes away the

135

points which do not give the largest R_i’s in their n_I-neighborhood. More specifically, the screening

136

procedure of selecting candidate points is described as follows.

137

Algorithm 1 : Screening

(1) For each i inrnI, n nIs, compute Ri.

(2) If Ri maxtRj: jP pi nI, i nIsu, i is selected as a candidate point.

(3) After scanning through the data sequence, a set of Kncandidate points, denoted as HcpnIq, is obtained.

This screening algorithm is comparable to that in [15], because by the Laplace approximation, we

138

can write

139

Ri Dn

±i n_I1

li exptpYl ¯Yi µq²uπpµq

±i n_I1

li exptpYl ¯Yiq²u t1 opp1qu

Dnexp

# 2

_{i n}

I1

¸

li

Yl

i¸1 jinI

Yj

µ nIµ² +

πpµqt1 opp1qu,

(5)

where Dn is a constant of order Oppn^1{2I q and µ is the maximizer of °i n_I1

li pYl ¯Y_i

140

µq² logπpµq. Clearly, the magnitude of the leading term in Ri is strongly associated with

141

n¹_I °i n_I1

li Yl°i1 jinIYj

, which is the local diagnosis function with h nI in [15].

142

Next, we show the screening procedure identifies a set HcpnIq that would lead to the change point

143

consistency.

144

Proposition 1. Assume n¹_I^{2δ{σ Ñ 8. For each tj P T0pp0q, there is a τ P HcpnIq such that

145

Prttj P pτ nI, τ nIqu 1 OrmintexppnIδ²q, an_Ius.

146

In theory, i tjmaximizes Riin the nI-neighborhood of tjasymptotically. By selecting the local

147

maximal Riin the screening procedure, HcpnIq would cover the nI-neighborhood of T0pp0q as

148

nÑ 8. Also the condition n¹_I^{2δ{σ Ñ 8 indicates that the effect size cannot be too small in order

149

to find the candidate points around the true change points. After selecting the candidate points, we

150

perform the refinement step to identify the locations and the total number of change points.

151

Algorithm 2 : Refinement Scanning

(1) Compute PrpYn|Mkq by scanning over all the candidate points in HcpnIq.

(2) For each p, we obtain a set of change points pTppq corresponding to the p largest PrpYn|Mkq, k 1, . . . , Kn.

Optimization

(3) Selectpp that maximizes PrtYn| pTppqu.

Theorem 2. Assume that nI{logpnq Ñ c, 0 c ¤ 8, n¹I^{2δ{σ Ñ 8, lim supnÑ8nI{λ 1{2, and equation (2) holds. LetHcpnIq be the set containing candidate points such that |τk 1τk| ¡ nI, and for eacht_j there is aτ_k P HcpnIq, Prp|tj τk| ¤ nIq 1 OprmintexppnIδ²q, anIus, Then,

Prppp p0q 1 OprmaxtexppnIδ²q, an_Ius, and furthermore,

152

Pr

# sup

ptjP pTpppq

inf

t_jPT0pp0q|pptj tjq{n| ¤ nI{n +

1 OtexppnIδ²qu.

and

153

Pr

# sup

tjPT0pp0q

ptjP pinfTpppq|pptj tjq{n| nI{n +

1 OpanIq.

Theorem 2 shows that BMS controls both the over- and under-segmentation errors. The in-

154

trinsic rationale is that for any Tppq different from T0pp0q, there is at least a chosen point

155

τ P T ppq whose nI-neighborhood does not contain true change points. Then the likelihood ra-

156

tio PrtYn|T ppqu{ PrtYn|T0pp0qu goes to 0 with probability 1, because the ratio contains at least

157

one of PrpYn|Mjq and PrpYn|Mjq¹ for τk P T0pp0q and τj R T0pp0q, which converges to 0 in

158

probability by Lemma 1 and (2).

159

For a given p, each multiplicand in PrtYn| pTppqu can be obtained and stored through the scanning

160

step. The optimization step essentially picks the maximal PrtYn| pTppqu among a set of known

161

quantities, which avoids intensive optimization procedures. Because the computational time for

162

PrpYn|Mkq grows at the speed of Opnq for k 1, . . . , Kn, the computational time for the refinement

163

stage grows with the sample size at the speed of OpnKnq.

164

(6)

3 Simulation

165

3.1 The sequence without spikes

166

To evaluate the performance of the proposed BMS method in the setting without spike points, we

167

generate data from two different models. Model I takes the form of

168

Yi h^TJpxiq σi

where the error term i Np0, 1q, σ 0.5, and h p2.01, 2.51, 1.51, 2.01,

2.51,2.11, 1.05, 2.16, 1.56, 2.56, 2.11q^T with p₀ 11. We set Jpxiq tp1 sgnpnxi tjqq{2, j 1, . . . , p0u^T, and the xi’s are equally spaced points on [0, 1]. The true change points are

ptj{n, j 1, . . . , p0q p0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81q.

The random errors are generated from three distributions: the standard normal distribution Np0, 1q;

169

Student’s t distribution with 5 degrees of freedom tp5q, which is standardized to have a unit variance;

170

and the log-normal distribution LNp0, 1q, which is the exponential of the standard normal distribution

171

and also standardized to have a unit variance. Model II considers a heteroscedastic error term across

172

the segments. The data generating model is

173

Y_i h^TJptiq σi 1^T¹Jptiq

j1

v_j

wherepvj, j 1, . . . , 11q p1, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5q. Other model specifications remain the same as those in model I. We define the over- and under-segmentation errors as dp pGn|Gnq and dpGn| pGnq respectively,

dp pG_n|Gnq sup

bPGn

inf

aP pG_n|a b|, dpGn| pG_nq sup

bP pGn

inf

aPGn

|a b|.

For the BMS procedure, we consider three different priors for πpq, corresponding to the local prior,

174

non-local moment prior and non-local inverse moment priors. To save the space, we only present

175

the results by using the non-local inverse moment prior. We take nI tlogpnqu^1.5h, where h¥ 0.5

176

generally works well in the simulations.

177

For a comprehensive comparison with existing methods, we consider BMS under the non-local

178

inverse moment prior πIpq with q v 2 and s 6, PELT [12], WBS [7], NOT with normal or

179

heavy-tail distributions [1] and SML [5]. Table 1 summarizes the numerical results under model I

180

and model II with normal, Student’s t, and log-normal error distributions and their heteroscedastic

181

counterparts, respectively. On average, BMS performs the best in selecting the number of change

182

points and balancing both over- and under-segmentation errors. It is expected that the performances

183

of WBS, PELT, and SML deteriorate when the errors do not follow a normal distribution, because

184

all the three procedures heavily rely on the parametric model assumptions, and hence they are not

185

robust to model mis-specifications. In contrast, both BMS and NOT behave well under various error

186

distributions. Also, NOT and SML perform the best in controlling the over-segmentation errors,

187

while the resulting estimatorpp tends to be larger than the true p0. On the other hand, the BMS allows

188

for slightly larger over-segmentation errors in order to maintainpp to be more concentrated around p0.

189

3.2 The sequence with spike points.

190

In addition, we illustrate the features of the BMS, NOT and SML methods on the sequences contami-

191

nated with spike points. Assume the noises are normal, we generate 500 sequences each contains

192

n 1000 points with mean changes at 0.01 and 0.01 on the 400 and 440’s observations, respec-

193

tively. We set the noise standard deviation to be 0.002. Further, we generate 10 random samples

194

uniformly in the range ofp0.07, 0.08q and p0.07, 0.08q, and add them to the original sequence at

195

random locations to form up the spike points. Note that we choose these parameters to mimic the

196

real data setting. We implement BMS, NOT, and SML on the simulated samples. In BMS, we select

197

nI 12, which is the largest integer that smaller than 0.65tlogpnqu^1.5.

198

From Table 2, we can seen that the BMS is insensitive to the spike points with the smallest|pp p0|

199

on average. In the experiments (not show) NOT ignores both the change points with small signal

200

(7)

Table 1: Comparison results averaged over 200 simulations among the BMS, PELT, WBS, NOT and SML methods under model I and model II under different error distributions: the standard normal distribution Np0, 1q, Student’s tp5q, and log-normal LNp0, 1q with constant variances; and the corresponding distributions with heteroscedastic variances. Standard deviations are in parentheses.

Error Method pp p0 dpGn| pGnq dp pGn|Gnq

Distribution ¤ 3 2 1 0 1 2 ¥ 3

Np0, 1q BMS 0 0 1 197 2 0 0 2.41 (6.06) 1.96 (3.94)

PELT 0 1 37 162 0 0 0 0.91 (1.19) 6.32 (11.92)

WBS 0 0 0 194 6 0 0 1.22 (4.13) 0.86 (0.79)

NOT 0 0 0 192 7 1 0 1.93 (8.04) 0.75 (0.80)

SML 0 0 0 132 52 13 3 12.94 (42.98) 0.78 (0.90)

tp3q BMS 0 0 8 190 2 0 0 2.15 (5.76) 2.83 (7.01)

PELT 0 4 31 165 0 0 0 0.95 (1.03) 6.24 (12.24)

NOT 0 0 3 184 3 6 4 7.57 (27.70) 1.51 (2.57)

SML 0 0 0 42 34 44 80 40.13 (53.68) 0.88 (0.87)

LNp0, 1q BMS 0 0 12 180 6 1 2 3.69 (12.12) 3.11 (6.89)

PELT 1 2 21 135 15 23 3 12.10 (29.63) 7.22 (13.32)

NOT 0 1 4 183 7 1 4 6.06 (26.32) 1.18 (4.45)

SML 0 0 0 0 0 4 196 111.77 (52.05) 0.73 (1.33)

Heterosced BMS 0 0 13 176 8 3 0 3.69 (7.08) 3.88 (7.36)

Np0, 1q PELT 0 0 31 169 0 0 0 1.49 (1.53) 6.15 (11.44)

NOT 0 0 0 150 23 21 6 7.52 (12.60) 1.66 (1.68)

SML 0 0 0 119 55 20 6 6.75 (23.98) 1.37 (1.52)

Heterosced BMS 0 0 14 181 4 1 0 2.79 (4.87) 4.21 (8.17)

tp5q PELT 0 4 35 159 2 0 0 1.50 (1.83) 7.76 (13.48)

NOT 0 1 5 179 10 2 3 8.15 (25.01) 2.36 (4.70)

SML 0 0 0 43 22 41 94 26.74 (35.42) 1.27 (1.59)

Heterosced BMS 0 0 20 173 7 0 0 3.73 (11.72) 4.20 (7.85)

LNp0, 1q PELT 0 2 21 142 22 12 1 6.07 (16.09) 8.10 (13.93)

NOT 0 0 4 183 7 4 2 6.32 (24.67) 1.42 (3.70)

SML 0 0 0 2 1 0 197 68.05 (47.15) 0.87 (1.67)

noise ratio and the spike signals with small segment length. On the other hand, SML is sensitive to

201

the extreme value. To this end, BMS is the most suitable procedure for the MRgRT data, because on

202

the one hand it reinforces the minimal segment length to avoid the identification of the spike signal;

203

on the other hand it retains small minimal segment length to detect change points with short distance.

204

Table 2: Comparison results averaged over 500 simulations among the BMS, NOT and SML methods on the data sequences with spike points.

Method pp p0

¤ 1 0 1 2 ¥ 3

BMS 31 276 113 67 13

NOT 387 32 20 11 50

SML 0 0 0 0 500

4 MRgRT data

205

We illustrate the BMS method with the application to the MRgRT data which contains 2265 data

206

points ordered by the distances from the sources of the radiations dose. The R code for implementing

207

the method can be downloaded from our GitHub repository [10]. Throughout the implementation, we

208

use the non-local inverse moment prior πIpµq sν^q^{2{Γtq{p2squµ^{pq 1q}exp pµ²{νq^s( with

209

q 2, ν 2. We set nI 13, which is the largest integer that smaller than 0.65tlogpnqu^1.5.

210

We first vary s from 2 to 10. Figure 1 shows that when s is small, we identify more change

211

points than the true ones, and as s grows, the number of identified change points decreases. This

212

phenomenon is consistent with the result in Lemma 4 that the convergence rate for the non-local prior

213

is Optexppn^s_I^{{p1 sq}qu. When s is small, the Bayes factor vanishes slowly and hence the algorithm

214

picks redundant change points. When s is sufficiently large, the convergence rate approaches to

215

(8)

OptexppnIqu, and hence the algorithm eliminates the flat points more effectively. In the following

216

analysis, we select s 10, with which the algorithm provides the best result in Figure 1.

217

-0.010-0.0050.0000.0050.010

Distance from the source Y -0.010-0.0050.0000.0050.010

Distance from the source

Y

0 12 24 35 47 59 71 83 95 123

s= 2

s= 5

s = 10

Figure 1: Change points detection when using different s 2, 5, 10

Interestingly, after we remove the spike points and kept the data within the range ofp0.01, 0.01q,

218

we implement the NOT, SML, and BMS on the truncated sequence. Figures 2 shows that the results

219

from the three methods are overlapped. The NOT and BMS methods have similar results, and both

220

outperform SML. This implies that removing the spike points improves the boundary detection

221

accuracy for all the three methods. However, the spike remover is infeasible in practice, because the

222

locations and the magnitudes of the spike are difficult to track in human bodies.

-0.010-0.0050.0000.0050.010

Signal level

0 12 24 36 49 63 75 87 101

Figure 2: Change points detection after restricting the data in the range of (-0.01, 0.01). BMS: the red line solid; NOT: the blue dash line; SML: the brown two dash linear.

223

5 Conclusion

224

We propose the BMS method that consistently identifies multiple mean changes in a data sequence.

225

The BMS method removes the flat points effectively without sacrificing the detection accuracy.

226

Further, our method is particularly useful when the data sequence contains spike points, which are

227

not of interest. We apply the BMS to analyze the MRgRT data, for which the NOT, SMT and other

228

methods fail to but the BMS algorithm correctly detects the mean changes boundaries. We explore

229

the BMS performance with different tuning parameters, and the resulting patterns are consistent with

230

the theoretical properties. Moreover, we demonstrate that the BMS is robust to the error distributions

231

by evaluating the detection procedures on the sequence with different random errors.

232

(9)

References

233

[1] Rafal Baranowski, Yining Chen, and Piotr Fryzlewicz. Narrowest-over-threshold detection of

234

multiple change-points and change-point-like features. arXiv preprint arXiv:1609.00293, 2016.

235

[2] Francesco Bertolino, Walter Racugno, and Elías Moreno. Bayesian model selection approach to

236

analysis of variance under heteroscedasticity. The Statistician, pages 503–517, 2000.

237

[3] Caterina Conigliani and Anthony O’hagan. Sensitivity of the fractional bayes factor to prior

238

distributions. Canadian Journal of Statistics, 28(2):343–352, 2000.

239

[4] Fulvio De Santis and Fulvio Spezzaferri. Consistent fractional bayes factor for nested normal

240

linear models. Journal of statistical planning and inference, 97(2):305–321, 2001.

241

[5] Chao Du, Chu-Lan Michael Kao, and SC Kou. Stepwise signal extraction via marginal

242

likelihood. Journal of the American Statistical Association, 111(513):314–330, 2016.

243

[6] Peter Eiauer and Peter Hackl. The use of MOSUMS for quality control. Technometrics,

244

20(4):431–436, 1978.

245

[7] Piotr Fryzlewicz. Wild binary segmentation for multiple change-point detection. Annals of

246

Statistics, 42(6):2243–2281, 2014.

247

[8] Joseph Glaz, Joseph I Naus, Sylvan Wallenstein, Sylvan Wallenstein, and Joseph I Naus. Scan

248

Statistics. Springer, 2001.

249

[9] Harold Jeffreys. The theory of probability. Oxford University Press, 1998.

250

[10] Fei Jiang, Guosheng Yin, and Francesca Dominici. Bayesian multiple change points detection

251

with non-local priors. R code, 2017.

252

[11] Valen E Johnson and David Rossell. On the use of non-local prior densities in Bayesian

253

hypothesis tests. Journal of the Royal Statistical Society: Series B, 72(2):143–170, 2010.

254

[12] Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of change points with a

255

linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598,

256

2012.

257

[13] Claudia Kirch and Birte Muhsal. A MOSUM procedure for the estimation of multiple random

258

change points. Preprint, 2014.

259

[14] Marc Lavielle and Carenne Ludeña. The multiple change-points problem for the spectral

260

distribution. Bernoulli, 6(5):845–869, 2000.

261

[15] Yue S Niu and Heping Zhang. The screening and ranking algorithm to detect dna copy number

262

variations. Annals of Applied Statistics, 6(3):1306, 2012.

263

[16] Philip Preuss, Ruprecht Puchstein, and Holger Dette. Detection of multiple structural breaks in

264

multivariate time series. Journal of the American Statistical Association, 110(510):654–668,

265

2015.

266

[17] Stephen G Walker. Modern bayesian asymptotics. Statistical Science, pages 111–117, 2004.

267

[18] Yi-Ching Yao. Approximating the distribution of the maximum likelihood estimate of the

268

change-point in a sequence of independent random variables. Annals of Statistics, 15(3), 1987.

269

[19] Chun Yip Yau and Zifeng Zhao. Inference for multiple change points in time series via likelihood

270

ratio scan statistics. Journal of the Royal Statistical Society: Series B, 78(4):895–916, 2015.

271

(10)

Appendix

272 273

A Additional Figures

274

Figure A.1:Reconstructed image of a slice in a cylindrical dosimeter with a cavity in the middle (left) and a typical line profile through the center of the cavity (right). The radiation enter the dosimeter from the hole on the left of the cylindrical dosimeter. The cylindrical rotates 360 degree so that the radiation can enter from different direction.

-0.010-0.0050.0000.0050.010

NOT

Distance from the source Y -0.010-0.0050.0000.0050.010

SML

Y

Figure A.2:The change point detection results from the SML with normal prior and the maximal number of change points to be 30 and NOT methods for the dose level changes in the DMOS system.

B Proofs

275

Let g¹pµq and g²pµq denote the first and second derivatives of a generic function gpµq with respect to

276

µ respectively, and further define the utility function as

277

Ukpµq

τk¸11 lτk

pYl ¯Yτ_k µq².

The following conditions are imposed for the theoretical derivations.

278

(A1) Assume µ to be in a closed set of points in R.

279

(A2) Assume πpµq to be a continuous density function with bounded first and second derivatives.

280

(A3) Assume that PrtYn|T ppqu has a unique maximizer in the neighborhood of T0pp0q.

281

Lemma 1. Assume that τ_kis a change point for which the mean ofY_l ¯Y_τ_ksatisfies|µk0| ¡ δ for

282

δ¡ 0, n¹k^{2δ{σ Ñ 8, then there is a constant D ¡ 0 such that

283

nlimÑ8Pr

³ ±τ_k ₁1

lτk exptpYl ¯Yτ_k µq²uπpµqdµ

±τ_k ₁1

lτk exptpYl ¯Yτ_kq²u ¡ exppDnkδ²q

1.

(11)

Proof: By the definition of Ukpµq, we can write

284 ³ ±τ_k ₁1

lτk exptpYl ¯Yτ_k µq²uπpµqdµ

±τ_k ₁1

lτk exptpYl ¯Yτ_kq²u

³exptUkpµquπpµqdµ exptUkp0qu .

We first define Nδpµk0q tµ : |µ µk0| δu and Nδ^cpµk0q as its compliment, and then show

285

nlimÑ8Pr

sup

µPN_δ^cpµk0qtUkpµq Ukpµk0qu Dnkδ²

1. (4)

Note that

286

U_kpµq Ukpµk0q

nk

#

pµ²k0 µ²q n¹k pµk0 µq

τ_k¸₁1 lτk

2pYl ¯Yτ_kq +

nktpµ²k0 µ²q 2pµk0 µqµk0 Oppn¹k^{2|µk0 µ|σkqu

nktpµ µk0q²u Oppn¹k^{2|µk0 µ|σkq

¤ nkδ² Oppn¹_k^{2|µk0 µ|σkq

nkδ²{2 nkδ²{2 Oppn¹_k^{2|µk0 µ|σkq.

As n¹_k^{2δ{σ Ñ 8, we have nkδ²{2 Oppn¹k^{2|µk0 µ|σkq 0 with probability 1, and thus

287

nlimÑ8Pr

sup

µPNδ^cpµk0qtUkpµq Ukpµk0qu Dnkδ²

1.

When τkis a change point, let µ 0, because |µk0| ¡ δ, we have

288

nlimÑ8Pr

exptUkp0qu

exptUkpµk0qu exppDnkδ²q

1. (5)

By the Laplace approximation,

289 »

exptUkpµquπpµqdµ OprUk²prµq^1{2exptUkprµquπprµqs, (6) whererµ is the maximizer of Ukpµq logtπpµqu, and U_k²prµq Oppnjq. Let pµ be the maximizer of

290

Ukpµq, and then

291

0 L¹kprµq Blogπprµq{Bµ

L²kpµqprµ pµq Blogπprµq{Bµ,

where µis a point on the line segment betweenrµ and pµ. As πpµq has two bounded derivatives by

292

condition (A2), L²_kpµq Oppnjq, we have rµ pµ Oppn¹j q. Therefore, (6) can be written as

293 »

exptUkpµquπpµqdµ OprUk²ppµq^1{2exptUkppµquπppµqs

OprUk²pµk0q^1{2exptUkpµk0quπpµk0qs, (7) where the last equality holds becausepµ is the least squared estimator. This implies

294

exptUkpµk0qu

³exptUkpµquπpµqdµ Oppn¹_k^{2q, (8) which in conjunction with (5) leads to

295

nlimÑ8Pr

³ exptUkp0qu

exptUkpµquπpµqdµ exppDnkδ²q

1.

By condition (A2) and the boundedness of U_kpµq, we have

296

nlimÑ8Pr

³exptUkpµquπpµqdµ

exptUkp0qu ¡ exppDnkδq

1, which completes the proof.

297

(12)

Lemma 2. Let πpµq πLpµq be a local prior, and assume that τjis not a change point, i.e.,µj0 0,

298

then

299

³ ±τj 11

±τ_j ₁1

lτj exptpYl ¯Yτ_jq²u Oppn^1{2j q.

Proof: By the definition of Ujpµq, we can write

300

³exptUjpµquπpµqdµ exptUjp0qu

³ ±τj 11

lτj exptpYl ¯Yτj µq²uπpµqdµ

±τ_j ₁1

lτj exptpYl ¯Yτ_jq²u . Using the same argument as that leading to (7) with µj0 0, we have

301

»

exptUjpµquπpµqdµ OprU²p0q^1{2exptUjp0quπp0qs.

Since U²p0q^1{2 Oppn^1{2j q, and πp0q is a bounded density, we have

302

³exptUjpµquπpµqdµ

exptUjp0qu Oppn^1{2j q.

Lemma 3. Let

303

πpµq πMpµq µ^2v CM

πbpµq,

whereC_M is a normalizing constant,π_bpµq with πbp0q ¡ 0 is the base prior density with 2v finite

304

moments, and bounded first two derivatives in the neighborhood around0. Assume that τj is not a

305

change point, i.e.,µj0 0, then

306

³ ±τj 11

±τ_j ₁1

lτj exptpYl ¯Y_τ_jq²u Oppn^v1{2j q.

Proof: We can write

307

» ^τj¹11

lτj

exptpYl ¯Y_τ_j µq²uπpµqdµ

»

exptUjpµq logπpµqudµ.

Let hpµq Ujpµq logπpµq 2vlogpµq logtπbpµqu Ujpµq, and let rµ be the maximizer of

308

hpµq, then we have

309

2v{rµ πb¹prµq{πbprµq L¹jprµq 0.

If we expand L¹_jprµq around pµ, the least squared estimator for µj0, the above equality can be rewritten

310

as

311

2v{n n¹rµπb¹prµq{πbprµq L²jpµqrµprµ pµq 0, where µis the point on the line segment betweenpµ and rµ. Therefore,

312

Oppn¹q rµprµ pµq

prµ pµq² pµprµ pµq

prµ pµ pµ{2q² pµ²{4.

Along with the fact thatpµ Oppn^1{2j q, we have rµ pµ Oppn^1{2j q, and rµ Oppn^1{2q. Next,

313

by the Laplace expansion, we have

314

»

expthpµqudu Oppt2v{rµ² Uj²prµqu^1{2expr2vlogprµq logtπbprµqu Ujprµqsq,

(13)

and also

315

n¹Ujprµq n¹Ujp0q n¹U_j¹pµqrµ

n¹tUj¹ppµq Uj¹pµq Uj¹ppµqurµ

Oppµ pµqrµ

Oppn¹q, (9)

where µis on the line segment betweenrµ and 0. Thus,

316

|Ujprµq Ujp0q| Opp1q.

As a result,

317 ³ ±τj 11

±τ_j ₁1

lτj exptpYl ¯Y_τ_jq²u

³exptUjpµquπpµqdµ exptUjp0qu

³expthpuqudu exptUjp0qu

Oppt2v{rµ² Uj²prµqu^1{2expr2vlogprµq logtπMprµqu Ujprµq Ujp0qsq

Oppn^1{2j rµ^2vq

Oppn^1{2vj q,

where the last equality holds by the fact thatrµ Oppn^1{2q. This proves the result.

318

Lemma 4. Let

319

πpµq πIpµq sν^q^{2

Γtq{p2squµ^{pq 1q}exp

#

µ² ν

_s+ . Assume thatτjis not a change point, i.e.,µj0 0, then

320

³ ±τ_j ₁1

lτj exptpYl ¯Y_τ_j µq²uπpµqdµ

±τj 11

lτj exptpYl ¯Yτ_jq²u Optexppn^sj^{{ps 1q}qu.

Proof: We first write

321

» ^τj¹11

lτj

exptpYl ¯Yτ_j µq²uπpµqdµ c

»

exp Ujpµq µ^2sν^s pq 1qlogpµq( dµ,

for a constant c. Let hpµq Ujpµq µ^2s pq 1qlogpµq, and assume rµ is the maximizer of hpµq,

322

then we have

323

U_j¹prµq 2srµ^2s1ν^s pq 1qrµ¹ Uj¹pµqprµ pµq 2srµ^2s1ν^s pq 1qrµ¹ 0, where µis on the line segment betweenrµ and pµ. The above equality yields

324

n_jrµ^{2s 2}p1 pµ{rµq 2sν^s pq 1qrµ^2s

Uj²pµq{nj

, (10)

which impliesrµ Oppn¹j^{{p2s 2q}q.

325

From (10), we have nrµ^{2s 1}prµ pµq Opp1q, which leads to

326

rµ pµ Optnp4s 3q{p2s 2q

j u. (11)

Following (30) in [11] and using our notation, we obtain

327

»

expthpµqudu Op

"p4s² 2sq^{2s 2}

rµ Uj²prµq

*_1{2

|rµ|^q1exptrµ^2sν^s U_jprµqu

.