• No results found

Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors

N/A
N/A
Protected

Academic year: 2021

Share "Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors

Anonymous Author(s) Affiliation

Address email

Abstract

We propose a Bayesian model selection (BMS) boundary detection procedure using

1

non-local prior distributions for a sequence of data with multiple systematic mean

2

changes. By using the non-local priors in the Bayesian model selection framework,

3

the BMS method can effectively suppress the non-boundary spike points with large

4

instantaneous changes. Further, we speed up the algorithm by reducing the multiple

5

change points to a series of single change point detection problems. We establish

6

the consistency of the estimated number and locations of the change points under

7

various prior distributions. Extensive simulation studies are conducted to compare

8

the BMS with existing methods, and our method is illustrated with application to

9

the magnetic resonance imaging guided radiation therapy data.

10

1 Introduction

11

Traditional change point detection algorithms often handle the cases where the occurrent frequency of

12

the change points is relatively consistent across the signal. For example, the popular narrowest-over-

13

threshold (NOT) [1] algorithm is suitable under the setting where the different segments between the

14

change points have comparable lengths, while the stepwise marginal likelihood (SML) method [5]

15

works the best to identify the frequent change points. However, it is often case that the gaps between

16

the consecutive change points are dramatically different, while only ones with certain distances are of

17

interest. In this paper, we develop a computational efficient Bayesian model selection algorithm for

18

identifying the change points under this specific setting.

19

The inconsistent gaps between the change points can be observed from the signals generated by the

20

magnetic resonance imaging guided radiation therapy (MRgRT). One problem with MRgRT is that

21

when radiation traveled in the magnetic field, the dose level can be significantly enhanced near the

22

boundaries between different tissues or organs in human bodies. The Duke Mid-sized Optical-CT

23

System (DMOS) (See Figure A.1 in Appendix) was developed to identify the dose changes near the

24

region of this boundary artifact. The right panel in Figure A.1 contains one radiation dose, where we

25

can observe that the boundaries on and inside the dosimeter can be distinguished by noticeable peaks

26

of the radiation dose levels. In the experiment, multiple radiations enter the cylindrical dosimeter from

27

different directions, and a sequence of data ordered by their distances to the sources was collected.

28

Because the dosimeter is a circle and the cavity is in the middle, the radiation from different directions

29

would hit the boundary at similar distances away from their sources in the dosimeter.

30

In the MRgRT data, radiations in certain directions may experience temporary changes at non-

31

boundary locations, which may result from the abnormal status of the DMOS system rather than the

32

true dose changes. The temporary change points, appear in the data sequence as the spike points,

33

often mixed up with the ones on the boundary (the peak locations in Figure A.1) which makes the

34

boundary detection extremely challenging. Figures A.2 in Appendix shows the change points in the

35

MRgRT data identified by the popular NOT [1] and the SML [5] algorithms, respectively. It can be

36

Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017). Do not distribute.

(2)

seen that neither of the algorithms correctly identify the true boundaries. This motivate us to propose

37

a new algorithm in detecting the systematic changes when the segment length can have dramatic

38

differences.

39

As shown in the preliminary analysis of the MRgRT data, the local control of the discovery is crucial.

40

To avoid picking the spike points, we enforce minimal distances between the change points. Moreover,

41

we adopt the computational efficient local scan routine and propose a systematic two-stage procedure

42

to speed up the change point detection procedure. More specifically, the local scan method first

43

identifies the candidate points with minimal distance based on the local data, and then optimizes

44

an utility function to obtain the estimates for the locations and the total number of change points.

45

Because the change points are defined based on the mean changes in two consecutive segments, the

46

local data are sufficient to detect the systemic changes [6, 7, 8, 13, 14, 16, 18, 19].

47

To perserve the positive detection rate of the change points and reduce the false detection rate of the

48

non-change points, we utilize a Bayesian marginal likelihood function as the utility, and propose a

49

new Bayesian model selection (BMS) procedure for identifying change points. In particular, we show

50

that the selection consistency is achieved by choosing both the local [2, 3, 4, 17] and non-local priors

51

[11], whereas the convergence rate is faster under the non-local priors.

52

Our BMS procedure is cast in the model selection framework, which is faster than the dynamic

53

programming introduced under the SML framework. For example, for a single sequence in the

54

MRgRT data, BMS takes roughly 1.3 seconds and SML takes 3.1 seconds when the maximum

55

number of change points is capped at 100. The major factor that contributes to the efficiency of

56

BMS algorithm is the fact that BMS reduces the search space dramatically by selecting a small set of

57

candidate change-points. Further, once the candidate points are selected, BMS only needs to evaluate

58

two consecutive segments at a time, which allows the algorithms to be implemented in parallel.

59

2 Bayesian multiple change points detection

60

2.1 Probability model

61

Suppose there are p0true change points t1¤, . . . , ¤ tp0among n observations Yn tY1, . . . , Ynu.

62

As a convention, let t0 1 and tpp0 1q n 1. Denote λj  tj 1 tjand λ minj0,...,p0λj.

63

We consider a set of Kn candidate points τ1, . . . , τKn, with τ0  1 and τKn 1  n 1, while

64

postponing the discussion on the candidate set selection to Section 2.4. Define nj  τj 1 τj,

65

nI  minj0,...,Kn1nj, nI ¤ λ. Let HpnIq  tτj : j  1, . . . , Kn,|τj 1 τj| ¡ nIu denote

66

the set of all candidate points and T0pp0q  ttj : j  1, . . . , p0u be the set of true change points.

67

The specification of the candidate points allows the BMS method to be implemented in a lower

68

dimensional space with the most influential points. It also guarantees that there are a sufficient

69

number of non-change points surrounding the true ones so that the consistency conditions are met.

70

The probability model takes the form of

71

Yl ντj l, lP rτj, τj 1q,

where the random errors lare independent with mean zero and variance σ2j. Further, we define

72

σ maxj0,...,p0σj.

73

For ease of exposition, we first consider the situation where the locations of the candidate change

74

points are given and T0pp0q € HpnIq. Define ¯Yτj  n1j1°τj1

lj1Yl, which is the sample average

75

for thepj  1qth segment rτj1, τjq, j ¡ 1. Suppose the candidate point τkis not a change point,

76

then the points inrτk, τk 1q have the same mean as those in rτk1, τkq. If τk is a change point, we

77

expect a mean shift between the segmentsrτk, τk 1q and rτk1, τkq. Hence, we can formulate the

78

model and prior distribution for l¥ τ1as follows:

79

Yl ¯Yτk  µk ξl, lP rτk, τk 1q, µk  πpµkq, if τkis a change point,

µk  0, with probability 1, if τkis an nI-flat point,

where πpq is a prior density and ξl is a mean-zero error. Note that the observations in the first

80

segment are unchanged. Here the nI-flat point is defined as a non-change point which is at least nI 81

apart from any change points. We require the nI distance between the true change points and the flat

82

ones so that there are sufficient neighborhood samples to achieve the estimation consistency.

83

(3)

Let µk0be the true value of µk, and we assume |µk0| ¡ δ, where δ ¡ 0 is the lower bound of

84

the µk0, for the k’s with τk P T0pp0q. The prior distribution on µk is crucial for determining the

85

convergence rate of the BML procedure. We explore three types of priors: the local prior [9], the

86

non-local moment prior and the inverse moment prior in [11].

87

Local: πLpµq  Np0, ω2q Moment: πMpµq  µ2v{CM1{?

2π exppµ2{2q

Inverse moment: πIpµq  sνq{2{Γpq{2sqµpq 1qexp pµ2{νqs( , where CM is the normalizing constant.

88

Let Mkrepresent the model that τkis the sole change point. We define the marginal likelihood with

89

the Gaussian kernel as

90

PrpYn|Mkq 

Kn

¹

j1,jk τj¹11

lj

exptpYl ¯Yτjq2u

» τk¹11

lk

exptpYl ¯Yτk µq2uπpµqdµ.

The posterior model probability of Mk given Ynis

91

PrpMk|Ynq  PrpYn|Mkq PrpMkq

°Kn

j1PrpYn|Mjq PrpMjq  PrpYn|Mkq

°Kn

j1PrpYn|Mjq, (1) when Mjassumes a non-informative uniform prior, j 1, . . . , Kn. Note that it is not necessary for

92

Ynto be normally distributed to ensure the selection consistency in detecting mean changes. The

93

Gaussian kernel serves as a utility function, which tends to be large when the distances between the

94

true and the hypothetical segment means are small. Hence, as nÑ 8, PrpMk|Ynq approaches 1

95

when τkis indeed a change point and the τj’spj  kq are nI-flat points.

96

2.2 Change point detection

97

We start with the simplest case that there is only one mean shift in the data, i.e., p0  1 is fixed

98

a priori. We select the candidate point τk associated with the largest PrpMk|Ynq among all the

99

candidates, which corresponds to the largest marginal likelihood PrpYn|Mkq. It can be shown that

100

PrpMk|Ynq 

# 1

Kn

¸

jk

PrpYn|Mjq PrpYn|Mkq

+1 , where

101

PrpYn|Mjq 

³ ±τj 11

lj exptpYl ¯Yτj µq2uπpµqdµ

±τj 11

lj exptpYl ¯Yτjq2u PrpYn|Mkq 

³ ±τk 11

lk exptpYl ¯Yτk µq2uπpµqdµ

±τk 11

lk exptpYl ¯Yτkq2u .

The two marginal likelihoods indicate that the selection consistency is fully determined by the

102

evidence in favor of µk  πpµkq and that of µj  0 for j  k. The product on the right hand side

103

converges to 0 when nI grows to8 with the sample size. The candidate points retain enough samples

104

in the neighborhood to guarantee the convergency.

105

For the case with multiple change points (p0¡ 1), we select the points associated with the p0largest

106

PrpMk|Ynq, and the selection consistency is presented as follows.

107

Theorem 1. Let M tMk, τkP T0pp0qu. If the Bayes factor satisfies

108

PrpYn|Mjq  Oppanjq (2)

forτjR T0pp0q, anj  opp1q, and n1I{2δ{σ Ñ 8, then

109 ¸

MkPM

PrpMk|Ynq  1 OptKnanIexppnIδ2qu. (3) Hence whennI{logpnq Ñ c, 0   c ¤ 8, nI ¤ λ, we have

¸

MkPM

PrpMk|YnqÑ 1.p

(4)

The proof of Theorem 1 is delineated in the Appendix. Clearly, the selection consistency depends

110

on the convergence rate of anI, which is determined by the choice of the prior πpq. Lemmas 2–4

111

in the Appendix show that anj  n1{2j when πpq  πLpµq; anj  nv1{2j if πpq  πMpµq and

112

anj  exptnsj{ps 1qu if πpq  πIpµq. Hence, the selection consistency is achieved at the fastest

113

rate using the non-local inverse moment prior.

114

2.3 Number of change points

115

When p0is unknown, let Tppq be the set containing p points from the procedure described in Section

116

2.2. We define the marginal likelihood given Tppq, that is PrtYn|T ppqu as

117

¹

τjRT ppq τj¹11

lj

exptpYl ¯Yτjq2u ¹

τkPT ppq

» τk¹11

lk

exptpYl ¯Yτk µq2uπpµqdµ.

We can estimate the locations and the number of change points in two steps: (i) for any given

118

p, we obtain pTppq using the procedure described in the previous section, and (ii) we estimate p0 119

by pp via maximizing PrtYn| pTppqu with respect to p. In contrast to the procedure in [5] which

120

simultaneously estimates the locations and the number of change points by maximizing the marginal

121

likelihood with respect to Tppq and p, our BMS splits the estimation procedure into a scanning step

122

as described in Section 2.2 and an optimization step. This scanning–optimization mechanism reduces

123

the computational burden substantially, because the optimization in step (ii) is merely implemented

124

in a single dimension.

125

2.4 Candidate points selection

126

Previous discussions rely upon a critical assumption that the candidate points are specified in advance,

127

which, however, is often infeasible in practice. To facilitate the implementation of the BMS, we

128

need to find a candidate set HcpnIq that is close to HpnIq. For the selection consistency of the

129

change points, we require for each tj there is a τk P HcpnIq, such that Prp|tj  τk| ¤ nIq 

130

1 OprmintexppnIδ2q, anIus. Define

131

Ri

³ ±i nI1

li exptpYl ¯Yi µq2uπpµqdµ

±i nI1

li exptpYl ¯Yiq2u , where ¯Yi  pnI 1q1°i1

jinIYj. By the argument similar to that in Lemma 1, Rigoes to infinity

132

when i is a true change point, and RiÑ 0 in probability, when i is an nI-flat point. Hence, the value

133

of Rican distinguish a change point from a set of nI-flat points. To further eliminate the non-change

134

points that are also not nI-flat, we implement the non-maximum suppression that removes away the

135

points which do not give the largest Ri’s in their nI-neighborhood. More specifically, the screening

136

procedure of selecting candidate points is described as follows.

137

Algorithm 1 : Screening

(1) For each i inrnI, n nIs, compute Ri.

(2) If Ri maxtRj: jP pi  nI, i nIsu, i is selected as a candidate point.

(3) After scanning through the data sequence, a set of Kncandidate points, denoted as HcpnIq, is obtained.

This screening algorithm is comparable to that in [15], because by the Laplace approximation, we

138

can write

139

Ri  Dn

±i nI1

li exptpYl ¯Yi µq2uπpµq

±i nI1

li exptpYl ¯Yiq2u t1 opp1qu

 Dnexp

# 2

i n

I1

¸

li

Yl

i¸1 jinI

Yj

µ nIµ2 +

πpµqt1 opp1qu,

(5)

where Dn is a constant of order Oppn1{2I q and µ is the maximizer of °i nI1

li pYl ¯Yi

140

µq2 logπpµq. Clearly, the magnitude of the leading term in Ri is strongly associated with

141

n1Ii nI1

li Yli1 jinIYj

, which is the local diagnosis function with h nI in [15].

142

Next, we show the screening procedure identifies a set HcpnIq that would lead to the change point

143

consistency.

144

Proposition 1. Assume n1I{2δ{σ Ñ 8. For each tj P T0pp0q, there is a τ P HcpnIq such that

145

Prttj P pτ  nI, τ nIqu  1  OrmintexppnIδ2q, anIus.

146

In theory, i tjmaximizes Riin the nI-neighborhood of tjasymptotically. By selecting the local

147

maximal Riin the screening procedure, HcpnIq would cover the nI-neighborhood of T0pp0q as

148

nÑ 8. Also the condition n1I{2δ{σ Ñ 8 indicates that the effect size cannot be too small in order

149

to find the candidate points around the true change points. After selecting the candidate points, we

150

perform the refinement step to identify the locations and the total number of change points.

151

Algorithm 2 : Refinement Scanning

(1) Compute PrpYn|Mkq by scanning over all the candidate points in HcpnIq.

(2) For each p, we obtain a set of change points pTppq corresponding to the p largest PrpYn|Mkq, k 1, . . . , Kn.

Optimization

(3) Selectpp that maximizes PrtYn| pTppqu.

Theorem 2. Assume that nI{logpnq Ñ c, 0   c ¤ 8, n1I{2δ{σ Ñ 8, lim supnÑ8nI{λ   1{2, and equation (2) holds. LetHcpnIq be the set containing candidate points such that |τk 1k| ¡ nI, and for eachtj there is aτk P HcpnIq, Prp|tj  τk| ¤ nIq  1  OprmintexppnIδ2q, anIus, Then,

Prppp  p0q  1  OprmaxtexppnIδ2q, anIus, and furthermore,

152

Pr

# sup

ptjP pTpppq

inf

tjPT0pp0q|pptj tjq{n| ¤ nI{n +

 1  OtexppnIδ2qu.

and

153

Pr

# sup

tjPT0pp0q

ptjP pinfTpppq|pptj tjq{n|   nI{n +

 1  OpanIq.

Theorem 2 shows that BMS controls both the over- and under-segmentation errors. The in-

154

trinsic rationale is that for any Tppq different from T0pp0q, there is at least a chosen point

155

τ P T ppq whose nI-neighborhood does not contain true change points. Then the likelihood ra-

156

tio PrtYn|T ppqu{ PrtYn|T0pp0qu goes to 0 with probability 1, because the ratio contains at least

157

one of PrpYn|Mjq and PrpYn|Mjq1 for τk P T0pp0q and τj R T0pp0q, which converges to 0 in

158

probability by Lemma 1 and (2).

159

For a given p, each multiplicand in PrtYn| pTppqu can be obtained and stored through the scanning

160

step. The optimization step essentially picks the maximal PrtYn| pTppqu among a set of known

161

quantities, which avoids intensive optimization procedures. Because the computational time for

162

PrpYn|Mkq grows at the speed of Opnq for k  1, . . . , Kn, the computational time for the refinement

163

stage grows with the sample size at the speed of OpnKnq.

164

(6)

3 Simulation

165

3.1 The sequence without spikes

166

To evaluate the performance of the proposed BMS method in the setting without spike points, we

167

generate data from two different models. Model I takes the form of

168

Yi hTJpxiq σi

where the error term i Np0, 1q, σ  0.5, and h  p2.01, 2.51, 1.51, 2.01,

2.51,2.11, 1.05, 2.16, 1.56, 2.56, 2.11qT with p0  11. We set Jpxiq  tp1 sgnpnxi tjqq{2, j  1, . . . , p0uT, and the xi’s are equally spaced points on [0, 1]. The true change points are

ptj{n, j  1, . . . , p0q  p0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81q.

The random errors are generated from three distributions: the standard normal distribution Np0, 1q;

169

Student’s t distribution with 5 degrees of freedom tp5q, which is standardized to have a unit variance;

170

and the log-normal distribution LNp0, 1q, which is the exponential of the standard normal distribution

171

and also standardized to have a unit variance. Model II considers a heteroscedastic error term across

172

the segments. The data generating model is

173

Yi hTJptiq σi 1T¹Jptiq

j1

vj

wherepvj, j 1, . . . , 11q  p1, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5, 3, 2{3, 0.5q. Other model specifications remain the same as those in model I. We define the over- and under-segmentation errors as dp pGn|Gnq and dpGn| pGnq respectively,

dp pGn|Gnq  sup

bPGn

inf

aP pGn|a  b|, dpGn| pGnq  sup

bP pGn

inf

aPGn

|a  b|.

For the BMS procedure, we consider three different priors for πpq, corresponding to the local prior,

174

non-local moment prior and non-local inverse moment priors. To save the space, we only present

175

the results by using the non-local inverse moment prior. We take nI  tlogpnqu1.5h, where h¥ 0.5

176

generally works well in the simulations.

177

For a comprehensive comparison with existing methods, we consider BMS under the non-local

178

inverse moment prior πIpq with q  v  2 and s  6, PELT [12], WBS [7], NOT with normal or

179

heavy-tail distributions [1] and SML [5]. Table 1 summarizes the numerical results under model I

180

and model II with normal, Student’s t, and log-normal error distributions and their heteroscedastic

181

counterparts, respectively. On average, BMS performs the best in selecting the number of change

182

points and balancing both over- and under-segmentation errors. It is expected that the performances

183

of WBS, PELT, and SML deteriorate when the errors do not follow a normal distribution, because

184

all the three procedures heavily rely on the parametric model assumptions, and hence they are not

185

robust to model mis-specifications. In contrast, both BMS and NOT behave well under various error

186

distributions. Also, NOT and SML perform the best in controlling the over-segmentation errors,

187

while the resulting estimatorpp tends to be larger than the true p0. On the other hand, the BMS allows

188

for slightly larger over-segmentation errors in order to maintainpp to be more concentrated around p0.

189

3.2 The sequence with spike points.

190

In addition, we illustrate the features of the BMS, NOT and SML methods on the sequences contami-

191

nated with spike points. Assume the noises are normal, we generate 500 sequences each contains

192

n 1000 points with mean changes at 0.01 and 0.01 on the 400 and 440’s observations, respec-

193

tively. We set the noise standard deviation to be 0.002. Further, we generate 10 random samples

194

uniformly in the range ofp0.07, 0.08q and p0.07, 0.08q, and add them to the original sequence at

195

random locations to form up the spike points. Note that we choose these parameters to mimic the

196

real data setting. We implement BMS, NOT, and SML on the simulated samples. In BMS, we select

197

nI  12, which is the largest integer that smaller than 0.65tlogpnqu1.5.

198

From Table 2, we can seen that the BMS is insensitive to the spike points with the smallest|pp p0|

199

on average. In the experiments (not show) NOT ignores both the change points with small signal

200

(7)

Table 1: Comparison results averaged over 200 simulations among the BMS, PELT, WBS, NOT and SML methods under model I and model II under different error distributions: the standard normal distribution Np0, 1q, Student’s tp5q, and log-normal LNp0, 1q with constant variances; and the corresponding distributions with heteroscedastic variances. Standard deviations are in parentheses.

Error Method pp p0 dpGn| pGnq dp pGn|Gnq

Distribution ¤ 3 2 1 0 1 2 ¥ 3

Np0, 1q BMS 0 0 1 197 2 0 0 2.41 (6.06) 1.96 (3.94)

PELT 0 1 37 162 0 0 0 0.91 (1.19) 6.32 (11.92)

WBS 0 0 0 194 6 0 0 1.22 (4.13) 0.86 (0.79)

NOT 0 0 0 192 7 1 0 1.93 (8.04) 0.75 (0.80)

SML 0 0 0 132 52 13 3 12.94 (42.98) 0.78 (0.90)

tp3q BMS 0 0 8 190 2 0 0 2.15 (5.76) 2.83 (7.01)

PELT 0 4 31 165 0 0 0 0.95 (1.03) 6.24 (12.24)

NOT 0 0 3 184 3 6 4 7.57 (27.70) 1.51 (2.57)

SML 0 0 0 42 34 44 80 40.13 (53.68) 0.88 (0.87)

LNp0, 1q BMS 0 0 12 180 6 1 2 3.69 (12.12) 3.11 (6.89)

PELT 1 2 21 135 15 23 3 12.10 (29.63) 7.22 (13.32)

NOT 0 1 4 183 7 1 4 6.06 (26.32) 1.18 (4.45)

SML 0 0 0 0 0 4 196 111.77 (52.05) 0.73 (1.33)

Heterosced BMS 0 0 13 176 8 3 0 3.69 (7.08) 3.88 (7.36)

Np0, 1q PELT 0 0 31 169 0 0 0 1.49 (1.53) 6.15 (11.44)

NOT 0 0 0 150 23 21 6 7.52 (12.60) 1.66 (1.68)

SML 0 0 0 119 55 20 6 6.75 (23.98) 1.37 (1.52)

Heterosced BMS 0 0 14 181 4 1 0 2.79 (4.87) 4.21 (8.17)

tp5q PELT 0 4 35 159 2 0 0 1.50 (1.83) 7.76 (13.48)

NOT 0 1 5 179 10 2 3 8.15 (25.01) 2.36 (4.70)

SML 0 0 0 43 22 41 94 26.74 (35.42) 1.27 (1.59)

Heterosced BMS 0 0 20 173 7 0 0 3.73 (11.72) 4.20 (7.85)

LNp0, 1q PELT 0 2 21 142 22 12 1 6.07 (16.09) 8.10 (13.93)

NOT 0 0 4 183 7 4 2 6.32 (24.67) 1.42 (3.70)

SML 0 0 0 2 1 0 197 68.05 (47.15) 0.87 (1.67)

noise ratio and the spike signals with small segment length. On the other hand, SML is sensitive to

201

the extreme value. To this end, BMS is the most suitable procedure for the MRgRT data, because on

202

the one hand it reinforces the minimal segment length to avoid the identification of the spike signal;

203

on the other hand it retains small minimal segment length to detect change points with short distance.

204

Table 2: Comparison results averaged over 500 simulations among the BMS, NOT and SML methods on the data sequences with spike points.

Method pp p0

¤ 1 0 1 2 ¥ 3

BMS 31 276 113 67 13

NOT 387 32 20 11 50

SML 0 0 0 0 500

4 MRgRT data

205

We illustrate the BMS method with the application to the MRgRT data which contains 2265 data

206

points ordered by the distances from the sources of the radiations dose. The R code for implementing

207

the method can be downloaded from our GitHub repository [10]. Throughout the implementation, we

208

use the non-local inverse moment prior πIpµq  sνq{2{Γtq{p2squµpq 1qexp pµ2{νqs( with

209

q 2, ν  2. We set nI  13, which is the largest integer that smaller than 0.65tlogpnqu1.5.

210

We first vary s from 2 to 10. Figure 1 shows that when s is small, we identify more change

211

points than the true ones, and as s grows, the number of identified change points decreases. This

212

phenomenon is consistent with the result in Lemma 4 that the convergence rate for the non-local prior

213

is OptexppnsI{p1 sqqu. When s is small, the Bayes factor vanishes slowly and hence the algorithm

214

picks redundant change points. When s is sufficiently large, the convergence rate approaches to

215

(8)

OptexppnIqu, and hence the algorithm eliminates the flat points more effectively. In the following

216

analysis, we select s 10, with which the algorithm provides the best result in Figure 1.

217

-0.010-0.0050.0000.0050.010

Distance from the source Y -0.010-0.0050.0000.0050.010

Distance from the source Y -0.010-0.0050.0000.0050.010

Distance from the source

Y

0 12 24 35 47 59 71 83 95 123

s= 2

s= 5

s = 10

Figure 1: Change points detection when using different s 2, 5, 10

Interestingly, after we remove the spike points and kept the data within the range ofp0.01, 0.01q,

218

we implement the NOT, SML, and BMS on the truncated sequence. Figures 2 shows that the results

219

from the three methods are overlapped. The NOT and BMS methods have similar results, and both

220

outperform SML. This implies that removing the spike points improves the boundary detection

221

accuracy for all the three methods. However, the spike remover is infeasible in practice, because the

222

locations and the magnitudes of the spike are difficult to track in human bodies.

-0.010-0.0050.0000.0050.010

Distance from the source

Signal level

0 12 24 36 49 63 75 87 101

Figure 2: Change points detection after restricting the data in the range of (-0.01, 0.01). BMS: the red line solid; NOT: the blue dash line; SML: the brown two dash linear.

223

5 Conclusion

224

We propose the BMS method that consistently identifies multiple mean changes in a data sequence.

225

The BMS method removes the flat points effectively without sacrificing the detection accuracy.

226

Further, our method is particularly useful when the data sequence contains spike points, which are

227

not of interest. We apply the BMS to analyze the MRgRT data, for which the NOT, SMT and other

228

methods fail to but the BMS algorithm correctly detects the mean changes boundaries. We explore

229

the BMS performance with different tuning parameters, and the resulting patterns are consistent with

230

the theoretical properties. Moreover, we demonstrate that the BMS is robust to the error distributions

231

by evaluating the detection procedures on the sequence with different random errors.

232

(9)

References

233

[1] Rafal Baranowski, Yining Chen, and Piotr Fryzlewicz. Narrowest-over-threshold detection of

234

multiple change-points and change-point-like features. arXiv preprint arXiv:1609.00293, 2016.

235

[2] Francesco Bertolino, Walter Racugno, and Elías Moreno. Bayesian model selection approach to

236

analysis of variance under heteroscedasticity. The Statistician, pages 503–517, 2000.

237

[3] Caterina Conigliani and Anthony O’hagan. Sensitivity of the fractional bayes factor to prior

238

distributions. Canadian Journal of Statistics, 28(2):343–352, 2000.

239

[4] Fulvio De Santis and Fulvio Spezzaferri. Consistent fractional bayes factor for nested normal

240

linear models. Journal of statistical planning and inference, 97(2):305–321, 2001.

241

[5] Chao Du, Chu-Lan Michael Kao, and SC Kou. Stepwise signal extraction via marginal

242

likelihood. Journal of the American Statistical Association, 111(513):314–330, 2016.

243

[6] Peter Eiauer and Peter Hackl. The use of MOSUMS for quality control. Technometrics,

244

20(4):431–436, 1978.

245

[7] Piotr Fryzlewicz. Wild binary segmentation for multiple change-point detection. Annals of

246

Statistics, 42(6):2243–2281, 2014.

247

[8] Joseph Glaz, Joseph I Naus, Sylvan Wallenstein, Sylvan Wallenstein, and Joseph I Naus. Scan

248

Statistics. Springer, 2001.

249

[9] Harold Jeffreys. The theory of probability. Oxford University Press, 1998.

250

[10] Fei Jiang, Guosheng Yin, and Francesca Dominici. Bayesian multiple change points detection

251

with non-local priors. R code, 2017.

252

[11] Valen E Johnson and David Rossell. On the use of non-local prior densities in Bayesian

253

hypothesis tests. Journal of the Royal Statistical Society: Series B, 72(2):143–170, 2010.

254

[12] Rebecca Killick, Paul Fearnhead, and IA Eckley. Optimal detection of change points with a

255

linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598,

256

2012.

257

[13] Claudia Kirch and Birte Muhsal. A MOSUM procedure for the estimation of multiple random

258

change points. Preprint, 2014.

259

[14] Marc Lavielle and Carenne Ludeña. The multiple change-points problem for the spectral

260

distribution. Bernoulli, 6(5):845–869, 2000.

261

[15] Yue S Niu and Heping Zhang. The screening and ranking algorithm to detect dna copy number

262

variations. Annals of Applied Statistics, 6(3):1306, 2012.

263

[16] Philip Preuss, Ruprecht Puchstein, and Holger Dette. Detection of multiple structural breaks in

264

multivariate time series. Journal of the American Statistical Association, 110(510):654–668,

265

2015.

266

[17] Stephen G Walker. Modern bayesian asymptotics. Statistical Science, pages 111–117, 2004.

267

[18] Yi-Ching Yao. Approximating the distribution of the maximum likelihood estimate of the

268

change-point in a sequence of independent random variables. Annals of Statistics, 15(3), 1987.

269

[19] Chun Yip Yau and Zifeng Zhao. Inference for multiple change points in time series via likelihood

270

ratio scan statistics. Journal of the Royal Statistical Society: Series B, 78(4):895–916, 2015.

271

(10)

Appendix

272 273

A Additional Figures

274

Figure A.1:Reconstructed image of a slice in a cylindrical dosimeter with a cavity in the middle (left) and a typical line profile through the center of the cavity (right). The radiation enter the dosimeter from the hole on the left of the cylindrical dosimeter. The cylindrical rotates 360 degree so that the radiation can enter from different direction.

-0.010-0.0050.0000.0050.010

NOT

Distance from the source Y -0.010-0.0050.0000.0050.010

SML

Distance from the source

Y

Figure A.2:The change point detection results from the SML with normal prior and the maximal number of change points to be 30 and NOT methods for the dose level changes in the DMOS system.

B Proofs

275

Let g1pµq and g2pµq denote the first and second derivatives of a generic function gpµq with respect to

276

µ respectively, and further define the utility function as

277

Ukpµq  

τk¸11 lk

pYl ¯Yτk µq2.

The following conditions are imposed for the theoretical derivations.

278

(A1) Assume µ to be in a closed set of points in R.

279

(A2) Assume πpµq to be a continuous density function with bounded first and second derivatives.

280

(A3) Assume that PrtYn|T ppqu has a unique maximizer in the neighborhood of T0pp0q.

281

Lemma 1. Assume that τkis a change point for which the mean ofYl ¯Yτksatisfies|µk0| ¡ δ for

282

δ¡ 0, n1k{2δ{σ Ñ 8, then there is a constant D ¡ 0 such that

283

nlimÑ8Pr

³ ±τk 11

lk exptpYl ¯Yτk µq2uπpµqdµ

±τk 11

lk exptpYl ¯Yτkq2u ¡ exppDnkδ2q



 1.

(11)

Proof: By the definition of Ukpµq, we can write

284 ³ ±τk 11

lk exptpYl ¯Yτk µq2uπpµqdµ

±τk 11

lk exptpYl ¯Yτkq2u 

³exptUkpµquπpµqdµ exptUkp0qu .

We first define Nδk0q  tµ : |µ  µk0|   δu and Nδck0q as its compliment, and then show

285

nlimÑ8Pr

 sup

µPNδck0qtUkpµq  Ukk0qu   Dnkδ2



 1. (4)

Note that

286

Ukpµq  Ukk0q

 nk

#

2k0 µ2q  n1kk0 µq

τk¸11 lk

2pYl ¯Yτkq +

 nktpµ2k0 µ2q  2pµk0 µqµk0 Oppn1k{2k0 µ|σkqu

 nktpµ  µk0q2u Oppn1k{2k0 µ|σkq

¤ nkδ2 Oppn1k{2k0 µ|σkq

 nkδ2{2  nkδ2{2 Oppn1k{2k0 µ|σkq.

As n1k{2δ{σ Ñ 8, we have nkδ2{2 Oppn1k{2k0 µ|σkq   0 with probability 1, and thus

287

nlimÑ8Pr

 sup

µPNδck0qtUkpµq  Ukk0qu   Dnkδ2



 1.

When τkis a change point, let µ 0, because |µk0| ¡ δ, we have

288

nlimÑ8Pr

 exptUkp0qu

exptUkk0qu   exppDnkδ2q



 1. (5)

By the Laplace approximation,

289 »

exptUkpµquπpµqdµ  OprUk2prµq1{2exptUkprµquπprµqs, (6) whererµ is the maximizer of Ukpµq logtπpµqu, and Uk2prµq  Oppnjq. Let pµ be the maximizer of

290

Ukpµq, and then

291

0  L1kprµq Blogπprµq{Bµ

 L2kqprµ  pµq Blogπprµq{Bµ,

where µis a point on the line segment betweenrµ and pµ. As πpµq has two bounded derivatives by

292

condition (A2), L2kq  Oppnjq, we have rµ  pµ  Oppn1j q. Therefore, (6) can be written as

293 »

exptUkpµquπpµqdµ  OprUk2ppµq1{2exptUkppµquπppµqs

 OprUk2k0q1{2exptUkk0quπpµk0qs, (7) where the last equality holds becausepµ is the least squared estimator. This implies

294

exptUkk0qu

³exptUkpµquπpµqdµ  Oppn1k{2q, (8) which in conjunction with (5) leads to

295

nlimÑ8Pr

³ exptUkp0qu

exptUkpµquπpµqdµ   exppDnkδ2q



 1.

By condition (A2) and the boundedness of Ukpµq, we have

296

nlimÑ8Pr

³exptUkpµquπpµqdµ

exptUkp0qu ¡ exppDnkδq



 1, which completes the proof.

297

(12)

Lemma 2. Let πpµq  πLpµq be a local prior, and assume that τjis not a change point, i.e.,µj0 0,

298

then

299

³ ±τj 11

lj exptpYl ¯Yτj  µq2uπpµqdµ

±τj 11

lj exptpYl ¯Yτjq2u  Oppn1{2j q.

Proof: By the definition of Ujpµq, we can write

300

³exptUjpµquπpµqdµ exptUjp0qu 

³ ±τj 11

lj exptpYl ¯Yτj µq2uπpµqdµ

±τj 11

lj exptpYl ¯Yτjq2u . Using the same argument as that leading to (7) with µj0 0, we have

301

»

exptUjpµquπpµqdµ  OprU2p0q1{2exptUjp0quπp0qs.

Since U2p0q1{2 Oppn1{2j q, and πp0q is a bounded density, we have

302

³exptUjpµquπpµqdµ

exptUjp0qu  Oppn1{2j q.

Lemma 3. Let

303

πpµq  πMpµq  µ2v CM

πbpµq,

whereCM is a normalizing constant,πbpµq with πbp0q ¡ 0 is the base prior density with 2v finite

304

moments, and bounded first two derivatives in the neighborhood around0. Assume that τj is not a

305

change point, i.e.,µj0 0, then

306

³ ±τj 11

lj exptpYl ¯Yτj  µq2uπpµqdµ

±τj 11

lj exptpYl ¯Yτjq2u  Oppnv1{2j q.

Proof: We can write

307

» τj¹11

lj

exptpYl ¯Yτj µq2uπpµqdµ 

»

exptUjpµq logπpµqudµ.

Let hpµq  Ujpµq logπpµq  2vlogpµq logtπbpµqu Ujpµq, and let rµ be the maximizer of

308

hpµq, then we have

309

2v{rµ πb1prµq{πbprµq L1jprµq  0.

If we expand L1jprµq around pµ, the least squared estimator for µj0, the above equality can be rewritten

310

as

311

2v{n n1rµπb1prµq{πbprµq L2jqrµprµ  pµq  0, where µis the point on the line segment betweenpµ and rµ. Therefore,

312

Oppn1q  rµprµ  pµq

 prµ  pµq2 pµprµ  pµq

 prµ  pµ pµ{2q2 pµ2{4.

Along with the fact thatpµ  Oppn1{2j q, we have rµ  pµ  Oppn1{2j q, and rµ  Oppn1{2q. Next,

313

by the Laplace expansion, we have

314

»

expthpµqudu  Oppt2v{rµ2 Uj2prµqu1{2expr2vlogprµq logtπbprµqu Ujprµqsq,

(13)

and also

315

n1Ujprµq  n1Ujp0q  n1Uj1qrµ

 n1tUj1ppµq Uj1q  Uj1ppµqurµ

 Op pµqrµ

 Oppn1q, (9)

where µis on the line segment betweenrµ and 0. Thus,

316

|Ujprµq  Ujp0q|  Opp1q.

As a result,

317 ³ ±τj 11

lj exptpYl ¯Yτj  µq2uπpµqdµ

±τj 11

lj exptpYl ¯Yτjq2u



³exptUjpµquπpµqdµ exptUjp0qu



³expthpuqudu exptUjp0qu

 Oppt2v{rµ2 Uj2prµqu1{2expr2vlogprµq logtπMprµqu Ujprµq  Ujp0qsq

 Oppn1{2j2vq

 Oppn1{2vj q,

where the last equality holds by the fact thatrµ  Oppn1{2q. This proves the result.

318

Lemma 4. Let

319

πpµq  πIpµq  sνq{2

Γtq{p2squµpq 1qexp

#



2 ν

s+ . Assume thatτjis not a change point, i.e.,µj0 0, then

320

³ ±τj 11

lj exptpYl ¯Yτj µq2uπpµqdµ

±τj 11

lj exptpYl ¯Yτjq2u  Optexppnsj{ps 1qqu.

Proof: We first write

321

» τj¹11

lj

exptpYl ¯Yτj  µq2uπpµqdµ  c

»

exp Ujpµq  µ2sνs pq 1qlogpµq( dµ,

for a constant c. Let hpµq  Ujpµq  µ2s pq 1qlogpµq, and assume rµ is the maximizer of hpµq,

322

then we have

323

Uj1prµq 2srµ2s1νs pq 1qrµ1  Uj1qprµ  pµq 2srµ2s1νs pq 1qrµ1 0, where µis on the line segment betweenrµ and pµ. The above equality yields

324

nj2s 2p1  pµ{rµq 2sνs pq 1qrµ2s

Uj2q{nj

, (10)

which impliesrµ  Oppn1j{p2s 2qq.

325

From (10), we have nrµ2s 1prµ  pµq  Opp1q, which leads to

326

rµ  pµ  Optnp4s 3q{p2s 2q

j u. (11)

Following (30) in [11] and using our notation, we obtain

327

»

expthpµqudu  Op

"p4s2 2sq2s 2

rµ  Uj2prµq

*1{2

|rµ|q1exptrµ2sνs Ujprµqu

 .

References

Related documents

Ubercart’s Product module provides a Product content type for us on installation, and the FileField ( http://drupal .org/project/filefield ), ImageField

In the DC/AC distribution system, most of the power is still immediately converted to DC and routed via the DC bus, but some power is directly routed in a parallel AC bus

After selecting a software architecture alternative that meets performance objectives, we use the system execution model to evaluate technical architecture alternatives and

,WVKRXOGEHDGGHGWKDWVRPHPLGGOHLQFRPHFRXQWULHVOLNH&KLOHDQG0H[LFRKDYH low-tax-capacity , below 21.3, and negative tax gaps. In this case, these countries need to adjust

Presented - May 2011 North American Association of Fisheries Economists meeting in Honolulu, HI.. Presented – November 2011 Southern Economic Association meeting in Washington,

An employee may change the number of withholding exemptions and/or allowances he or she claims on Form W-4, Employee's Withholding Allowance Certificate.. It is generally advisable

Apparently, the RA diagnostic performance of anti-CCP was superior to those of RF and MMP-3, in terms of its significantly higher sensitivity and specificity... superior to

Ultimately, this work connecting the development of linguistically responsive orientations to teacher preparation tasks is particularly suited to our study because we explore