4.5 Uncompensated EOC Effects
5.1.1 Sequential Probability Ratio Test
Consider a random variable y , and consider the hypothesis H0that the probability density function of y is given by p0 yand an alternative hypothesis H1that the probability density function of y is given by p1 y. Given these hypotheses, the log likelihood ratio
z = lnp1 y
p0 y
(5.2)
is a function of y that will be negative if hypothesis H0is true and positive if hypothesis H1 is true. The Neyman-Pearson lemma [137] implies that for a given amount of information the log likelihood ratio z is the most powerful test to discriminate between hypotheses H0 and H1. If Ym =y1,y2,...,ym is a finite set of samples yi of y , and A and B are two positive
constants such that B < A, at each sampling stage m the SPRT consists in evaluating the cumulative log likelihood ratio [138]
Zm = m X i =1 zi = m X i =1 lnp1 yi p0 yi (5.3)
Hypothesis H1 is accepted if Zm ≥ lnA, and hypothesis H0 is accepted if Zm ≤ ln B. If
ln B < Zm< ln A then sample ym +1is collected and Zm +1is evaluated for the extended sample
set Ym +1 =Ym,ym +1 . The SPRT will eventually terminate if the samples yi are distributed
independently and identically, but otherwise it might not terminate [138]. If it terminates, the SPRT is optimal in the sense that it minimises the amount of samples yi required to ac-
cept either hypothesis H0or H1utilising the log likelihood ratio, which for a given amount of information, i.e. samples yi and probability density functions p0 yand p1 y, is the most powerful test to discriminate between hypotheses H0and H1[134–137, 139]. Further details can be found in appendix B.
Test Boundaries
Under the assumption that when the SPRT ends the excess of Zm over either boundary lnA
or ln B is negligible, it is shown in appendix section B.3 that the value of constants A and B is given by the Wald boundaries
A =1− β
α and B =
β
1− α (5.4)
A SPRT with Wald boundaries will have a probability equal to or lower than α of committing an error of the first kind, and a probability equal to or lower than β of committing an error of the second kind [135, 138]. An error of the first kind is the rejection of hypothesis H0 when it is true, while an error of the second kind is the rejection of hypothesis H1when it is true. If hypothesis H0represents the no change condition, then the probability α of rejecting the hypothesis H0when it is true corresponds to the probability of detecting a change when there is none, i.e. to the probability of false-calling. Conversely, the probability β of rejecting hypothesis H1when it is true corresponds to the probability of not detecting a change when there is one, i.e. to the probability of non-detection. It follows that 1− β represents the probability of detection.
Importantly, equation 5.4 reveals that decreasing the probabilities of false-calling α and of non-detection β increases the value of A and decreases the value of B, i.e. any decrease in the probabilities of committing an error widens the interval between A and B. Consequently, more samples yi are needed for the SPRT to terminate with lower probabilities of committing
an error because, intuitively, more information is required to take a decision with a lower probabilities of making a mistake. Note that A→ ∞ for α → 0, thereby obtaining a SPRT that can only accept of hypothesis H0. It follows that a SPRT will need to have a probability α > 0 of false-calling for it to be able to accept hypothesis H1, and therefore be able to discriminate between hypotheses H0and H1and be useful for change detection purposes.
Finally, the assumption of negligible excess of Zm over either boundary lnA or ln B is trivial,
since the probabilities α of false-calling and β of non-detection can be reduced as needed to minimise any excess without negatively affecting the performance of the SPRT.
Average Sample Number
Assuming that when the SPRT ends the excess of Zm over either boundary lnA or ln B is
5.1. Change Detection Algorithms
sample set Ym required for the SPRT to terminate with the acceptance of either hypothesis
H0or H1when they are true. In appendix section B.4.1 it is shown that the expected value E[m], known as the Average Sample Number (ASN), is approximated by
E[m]≈ L ln B + (1 − L)lnA
E[z ] (5.5)
where E[z ] is the expected value of the log likelihood ratio z , and L, known as the Operating Characteristic (OC), describes the probability that the SPRT will terminate with the accep- tance of hypothesis H0as a function of whether hypothesis H0or H1is actually true. Since the probability of accepting hypothesis H1when it is true, i.e. the probability of detection, is 1− β, it follows that the hypothesis of accepting H0when it is false is β . Similarly, the prob- ability of accepting hypothesis H1when it is false, i.e. the probability of false-calling, is α, it follows that the hypothesis of accepting H0when it is true is 1−α. Therefore, if hypothesis H0 is true then L = 1 − α, while if hypothesis H1is true then L = β. If the values E0[z ] and E1[z ] represent the expected values of the log likelihood ratio z assuming respectively hypothesis H0and H1are true, then if hypothesis H0is actually true it will take on average
E0[m ]≈(1− α)ln B + αlnA
E0[z ] (5.6)
samples to accept it, and if hypothesis H1is actually true it will take on average E1[m ]≈β ln B + 1 − β
lnA E1[z ]
(5.7) samples to accept it. In practice α 1% and β 1%. It follows that lnA 0 and ln B 0, and that if hypothesis H0is true then L = 1 − α ≈ 1, while if hypothesis H1is true then L =
β≈ 0. Then if hypothesis H0is actually true it will take on average E0[m ]≈
ln B E0[z ]
(5.8) samples to accept it, and if hypothesis H1is actually true it will take on average
E1[m ]≈ lnA E1[z ]
(5.9) samples to accept it. As previously discussed, decreasing the probabilities of committing an error increases the value of A and decreases the value of B. It follows that, as previously noted, the ASN will also increase as the probabilities of committing an error decrease.
Minimum Detectable Change
The Average Sample Number (ASN) is particularly useful in the case of parameterised dis- tributions. Assume the random variable y has a probability density function p y | θpa- rameterised by the parameter vector θ , and consider the hypothesis H0that the parameter
vector θ is equal to some value θ0and an alternative hypothesis H1that the parameter vec- tor θ is equal to some other value θ16= θ0. Consequently, if hypothesis H0 is true then the probability density function of y is given by p y | θ0, and if hypothesis H1is true then the probability density function of y is given by p y | θ1. It is possible to calculate the expected value E[m| θ ] of the cardinality m of the finite sample set Ymrequired for the SPRT to termi-
nate with the acceptance of either hypothesis H0or H1when the true parameter vector is θ which may or may not be equal to either θ0or θ1. In appendix section B.4.2 it is shown that the expected value E[m| θ ] is approximated by
E[m | θ ] ≈L (θ )ln B + (1 − L (θ ))lnA
E[z| θ ] (5.10)
The Operating Characteristic (OC) L (θ ) describes the probability that the SPRT will termi- nate with the acceptance of hypothesis H0as a function of the true parameter vector θ . As a special case, if hypothesis H0is true then L (θ0) = 1 − α, while if hypothesis H1is true then
L (θ1) = β , as previously discussed. If E[z | θ ] is the expected value of the log likelihood ratio
z when the true parameter vector is θ , then it will take on average E[m | θ ] samples for the SPRT to terminate, with a probability L (θ ) of accepting hypothesis H0 that the parameter vector θ is equal to some value θ0, and a probability 1−L (θ ) of accepting hypothesis H1that the parameter vector θ is equal to some value θ1.
It is often of interest to find the minimum absolute element-wise change|ν| = |θ1− θ0| be- tween parameter vectors θ1and θ0that can be detected with a given finite sample set Ym of
cardinality m . Then, given parameter vector θ0, solving E[m| θ1] = m for θ1will yield a value for parameter vector θ1such that the minimum absolute element-wise change|ν| = |θ1− θ0| detectable is minimised for a probability equal to or higher than 1− β of detection, and a probability equal to or smaller than α of false-calling. Because in practice α 1% and
β 1%, as previously noted the equation
E[m | θ1]≈ L (θ1)ln B + (1 − L (θ1))lnA E[z | θ1] (5.11) in practice simplifies to E[m | θ1]≈ lnA E[z | θ1] (5.12)
Solving E[m| θ1] = m implies that lnA
m = E[z | θ1] = ln
Ep y | θ1
Ep y | θ0 (5.13)
Decreasing the probabilities of committing an error increases the value of A. Consequently, given parameter vector θ0, for a fixed m the value z must increase and so does the minimum absolute element-wise change|ν| = |θ1− θ0| between parameter vectors θ1and θ0.
5.1. Change Detection Algorithms