The expected run-length of the optimal test is 7.45, compared to 7.73 for the test with constant thresholds. This corresponds to a reduction of about 3.6%. Whether this im- provement is worth the increased complexity surely depends on the actual application. However, calculating the optimal constant thresholds is a non-trivial problem in itself so that the effort might as well be invested in solving the problem exactly.
3.6
Summary
In this chapter, the optimal sequential test for stochastic processes with Markovian representations was derived. This was done by first formulating the design problem as an optimal stopping problem, based on which a cost minimizing testing policy was obtained as the solution of a nonlinear integral equation. It was then shown that the partial generalized derivatives of the optimal cost function are, up to a scaling factor, identical to the error probabilities of the cost minimizing test. This relation was used to formulate the problem of designing optimal sequential tests as a problem of solving an integral equation under constraints on the partial derivatives of the solution function. Moreover, it was shown that the latter problem can be solved by means of standard linear programming techniques without the need to calculate the partial derivatives explicitly. Numerical examples were given to illustrate this procedure.
63
Chapter 4
Fundamentals II
Before going into the details of the design of minimax optimal sequential hypothesis tests, some additional concepts and preliminary results need to be introduced. This is the purpose of the current chapter. First, the idea of robust statistics and the minimax principle are introduced and the characterization of minimax optimal solutions via sad- dle points is discussed. This relation will turn out to be crucial for the proof of minimax optimality of the proposed tests. Subsequently, uncertainty sets are introduced as a means of formulating hypothesis tests under incomplete knowledge of the underlying distributions. For sequential hypothesis test, both the uncertainty sets and the true distributions additionally need to be allowed to depend on the test statistic in order to obtain strictly minimax optimal solutions. This connection between the test and the underlying random process is explained in more detail in Section 4.4. Finally, some technical aspects of convex optimization in Banach spaces are revised, in particular the method Lagrangian multipliers in infinite dimensional spaces.
4.1
Statistical Robustness and the Minimax Prin-
ciple
In the previous chapter, it was assumed that the distributions under both hypotheses are known exactly. This, however, is rarely the case in practice. Even if an accurate model for P0 and P1 exists, a certain degree of mismatch between model and reality
is usually unavoidable. Put the other way around, specifying the hypotheses exactly requires the test designer to have access to a complete probabilistic description of all possible sources of uncertainty, which is a highly unrealistic assumption. Consider, for example, a simple energy detector that is used to establish the presence or absence of a radio signal. Even if the signal of interest is deterministic and known, the performance of the detector depends on factors like the noise characteristic of the sensors, the propa- gation path between the transmitter and the detector, possible interference form other transmitters and even the weather conditions [MLN09]. A robust statistical hypothe- sis tests is designed to be insensitive to such random deviations from the underlying model. A more formal definition of statistical robustness and robust hypothesis tests is given in this and the upcoming sections.
Taking model mismatches into account when designing statistical tests results in dis- tributional uncertainties, meaning that under either hypothesis the distribution of the observed random variables is only known approximately. Each hypothesis is hence rep- resented by a set or class of possible distributions. Hypotheses of this kind are called composite hypotheses, in contrast to the simple hypotheses considered in Chapter 3. For a binary test, composite hypothesis are in general of the form
H0: PX ∈ P0,
H0: PX ∈ P1,
(4.1)
where P0, P1 ⊂ Mµ(ΩX, FX) are referred to as the uncertainty sets. Here Mµ(Ω, F )
denotes the set of all distributions on a measurable space (Ω, F ) that admit a positive density with respect to the measure µ, i.e.,
Mµ(Ω, F ) := P : ∃p > 0 : Z E p(ω) dµ(ω) = P (E ) ∀E ∈ F . (4.2)
The restriction that p > 0 is introduced to guarantee that all likelihood ratios are well defined. Moreover, in order to exclude cases where H0 and H1 are statistically indis-
tinguishable, the uncertainty sets are further assumed to be disjoint, i.e., P0∩ P1 = ∅.
A more rigorous definition of P0 and P1 is given in Section 4.3.
Tests for composite hypothesis, with fixed and random sample sizes, have been studied extensively in the literature. An in-depth treatment can be found, for example, in [Pap91] or [LR05]. Here, the discussion is limited to a brief introduction to different approaches in composite hypothesis testing with emphasis on the minimax approach. In the previous chapter it was shown how to design an optimal sequential test for a pair of distributions (P0, P1). The fundamental problem in composite hypothesis testing is
that typically no test exists that is optimal for every possible pair of distributions (P0, P1) ∈ P0 × P1. Hence, an additional criterion is necessary in order to define a
meaningful objective for tests between composite hypotheses. The existing approaches can roughly be grouped into three categories: Bayesian methods, adaptive methods and minimax methods.
The idea of Bayesian methods is to use a weighted average of the test performance as a global objective function. More precisely, it is assumed that under each hypothesis the true distribution is generated according to some probabilistic law, or prior distribution. The hypothesis test is then designed so as to minimize the expected error probabilities, where the expectation is taken with respect to the prior probabilities of the individual distributions. As a consequence, Bayesian methods perform very well if the unknown distributions indeed occur with the assumed prior probabilities, but can perform poorly,
4.1 Statistical Robustness and the Minimax Principle 65
if this is not the case. Particularly critical are scenarios that are highly unlikely to occur under the assumed prior and, hence, have a negligible influence on the test design. Such corner cases can deteriorate the performance of Bayesian tests significantly. In summary, Bayesian tests are designed to yield good performance on average, but not for every feasible pair of distributions.
The problem of blind spots in the test performance can be avoided by using adaptive methods. In contrast to Bayesian methods, they rely less on a priori assumptions, but rather try to infer as much information as possible from the data itself. A typical adap- tive test procedure first estimates the most likely distribution under each hypothesis and then performs an optimal test for the estimated pair [ZZM92]. In sequential test- ing, the estimates of the distributions are usually updated on the fly and the stopping criterion is adjusted to the increasing accuracy of the estimates [LLY14]. In theory, adaptive tests yield close to optimal performance under all possible distribution pairs (P0, P1). However, this performance is only achieved if the estimated distributions are
sufficiently close to the truth. This cannot be guaranteed if, for example, the sam- ple size is small, the parameters that need to be estimated are high-dimensional, or the parameters fluctuate at a rate that is close to the sampling frequency. Moreover, performance guarantees for adaptive sequential tests have only been established in an asymptotic sense, i.e., for vanishingly small error probabilities α, β → 0. Obtaining strict performance bounds for non-vanishing error probabilities is still an open prob- lem [Tar13].
The idea to guarantee a certain performance under all feasible circumstances leads to the minimax design principle. Its objective is to minimize the maximum (mini-max) error probabilities of a test over all pairs (P0, P1) ∈ P0× P1. This results in a test that
performs optimally in the worst case and has guaranteed performance bounds in all other cases. Similar to the adaptive test, little a priori knowledge is incorporated into the test design. The advantage of the minimax approach is that it yields tests that are predictable and robust, in the sense that they do not suffer performance degradation over the entire uncertainty set P0×P1. The disadvantage is that the minimax approach
results in highly conservative tests that are optimized for a worst case scenario that may never actually happen, while possibly performing mediocre under typical scenarios. This problem motivates the use of Bayesian methods. . .
Many more design principles have been suggested and various hybrids of different approaches, such as robust Bayesian [dOPdG10] or robust adaptive methods [ASZS14], can be found in the literature. In summary, there is no one method or technique that fits all needs. Nevertheless, the notion of robustness is closely tied to the minimax approach and its property to guarantee a certain performance.
The field of robust statistics, and robust hypothesis testing in particular, was developed foremost by Huber in the mid-1960s [Hub64]. He was the first to derive the famous clipped likelihood ratio test [Hub65], which is minimax optimal under ε-contamination type, i.e., infrequent, but grossly corrupted outliers. The corresponding uncertainty set Pε is given by
Pε = {Q ∈ Mµ: Q = (1 − ε)P + εH, H ∈ Mµ},
where P is referred to as the nominal distribution and H is an arbitrary outlier dis- tribution. This kind of contamination is particularly critical since a single corrupted observation can be enough to alter the outcome of a non-robust test [Hub81]. The result that a simple clipping of the test statistic yields a minimax optimal test is one of the most significant contributions to the field of robust hypothesis testing and probably the one with the highest impact on practical applications. Huber further showed that the clipped likelihood ratio test is in fact a regular likelihood ratio test, but for the so- called least favorable instead of the nominal, uncontaminated distributions. The idea to reduce the design of a minimax test for composite hypotheses to the design of an optimal test for carefully chosen simple hypotheses is underlying most robust testing schemes and is used in this dissertation as well. It is based on the characterization of minimax optimal solutions via saddle points. This aspect is briefly discussed in the next section.