Getting Serious: Choosing the Regularization Parameter
5.5 Squeezing the Most Out of the Residual Vector—NCP Analysis
At this stage, recall that when the solution is oversmoothed (too few SVD compo-nents are included), then the residual vector is dominated by SVD compocompo-nents from the exact data bexact. And when the solution is undersmoothed (too many SVD com-ponents are included), then the residual vector is dominated by SVD comcom-ponents from the noise e.
We already mentioned that, ideally, in order to choose the truncation parameter k for TSVD, we could monitor the Picard plot and select the regularization parameter such that precisely all coefficients|uiTb| above the noise level η are included in the regularized solution. When the coefficients|uiTbexact| associated with the exact right-hand side decay monotonically, then this corresponds to choosing the TSVD truncation parameter kη such that
|uTkηbexact| > η ≥ |ukTη+1bexact|.
In practice, we should choose k as the index where the noisy coefficients uTi b change from overall decaying behavior to leveling off.
The challenge is to implement this choice without computing the SVD, or in-volving a visual inspection of the Picard plot, for that matter. The approach taken in [38] and [62] (based on ideas originally developed in [61]) is to view the residual vector as a time series, and consider the exact right-hand side bexact (which we know represents a smooth function) as a “signal” which appears distinctly different from the noise vector e. The goal is thus to find the regularization parameter for which the residual changes behavior from being signal-like (dominated by components from bexact) to being noise-like (dominated by components from e).
Specifically, let us consider the TSVD method for which the residual vector is given by rk =n
i =k+1uiTb ui. If m > n, then we deliberately neglect the component (I − U UT)b outside the column space of A because it consists of noise only (by definition, bexact is always in the column space of A). When k is too small, then, in the SVD basis, the residual vector rk includes both signal components uTi b≈ uiTbexact (for i in the range k < i < kη) and noise components uiTb ≈ ±η (for kη < i ≤ n).
And when k is too large, then rk includes only noise components uiTb≈ ±η (for all i in the range k < i≤ n). So our aim is to choose kη as the smallest k for which rk+1
appears statistically like noise.
The above idea is somewhat similar to the idea underlying the discrepancy prin-ciple from Section 5.2, where we choose the regularization parameter such that the norm of the residual equals the norm of the noise e. But there, the single piece of information we utilize about e is its standard deviation η. Here we wish to develop a more sophisticated criterion, based on the statistical question: When can the residual vector be considered as noise?
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
5.5. Squeezing the Most Out of the Residual Vector—NCP Analysis 99 The discrete Fourier transform (DFT) lets us answer this question for the im-portant case of white noise, via a technique that has been developed in both signal processing and statistics (using different nomenclature, of course). Let ˆrλdenote the DFT of the Tikhonov residual vector rλ,
ˆ
rλ= dft(rλ) =
(ˆrλ)1, (ˆrk)2, . . . , (ˆrλ)m
"T
∈ Cm. (5.13)
The DFT is always computed by means of the computationally efficient fast Fourier transform (FFT) algorithm. The power spectrum of rλis defined as the real vector
pλ=
|(ˆrλ)1|2,|(ˆrλ)2|2, . . . ,|(ˆrλ)q+1|2T
, q =m/2, (5.14)
in which q denotes the largest integer such that q≤ m/2. The elements of the power spectrum pλ ∈ Rq+1 represent the power in the signal at each of the basic spectral components, with the first component (pλ)1 representing the constant signal (the 0th frequency or “DC component”), and the last component (pλ)q+1representing the highest frequency.
We now define the normalized cumulative periodogram (NCP) for the residual vector rλ as the vector c(rλ) ∈ Rq whose elements4 involve the cumulated sums of the power spectrum:
c(rλ)i = (pλ)2+· · · + (pλ)i +1 (pλ)2+· · · + (pλ)q+1
, i = 1, . . . , q. (5.15) Note the normalization which ensures that the largest element is 1, c(rλ)q = 1, and that the first component of pλ is not included in computing the NCP. Precisely the same definition holds for the NCP c(rk) associated with the residual vector for the TSVD method, with truncation parameter k.
If the vector rλ consists of white noise, then, by definition, the expected power spectrum is flat, i.e.,E((pλ)2) =E((pλ)3) =· · · = E((pλ)q+1), and hence the points on the expected NCP curve!
i ,E(c(rλ)i)"
lie on a straight line from (0, 0) to (q, 1).
Actual noise does not have an ideal flat spectrum, but we can still expect the NCP to be close to a straight line. In statistical terms, with a 5% significance level the NCP curve must lie within the Kolmogorov–Smirnoff limits±1.35 q−1/2of the straight line;
see the examples in the left part Figure 5.7.
We have already seen examples of noise that is not white, and in Sections 3.6 and 4.8 we discussed LF and HF noise, i.e., noise dominated by low and high frequencies, respectively. The examples in the middle and right parts of Figure 5.7 show NCPs for realizations of such noise; notice that all the curves have segments outside the Kolmogorov–Smirnoff limits.
In this way, we can use the NCP to track the appearance of the residual vector as the regularization parameter changes. Figure 5.8 shows an example of the evolution of the Tikhonov NPC c(rλ) as the regularization parameter changes from λ = 10−3 to λ = 10−6 (i.e., from over- to undersmoothing). In between these values, there is
4We note the alternative expression c (rλ)i =ˆrλ(2: i +1)22/ˆrλ(2: q+1)22, i = 1, . . . , q, which avoids the use of pλ.
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
100 Chapter 5. Getting Serious: Choosing the Regularization Parameter
0 64 128
0 0.5 1
White noise
0 64 128
0 0.5 1
LF noise
0 64 128
0 0.5 1
HF noise
Figure 5.7. Left to right: 10 instances of white-noise residuals, 10 instances of residuals dominated by low-frequency components, and 10 instances of residuals dominated by high-frequency components. All signals have length m = 256 such that q = 128. The dashed lines show the Kolmogorov–Smirnoff limits ±1.35 q−1/2 ≈
±0.12 for a 5% significance level.
0
20
40
60
1e−6 1e−5 1e−4 1e−3
0 0.2 0.4 0.6 0.8 1
λ
Figure 5.8. Plots of NCPs c(rλ) (5.15) for various Tikhonov regulariza-tion parameters λ, for the test problem deriv2(128,2) with relative noise level
e2/bexact2= 10−5. The residual changes from being dominated by low-frequency components (SVD components of bexact) for λ = 10−3 to being dominated by high-frequency components (from the noise e) for λ = 10−6.
a λ such that the corresponding residual vector rλ is white noise, as revealed in an almost linear NCP.
Clearly, as long as the noise is white, we can use the NCP as the basis for a parameter-choice method, because the NCP reveals precisely when the residual vector can be considered to be white noise. Specifically, we would increase the TSVD parameter k, or decrease the Tikhonov parameter λ, until we first encounter a situation
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php