Squeezing the Most Out of the Residual Vector

Getting Serious: Choosing the Regularization Parameter

5.5 Squeezing the Most Out of the Residual Vector—NCP Analysis

At this stage, recall that when the solution is oversmoothed (too few SVD compo-nents are included), then the residual vector is dominated by SVD compocompo-nents from the exact data b^exact. And when the solution is undersmoothed (too many SVD com-ponents are included), then the residual vector is dominated by SVD comcom-ponents from the noise e.

We already mentioned that, ideally, in order to choose the truncation parameter k for TSVD, we could monitor the Picard plot and select the regularization parameter such that precisely all coefficients|ui^Tb| above the noise level η are included in the regularized solution. When the coefficients|u_i^Tbêxact| associated with the exact right-hand side decay monotonically, then this corresponds to choosing the TSVD truncation parameter kη such that

|u^Tk_ηb^exact| > η ≥ |uk^T_η+1b^exact|.

In practice, we should choose k as the index where the noisy coeﬃcients u^T_i b change from overall decaying behavior to leveling oﬀ.

The challenge is to implement this choice without computing the SVD, or in-volving a visual inspection of the Picard plot, for that matter. The approach taken in [38] and [62] (based on ideas originally developed in [61]) is to view the residual vector as a time series, and consider the exact right-hand side bêxact (which we know represents a smooth function) as a “signal” which appears distinctly different from the noise vector e. The goal is thus to find the regularization parameter for which the residual changes behavior from being signal-like (dominated by components from bêxact) to being noise-like (dominated by components from e).

Speciﬁcally, let us consider the TSVD method for which the residual vector is given by rk =n

i =k+1u_i^Tb ui. If m > n, then we deliberately neglect the component (I − U U^T)b outside the column space of A because it consists of noise only (by definition, bêxact is always in the column space of A). When k is too small, then, in the SVD basis, the residual vector rk includes both signal components u^T_i b≈ ui^Tbêxact (for i in the range k < i < kη) and noise components u_i^Tb ≈ ±η (for kη < i ≤ n).

And when k is too large, then rk includes only noise components u_i^Tb≈ ±η (for all i in the range k < i≤ n). So our aim is to choose kη as the smallest k for which rk+1

appears statistically like noise.

The above idea is somewhat similar to the idea underlying the discrepancy prin-ciple from Section 5.2, where we choose the regularization parameter such that the norm of the residual equals the norm of the noise e. But there, the single piece of information we utilize about e is its standard deviation η. Here we wish to develop a more sophisticated criterion, based on the statistical question: When can the residual vector be considered as noise?

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5.5. Squeezing the Most Out of the Residual Vector—NCP Analysis 99 The discrete Fourier transform (DFT) lets us answer this question for the im-portant case of white noise, via a technique that has been developed in both signal processing and statistics (using diﬀerent nomenclature, of course). Let ˆr_λdenote the DFT of the Tikhonov residual vector rλ,

rλ= dft(rλ) =

(ˆrλ)1, (ˆrk)2, . . . , (ˆrλ)m

∈ C^m. (5.13)

The DFT is always computed by means of the computationally eﬃcient fast Fourier transform (FFT) algorithm. The power spectrum of r_λis deﬁned as the real vector

pλ=

|(ˆrλ)1|²,|(ˆrλ)2|², . . . ,|(ˆrλ)q+1|²T

, q =m/2, (5.14)

in which q denotes the largest integer such that q≤ m/2. The elements of the power spectrum p_λ ∈ R^q+1 represent the power in the signal at each of the basic spectral components, with the ﬁrst component (p_λ)₁ representing the constant signal (the 0th frequency or “DC component”), and the last component (p_λ)_q+1representing the highest frequency.

We now deﬁne the normalized cumulative periodogram (NCP) for the residual vector rλ as the vector c(rλ) ∈ R^q whose elements⁴ involve the cumulated sums of the power spectrum:

c(rλ)i = (p_λ)₂+· · · + (pλ)_{i +1} (pλ)2+· · · + (pλ)q+1

, i = 1, . . . , q. (5.15) Note the normalization which ensures that the largest element is 1, c(rλ)q = 1, and that the ﬁrst component of pλ is not included in computing the NCP. Precisely the same deﬁnition holds for the NCP c(rk) associated with the residual vector for the TSVD method, with truncation parameter k.

If the vector r_λ consists of white noise, then, by deﬁnition, the expected power spectrum is ﬂat, i.e.,E((pλ)2) =E((pλ)3) =· · · = E((pλ)q+1), and hence the points on the expected NCP curve!

i ,E(c(rλ)_i)"

lie on a straight line from (0, 0) to (q, 1).

Actual noise does not have an ideal flat spectrum, but we can still expect the NCP to be close to a straight line. In statistical terms, with a 5% significance level the NCP curve must lie within the Kolmogorov–Smirnoff limits±1.35 q^−1/2of the straight line;

see the examples in the left part Figure 5.7.

We have already seen examples of noise that is not white, and in Sections 3.6 and 4.8 we discussed LF and HF noise, i.e., noise dominated by low and high frequencies, respectively. The examples in the middle and right parts of Figure 5.7 show NCPs for realizations of such noise; notice that all the curves have segments outside the Kolmogorov–Smirnoﬀ limits.

In this way, we can use the NCP to track the appearance of the residual vector as the regularization parameter changes. Figure 5.8 shows an example of the evolution of the Tikhonov NPC c(rλ) as the regularization parameter changes from λ = 10⁻³ to λ = 10⁻⁶ (i.e., from over- to undersmoothing). In between these values, there is

4We note the alternative expression c (r_λ)_i =ˆrλ(2: i +1)²2/ˆrλ(2: q+1)²2, i = 1, . . . , q, which avoids the use of p_λ.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

100 Chapter 5. Getting Serious: Choosing the Regularization Parameter

0 64 128

0 0.5 1

White noise

0 64 128

0 0.5 1

LF noise

0 64 128

0 0.5 1

HF noise

Figure 5.7. Left to right: 10 instances of white-noise residuals, 10 instances of residuals dominated by low-frequency components, and 10 instances of residuals dominated by high-frequency components. All signals have length m = 256 such that q = 128. The dashed lines show the Kolmogorov–Smirnoﬀ limits ±1.35 q^−1/2 ≈

±0.12 for a 5% signiﬁcance level.

1e−6 1e−5 1e−4 1e−3

0 0.2 0.4 0.6 0.8 1

Figure 5.8. Plots of NCPs c(rλ) (5.15) for various Tikhonov regulariza-tion parameters λ, for the test problem deriv2(128,2) with relative noise level

e2/b^exact2= 10⁻⁵. The residual changes from being dominated by low-frequency components (SVD components of b^exact) for λ = 10⁻³ to being dominated by high-frequency components (from the noise e) for λ = 10⁻⁶.

a λ such that the corresponding residual vector rλ is white noise, as revealed in an almost linear NCP.

Clearly, as long as the noise is white, we can use the NCP as the basis for a parameter-choice method, because the NCP reveals precisely when the residual vector can be considered to be white noise. Speciﬁcally, we would increase the TSVD parameter k, or decrease the Tikhonov parameter λ, until we ﬁrst encounter a situation

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

In document Discrete Inverse Problem - Insight and Algorithms (Page 95-98)