Comparison of the Methods - Getting Serious: Choosing the Regularization Parameter

Getting Serious: Choosing the Regularization Parameter

5.6 Comparison of the Methods

where c(r_k) or c(r_λ) fall entirely within the Kolmogorov–Smirnoﬀ limits. As long as the residual vector is explicitly available, such a test is computationally feasible because of the reasonable overhead, O(m log m) ﬂops, of the FFT algorithm.

Unfortunately, the above requirement may never be achieved in practice, partly because real data may fail to have completely white noise to begin with, and partly because we actually extract some SVD components of the noise, leaving the remaining noise component in the residual vector not quite white. Therefore, we obtain a more robust NCP-based parameter choice criterion if we choose that regularization parame-ter for which the residual vector most resembles white noise, in the sense that the NCP is closest to a straight line. In the implementation ncp in Regularization Tools we mea-sure this as the 2-norm between the NCP and the vector cwhite= (1/q, 2/q, . . . , 1)^T; other norms can be used as well.

In summary, the NCP parameter-choice criterion for Tikhonov regularization takes the following form:

Choose λ = λ_NCPas the minimizer of d (λ) =c(rλ)− cwhite2. (5.16) For TSVD the NCP criterion takes the form

Choose k = kNCPas the minimizer of d (k) =c(rk)− cwhite2. (5.17) Here the NCPs c(rλ) and c(rk) are deﬁned in(5.15), and the use of the 2-norm is not essential. The extension of this method to 2D data (such as digital images) using the 2D FFT is described in [38].

While we always perform our analysis of regularization methods and regularized solutions in terms of the SVD, the discrete Fourier basis is the natural choice for the NCP analysis described above. The good news is that the NCP analysis still reveals information that ideally belongs in an SVD setting, due to the close relationship between the singular functions and the Fourier basis functions described in Section 2.5.

In Chapter 7 we introduce so-called deconvolution problems where this relationship is even stronger, and Exercise 7.4 develops a variant of the NCP criterion for such problems.

5.6 Comparison of the Methods

With so many parameter-choice methods around, we would like to provide a guide to choosing among them. If a (really) good estimate of the error norm e2 is available, then the discrepancy principle is a good choice. Otherwise, we must use one of the other methods that do not need this information explicitly, but which to prefer? Unfortunately there is no general answer; each inverse problem has its own characteristics and error model, and it is somewhat unpredictable which parameter-choice method is optimal for a given problem.

To illustrate this dilemma, we study the solution of two diﬀerent test problems, namely, the gravity surveying problem gravity from Section 2.1, and the one-dimen-sional image reconstruction problem shaw from Exercise 3.6. For each test problem, we generate 500 instances of Gaussian white noise with standard deviation η = 10⁻⁴,

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

102 Chapter 5. Getting Serious: Choosing the Regularization Parameter

Figure 5.9. Histograms of the four ratios RDP, RL, RGCV, and RNCPdeﬁned in (5.18) for the gravity test problem. The wide lighter bars represent the small noise level η = 10⁻⁴, and the narrow darker bars represent the larger noise level η = 10⁻²; there are 500 noise realizations for each level. The insert in the top left ﬁgure shows the histograms if we use the exact value ofe2in the discrepancy principle.

and another 500 instances with η = 10⁻². For every instance of the noise, we com-pute the Tikhonov regularization parameter λ by means of the discrepancy principle (5.6) using νdp= 1 and the estimate n^1/2η for the error norm, the L-curve criterion (5.7), GCV (5.10), and the NCP criterion (5.16). All computations used the relevant functions from Regularization Tools.

In order to evaluate the performance of the four methods, we also need to compute the optimal regularization parameter λoptwhich, by deﬁnition, minimizes the error in the regularized solution:

λopt= argmin_λx^exact− xλ2. This allows us to compute the four ratios

RDP= λDP

λopt

, RL = λL

λopt

, RGCV=λGCV

λopt

, RNCP= λNCP

λopt

, (5.18)

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5.6. Comparison of the Methods 103

Figure 5.10. Same legend as in Figure 5.9, except these results are for the shaw test problem.

one for each parameter-choice method, and study their distributions via plots of their histograms (in log scale). The closer these ratios are to 1, the better, so a spiked histogram located at 1 is preferable.

Consider ﬁrst the results for the gravity problem in Figure 5.9. Except for the L-curve criterion, these results are independent of the noise level.

• Discrepancy principle. The peak of the histogram is located close to 1, and it is quite sharp. The optimal parameter is never overestimated by more than a factor 3. Unfortunately the histogram has a wide tail toward small ratios, with some ratios as small as 0.001—a really severe underestimate of the optimal parameter. The insert shows histograms for the situation where we know the exact error norme2 in each instance of the noise; now the tail vanishes, but it is rare that we have such good estimates.

• L-curve criterion. For the larger noise level, this method works exceptionally well—the histogram is very peaked and there is no tail. For the smaller noise level the average parameter λ_L is much too small; this is caused by the very smooth solution whose SVD coeﬃcients decay quickly, leading to the phenomenon of

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

104 Chapter 5. Getting Serious: Choosing the Regularization Parameter including too many SVD components in the regularized solution discussed in Section 5.3.

• GCV. This histogram resembles that for the discrepancy principle, except that the peak is located at about 0.5, and the tail toward small ratios is a lot thinner (i.e., there is a smaller risk of underestimation). The existence of the tail reﬂects the occasional failure of the GCV method already mentioned in Section 5.4.

• NCP criterion. This histogram also resembles that for the discrepancy principle, except that the tail is much thinner, especially for the larger noise level. The peak is located quite close to one.

For this problem, the NCP appears to be the method of choice, because it works well for both noise levels, giving a robust estimate λNCP of the regularization parameter that is often very close to the optimal one (with a small risk for underestimation).

If we know that the noise is always large, however, then the L-curve criterion is the winner, but it is a bad choice for small noise levels (due to the smooth solution).

Among the other two methods, whose performance is independent of the noise level, the discrepancy principle is more accurate than GCV on average, but it is also more likely to fail.

Now consider the results for the shaw test problem shown in Figure 5.10. These results show a quite diﬀerent behavior of the methods than we saw before; for example, the results now depend on the noise level for all four methods.

• Discrepancy principle. This method is very likely to overestimate the regular-ization parameter by a factor of 3 (for η = 10⁻²) or even 10 (for η = 10⁻⁴).

Still, the tail toward underestimation is also nonnegligible. Even with the exact norm of the noise (shown in the insert), the method tends to overestimate.

• L-curve. Some underestimation must be expected for this method—by up to a factor 10 for η = 10⁻⁴, and better for η = 10⁻². There is no tail toward underestimation, so the method is quite robust.

• GCV. The mean of the histograms is located at values slightly smaller than 1, and on average the behavior of this method is good, but there is a risk of over- or underestimating the regularization parameter by up to a factor 10. Also, severe underestimation is possible.

• NCP criterion. This method overestimates the regularization parameter consid-erably, by up to a factor 10 or more for the smaller noise level, and almost as bad for the larger noise level. There is also a slight risk of severe underestimation.

For this problem, it is highly advisable not to use the discrepancy principle or the NCP criterion. GCV, on the other hand, performs very well (except for the occasional failure). The L-curve criterion never fails, but the average estimate is too small. So the choice is between robustness and high quality of the estimate.

We could show more examples of the diverse behavior of the four methods, but the above two examples should suﬃce to show how diﬃcult it is to make general

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5.7. The Story So Far 105 statements about the diﬀerent parameter-choice methods. For a given inverse prob-lem, it is always recommended to try several parameter-choice methods on artiﬁcial data!

5.7 The Story So Far

This chapter deals with the important topic of methods for automatically choosing the regularization parameter from the data, preferably without using any information about the data that may not be readily available. The only assumption used throughout the chapter is that the noise is white.

We started with an analysis of the regularized solution by splitting it into a noise component or perturbation error, and a bias component or regularization error, and we stated our goal: to find a regularization parameter that balances these two error components in the solution. This corresponds to extracting all available “signal” from the data (the right-hand side) until only noise is left in the residual vector. We then presented four different parameter-choice methods based on quite different ideas.

• The discrepancy principle is a simple method that seeks to reveal when the residual vector is noise-only. It relies on a good estimate of the error norme2

which may be diﬃcult to obtain in practice.

• The L-curve criterion is based on an intuitive heuristic and seeks to balance the two error components via inspection (manually or automated) of the L-curve.

This method fails when the solution is very smooth.

• GCV seeks to minimize the prediction error, and it is often a very robust method—with occasional failure, often leading to ridiculous undersmoothing that reveals itself.

• The NCP criterion is a statistically based method for revealing when the residual vector is noise-only, based on the power spectrum. It can mistake LF noise for a signal and thus lead to undersmoothing.

A comparative study of the four methods for two diﬀerent test problems illustrates that it is generally not possible to favor one method over the others, since the behavior of the methods is very problem-dependent. Hence, it is preferable to have several parameter-choice methods at our hands for solving real-world problems.

Exercises

5.1. The Discrepancy Principle

The purpose of this exercise is to illustrate the sensitivity of the discrepancy principle to variations of the estimate of the error norm. First generate the shaw test problem for n = 100, and add Gaussian noise with standard deviation η = 10⁻³ to the right-hand side.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

106 Chapter 5. Getting Serious: Choosing the Regularization Parameter Use the discrepancy principle to compute the Tikhonov solution. The discrepancy principle is implemented in the MATLAB function discrep from Regularization Tools. This function may occasionally have convergence prob-lems; if this happens, then create a new noise realization (or change η slightly) and try again.

You should ﬁrst use the “safety factor” νdp= 1. As the “right-hand side”

e2in (5.6), try to use both the norm estimate √

n η and the actual 2-norm

e2of the perturbation vector e (perhaps try with different perturbations). Is there a significant difference in the results?

Still with the “safety factor” νdp = 1, use several values of√

n η and/or

e2that are slightly too large and slightly too small; this simulates a situation where only a rough estimate is available. For example, scale √

n η and/or

e2 up and down by 5–10%. How sensitive is the regularized solution to overestimates and underestimates of the noise level?

Finally, repeat the experiments with the “safety factor” ν_dp = 2, and comment on the new results.

5.2. The GCV and L-curve Methods

This exercise illustrates the use of the GCV and L-curve methods for choosing the regularization parameter, and we compare these methods experimentally. As part of this comparison we investigate how robust—or reliable—the methods are, i.e., how often they produce a regularization parameter close to the optimal one. We use the shaw test problem and the parameter-choice functions gcv and l_curve from Regularization Tools.

Plot the GCV function for, say, 5 or 10 diﬀerent perturbations with the same η, and note the general behavior of the GCV function. Is the minimum always at the transition region between the ﬂat part and the more vertical part (cf. Figure 5.6)?

Use the L-curve criterion to compute the regularization parameter for the same 5 or 10 perturbations as above. Does the regularization parameter computed by means of l_curve always correspond to a solution near the corner (cf. Figure 5.4)?

For a ﬁxed perturbation of the right-hand side, compute the regularization error and the perturbation error for Tikhonov’s method, for a range of regular-ization parameters λ. Plot the norms of the two error components Δx_bias and Δxpert, and relate the behavior of this plot to the two regularization parameters found by the GCV method and the L-curve criterion.

5.3. NCP Studies

This exercise illustrates the use of the NCP for analysis of the spectral behavior of ﬁltered signals. Choose a signal length, say, m = 128, and generate a white-noise signal vector x of length m: Then use the following recursion to generate a ﬁltered signal y of the same length as x :

y₁= x1, yi = xi+ c xi−1, i = 2, . . . , m,

where c is a constant in the range−1 ≤ c ≤ 1 (this is a common simple ﬁlter in signal processing for generating colored noise). Use the NCP to answer the

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises 107 following questions: For which sign of c is y dominated by high frequencies?

In which range should you choose c if the NCP for y has to stay within the Kolmogorov–Smirnoﬀ limits?

5.4. Sensitivity Analysis^∗

The purpose of this exercise is to investigate how sensitive the regularized solu-tions are to perturbasolu-tions once a regularization parameter has been computed by means of GCV or the L-curve criterion. The exercise also illustrates that the regularization parameter is so dependent on the noise that it is impossible to select the parameter a priori.

Generate your favorite test problem, add a particular perturbation e (η should be neither too large nor too small), and compute the regularization parameters λGCV and λL for Tikhonov regularization by means of the GCV and L-curve methods. Make sure that these parameters are reasonable; otherwise, choose another perturbation and try again.

Now create a series of problems with diﬀerent realizations of the noise e (with the same η as above) added to b^exact. For each noise realization compute the Tikhonov solutions using the same two regularization parameters λ_GCVand λ_L as above, and compute their errors. How sensitive are the solutions to the variations in e?

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Chapter 6

In document Discrete Inverse Problem - Insight and Algorithms (Page 98-105)