Prelude to Total Variation Regularization

Beyond the 2-Norm: The Use of Discrete Smoothing Norms

8.6 Prelude to Total Variation Regularization

To ﬁnally arrive at the standard-form problem, we ﬁrst note that due to (8.19) we can write

E_M= E_{N (L)}_⊥A_,_{N (L)}= L^†

N (L)^⊥AL.

The oblique pseudoinverse L^†

N (L)^⊥A is also called the A-weighted pseudoinverse, and it was given the short-hand notation L^#in the previous section. It can be computed as

L^†_{N (L)}_⊥A = E_{N (L)}⊥A,N (L)L^†

I− EN (L),N (L)^⊥A

L^†

I− W (A W )^†A"

L^†. Hence, in the Tikhonov problem we can insert

A E_M = A L^†

N (L)^⊥AL = ¯A L with A = A L^†

N (L)^⊥A. Using another relation in (8.19), we also get

L E_M = L L^†

N (L)^⊥AL = P_R(L)L = L.

Thus we arrive at the standard-form problem min%

 ¯A ¯x− b²2+ λ²¯x2

, A = A L^†_{N (L)}_⊥A, x = L x .¯

The key to obtaining this formulation is to use the oblique pseudoinverse L^†_{N (L)}_⊥A. When L is p× n with p > n and L has full rank, then the oblique pseudoinverse reduces to the ordinary pseudoinverse L^†, and it further reduces to the inverse L⁻¹ when L is square and invertible.

To summarize our results in this dense section, it is crucial to use the splitting Rⁿ=N (L) + N (L)^⊥^A. Only with this splitting ofRⁿare we able to split the residual norms of the Tikhonov problem into two separate least squares problems, as shown in Figure 8.7. The rest is manipulations with oblique projectors, etc.

8.6 Prelude to Total Variation Regularization

We ﬁnish this book with an important example of a smoothing norm that does not involve the 2-norm (which is explicitly or implicitly present in most of the other regular-ization methods that we have encountered). Once again, we start with the continuous formulation (2.2), and we deﬁne a smoothing norm which is the 1-norm of the gradi-ent:

STV(f ) =f1= 1

|f(t)| dt. (8.20)

This is known as the total variation (TV) of the function f (t). The basic idea is that if we replace S(f )² in (8.1) with S_TV(f ), then we still obtain smooth solutions, but the 1-norm in S_TV(f ) penalizes nonsmoothness in a quite diﬀerent manner than the 2-norm.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

188 Chapter 8. Beyond the 2-Norm: The Use of Discrete Smoothing Norms

Figure 8.7. Illustration of the splitting underlying the standard-form trans-formation. Only with the oblique splittingRⁿ=N (L) + N (L)^⊥^A are we able to split the residual norms of the Tikhonov problem into two separate least squares problems via the orthogonal splittingR^m=R(A EM) +R(A EN).

We illustrate the diﬀerence between the two smoothing norms with a simple example that involves the piecewise linear function

f (t) =

which increases linearly from 0 to 1 in the interval [0, 1]; see Figure 8.8. It is easy to show that the smoothing norms associated with the 1- and 2-norms of f(t) satisfy

f1=

We see that the total variation smoothing norm STV(f ) = f1 is independent of the slope of the middle part of f (t), while the smoothing norm based on the 2-norm penalizes steep gradients (when h is small). The 2-norm of f(t) will not allow any steep gradients, and therefore it produces a very smooth solution (in Exercise 4.7 we saw the same behavior for the 2-norm of f (t)). The 1-norm, on the other hand, allows some steep gradients—but not too many—and it is therefore able to produce a less smooth solution, such as a piecewise smooth solution.

Assuming that we use a quadrature discretization, the discrete version of TV regularization takes the form of the following minimization problem:

min

%A x − b²2+ λ²L1x1

Like Tikhonov regularization, this is a convex optimization problem with a unique solution, but the second term involving the 1-norm is not diﬀerentiable everywhere with respect to x .

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.6. Prelude to Total Variation Regularization 189

The barcode deconvolution problem from Section 7.1 provides a good example of the use of discrete TV regularization, since we want to reconstruct a piecewise constant function (the exact barcode signal x^exact). We use the same problem as in Figure 7.1 except that we use a higher noise level η = 10⁻⁴to make the problem more diﬃcult to solve. Figure 8.9 shows the exact and blurred signals, together with the best TSVD and total variation reconstructions, for k = 120 and λ = 0.037, respectively.

Even the best TSVD solution does not give a reliable reconstruction, while the total variation solution is quite close to the exact solution. For such problems it is a clear advantage to use total variation.

In two dimensions, given a function f (t) with t = (t¹, t2), we replace the absolute value of the gradient in (8.20) with the gradient magnitude, deﬁned as

|∇f | = The relevant smoothing norms of f (t) are now

|∇f|

To illustrate the diﬀerence between these two smoothing norms, consider a function f (t) with the polar representation

f (r, θ) =

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

190 Chapter 8. Beyond the 2-Norm: The Use of Discrete Smoothing Norms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1

The barcode intensity f(t) (solid) and the point spread function (dotted)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 10

20x 10⁻³ The measured signal g(s) in the scanner, η = 0.0001

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1

The result of TSVD reconstruction, k = 120

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1

The result of TV reconstruction, λ = 0.037

Figure 8.9. Same barcode example as in Figure 7.1 except the noise level is larger. Top: The exact solution (the printed barcode) together with the point spread function. Second from top: The recorded noisy signal. Third from top: Best TSVD reconstruction for k = 120. Bottom: Best total variation reconstruction for λ = 0.037.

This function is 1 inside the disk with radius r = R, zero outside the disk with radius r = R + h, and has a linear radial slope between 0 and 1; see Figure 8.10. In the area between these two disks the gradient magnitude is |∇f | = 1/h, and elsewhere it is zero. Thus we obtain

|∇f|

1= 2π

R+h R

hr d r d θ = 2πR + πh,

|∇f|²

2= 2π

R+h R

h²r d r d θ = 2πR h + π.

Similar to the one-dimensional example, we see that the total variation smoothing norm is almost independent of the size of the gradient, while the 2-norm penalizes steep gradients. In fact, as h→ 0, we see that STV(f ) converges to the circumference 2πR.

For this reason, if we want to reconstruct a 2D solution (an image) which is al-lowed to have some, but not too many, steep gradients, then we should use the total variation smoothing norm S_TV(f ) =|∇f|

1. In practice, this smoothing norm tends to produce piecewise constant reconstructions in one dimension, and “blocky”

recon-Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.7. And the Story Continues . . . 191

Figure 8.10. A 2D function f (t) whose gradient magnitude is |∇f | = 1/h in the ring-formed area. For this function |∇f |1 = 2πR + πh and |∇f |²2 = 2πR/h + π.

Figure 8.11. Illustration of total variation image deblurring.

structions (images) in 2D. This has made total variation regularization very popular in some image reconstruction problems.

We ﬁnish with an example that illustrates the use of total variation regularization for image deblurring. Figure 8.11 shows a sharp image, a noisy and blurred version, and a total variation reconstruction computed by means of the function TVdeblur in the MATLAB package mxTV [13]. The edges in the original image are well re-constructed in the deblurred image, but a close inspection shows some unavoidable blocking artifacts due to the total variation smoothing norm.

8.7 And the Story Continues . . .

The last chapter is a brief introduction to a much larger world of regularization algo-rithms than we have considered in the previous chapters—a world where the reader must leave behind the safe grounds of the SVD. We focused on Tikhonov regu-larization in the general form and showed how to incorporate requirements about additional smoothness via discrete smoothing norms. We also showed how to use smoothing preconditioners in iterative regularization methods through the

introduc-Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

192 Chapter 8. Beyond the 2-Norm: The Use of Discrete Smoothing Norms tion of standard-form transformations and oblique pseudoinverses. We also hinted at ways to implement such methods eﬃciently. Finally, we gave a brief introduction to total variation regularization, which uses a quite diﬀerent kind of smoothing norm.

This book does not intend to tell the whole story about the regularization of discrete inverse problems. Rather, the goal is to set the stage and provide the reader with a solid understanding of important practical and computational aspects. The book therefore stops here, because the practical use and implementation of regularization methods with general smoothing norms, especially for large-scale problems, tend to be very problem dependent. But the story continues.

Exercises

8.1. Alternative First-Derivative Smoothing Norms

The matrices L1, L^z₁, and L^r₁ introduced in Sections 8.1 and 8.2 are based on differences between neighboring elements of the solution vector x , representing samples of the solution at the grid points t_j = (j−¹₂)/n, j = 1, . . . , n. Alter-natively, we can use central differences that involve elements with a distance of two, i.e., x_j₋₁ and x_{j +1}. Show that the corresponding matrices for the first derivative are given by

Also, show that the null spaces of these matrices are as follows:

• N (/L1) = span%

(1, 1, . . . , 1)^T, (1,−1, 1, −1, . . .)^T&

• N (/L^z1) =

trivial null space, n even, span{(1, 0, 1, 0, . . .)^T}, n odd,

• N (/L^r1) = span%

(1, 1, . . . , 1)^T&

Why is it dangerous to use /L1and /L^z₁ in connection with smoothing norms?

8.2. Periodic Derivative Matrices

Write the derivative matrices for ﬁrst and second orders, for periodic boundary conditions. What are the null spaces?

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises 193 8.3. General-Form Tikhonov Regularization

This exercise illustrates that standard-form regularization, corresponding to L = I, is not always a good choice in regularization problems. The test problem here is the inverse Laplace transform, which is implemented in Regularization Tools as the function i_laplace. The integral equation takes the form

_∞

exp(−s t) f (t) dt = g(s) , 0≤ s ≤ ∞, with right-hand side g and solution f given by

g(s) = 1

s − 1

s + 1/2 , f (t) = 1− exp(−t/2).

Use the call [A,b,x] = i_laplace(n,2) to produce this test problem. Also generate the ﬁrst derivative smoothing matrix L1 from (8.4) by means of L = get_l(n,1).

Add Gaussian white noise with standard deviation η = 10⁻⁴ to the right-hand side. Then compute ordinary Tikhonov solutions and vary the regulariza-tion parameter λ until the noise starts to dominate xλ, and plot the solutions.

Notice that none of these solutions are able to reconstruct the exact solution very well in the rightmost part of the plot.

Finally, compute the general-form Tikhonov solutions by means of the function tikhonov. You should use the calls

[U,sm,X] = cgsvd(A,L);

Xreg = tikhonov(U,sm,X,b,lambda);

Again, vary λ until the noise starts to dominate x_L,λ. This approach should be able to yield better approximations to the exact solution. Study the right generalized singular vectors x_i and explain why.

8.4. Preconditioned CGLS

This is another example which illustrates that L = In is not always a good choice in regularization problems. We use the second derivative test problem deriv2(n) from Regularization Tools (see Exercise 2.3), which requires an L matrix that approximates the ﬁrst derivative operator.

Generate the test problem deriv2(n,2) with n = 100, add noise to the right-hand side with relative noise levelrnl = 10⁻³, and study the Picard plot.

According to this plot, the problem looks innocent; nothing here reveals that the choice L = In might be bad.

Then compute the TSVD solutions and plot them. Notice that none of them are able to reconstruct the exact solution very well in the rightmost part of the plot. This is due to the particular structure in the right singular vectors v_i; plot some of these vectors and explain which structure hurts the reconstruction.

Now generate the matrix L₁(8.4) and a basis W for its null space by means of the call [L,W] = get_l(n,1). Then use the preconditioned CGLS method, implemented in the function pcgls, to compute the regularized solution with the smoothing normL1·2. This approach should produce better reconstructions.

Study the generalized right singular vectors x_iand explain why.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Appendix A

In document Discrete Inverse Problem - Insight and Algorithms (Page 183-190)