• No results found

Prelude to Total Variation Regularization

Beyond the 2-Norm: The Use of Discrete Smoothing Norms

8.6 Prelude to Total Variation Regularization

To finally arrive at the standard-form problem, we first note that due to (8.19) we can write

EM= EN (L)⊥A,N (L)= L

N (L)⊥AL.

The oblique pseudoinverse L

N (L)⊥A is also called the A-weighted pseudoinverse, and it was given the short-hand notation L#in the previous section. It can be computed as

LN (L)⊥A = EN (L)⊥A,N (L)L

=!

I− EN (L),N (L)⊥A

"

L

=!

I− W (A W )A"

L. Hence, in the Tikhonov problem we can insert

A EM = A L

N (L)⊥AL = ¯A L with A = A L

N (L)⊥A. Using another relation in (8.19), we also get

L EM = L L

N (L)⊥AL = PR(L)L = L.

Thus we arrive at the standard-form problem min%

 ¯A ¯x− b22+ λ2¯x2

&

, A = A LN (L)⊥A, x = L x .¯

The key to obtaining this formulation is to use the oblique pseudoinverse LN (L)⊥A. When L is p× n with p > n and L has full rank, then the oblique pseudoinverse reduces to the ordinary pseudoinverse L, and it further reduces to the inverse L−1 when L is square and invertible.

To summarize our results in this dense section, it is crucial to use the splitting Rn=N (L) + N (L)A. Only with this splitting ofRnare we able to split the residual norms of the Tikhonov problem into two separate least squares problems, as shown in Figure 8.7. The rest is manipulations with oblique projectors, etc.

8.6 Prelude to Total Variation Regularization

We finish this book with an important example of a smoothing norm that does not involve the 2-norm (which is explicitly or implicitly present in most of the other regular-ization methods that we have encountered). Once again, we start with the continuous formulation (2.2), and we define a smoothing norm which is the 1-norm of the gradi-ent:

STV(f ) =f1= 1

0

|f(t)| dt. (8.20)

This is known as the total variation (TV) of the function f (t). The basic idea is that if we replace S(f )2 in (8.1) with STV(f ), then we still obtain smooth solutions, but the 1-norm in STV(f ) penalizes nonsmoothness in a quite different manner than the 2-norm.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

188 Chapter 8. Beyond the 2-Norm: The Use of Discrete Smoothing Norms

Figure 8.7. Illustration of the splitting underlying the standard-form trans-formation. Only with the oblique splittingRn=N (L) + N (L)A are we able to split the residual norms of the Tikhonov problem into two separate least squares problems via the orthogonal splittingRm=R(A EM) +R(A EN).

We illustrate the difference between the two smoothing norms with a simple example that involves the piecewise linear function

f (t) =

which increases linearly from 0 to 1 in the interval [0, 1]; see Figure 8.8. It is easy to show that the smoothing norms associated with the 1- and 2-norms of f(t) satisfy

f1=

We see that the total variation smoothing norm STV(f ) = f1 is independent of the slope of the middle part of f (t), while the smoothing norm based on the 2-norm penalizes steep gradients (when h is small). The 2-norm of f(t) will not allow any steep gradients, and therefore it produces a very smooth solution (in Exercise 4.7 we saw the same behavior for the 2-norm of f (t)). The 1-norm, on the other hand, allows some steep gradients—but not too many—and it is therefore able to produce a less smooth solution, such as a piecewise smooth solution.

Assuming that we use a quadrature discretization, the discrete version of TV regularization takes the form of the following minimization problem:

min

x

%A x − b22+ λ2L1x1

&

.

Like Tikhonov regularization, this is a convex optimization problem with a unique solution, but the second term involving the 1-norm is not differentiable everywhere with respect to x .

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.6. Prelude to Total Variation Regularization 189

The barcode deconvolution problem from Section 7.1 provides a good example of the use of discrete TV regularization, since we want to reconstruct a piecewise constant function (the exact barcode signal xexact). We use the same problem as in Figure 7.1 except that we use a higher noise level η = 10−4to make the problem more difficult to solve. Figure 8.9 shows the exact and blurred signals, together with the best TSVD and total variation reconstructions, for k = 120 and λ = 0.037, respectively.

Even the best TSVD solution does not give a reliable reconstruction, while the total variation solution is quite close to the exact solution. For such problems it is a clear advantage to use total variation.

In two dimensions, given a function f (t) with t = (t1, t2), we replace the absolute value of the gradient in (8.20) with the gradient magnitude, defined as

|∇f | = The relevant smoothing norms of f (t) are now

|∇f|

To illustrate the difference between these two smoothing norms, consider a function f (t) with the polar representation

f (r, θ) =

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

190 Chapter 8. Beyond the 2-Norm: The Use of Discrete Smoothing Norms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1

The barcode intensity f(t) (solid) and the point spread function (dotted)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 10

20x 10−3 The measured signal g(s) in the scanner, η = 0.0001

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1

The result of TSVD reconstruction, k = 120

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.5 1

The result of TV reconstruction, λ = 0.037

Figure 8.9. Same barcode example as in Figure 7.1 except the noise level is larger. Top: The exact solution (the printed barcode) together with the point spread function. Second from top: The recorded noisy signal. Third from top: Best TSVD reconstruction for k = 120. Bottom: Best total variation reconstruction for λ = 0.037.

This function is 1 inside the disk with radius r = R, zero outside the disk with radius r = R + h, and has a linear radial slope between 0 and 1; see Figure 8.10. In the area between these two disks the gradient magnitude is |∇f | = 1/h, and elsewhere it is zero. Thus we obtain

|∇f|

1=

0

R+h R

1

hr d r d θ = 2πR + πh,

|∇f|2

2=

0

R+h R

1

h2r d r d θ = 2πR h + π.

Similar to the one-dimensional example, we see that the total variation smoothing norm is almost independent of the size of the gradient, while the 2-norm penalizes steep gradients. In fact, as h→ 0, we see that STV(f ) converges to the circumference 2πR.

For this reason, if we want to reconstruct a 2D solution (an image) which is al-lowed to have some, but not too many, steep gradients, then we should use the total variation smoothing norm STV(f ) =|∇f|

1. In practice, this smoothing norm tends to produce piecewise constant reconstructions in one dimension, and “blocky”

recon-Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

8.7. And the Story Continues . . . 191

Figure 8.10. A 2D function f (t) whose gradient magnitude is |∇f | = 1/h in the ring-formed area. For this function |∇f |1 = 2πR + πh and |∇f |22 = 2πR/h + π.

Figure 8.11. Illustration of total variation image deblurring.

structions (images) in 2D. This has made total variation regularization very popular in some image reconstruction problems.

We finish with an example that illustrates the use of total variation regularization for image deblurring. Figure 8.11 shows a sharp image, a noisy and blurred version, and a total variation reconstruction computed by means of the function TVdeblur in the MATLAB package mxTV [13]. The edges in the original image are well re-constructed in the deblurred image, but a close inspection shows some unavoidable blocking artifacts due to the total variation smoothing norm.

8.7 And the Story Continues . . .

The last chapter is a brief introduction to a much larger world of regularization algo-rithms than we have considered in the previous chapters—a world where the reader must leave behind the safe grounds of the SVD. We focused on Tikhonov regu-larization in the general form and showed how to incorporate requirements about additional smoothness via discrete smoothing norms. We also showed how to use smoothing preconditioners in iterative regularization methods through the

introduc-Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

192 Chapter 8. Beyond the 2-Norm: The Use of Discrete Smoothing Norms tion of standard-form transformations and oblique pseudoinverses. We also hinted at ways to implement such methods efficiently. Finally, we gave a brief introduction to total variation regularization, which uses a quite different kind of smoothing norm.

This book does not intend to tell the whole story about the regularization of discrete inverse problems. Rather, the goal is to set the stage and provide the reader with a solid understanding of important practical and computational aspects. The book therefore stops here, because the practical use and implementation of regularization methods with general smoothing norms, especially for large-scale problems, tend to be very problem dependent. But the story continues.

Exercises

8.1. Alternative First-Derivative Smoothing Norms

The matrices L1, Lz1, and Lr1 introduced in Sections 8.1 and 8.2 are based on differences between neighboring elements of the solution vector x , representing samples of the solution at the grid points tj = (j−12)/n, j = 1, . . . , n. Alter-natively, we can use central differences that involve elements with a distance of two, i.e., xj−1 and xj +1. Show that the corresponding matrices for the first derivative are given by

Also, show that the null spaces of these matrices are as follows:

• N (/L1) = span%

(1, 1, . . . , 1)T, (1,−1, 1, −1, . . .)T&

,

• N (/Lz1) =

 trivial null space, n even, span{(1, 0, 1, 0, . . .)T}, n odd,

• N (/Lr1) = span%

(1, 1, . . . , 1)T&

.

Why is it dangerous to use /L1and /Lz1 in connection with smoothing norms?

8.2. Periodic Derivative Matrices

Write the derivative matrices for first and second orders, for periodic boundary conditions. What are the null spaces?

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Exercises 193 8.3. General-Form Tikhonov Regularization

This exercise illustrates that standard-form regularization, corresponding to L = I, is not always a good choice in regularization problems. The test problem here is the inverse Laplace transform, which is implemented in Regularization Tools as the function i_laplace. The integral equation takes the form

0

exp(−s t) f (t) dt = g(s) , 0≤ s ≤ ∞, with right-hand side g and solution f given by

g(s) = 1

s 1

s + 1/2 , f (t) = 1− exp(−t/2).

Use the call [A,b,x] = i_laplace(n,2) to produce this test problem. Also generate the first derivative smoothing matrix L1 from (8.4) by means of L = get_l(n,1).

Add Gaussian white noise with standard deviation η = 10−4 to the right-hand side. Then compute ordinary Tikhonov solutions and vary the regulariza-tion parameter λ until the noise starts to dominate xλ, and plot the solutions.

Notice that none of these solutions are able to reconstruct the exact solution very well in the rightmost part of the plot.

Finally, compute the general-form Tikhonov solutions by means of the function tikhonov. You should use the calls

[U,sm,X] = cgsvd(A,L);

Xreg = tikhonov(U,sm,X,b,lambda);

Again, vary λ until the noise starts to dominate xL,λ. This approach should be able to yield better approximations to the exact solution. Study the right generalized singular vectors xi and explain why.

8.4. Preconditioned CGLS

This is another example which illustrates that L = In is not always a good choice in regularization problems. We use the second derivative test problem deriv2(n) from Regularization Tools (see Exercise 2.3), which requires an L matrix that approximates the first derivative operator.

Generate the test problem deriv2(n,2) with n = 100, add noise to the right-hand side with relative noise levelrnl = 10−3, and study the Picard plot.

According to this plot, the problem looks innocent; nothing here reveals that the choice L = In might be bad.

Then compute the TSVD solutions and plot them. Notice that none of them are able to reconstruct the exact solution very well in the rightmost part of the plot. This is due to the particular structure in the right singular vectors vi; plot some of these vectors and explain which structure hurts the reconstruction.

Now generate the matrix L1(8.4) and a basis W for its null space by means of the call [L,W] = get_l(n,1). Then use the preconditioned CGLS method, implemented in the function pcgls, to compute the regularized solution with the smoothing normL1·2. This approach should produce better reconstructions.

Study the generalized right singular vectors xiand explain why.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Appendix A