Bilevel parameter learning in image analysis

4.5 Examples

4.5.2 Bilevel parameter learning in image analysis

In this subsection, we consider the Itoh–Abe method for solving bilevel optimisation problems for the learning of parameters of variational imaging problems. We restrict our focus to denoising problems, although the same method could be applied to any inverse problem. We first consider one-dimensional bilevel problems with wavelet and TV denoising, and two- dimensional problems with TGV denoising. In the TGV case, we compare the randomised Itoh–Abe method to the Py-BOBYQA and LT-MADS methods. Throughout this section, we set M = n, where n = 1, 2.

Setup for variational regularisation problem

Consider an image x†∈ L2_{(Ω), for some domain Ω ⊂ R}2_{, and a noisy image}

4.5 Examples 91

To recover a clean image from the noisy one, we consider a parametrised family of regularisers,

R_ϑ : L2(Ω) → [0, ∞] : ϑ ∈ [0, ∞)no, and solve the variational regularisation problem

x_ϑ ∈ arg min x 1 2∥x − f δ_∥2_{+ R} ϑ(x). (4.8)

We list some common regularisers in image analysis. Total variation (TV) [36, 196] is given by the function Rϑ(x) := ϑ TV(x), where ϑ ∈ [0, ∞), and

TV(x) := sup Z

Ω

x(y) div φ (y) dy : φ ∈ C_c1(Ω; Rd), ∥φ ∥∞≤ 1

This is one of the most common regularisers for image denoising. See Figure 4.5 for an ex- ample of denoising with TV regularisation. We also consider its second-order generalisation, total generalised variation[29, 28], Rϑ(x) = TGV2ϑ(x), where ϑ = [ϑ1, ϑ2]

T _{∈ [0, ∞)}2_and

TGV2_ϑ(x) := sup

Ω

x(y) div2φ (y) dy : φ ∈ C_c2(Ω; Sym2(Rd)), ∥ divlφ ∥∞≤ ϑl+1, l = 0, 1

Recall that we defined the discrete variants of TV and TGV in Chapter 1. Recall that for a linear operator W on L2(Ω), the basis pursuit regulariser

R_ϑ(x) := ϑ ∥W x∥1

promotes sparsity of the image x in the dictionary of W .

As illustrated in Figure 4.5, the quality of the reconstruction is sensitive to ϑ . If ϑ is too low, the reconstruction is too noisy, while if ϑ is too high, too much detail is removed. As it is generally not possible to ascertain the optimal choice of ϑ a priori, a significant amount of time and effort is spent on parameter tuning. It is therefore of interest to improve our understanding of optimal parameter choices. One approach is to learn suitable parameters from training data. This requires a desired reconstruction x†, noisy data fδ_{, and a scoring}

function Φ : L2_{(Ω) → R that measures the error between x}†and the reconstruction x_ϑ. The bilevel optimisation problem is given by

ϑ∗∈ arg min

ϑ ∈[0,∞)n

In our case, we have strong convexity in the data fidelity term, which implies that x_ϑ is unique for each ϑ ∈ [0, ∞)n. We can therefore define a mapping

F(ϑ ) := Φ(xϑ). (4.10)

The bilevel problem (4.9) is difficult to tackle, both analytically as well as numerically. In most cases, the lower level problem (4.8) does not have a closed form formulation. Instead, a reconstruction x_ϑ is approximated numerically with an algorithm.

For the numerical experiments in this chapter, we reparametrise F(ϑ ) as F(exp(ϑ )), where the exponential operator is applied elementwise on the parameters. There are two reasons for doing so. The first reason is that this chapter is concerned with unconstrained optimisation, and this parametrisation allows us to optimise on Rn instead of [0, ∞)n. The second reason is that exp(ϑ ) has been found to be a preferable scaling for purposes of numerical optimisation. Note that in Chapter 5 we extend the Itoh–Abe optimisation framework to nonsmooth, nonconvex, constrained optimisation problems.

Wavelet denoising

We consider the wavelet denoising problem

x_ϑ = arg min x∈L2_(Ω) 1 2∥x − f δ_∥2_{+ ϑ ∥W x∥} 1,

where W is a wavelet transform. In particular, W is an orthonormal basis, which implies that the regularisation problem has the unique solution

x_ϑ = W−1S(W fδ_{, ϑ ),}

where S is the shrinkage operator given in (1.5). We first optimise ϑ for the scoring function

Φ(x) :=1 2∥x − x

†_∥2_.

We set the parameters of the Itoh–Abe method to ε = 10−4, τmin= 10−1, τmax= 10, and

4.5 Examples 93 103 ₁₀2 ₁₀1 ₁₀0 ₁₀1 ₁₀2 ₁₀3 103 104 F ( ) 0 1 2 3 9

(a) Plot with labels. (b) k = 0. ϑ = 1.50 × 102. (c) k = 1. ϑ = 1.02.

(d) k = 2. ϑ = 4.42 × 10−3. (e) k = 3. ϑ = 1.99 × 10−1. (f) k = 9. ϑ = 1.04 × 10−1.

Fig. 4.6 Wavelet denoising with L2scoring function and the Itoh–Abe method. Top left: Plot of iterates of the Itoh–Abe method. The rest: Image denoising results at different iterates k.

We also optimise ϑ with respect to the scoring function Φ(x) := 1 − SSIM(x, x†), where SSIM is the structural similarity function [218]

SSIM(x, y) := (2µxµy+ c)(2σxy+C) (µ2

x+ µy2+ c)(σx2+ σy2+C)

Here µxis the mean intensity of x, σx is the unbiased estimate of the standard deviation of x,

and σxy is the correlation coefficient between x and y:

µx:= 1 m m

∑

i=1 x_i, σx:= 1 m− 1 m

∑

i=1 (x_i− µx)2 !1₂ , σxy:= 1 m− 1 m

∑

i=1 (x_i− µx)(yi− µy).

We set the parameters of the Itoh–Abe method to ε = 10−4, τmin= 10−3, τmax= 103, and

η = 10−2. See Figure 4.7 for the numerical results.

Total variation denoising

We consider the TV denoising problem

x_ϑ = arg min

x∈L2_(Ω)

1 2∥x − f

103 ₁₀2 ₁₀1 ₁₀0 ₁₀1 ₁₀2 100 2 × 101 3 × 101 4 × 101 6 × 101 F ( ) 0 2 4 5 8

(a) Plot with labels. (b) k = 0. ϑ = 10.0. (c) k = 2. ϑ = 4.71.

(d) k = 4. ϑ = 1.15. (e) k = 5. ϑ = 6.23 × 10−2. (f) k = 8. ϑ = 1.54 × 10−1.

Fig. 4.7 Wavelet denoising with SSIM scoring function and the Itoh–Abe method. Top left: Plot of iterates of the Itoh–Abe method. The rest: Image denoising result at different iterates k.

with the SSIM scoring function. We solve the above denoising problem using the PDHG method [50]. We set the parameters of the Itoh–Abe method to ε = 10−4, τmin = 10−5,

τmax= 9 × 10−4, and η = 10−5. See Figure 4.8 for the numerical results.

Total generalised variation regularisation

We now consider the second-order total generalised variation (TGV) regulariser for denoising, R_ϑ₁_,ϑ₂(x) = TGV2_ϑ

1,ϑ2(x), with the scoring function

Φ(x) := 1 − SSIM(x, x†).

Like for TV denoising, we solve the denoising problem using the PDHG method. We set the parameters of the randomised Itoh–Abe (RIA) method to ε = 10−1, τmin= 10−3, τmax= 105,

and η = 10−20. See Figure 4.9 for the numerical results.

We compare these results to the results from the Py-BOBYQA and LT-MADS solvers. We set the parameters of Py-BOBYQA to rhobeg = 2, rhoend = 10−10and npt = 2(n + 1) and the parameters of LT-MADS to DIRECTION_TYPE = LT 2N. See the results for two different starting points in Figure 4.10 and 4.11. We note that the objective function is approximately stationary across a range of values, which leads to the different points of

4.5 Examples 95 104 102 100 ₁₀2 3 × 101 4 × 101 6 × 101 F ( ) 0 1 2 5

(a) Plot with labels. _{(b) k = 0. ϑ = 2.00 × 10}−1_.

Fig. 4.8 TV denoising with SSIM scoring function and the Itoh–Abe method. Top left: Plot of iterates of the Itoh–Abe method. The rest: Image denoising result at different iterates k, with a zoom to show the difference.

1 03 1 02 1 01 1 00 1 01 1 02 1 03 2 1 03 1 02 1 01 1 00 1 01 1 02 1 03 1 1 8 0 6 1 0 2 9 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 (a) (b) j = 0, ϑ1= 6.70 × 10−3, ϑ2= 6.10 × 10−1. (c) j = 6, ϑ1= 5.96, ϑ2= 25.5. (d) j = 10, ϑ1= 4.09 × 10−1, ϑ2= 10.4. (e) j = 18, ϑ1= 1.43 × 10−1, ϑ2= 7.99 × 10−1. (f) j = 29, ϑ1= 8.87 × 10−2, ϑ2= 1.55.

Fig. 4.9 TGV denoising with SSIM scoring function and the Itoh–Abe method. Top left: Plot of iterates of the method. The rest: Image denoising result at different function evaluations j.

4.6 Conclusion and outlook 97

In document Geometric numerical integration for optimisation (Page 116-123)