4.5 Examples
4.5.2 Bilevel parameter learning in image analysis
In this subsection, we consider the Itoh–Abe method for solving bilevel optimisation problems for the learning of parameters of variational imaging problems. We restrict our focus to denoising problems, although the same method could be applied to any inverse problem. We first consider one-dimensional bilevel problems with wavelet and TV denoising, and two- dimensional problems with TGV denoising. In the TGV case, we compare the randomised Itoh–Abe method to the Py-BOBYQA and LT-MADS methods. Throughout this section, we set M = n, where n = 1, 2.
Setup for variational regularisation problem
Consider an image x†∈ L2(Ω), for some domain Ω ⊂ R2, and a noisy image
4.5 Examples 91
To recover a clean image from the noisy one, we consider a parametrised family of regularis- ers,
n
Rϑ : L2(Ω) → [0, ∞] : ϑ ∈ [0, ∞)no, and solve the variational regularisation problem
xϑ ∈ arg min x 1 2∥x − f δ∥2+ R ϑ(x). (4.8)
We list some common regularisers in image analysis. Total variation (TV) [36, 196] is given by the function Rϑ(x) := ϑ TV(x), where ϑ ∈ [0, ∞), and
TV(x) := sup Z
Ω
x(y) div φ (y) dy : φ ∈ Cc1(Ω; Rd), ∥φ ∥∞≤ 1
.
This is one of the most common regularisers for image denoising. See Figure 4.5 for an ex- ample of denoising with TV regularisation. We also consider its second-order generalisation, total generalised variation[29, 28], Rϑ(x) = TGV2ϑ(x), where ϑ = [ϑ1, ϑ2]
T ∈ [0, ∞)2and
TGV2ϑ(x) := sup
Z
Ω
x(y) div2φ (y) dy : φ ∈ Cc2(Ω; Sym2(Rd)), ∥ divlφ ∥∞≤ ϑl+1, l = 0, 1
.
Recall that we defined the discrete variants of TV and TGV in Chapter 1. Recall that for a linear operator W on L2(Ω), the basis pursuit regulariser
Rϑ(x) := ϑ ∥W x∥1
promotes sparsity of the image x in the dictionary of W .
As illustrated in Figure 4.5, the quality of the reconstruction is sensitive to ϑ . If ϑ is too low, the reconstruction is too noisy, while if ϑ is too high, too much detail is removed. As it is generally not possible to ascertain the optimal choice of ϑ a priori, a significant amount of time and effort is spent on parameter tuning. It is therefore of interest to improve our understanding of optimal parameter choices. One approach is to learn suitable parameters from training data. This requires a desired reconstruction x†, noisy data fδ, and a scoring
function Φ : L2(Ω) → R that measures the error between x†and the reconstruction xϑ. The bilevel optimisation problem is given by
ϑ∗∈ arg min
ϑ ∈[0,∞)n
In our case, we have strong convexity in the data fidelity term, which implies that xϑ is unique for each ϑ ∈ [0, ∞)n. We can therefore define a mapping
F(ϑ ) := Φ(xϑ). (4.10)
The bilevel problem (4.9) is difficult to tackle, both analytically as well as numerically. In most cases, the lower level problem (4.8) does not have a closed form formulation. Instead, a reconstruction xϑ is approximated numerically with an algorithm.
For the numerical experiments in this chapter, we reparametrise F(ϑ ) as F(exp(ϑ )), where the exponential operator is applied elementwise on the parameters. There are two reasons for doing so. The first reason is that this chapter is concerned with unconstrained optimisation, and this parametrisation allows us to optimise on Rn instead of [0, ∞)n. The second reason is that exp(ϑ ) has been found to be a preferable scaling for purposes of numer- ical optimisation. Note that in Chapter 5 we extend the Itoh–Abe optimisation framework to nonsmooth, nonconvex, constrained optimisation problems.
Wavelet denoising
We consider the wavelet denoising problem
xϑ = arg min x∈L2(Ω) 1 2∥x − f δ∥2+ ϑ ∥W x∥ 1,
where W is a wavelet transform. In particular, W is an orthonormal basis, which implies that the regularisation problem has the unique solution
xϑ = W−1S(W fδ, ϑ ),
where S is the shrinkage operator given in (1.5). We first optimise ϑ for the scoring function
Φ(x) :=1 2∥x − x
†∥2.
We set the parameters of the Itoh–Abe method to ε = 10−4, τmin= 10−1, τmax= 10, and
4.5 Examples 93 103 102 101 100 101 102 103 103 104 F ( ) 0 1 2 3 9
(a) Plot with labels. (b) k = 0. ϑ = 1.50 × 102. (c) k = 1. ϑ = 1.02.
(d) k = 2. ϑ = 4.42 × 10−3. (e) k = 3. ϑ = 1.99 × 10−1. (f) k = 9. ϑ = 1.04 × 10−1.
Fig. 4.6 Wavelet denoising with L2scoring function and the Itoh–Abe method. Top left: Plot of iterates of the Itoh–Abe method. The rest: Image denoising results at different iterates k.
We also optimise ϑ with respect to the scoring function Φ(x) := 1 − SSIM(x, x†), where SSIM is the structural similarity function [218]
SSIM(x, y) := (2µxµy+ c)(2σxy+C) (µ2
x+ µy2+ c)(σx2+ σy2+C)
.
Here µxis the mean intensity of x, σx is the unbiased estimate of the standard deviation of x,
and σxy is the correlation coefficient between x and y:
µx:= 1 m m
∑
i=1 xi, σx:= 1 m− 1 m∑
i=1 (xi− µx)2 !12 , σxy:= 1 m− 1 m∑
i=1 (xi− µx)(yi− µy).We set the parameters of the Itoh–Abe method to ε = 10−4, τmin= 10−3, τmax= 103, and
η = 10−2. See Figure 4.7 for the numerical results.
Total variation denoising
We consider the TV denoising problem
xϑ = arg min
x∈L2(Ω)
1 2∥x − f
103 102 101 100 101 102 100 2 × 101 3 × 101 4 × 101 6 × 101 F ( ) 0 2 4 5 8
(a) Plot with labels. (b) k = 0. ϑ = 10.0. (c) k = 2. ϑ = 4.71.
(d) k = 4. ϑ = 1.15. (e) k = 5. ϑ = 6.23 × 10−2. (f) k = 8. ϑ = 1.54 × 10−1.
Fig. 4.7 Wavelet denoising with SSIM scoring function and the Itoh–Abe method. Top left: Plot of iterates of the Itoh–Abe method. The rest: Image denoising result at different iterates k.
with the SSIM scoring function. We solve the above denoising problem using the PDHG method [50]. We set the parameters of the Itoh–Abe method to ε = 10−4, τmin = 10−5,
τmax= 9 × 10−4, and η = 10−5. See Figure 4.8 for the numerical results.
Total generalised variation regularisation
We now consider the second-order total generalised variation (TGV) regulariser for denoising, Rϑ1,ϑ2(x) = TGV2ϑ
1,ϑ2(x), with the scoring function
Φ(x) := 1 − SSIM(x, x†).
Like for TV denoising, we solve the denoising problem using the PDHG method. We set the parameters of the randomised Itoh–Abe (RIA) method to ε = 10−1, τmin= 10−3, τmax= 105,
and η = 10−20. See Figure 4.9 for the numerical results.
We compare these results to the results from the Py-BOBYQA and LT-MADS solvers. We set the parameters of Py-BOBYQA to rhobeg = 2, rhoend = 10−10and npt = 2(n + 1) and the parameters of LT-MADS to DIRECTION_TYPE = LT 2N. See the results for two different starting points in Figure 4.10 and 4.11. We note that the objective function is approximately stationary across a range of values, which leads to the different points of
4.5 Examples 95 104 102 100 102 3 × 101 4 × 101 6 × 101 F ( ) 0 1 2 5
(a) Plot with labels. (b) k = 0. ϑ = 2.00 × 10−1.
(c) k = 1. ϑ = 3.71 × 10−3. (d) k = 2. ϑ = 4.12 × 10−2. (e) k = 5. ϑ = 2.31 × 10−2.
Fig. 4.8 TV denoising with SSIM scoring function and the Itoh–Abe method. Top left: Plot of iterates of the Itoh–Abe method. The rest: Image denoising result at different iterates k, with a zoom to show the difference.
1 03 1 02 1 01 1 00 1 01 1 02 1 03 2 1 03 1 02 1 01 1 00 1 01 1 02 1 03 1 1 8 0 6 1 0 2 9 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 (a) (b) j = 0, ϑ1= 6.70 × 10−3, ϑ2= 6.10 × 10−1. (c) j = 6, ϑ1= 5.96, ϑ2= 25.5. (d) j = 10, ϑ1= 4.09 × 10−1, ϑ2= 10.4. (e) j = 18, ϑ1= 1.43 × 10−1, ϑ2= 7.99 × 10−1. (f) j = 29, ϑ1= 8.87 × 10−2, ϑ2= 1.55.
Fig. 4.9 TGV denoising with SSIM scoring function and the Itoh–Abe method. Top left: Plot of iterates of the method. The rest: Image denoising result at different function evaluations j.
4.6 Conclusion and outlook 97