3.9 Numerical experiments
3.9.6 Comparison of methods for solving the discrete gradient equation
We test the numerical performance of four methods for solving the discrete gradient equation (2.8), building on the fixed point theory in Section 3.4.
3.9 Numerical experiments 63 0 5 10 15 20 25 30 iterates 10-15 10-10 10-5 100 relative objective DG, = 10-2 CD, = 10-2 0 20 40 60 80 iterates 10-15 10-10 10-5 100 relative objective DG, = 10-4 CD, = 10-4 0 20 40 60 80 100 iterates 10-4 10-3 10-2 10-1 100 relative objective DG, = 10-8 CD, = 10-8
Fig. 3.8 Comparison of CD to DG for three values of ε, 10−2, 10−4, and 10−8. The time steps are set to τDG=
√
τCDwhere the latter time step is set to 1/L.
The first method, denoted F, is the fixed point updates (3.10) proposed in [164] (θ = 1). The second method, denoted R, is the relaxed fixed point method (3.11), where θ is optimised according to (3.12) if F is convex, and is otherwise set to 1/2. The third method, denoted F+R, is also the updates (3.11) with θ = 1 by default, but whenever the discrepancy ∥T (yk+1) − yk+1∥ is greater than ∥T (yk) − yk∥, then the update is repeated with θ set to half
its previous value. This third option might be desirable in cases where θ = 1 is expected to give faster convergence but also be unstable. The fourth method is the built-in solver scipy.optimize.fsolve in Python.
To test these methods, we performed 50 iterations of the discrete gradient method for different test problems, where at each iterate the discrete gradient solver would run until
∥rk∥∞< ε, where rik:= y k i − yk−1i yk−1i if y k−1 i ̸= 0, and r k i := yki otherwise,
for a specified tolerance ε > 0, or until k reaches a given maximum Kmax. We then compare
0 10 20 30 40 iterates 10-3 10-2 10-1 100 relative objective DG, = 0.0250 CD, = 0.0002 CD, = 0.0025 CD, = 0.0250 CD, = 0.2500
Fig. 3.9 Comparison of different time steps for CD vs fixed time step for DG. For smaller time steps, the CD iterates decrease too slowly, and for larger steps, they become unstable and fail to decrease.
0 5 10 15 iterates 109 10-3 10-2 10-1 100 relative objective DG LS
Fig. 3.10 Comparison of DG to simple backtracking line search (LS) in terms of coordinate evaluations.
point for a significant number of the iterations (> 10%), we consider the method inapplicable for that test problem.
We test the methods for the mean value discrete gradient applied to three of the previous test problems, for ε = 10−6 and 10−12. We have not included results for the Gonzalez discrete gradient and other tolerances, as the results were largely the same.
The results are given in Table 3.2. We see that R is superior in stability, being the only method that locates the minimiser in every case. In all cases, R or F+R were the most efficient or close to the most efficient method. However, the relative performance of the different methods varies notably for the different test problems. This suggests that optimising for θ would require it to be tuned according to the optimisation problem, e.g. by an initial line search procedure.
3.10 Conclusion and outlook 65
Table 3.2 Average CPU time (s) over 50 iterations of (2.8) with the mean value discrete gradient. Tolerance ε = 10−6.
Test problem F R F + R fsolve ε
Linear system (3.28) N/A (0.003) 0.006 0.002 0.190 10−6 Logistic regression (3.29) 0.001 0.016 0.001 N/A (0.054)
Nonconvex problem (3.30) N/A (0.019) 0.003 N/A (0.020) N/A (0.427)
Linear system (3.28) N/A (0.011) 0.012 0.005 0.206 10−12 Logistic regression (3.29) 0.055 0.037 0.019 N/A (0.076)
Nonconvex problem (3.30) N/A (0.033) 0.005 N/A (0.031) 0.513
3.10
Conclusion and outlook
In this chapter, we have studied the discrete gradient method for optimisation, and provided several fundamental results on well-posedness, convergence rates and optimal time steps. We have focused on four methods, using the Gonzalez discrete gradient, the mean value discrete gradient, the Itoh–Abe discrete gradient, and a randomised version of the Itoh–Abe method. Several of the proven convergence rates match the optimal rates of classical methods such as gradient descent and stochastic coordinate descent. For the Itoh–Abe discrete gradient method, the proven rates are better than previously established rates for comparable methods, i.e. cyclic coordinate descent methods [221].
There are open problems to be addressed in future work. First, similar to acceleration for gradient descent and coordinate descent [15, 157, 159, 221], we will study acceleration of the discrete gradient method to improve the convergence rate from O(1/k) to O(1/k2).
Chapter 4
Discrete gradient methods for
nonsmooth, nonconvex optimisation
4.1
Introduction
This chapter is based on the preprint [184], and is joint work with Matthias J. Ehrhardt, G. R. W. Quispel, and Carola-Bibiane Schönlieb.
In the previous chapter, we studied and provided analyis for discrete gradient methods in the continuously differentiable setting. In this chapter, we switch the focus to nonsmooth, nonconvex optimisation problems.
Thus we consider the unconstrained problem min
x∈RnF(x), (4.1)
where the objective function F is locally Lipschitz continuous, bounded below and coercive. The function may be nonconvex and nonsmooth, and we assume no knowledge besides point evaluations x 7→ F(x). To solve (4.1), we consider generalised Itoh–Abe type methods, namely the randomised Itoh–Abe methods studied in Chapter 3, as well as a deterministic variant. In this chapter, we therefore seek to extend discrete gradient methods from the differentiable setting to the nonsmooth setting.
Itoh–Abe methods
We recall the Itoh–Abe scalar update (3.4), defined via
xk+17→ xk− τkαkdk+1, where αk̸= 0 solves αk= −
F(xk− τkαkdk+1) − F(xk)
τkαk
We thus refer to αk as the implicit solution to this scalar equation, and consider the following algorithm.
Algorithm 1 Generalised Itoh–Abe method
Input: starting point x0, directions (dk)k∈N, time steps (τk)k∈N.
for k = 0, 1, 2, . . . do
Update xk+1= xk− τkαkdk+1 via (3.4)
end for