Contributions - Geometric numerical integration for optimisation

Their numerical results suggest that this method is competitive with state-of-art methods for nonconvex variational optimisation problems.

Going beyond gradient flows in Euclidean space, Celledoni et al. [44] extend the discrete gradient method to solve Riemannian gradient flow systems on manifolds. In this setting, they prove that the iterates of the method converge to a set of stationary points. They apply the method to solve eigenvalue problems, as well as imaging problems that can naturally be formulated on manifolds.

For some reviews of the field of geometric numerical integration, we refer the reader to [105, 115, 147].

In summary of this section, numerical integration has in recent years shown great promise in providing new perspectives and frameworks for addressing challenges in optimisation. As we detail in the next section, we are interested in various optimisation problems, including those concerning nonsmooth energies, and we are interested in the use of discrete gradient methods from geometric numerical integration applied in this setting.

1.2 Contributions

In what follows, we summarise the motivations for and contributions of each chapter.

Chapter 3: The foundations of discrete gradient methods for smooth

optimisation

This chapter is based on the preprint [81], which is joint work done in collaboration with Matthias J. Ehrhardt, Torbjørn Ringholm, and Carola-Bibiane Schönlieb. The purpose of this chapter is to provide a comprehensive analysis of discrete gradient methods for the optimisation of continuously differentiable functions. While these optimisation methods have already been applied in various contexts for variational regularisation problems [100, 185], linear systems [153], and preserving Lyapunov functions [108], various aspects of the theoretical analysis have until now been lacking.

In this chapter, we address several issues, including convergence rates of the methods, well-posedness of the discrete gradient equation (2.8), and how to solve (2.8) efficiently. In particular, in Theorem 3.4 we prove for the three main discrete gradient methods that the discrete gradient equation admits a solution for all time steps. Furthermore, we prove that the discrete gradient methods essentially inherit the convergence rates of explicit gradient descent, yielding O(1/k) rates for convex functions, and linear rates for strongly convex functions. Meanwhile, we propose a novel scheme for solving the discrete gradient equation, which we

demonstrate to be theoretically and numerically superior in certain cases. Furthermore, we propose and study a natural generalisation of the Itoh–Abe discrete gradient method, akin to randomised coordinate descent and random pursuit methods. The theory is supported with numerical experiments.

Chapter 4: Discrete gradient methods for nonsmooth, nonconvex opti-

misation

This chapter is based on the preprint [184], which is joint work done in collaboration with Matthias J. Ehrhardt, G. R. W. Quispel, and Carola-Bibiane Schönlieb. In this chapter, we consider the Itoh–Abe discrete gradient method for solving nonsmooth, nonconvex optimisation problems. Since this discrete gradient is derivative-free, it provides us with a notion of gradient flow-type dissipation in a black-box setting where we only have access to function evaluations.

We consider the Clarke subdifferential framework [54], defined in Section 2.5, for locally Lipschitz continuous functions. In this setting, we prove for randomised extensions of the Itoh–Abe discrete gradient method, as well as deterministic variants, that the iterates converge to a limit set of Clarke stationary points. Convergence guarantees in the deterministic case is based on a property termed cyclical density. While the analysis in this chapter can be used for discrete gradient methods, they are immediately generalisable to other line search-based, derivative-free methods in the Clarke subdifferential setting, thus allowing for optimality analysis for a wider class of derivative-free optimisation algorithms. Noting that many bilevel problems are nonsmooth, nonconvex, and challenging to compute gradients for, we consider the proposed methods for solving these problems. Furthermore, we compare with state-of-art derivative-free optimisation algorithms, thereby demonstrating the competitiveness of the proposed methods.

Chapter 5: Discrete gradient methods for nonsmooth, nonconvex, con-

strained optimisation

In this chapter, we build on the analysis of the previous chapter for derivative-free optimisation of nonsmooth, nonconvex functions, by extending the algorithm and convergence analysis to constrained optimisation problems. The reason for this is that parameter-optimisation problems often involve constraints on the parameters, varying from explicit constraints to more complicated, implicitly defined constraints. We study this problem in a general setting, only assuming that the constraint is epi-Lipschitzian [188], which essentially means it is the

1.2 Contributions 13

level set of a locally Lipschitz continuous function. The Clarke subdifferential framework is extended to define stationary points constrained to a set, and in this framework, we prove that the algorithm converges to a set of stationary points.

Chapter 6: Bregman discrete gradient methods for sparse optimisation

This chapter is based on the article [20] published in the Journal of Mathematical Imaging and Vision, and which is joint work done in collaboration with Martin Benning and Carola- Bibiane Schönlieb. While in the previous chapters, we look at discrete gradient methods applied to gradient flows, in this chapter we consider discrete gradient methods applied to the inverse scale space flow [201], which is a dissipative differential system closely related to Bregman iterative methods. This system allows us to incorporate additional structure into the scheme, to promote sparsity or other features of the objective function and the ground truth. We study the Itoh–Abe discrete gradient method applied to this flow, and prove convergence in a nonsmooth, nonconvex subdifferential framework. We implement this method for different Bregman distances and objective functions, generalising well-known methods such as Gauss-Seidel and successive-over-relaxation (SOR) for sparse optimisation. Through numerical experiments, we observe that for sparse ground truths, the Bregman discrete gradient methods converge significantly faster than regular SOR. Furthermore, the analysis in this chapter opens the door for the application of discrete gradient methods to other, non-Euclidean gradient flows.

Chapter 7: Differentiation for nonsmooth bilevel optimisation

In this chapter, we focus exclusively on bilevel optimisation problems, seeking to exploit structured nonsmoothness of the corresponding variational problem to differentiate with respect to the parameters. To do so, we employ the framework of partial smoothness [130]. For a large class of bilevel problems, we demonstrate piecewise differentiability of the solution mapping in Theorem 7.29, allowing us to characterise the Clarke subdifferential of the bilevel objective function. Furthermore, we prove for various forward-backward type algorithms, including accelerated variants, that the algorithmic derivatives converge to the limiting, implicit derivative in Theorem 7.33.

In document Geometric numerical integration for optimisation (Page 37-40)