Numerical integration - An overview of optimisation and numerical integration

1.1 An overview of optimisation and numerical integration

1.1.3 Numerical integration

Numerical integration is the study of numerical methods for solving systems of differential equations. In recent years, this field has received increasing attention from the mathematical optimisation community, due to the idea that optimisation schemes can be understood through their relation to continuous-time differential systems and the numerical integration methods that connect them. We illustrate this with some examples.

The relevance of numerical integration to optimisation should not be surprising, considering that explicit and implicit gradient descent (1.2), (1.3), the two, main building blocks of first-order optimisation methods can be viewed as the forward and backward Euler method [114] respectively applied to the differential system known as the gradient flow,

x(t) = −∇F(x(t)), x(0) = x0_{∈ R}n, t≥ 0. (1.12) Furthermore, their stability properties, e.g. unconditional dissipativity of (1.3) with respect to the time step τk, can be inferred from the properties of the Euler method.

A prominent example of the study of differential equations and numerical integration to address challenges in optimisation is that of understanding the acceleration phenomenon. In 1983, Nesterov introduced accelerated gradient descent [157] as a method that matches the optimal convergence rate of O(1/k2) for first-order methods on L-smooth, convex functions. Since the resurgence of first-order methods in the era of big data and high-dimensional optimisation, acceleration techniques have received significant attention in the past decade,

1.1 An overview of optimisation and numerical integration 9

for solving problems such as compressed sensing [16], training of deep and recurrent neural networks [210], and sparse linear regression [14].

In spite of its prevalence, the underlying dynamics of acceleration schemes are not well-understood, prompting several recent approaches to identify a framework in which to understand these schemes, taking perspectives from numerical integration. Su et al. [208] and Wibisono et al. [219] identify second-order ordinary differential equations (ODEs) that can be seen as continuous-time limits of the acceleration schemes. In the former case, this enables them to explain the oscillatory behaviour of acceleration scheme by interpreting the ODEs as damping systems. In the latter case, they present a family of Bregman Lagrangian functionalswhich generate the original and new acceleration schemes. Furthermore, they demonstrate that the choice of ODE discretisation method is crucial for whether the acceleration phenomena is retained in the iterative scheme.

Several works have contributed to this setting of numerical analysis of acceleration methods which bridges continous-time and discrete-time dynamics. Wilson et al. [220] approach this from the perspective of Lyapunov theory, presenting Lyapunov functions accounting for both continuous- and discrete-time dynamics. Betancourt et al. [23] present a framework of sympletic optimisation, i.e. considering perspectives of Hamiltonian dynamics and symplectic structure-preserving methods.

In a similar vein, recent papers by Maddison et al. [144] and França et al. [92] have studied conformal Hamiltonian systems, with the former focusing on how information about the the objective function’s convex conjugate can be incorporated to obtain stronger convergence rates, and the latter on structure-preserving numerical methods and their relation to different iterative schemes.

Another central issue for iterative optimisation schemes is the choice of time step τ_k, which is closely tied to stability analysis of numerical methods. In this context, tools from numerical integration can be used to formulate iterative schemes that allow for the use of larger time steps and therefore faster progression towards the minimum. Eftekhari et al. [80] achieve this for strongly convex problems, by formulating explicit stabilised descent methods that use explicit Runge–Kutta methods to maximise the total length of time steps [1]. The theoretical analysis demonstrates robustness with respect to the objective function’s condition number, and in numerical examples the method is shown to outperform accelerated gradient descent.

Other numerical integration methods include implicit Runge-Kutta methods, where energy dissipation is ensured under mild time step restrictions [104]. Finally we mention that one may consider gradient flows under non-Euclidean metrics, such as Bregman distances (see Chapter 6) and the Wasserstein metric [5, 200] (see Section 8.3.1).

Geometric numerical integration

In this thesis, we are particularly interested in geometric structure-preserving methods, which is the domain of geometric numerical integration. As described by Iserles & Quispel in ‘Why Geometric Numerical Integration?’[115], differential equations may exhibit geometric invariants, such as conservation laws of Hamiltonian energies, or Lie point symmetries, each of which imply that the solution to the differential equations is restricted to some lower-dimensional manifold. One is then interested in numerical methods which preserve these structures in some sense.

We highlight one class of methods from geometric integration, namely discrete gradient methods[97, 116, 148, 183]. These are designed for differential equations that can be written in linear-gradient-form, i.e.

x(t) = A(x(t))∇F(x(t)), (1.13) where A is a matrix-valued function. By applying the chain rule, we derive

dF(x(t))

dt = ⟨∇F(x(t)), A(x(t))∇F(x(t))⟩,

from which one can observe that the system is conservative, i.e. F is constant along x(t), if A is skew-symmetric, i.e. A∗= −A. Similarly, we observe that the system is dissipative if A is negative-definite i.e. −A is positive-definite—see Section 2.3. In fact, [148, Proposition 2.1 & Proposition 2.8] show that conservative and dissipative systems can in general be expressed in linear-gradient form.

Discrete gradient methods preserve the geometric structures of linear-gradient systems, e.g. energy conservation and dissipation laws, as well as Lyapunov functions. Furthermore, the methods are unconditionally stable, in the sense that these properties are preserved for all discretisation time steps τk> 0. This has prompted the study of discrete gradient methods

applied to gradient flows for solving optimisation problems. We give some examples. Grimm et al. [100] propose using discrete gradient methods for solving variational regularisation problems in image analysis. The applications include image inpainting and denoising, and they prove that for continuously differentiable objective functions, the methods converge to a set of stationary points. Furthermore, they compare the stability properties with other methods, such as Euler methods. In a similar setting, Ringholm et al. [185] consider the Itoh–Abediscrete gradient method for solving image inpainting problems regularised with Euler’s elastica. These are nonconvex optimisation problems whose gradients are expensive to compute, while the Itoh–Abe discrete gradient (defined in Section 2.7) is derivative-free.

In document Geometric numerical integration for optimisation (Page 34-37)