4.3 Proximal splitting algorithms
4.3.1 Proximal gradient methods
Sum of two convex functions
The following optimisation problem is first considered, min
x∈CNF (x) ≡ f (x) + g(x), (4.15)
where it is assumed that f : CN → R is a continuously differentiable (C1) con-
vex function and g : CN → R is a convex function possibly nondifferentiable (see appendix A.3).
The proximal gradient method [108] to solve (4.15), also known as forward- backward splitting, is defined as the following iterative procedure
xk+1 ← proxρg(xk− ρ∇f (xk)), (4.16) where ρ > 0 is the step size (constant for all k or determined by line search). It is called forward-backward because at each iteration it uses the forward gradient step on f followed by a backward step on g, as the following decomposition suggests
xk+1/2←xk− ρ∇f (xk) xk+1 ← proxρg(xk+1/2).
(4.17)
When ∇f is Lipschitz continuous with constant L (see appendix A.3), this method has been shown to converge for a fixed step size ρ ∈ (0, 2/L), see Ref. [108]. Note that
Proximal splitting algorithms
the proximal gradient method can be seen as a generalisation of other algorithms: when g = 1C, the method reduces to the projection onto C,
xk+1 ← projC(xk− ρ∇f (xk)), (4.18) and is known as projected gradient method ; when g = 0, the method reduces to the standard gradient descent,
xk+1← xk− ρ∇f (xk). (4.19)
Finally, it reduces to the proximal point algorithm when f = 0,
xk+1← proxρg(xk), (4.20) which is also known as proximal iteration.
Since the proximal gradient method is slow in general, various methods have been proposed to accelerate it in particular by Nesterov [109] and Beck and Teboulle [110]. When the function f has a Lipschitz continuous gradient with constant L, these methods enjoy a fast rate of convergence on the objective function, i.e. F (xk)−F (x?)
decreases at least as fast as 1/k2with a fixed step size ρ = 1/L or suitable line search, although the actual convergence of sequences produced by these schemes is no longer guaranteed [100]. The fast proximal gradient method of Beck and Teboulle [110] known as fast iterative shrinkage-thresholding algorithm (FISTA) is
xk+1 ← proxρg(wk− ρ∇f (wk)) (4.21a) tk+1 ←1 2(1 + q 1 + 4(tk)2) (4.21b) wk+1 ←xk+1+t k− 1 tk+1 (x k+1− xk) (4.21c)
where ρ = 1/L. In this case the major difference with the proximal gradient method is that the proximal step is not just used on the previous point xk, but at a point wk
that uses a specific linear combination of the previous two points {xk+1, xk}. Note that the specific steps (4.21b) and (4.21c) emerge from the analysis of the rate of convergence of FISTA. We refer the reader to Ref. [111] for more details. A simpler alternative identified by Vandenberghe [112] reads
xk+1← proxρg(wk− ρ∇f (wk)) wk+1←xk+1+k − 2
k + 1(x
k+1− xk) (4.22)
for k ≥ 1.
Optimisation framework: proximal splitting methods
hence the presence of shrinkage-thresholding in the technique’s name. The class of IST algorithms can be seen as an extension of the classical Donoho–Johnstone shrinkage method [113]. As described here and later on by the same authors in Ref. [111], the method can be generalised using the proximal formalism and is generally referred to as fast proximal gradient (FPG) or accelerated proximal gradient (APG) method.
Sum of multiple convex functions
We are now interested in the case of multiple convex functions with potentially more than two functions. The following optimisation problem is considered,
x? = arg min x∈CN n F (x) ≡ f (x) + J X j=1 gj(x) o , (4.23)
where it is assumed that f : CN → R is a continuously differentiable convex function and gj : CN → R are convex functions possibly nondifferentiables. The formulation
of such objective functions enables the combination of multiple regularisation terms. Huang et al. [114] have proposed in 2011 a method for minimising the sum of convex functions named fast composite splitting (FCS) algorithm. In practice, only a simplified version of FCS can be used which consists of the following iterations,
uk+1j ← proxJ ρg j(w k− ρ∇f (wk)) (4.24a) xk+1 ←1 J J X j=1 uk+1j (4.24b) tk+1 ←1 2(1 + q 1 + 4(tk)2) (4.24c) wk+1 ←xk+1+t k− 1 tk+1 (x k+1− xk) (4.24d)
assuming ∇f is L-Lipschitz continuous and ρ = 1/L. This algorithm averages proximal gradient steps in (4.24b) and contains an acceleration strategy borrowed from FISTA. While the above described algorithm is very similar to FISTA, it has not been strictly proved to have the same convergence rate as FISTA.
More recently, Raguet et al. [115] have introduced the generalised forward- backward splitting (GFBS) method that minimises the sum of multiple convex func- tions. The algorithm consists of the following iterations
wjk+1←wk j + proxJ ρgj(2x k− wk j − ρ∇f (xk) − xk) xk+1←1 J J X j=1 wjk+1. (4.25)
Proximal splitting algorithms
When ∇f is L-Lipschitz, the method has been shown to converge when ρ < 2/L. As the name suggests it, this method can be seen as a generalisation of the forward- backward algorithm: for J = 1, this method simplifies to the forward-backward step (4.16).