2.3
Monotone Operators
We discussed in Section 2.2 how the proximal operator can help us generate a variety of iterative approximation schemes that find a minimizer of a (sum of) convex function(s). Except for the use of the proximal operator, these schemes are seemingly disparate. We will see in this section that the schemes presented above (and many more) can be put under a common framework if viewed through the lens of monotone operators. Casting the iterative schemes we have seen above in this framework simplifies signficantly the convergence analysis when it comes to devising new schemes, as will see in Chapter 4 of this work. The rough idea behind this modeling approach is that the problem of finding a minimizer of a convex program can be cast as an equivalent problem, namely that of finding a zero of a monotone operator. This problem is in turn transformed into finding a fixed point of a function related to the monotone operator. Finally, the fixed point is found by the fixed-point iteration, yielding an algorithm for the original problem.
Our discussion follows the style of [RB16], which gives a comprehensive, yet simple analysis on the topic of monotone operators. A more in-depth treatment can be found in the book [BC11]. We choose to perform the analysis on a Hilbert space setting since (i) it is more general and encompasses the Euclidean case and (ii) will be useful for the upcoming analysis in Chapter 4.
2.3.1 Basics on Monotone Operators
A relation or operator T in H is a subset of H × H, where H is a Hilbert space. We write u =
T (z) = T z to denote the set {u | (z, u) ∈ T }. The relation T has Lipschitz constant L if for all u∈ T (z) and v ∈ T (x) it holds that u − v ≤ Lz − x. If L < 1 we call T a contraction, while
if L ≤ 1, then T is called nonexpansive. A point z is a fixed point for T if z = T z, denoted as
z ∈ fix T . This is equivalent to z ∈ zer S, i.e., 0 ∈ zer Sz, where S = I − T and I is the identity
operator such that I ={(z, z) | z ∈ H}. The set of fixed points of a nonexpansive operator with full domain (dom T =H) is closed and convex. The operator is called averaged if T = (1 − θ)I + θG with θ ∈ (0, 1), where G is a nonexpansive operator. It follows that T is nonexpansive and has the same fixed points as G. Compositions of nonexpansive operators are nonexpansive, as well as compositions of averaged operators result in averaged operators.
Monotonicity is another important property of a relation. T is monotone ifz −x, T z −T x ≥ 0 for all z, x∈ H. The relation is maximal monotone if there is no monotone operator that properly contains it.
Two operators that play a crucial role in finding a zero of a relation are the resolvent of T , defined as JT = (I + T )−1, and the reflection of T defined as RT = 2JT− I. The following properties hold:
• If T is monotone, then JT and RT are nonexpansive functions.
• If T is maximal monotone, then JT and RT have full domainH.
• T shares the same fixed points with JT and RT.
In convex optimization problems, we are interested in specific operators, the most important of which is the subdifferential ∂f of a convex closed and proper function f : H → R ∪ {+∞}. If
The subdifferential is important because the solution to a convex optimization problem can be cast as finding a zero of the subdifferential, i.e., 0 ∈ ∂f(z). This can be equivalently written as z ∈ (I + ∂f)(z) ⇔ z ∈ (I + ∂f)−1(z) ⇔ z = J∂f(z). Hence the set of optimizers coincides with the fixed points of the resolvent and the reflection of the subdifferential. These operators take special forms in the framework of convex optimization. More specifically, the proximal operator introduced in (2.5) is the resolvent of the subdifferential of f evaluated at x, i.e., proxγf(x) = (I + γ∂f )−1(x) = arg min
z
f (z) + (1/(2γ))z − x2, γ > 0. Accordingly, the reflection operator is denoted asreflγf :H → H and is defined as reflγf = 2 proxγf− I.
The algorithms discussed in Section 2.2 can be derived as fixed-point iterations of functions associated to relevant monotone operators. The full derivations can be found in [RB16].
PPA Consider the function f ∈ Γ0(H) and the subdifferential operator A = ∂f. The proximal
operator is a function of A defined above as T = JγA = proxγf. The latter gives us access to the fixed-point iteration zk+1= T zk, which is the PPA for γk = γ > 0, and converges to a fixed point
z∗.
PGM Consider the problem of solving the inclusion 0∈ (A+B)(z) and assume that B is single- valued. By manipulating slightly the inclusion, one can derive the fixed-point iteration zk+1= T zk, where T = TATB and TA= JγA, TB = I− γB for γ lying inside a specified range of values. This scheme is known as the forward-backward splitting (FBS) iteration [Pas77].
Let us now consider the minimization of f (z) + g(z), f ∈ Γ0(Rn) differentiable with Lipschitz
continuous gradient Lf and g∈ Γ0(Rn). The problem can be cast as the inclusion 0∈ (A + B)(z), where A = ∂g and B =∇f. Applying the FBS, the resulting iteration is the PGM as given in (2.9) for γk= γ ∈ (0, 2/Lf), and converges to a fixed point z∗.
DRS Consider the same problem of minimizing f (z) + g(z), with f ∈ Γ0(Rn) and g ∈ Γ0(Rn).
Casting the problem as the inclusion 0∈ (A + B)(z), where A = ∂g and B = ∇f, we can introduce the operator T = 12(TATB+ I) by introducing the functions TA= RγA= reflγf and TB= RγB = reflγg. The resulting iteration is the DRS algorithm as given in equations (2.11),(2.12),(2.13) for
γk = γ > 0, and converges to a fixed point z∗.
As previously mentioned, viewing the iterative schemes via the lens of monotone operators allows for some generalizations. Along these lines, we introduce the following definitions:
Definition 7 (Cocoercivity.). The operator A is δ-cocoercive with δ > 0 if
x − y, Ax − Ay ≥ δAx − Ay2, ∀x, y ∈ H.
Definition 8 ((Quasi-)Strong monotonicity.). The operator A is μ-strongly monotone with
μ > 0 if
x − y, Ax − Ay ≥ μx − y2, ∀x, y ∈ H.
If the inequality holds only for y∈ zer A, i.e.,