Well-posedness and convergence - Geometric numerical integration for optimisation

In this section, we prove that the Bregman discrete gradient method (6.6) is well-defined. Furthermore, we prove that all accumulation points of the iterates (xk)_k_∈Ndefined via (6.6) are Clarke stationary points.

6.3.1 Well-posedness

Lemma 6.2. For any τ > 0, xk∈ Rn_{, and p}k_{∈ ∂ J(x}k_{), there exists an update (x}k+1_{, ˜}_pk+1₌

pk+1+ qk+1) that satisfies (6.6).

Proof. As (6.6) consists of successive scalar updates, it is sufficient to consider a scalar problem, v : R → R, j : R → R. For x ∈ R and p ∈ ∂ j(x) we either want y ̸= x such that

p− τv(y) − v(x)

y− x ∈ ∂ j(y), (6.7) or y = x and p − τw ∈ ∂ j(x), for some w ∈ ∂ v(x).

If such a w exists, we are done. Otherwise, we have min{vo(x; 1), vo(x; −1)} < 0 and may assume that vo(x; 1) < 0. In this case, we will show that there exists y > x such that (6.7) holds.

Since p − τvo(x; 1) > p and p ∈ ∂ j(x), we deduce that p − τvo(x; 1) > ∂ j(x). By the outer semicontinuity of subdifferentials and definition of Clarke directional derivatives, there is δ > 0 such that

p− τv(y) − v(x)

y− x > ∂ j(y) for all y ∈ (x, x + δ ). On the other hand, as v is bounded below,

6.3 Well-posedness and convergence 121

is bounded below for all y ∈ [x + δ , +∞), while by µ-convexity of j, we have ∂ j(y) ≥ ∂ j(x) + µ (y − x) for all y ∈ [x + δ , +∞). Hence, there is r ≫ 0 such that

p− τv(y) − v(x)

y− x < ∂ j(y) for all y ≥ x + r.

By continuity of v, and by outer semicontinuity of subdifferentials, it follows that there exists y∈ (x + δ , x + r) that solves (6.7). This concludes the proof.

Corollary 6.3. If F : Rn→ R is convex, then there exists a unique solution (xk+1_{, ˜}_pk+1₌

pk+1+ qk+1) to (6.6).

Proof. The existence of a solution to (6.6) is guaranteed by Lemma 6.2. To establish uniqueness, we argue as follows. An update yk,i must satisfy

pk_i− τk,i

F(yk,i) − F(yk,i−1) xk+1_i − xk

∈ [∂ J(yk,i)]i.

The left-hand-side is non-increasing with respect to xk+1_i , due to the difference quotient term of a convex function F, while the right-hand side is strictly increasing, due to the strong convexity of J. Hence there cannot be two distinct solutions for yk,i_i to the scalar equation. This implies uniqueness of the update.

Remark 6.4. If the update is stationary, i.e. xk+1_i = xk_i, then the subgradient update pk+1_i is unique only up to the choice of subderivative v_i∈ [∂ F(yk,i−1)]i.

6.3.2 Convergence theorem

Lemma 6.5. Let F : Rn→ R, J : Rn_{→ R, and Ω satisfy Assumption 6.1, and let (x}k_{, p}k₎ k∈N

be iterates that solve(6.6) for time steps (τk)k∈N⊂ [τmin, τmax]. Then the following properties

hold.

(i) F(xk+1) ≤ F(xk). (ii) lim_k→∞∥xk+1_{− x}k_{∥ = 0.}

(iii) If F is coercive, then there exists a convergent subsequence of(xk, pk)_k_∈N.

(iv) The set of limit points S is compact, connected, and has empty interior. Furthermore, F is single-valued on S.

As F is bounded below and (F(xk))_k_∈N is decreasing, F(xk) → F∗ for some limit F∗. Therefore, by (6.5), F(x0) − F∗= ∞

∑

k=0 F(xk) − F(xk+1) ≥ ∞

∑

k=0 µ τk ∥xk− xk+1∥2≥ µ τmax ∞

∑

k=0 ∥xk− xk+1∥2.

This implies property (ii).

Properties (iii) and (iv) follow from (i) and (ii) and are proven respectively in Lemma 4.3 and Lemma 4.4.

We now state and prove the main result of this chapter.

Theorem 6.6. Let the sequence of iterates (xk, pk)k∈Nsolve(6.6) for time steps (τk)k∈N⊂

[τmin, τmax]. Then all accumulation points x∗∈ S are Clarke stationary points restricted to Ω.

Proof. Let x∗∈ S and consider a convergent subsequence (xkj₎

j∈N. We want to show

for each basis vector ei that either Fo(x∗; ei) ≥ 0 or x∗_i = ui, and analogously that either

Fo(x∗; −ei) ≥ 0 or x∗_i = li. As the arguments are identical, we only consider the first case.

Suppose for contradiction that Fo(x∗; ei) < −η for some η > 0, and that x∗_i < ui. By the

definition of the Clarke directional derivative, there are ε, δ > 0 such that for all x ∈ Bε(x∗)

and λ ∈ (0, δ ), we have

F(x + λ ei) − F(x)

λ ≤ −

2. (6.8)

Since xkj → x∗_{and ∥x}kj+1− xkj∥ → 0, for each N ∈ N there exists K such that for all j ≥ K,

we have xk∈ B_ε(x∗) and ∥xk− xk+1_{∥ < δ for k = k}

j, kj+ 1, . . . , kj+ N. By making ε > 0

sufficiently small, we have Bε(x∗i) < ui. Furthermore, since xik+1≥ xki for k = kj, . . . , kj+

N− 1, we deduce that the constraint component qk

i is zero. By (6.8), it follows that

pk_ij− pkj+N i = N−1

∑

k=kj pk_i− pk+1_i = N−1

∑

k=kj τ_ikF(y

k,i_{) − F(y}k,i−1₎₎

xk+1_i − xk i ≤ −τmin N−1

∑

k=kj η 2 = −Nτmin η 2. (6.9)

By Assumption 6.1, ∂ jiis bounded on U = Bε(x∗) ∩ [li, ui]. Since pk, ji , . . . , p kj+N

i ∈ ∂ ji(U ),

we can choose N such that Nτminη₂ > max ∂ ji(U ) − min ∂ ji(U ) and arrive at a contradiction.

In document Geometric numerical integration for optimisation (Page 146-149)