In addition to the preliminaries introduced in Sec. 1.4, we also need the following in the chapter. Define the recession function of a PCC function f as
rec f (d) = lim
α→∞
f (x + αd)− f(x)
α (5.1)
for any x∈ dom f. Loosely speaking, the recession function characterizes the asymp- totic change of f as we go in direction d. In fact,
f (x + αd) = α rec f (d) + o(α)
as α → ∞ for any x ∈ dom f. The recession function rec f : Rn → R ∪ {∞} is a
positively homogeneous PCC function. If h(x) = g(−x), then rec(h∗)(d) = rec(g∗)(−d). When f and g are PCC, either f (x)+g(x) =∞ for all x ∈ Rnor rec(f +g) = rec f +rec g. If f is PCC, then σdom f∗ = rec f .
Define the proximal operator Proxf :Rn→ Rn as
Proxf(z) = arg min x∈Rn
n
f (x) + (1/2)kx − zk2o.
When f is PCC, the arg min uniquely exists, and therefore Proxf is well-defined. When C is closed and convex, ProxδC = ΠC. When f is PCC, Proxf+ Proxf∗ = I, where
I :Rn→ Rn is the identity operator.
A mapping T :Rn → Rnis nonexpansive ifkT (x)−T (y)k ≤ kx−yk for all x, y ∈ Rn. Nonexpansive mappings are, by definition, Lipschitz continuous with Lipschitz constant 1. T :Rn→ Rn is firmly-nonexpansive if
kT (x) − T (y)k2 ≤ hx − y, T (x) − T (y)i
5.2.1 Duality and primal subvalue
We call the optimization problem
minimize
x∈Rn f (x) + g(x), (P)
the primal problem. We call the optimization problem maximize
ν∈Rn −f
∗(ν)− g∗(−ν), (D)
the dual problem. Throughout this chapter, we assume f and g are PCC.
(P) is feasible if 0 ∈ dom f − dom g, strongly infeasible if 0 /∈ dom f − dom g, and weakly infeasible otherwise. (P) falls under exactly one of the three cases. (P) is infeasible if it is not feasible. (D) is feasible if 0 ∈ dom (f∗) + dom (g∗), strongly infeasible if 0 /∈ dom (f∗) + dom (g∗), and weakly infeasible otherwise.
We call p⋆ = inf{f(x)+g(x) | x ∈ Rn} the primal optimal value and d⋆ = sup{−f∗(ν)− g∗(−ν) | ν ∈ Rn} the dual optimal value. We let p⋆ =∞ if (P) is infeasible and d⋆ =−∞
if (D) is infeasible. Weak duality, which always holds, states d⋆ ≤ p⋆. We say strong duality holds between (P) and (D), if d⋆ = p⋆ ∈ [−∞, ∞]. We say total duality holds
between (P) and (D), if (P) has a solution, (D) has a solution, and strong duality holds. Define the primal subvalue of (P) as
p− = lim
ε→0+x,yinf∈Rn{f(x) + g(y) | kx − yk ≤ ε} .
The notion of primal subvalue is standard in conic programming [118, 234, 149, 147]. Here, we generalize it to general convex programs. The following theorem is well known [195], although we have not seen it stated exactly in this form.
Theorem 5.2.1. If f and g are PCC, then d⋆ = p− ≤ p⋆.
Proof. Write h(ν) = f∗(ν) + g∗(−ν), and define
p(δ) = min
x∈Rn{f(x + δ) + g(x)}.
Since f and g are proper, i.e., finite somewhere, p is proper. Since p is defined by partial minimization of a convex function, it is convex.
Then p∗(ν) =− min δ∈Rn{p(δ) − ν Tδ} =− min δ∈Rn min x∈Rn{f(x + δ) + g(x)} − ν Tδ =− min x∈Rn min δ∈Rn{f(x + δ) − ν T δ} + g(x) =− min x∈Rn min δ0∈Rn{f(δ 0)− νTδ0} + νTx + g(x) = f∗(ν)− min x∈Rn n νTx + g(x)o = f∗(ν) + g∗(−ν) = h(ν).
We can rewrite the definition of the primal subvalue as
p−= lim
ε→0kδk≤εinf p(δ) = lim infδ→0 p(δ),
where the second equality follows from the definition of lim inf. The lower semi- continuous hull of p is p∗∗ [195, Theorem 4 and 5], i.e.,
lim inf δ→0 p(δ) = p ∗∗(0). So p−= p∗∗(0) = h∗(0) = sup ν∈Rn{f ∗(ν) + g∗(−ν)} = d∗.
With Theorem 5.2.1, we can interpret strong duality as well-posedness of (P). The primal subvalue p−is the optimal value of (P) achieved with infinitesimal infeasibilities. When the infinitesimal infeasibilities provide a non-infinitestimal improvement to the function value, we can consider (P) ill-posed.
5.2.2 Douglas–Rachford operator
Douglas–Rachford splitting (DRS) applied to (P) is
xk+1/2= Proxγf(zk)
xk+1 = Proxγg(2xk+1/2− zk) (5.2)
zk+1 = zk+ xk+1− xk+1/2
with a starting point z0 ∈ Rn and a parameter γ > 0. We also express this iteration
more concisely as zk+1 = Tγ(zk) where Tγ =
1 2I +
1
2(2 Proxγg−I)(2 Proxγf−I).
Tγ :Rn→ Rn is a firmly-nonexpansive operator, and we interpret DRS as a fixed-point
iteration. Write T1 for Tγ with γ = 1.
The standard analysis of DRS assumes total duality, which, again, means (P) has a solution, (D) has a solution, and d⋆ = p⋆.
Theorem 5.2.2 (Theorem 7.1 and 8.1 of [13] and Proposition 4.8 of [80]). Total duality
holds between (P) and (D) if and only if Tγ has a fixed point for some γ > 0. If total duality holds between (P) and (D), then DRS converges in that zk → z⋆, where
x⋆ = Prox
γf(z⋆) is a solution of (P). If total duality does not hold between (P) and
(D), then DRS diverges in that kzkk → ∞.
Theorem 5.2.2 is well known, although the term “total duality” is not always used. More often, total duality is assumed by instead assuming a saddle point exists for an appropriate Lagrangian.
5.2.3 Fixed-point iterations without fixed points
Theorem 5.2.2 states the DRS iteration has no fixed points under pathologies. Ana- lyzing fixed-point iterations without fixed points is the first part of our pathological analysis.
Let T :Rn → Rn be a firmly-nonexpansive operator. Write
ran (I− T ) = {z − T (z) | z ∈ Rn}.
Note that T has a fixed point if and only if 0∈ ran (I − T ). The closure of this set,
ran (I− T ), is closed and convex [172]. We call
v = Πran (I−T )(0)
the infimal displacement vector of T . (The term was coined in [22].) If T has a fixed point, then v = 0, but v = 0 is possible even when T has no fixed point.
The following classical result by Pazy and Baillon et al. elegantly characterizes the asymptotic behavior of fixed-point iterations with respect to T .
Theorem 5.2.3 (Theorem 2 of [172] and Corollary 2.3 of [11]). If T is firmly-nonexpansive
and v is its infimal displacement vector, the iteration zk+1 = T (zk) satisfies
zk =−kv + o(k), zk+1− zk → −v.
Theorem 5.2.3 is especially powerful when we can concretely characterize v. Re- cently, Bauschke, Hare, and Moursi published the following elegant formula.
Theorem 5.2.4 ([23]). The infimal displacement vector v of T1, the DRS operator,
satisfies
v = arg minnkzk | z ∈ dom f − dom g ∩ dom f∗+ dom g∗o.
The original result in [23] is more general as it applies to the DRS operator of monotone operators. In Section5.3.1, we use Theorem5.2.4and the notion of improving directions to provide a further concrete characterization of v.