9.3 Linear Programming
9.3.3 Weak Duality
Consider the problems (PS) and (DS). Say that x is feasibleif x≥ 0 andAx=b. LetF be the set of feasiblex. Say thatyisfeasibleifATy≤c.
TheWeak Duality Theoremis the following:
Theorem 9.10 Let xandy be feasible vectors. Then
z=cTx≥bTy=w. (9.8)
Corollary 9.1 Ifz is not bounded below, then there are no feasible y. Corollary 9.2 Ifxandy are both feasible, and z=w, then bothxandy
The proof of the theorem and its corollaries are left as exercises.
The nonnegative quantitycTx−bTyis called theduality gap. Thecom- plementary slackness conditionsays that, for optimalxandy, we have
xj(cj−(ATy)j) = 0, (9.9)
for eachj, which says that the duality gap is zero. Primal-dual algorithms for solving linear programming problems are based on finding sequences
{xk} and{yk} that drive the duality gap down to zero [209].
9.3.4
Strong Duality
TheStrong Duality Theoremsmake a stronger statement. The following theorems are well known examples.
Theorem 9.11 If one of the problems (PS) or (DS) has an optimal solu- tion, then so does the other andz=wfor the optimal vectors.
Theorem 9.12 Gale’s Strong Duality Theorem[141] If both problems (PC) and (DC) have feasible solutions, then both have optimal solutions and the optimal values are equal.
Proof: We show that there are non-negative vectors x and y such that
Ax≥b, ATy≤c, andbTy−cTx≥0. It will then follow thatz =cTx=
bTy =w, so thatx andy are both optimal. In matrix notation, we want to findx≥0 andy≥0 such that
A 0 0 −AT −cT bT x y ≥ b −c 0 . (9.10)
We assume that there are nox ≥0 and y ≥0 for which the inequalities in (9.10) hold. Then, according to Theorem 9.7, there are non-negative vectorssandt, and non-negative scalarρsuch that
−AT 0 c 0 A −b s t ρ ≥0, (9.11) and −bT cT 0 s t ρ <0. (9.12)
Note that ρ cannot be zero, for then we would have ATs ≤ 0 and
At ≥ 0. Taking feasible vectors xand y, we would find that sTAx ≤ 0, which implies thatbTs≤0, and tTATy ≥0, which implies that cTt ≥0. Therefore, we could not also havecTt−bTs <0.
Writing out the inequalities, we have
ρcTt≥sTAt≥sT(ρb) =ρsTb.
Usingρ >0, we find that
cTt≥bTs,
which is a contradiction. Therefore, there do exist x≥0 and y ≥0 such thatAx≥b,ATy≤c, and bTy−cTx≥0.
In his book [141] Gale uses his strong duality theorem to obtain a proof of themin-maxtheorem in game theory (see [70]).
Part III
Algorithms
Chapter 10
Fixed-Point Methods
10.1 Chapter Summary . . . 144 10.2 Operators . . . 144 10.3 Contractions . . . 144 10.3.1 Lipschitz Continuity . . . 145 10.3.1.1 An Example: Bounded Derivative . . . 145 10.3.1.2 Another Example: Lipschitz Gradients . . . . 145 10.3.2 Non-expansive Operators . . . 145 10.3.3 Strict Contractions . . . 146 10.3.4 Eventual Strict Contractions . . . 147 10.3.5 Instability . . . 149 10.4 Gradient Descent . . . 149 10.4.1 Using Sequential Unconstrained Minimization . . . 149 10.4.2 Proving Convergence . . . 150 10.4.3 An Example: Least Squares . . . 151 10.5 Two Useful Identities . . . 151 10.6 Orthogonal Projection Operators . . . 152 10.6.1 Properties of the Operator PC . . . 15210.6.1.1 PC is Non-expansive . . . 153
10.6.1.2 PC is Firmly Non-expansive . . . 153
10.6.1.3 The Search for Other Properties ofPC . . . . 154
10.7 Averaged Operators . . . 154 10.7.1 Gradient Operators . . . 157 10.7.2 The Krasnoselskii-Mann Theorem . . . 157 10.8 Affine Linear Operators . . . 158 10.8.1 The Hermitian Case . . . 158 10.8.2 Example: Landweber’s Algorithm . . . 160 10.8.3 What ifB is not Hermitian? . . . 160 10.9 Paracontractive Operators . . . 160 10.9.1 Diagonalizable Linear Operators . . . 161 10.9.2 Linear and Affine Paracontractions . . . 163 10.9.3 The Elsner-Koltracht-Neumann Theorem . . . 163 10.10 Applications of the KM Theorem . . . 165 10.10.1 The ART . . . 165 10.10.2 The CQ Algorithm . . . 165 10.10.3 Landweber’s Algorithm . . . 166
10.10.4 Projected Landweber’s Algorithm . . . 167 10.10.5 Successive Orthogonal Projection . . . 167
10.1
Chapter Summary
In a broad sense, all iterative algorithms generate a sequence {xk} of
vectors. The sequence may converge for any starting vector x0, or may
converge only if thex0is sufficiently close to a solution. The limit, when it
exists, may depend onx0, and may, or may not, solve the original problem.
Convergence to the limit may be slow and the algorithm may need to be accelerated. The algorithm may involve measured data. The limit may be sensitive to noise in the data and the algorithm may need to be regularized to lessen this sensitivity. The algorithm may be quite general, applying to all problems in a broad class, or it may be tailored to the problem at hand. Each step of the algorithm may be costly, but only a few steps generally needed to produce a suitable approximate answer, or, each step may be easily performed, but many such steps needed. Although convergence of an algorithm is important, theoretically, sometimes in practice only a few iter- ative steps are used. In this chapter we consider several classes of operators that play important roles in applied linear algebra.
10.2
Operators
A functionT :RJ→
RJ is often called anoperatoronRJ. For most of the iterative algorithms we shall consider, the iterative step is
xk+1=T xk, (10.1)
for some operatorT. IfT is a continuous operator (and it usually is), and the sequence{Tkx0}converges to ˆx, thenTxˆ= ˆx, that is, ˆxis afixed point
of the operatorT. We denote by Fix(T) the set of fixed points of T. The convergence of the iterative sequence{Tkx0}will depend on the properties of the operatorT.
10.3
Contractions
Contraction operators are perhaps the best known class of operators associated with iterative algorithms.
10.3.1
Lipschitz Continuity
Definition 10.1 An operator T on RJ is Lipschitz continuous, with re-
spect to a vector norm || · ||, or L-Lipschitz, if there is a positive constant
Lsuch that
||T x−T y|| ≤L||x−y||, (10.2)
for allxandy in RJ.
10.3.1.1 An Example: Bounded Derivative
We know from the Mean Value Theorem that, for any differentiable functionf :R→R,
f(x)−f(y) =f0(c)(x−y),
where c is betweenx and y. Suppose that there is a positive constant L
such that|f0(c)| ≤L, for all realc. Then
|f(x)−f(y)| ≤L|x−y|,
for all x and y. Therefore, the function f is L-Lipschitz. The function
f(x) = 1
2cos(x) is 1
2-Lipschitz.
More generally, if f is a real-valued differentiable function of J real variables, that is,f :RJ→
R, and the gradient satisfiesk∇f(c)k2≤Lfor
allcinRJ, then
|f(x)−f(y)| ≤Lkx−yk2,
so thatf isL-Lipschitz, with respect to the 2-norm.
10.3.1.2 Another Example: Lipschitz Gradients
Iff :RJ →Ris twice differentiable andk∇2f(c)k2≤L, for allx, then
T =∇f isL-Lipschitz, with respect to the 2-norm.