Weak Duality - Linear Programming - Applied and Computational Linear Algebra: A First Course -

9.3 Linear Programming

9.3.3 Weak Duality

Consider the problems (PS) and (DS). Say that x is feasibleif x≥ 0 andAx=b. LetF be the set of feasiblex. Say thatyisfeasibleifAT_y_≤_c_.

TheWeak Duality Theoremis the following:

Theorem 9.10 Let xandy be feasible vectors. Then

z=cTx≥bTy=w. (9.8)

Corollary 9.1 Ifz is not bounded below, then there are no feasible y. Corollary 9.2 Ifxandy are both feasible, and z=w, then bothxandy

The proof of the theorem and its corollaries are left as exercises.

The nonnegative quantitycTx−bTyis called theduality gap. Thecom- plementary slackness conditionsays that, for optimalxandy, we have

xj(cj−(ATy)j) = 0, (9.9)

for eachj, which says that the duality gap is zero. Primal-dual algorithms for solving linear programming problems are based on finding sequences

{xk_} _and_{_yk_} _{that drive the duality gap down to zero [209].}

9.3.4 Strong Duality

TheStrong Duality Theoremsmake a stronger statement. The following theorems are well known examples.

Theorem 9.11 If one of the problems (PS) or (DS) has an optimal solution, then so does the other andz=wfor the optimal vectors.

Theorem 9.12 Gale’s Strong Duality Theorem[141] If both problems (PC) and (DC) have feasible solutions, then both have optimal solutions and the optimal values are equal.

Proof: We show that there are non-negative vectors x and y such that

Ax≥b, ATy≤c, andbTy−cTx≥0. It will then follow thatz =cTx=

bTy =w, so thatx andy are both optimal. In matrix notation, we want to findx≥0 andy≥0 such that

  A 0 0 −AT −cT _bT   x y ≥   b −c 0  . (9.10)

We assume that there are nox ≥0 and y ≥0 for which the inequalities in (9.10) hold. Then, according to Theorem 9.7, there are non-negative vectorssandt, and non-negative scalarρsuch that

−AT ₀ _c 0 A −b   s t ρ  ≥0, (9.11) and −bT _cT ₀   s t ρ  <0. (9.12)

Note that ρ cannot be zero, for then we would have ATs ≤ 0 and

At ≥ 0. Taking feasible vectors xand y, we would find that sTAx ≤ 0, which implies thatbTs≤0, and tTATy ≥0, which implies that cTt ≥0. Therefore, we could not also havecTt−bTs <0.

Writing out the inequalities, we have

ρcTt≥sTAt≥sT(ρb) =ρsTb.

Usingρ >0, we find that

cTt≥bTs,

which is a contradiction. Therefore, there do exist x≥0 and y ≥0 such thatAx≥b,AT_y_≤_c_{, and} _bT_y₋_cT_x_≥_0.

In his book [141] Gale uses his strong duality theorem to obtain a proof of themin-maxtheorem in game theory (see [70]).

Part III

Algorithms

Chapter 10 Fixed-Point Methods

10.1 Chapter Summary . . . 144 10.2 Operators . . . 144 10.3 Contractions . . . 144 10.3.1 Lipschitz Continuity . . . 145 10.3.1.1 An Example: Bounded Derivative . . . 145 10.3.1.2 Another Example: Lipschitz Gradients . . . . 145 10.3.2 Non-expansive Operators . . . 145 10.3.3 Strict Contractions . . . 146 10.3.4 Eventual Strict Contractions . . . 147 10.3.5 Instability . . . 149 10.4 Gradient Descent . . . 149 10.4.1 Using Sequential Unconstrained Minimization . . . 149 10.4.2 Proving Convergence . . . 150 10.4.3 An Example: Least Squares . . . 151 10.5 Two Useful Identities . . . 151 10.6 Orthogonal Projection Operators . . . 152 10.6.1 Properties of the Operator PC . . . 152

10.6.1.1 PC is Non-expansive . . . 153

10.6.1.2 PC is Firmly Non-expansive . . . 153

10.6.1.3 The Search for Other Properties ofPC . . . . 154

10.7 Averaged Operators . . . 154 10.7.1 Gradient Operators . . . 157 10.7.2 The Krasnoselskii-Mann Theorem . . . 157 10.8 Affine Linear Operators . . . 158 10.8.1 The Hermitian Case . . . 158 10.8.2 Example: Landweber’s Algorithm . . . 160 10.8.3 What ifB is not Hermitian? . . . 160 10.9 Paracontractive Operators . . . 160 10.9.1 Diagonalizable Linear Operators . . . 161 10.9.2 Linear and Affine Paracontractions . . . 163 10.9.3 The Elsner-Koltracht-Neumann Theorem . . . 163 10.10 Applications of the KM Theorem . . . 165 10.10.1 The ART . . . 165 10.10.2 The CQ Algorithm . . . 165 10.10.3 Landweber’s Algorithm . . . 166

10.10.4 Projected Landweber’s Algorithm . . . 167 10.10.5 Successive Orthogonal Projection . . . 167

10.1 Chapter Summary

In a broad sense, all iterative algorithms generate a sequence {xk_} _of

vectors. The sequence may converge for any starting vector x0_{, or may}

converge only if thex0_{is sufficiently close to a solution. The limit, when it}

exists, may depend onx0_{, and may, or may not, solve the original problem.}

Convergence to the limit may be slow and the algorithm may need to be accelerated. The algorithm may involve measured data. The limit may be sensitive to noise in the data and the algorithm may need to be regularized to lessen this sensitivity. The algorithm may be quite general, applying to all problems in a broad class, or it may be tailored to the problem at hand. Each step of the algorithm may be costly, but only a few steps generally needed to produce a suitable approximate answer, or, each step may be easily performed, but many such steps needed. Although convergence of an algorithm is important, theoretically, sometimes in practice only a few iterative steps are used. In this chapter we consider several classes of operators that play important roles in applied linear algebra.

10.2 Operators

A functionT :_RJ_→

RJ is often called anoperatoronRJ. For most of the iterative algorithms we shall consider, the iterative step is

xk+1=T xk, (10.1)

for some operatorT. IfT is a continuous operator (and it usually is), and the sequence{Tk_x0_}_{converges to ˆ}_x_{, then}_T_x_ˆ_{= ˆ}_x_{, that is, ˆ}_x_{is a}_{fixed point}

of the operatorT. We denote by Fix(T) the set of fixed points of T. The convergence of the iterative sequence{Tkx0}will depend on the properties of the operatorT.

10.3 Contractions

Contraction operators are perhaps the best known class of operators associated with iterative algorithms.

10.3.1 Lipschitz Continuity

Definition 10.1 An operator T on RJ is Lipschitz continuous, with re-

spect to a vector norm || · ||, or L-Lipschitz, if there is a positive constant

Lsuch that

||T x−T y|| ≤L||x−y||, (10.2)

for allxandy in _RJ_.

10.3.1.1 An Example: Bounded Derivative

We know from the Mean Value Theorem that, for any differentiable functionf :_R→_R,

f(x)−f(y) =f0(c)(x−y),

where c is betweenx and y. Suppose that there is a positive constant L

such that|f0(c)| ≤L, for all realc. Then

|f(x)−f(y)| ≤L|x−y|,

for all x and y. Therefore, the function f is L-Lipschitz. The function

f(x) = 1

2cos(x) is 1

2-Lipschitz.

More generally, if f is a real-valued differentiable function of J real variables, that is,f :_RJ_→

R, and the gradient satisfiesk∇f(c)k2≤Lfor

allcin_RJ_{, then}

|f(x)−f(y)| ≤Lkx−yk2,

so thatf isL-Lipschitz, with respect to the 2-norm.

10.3.1.2 Another Example: Lipschitz Gradients

Iff :RJ →Ris twice differentiable andk∇2f(c)k2≤L, for allx, then

T =∇f isL-Lipschitz, with respect to the 2-norm.

In document Applied and Computational Linear Algebra: A First Course - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 164-171)