Lecture17

(1)

Chapter 11 Optimization with equality

constraints

11.1 First order necessary conditions for

con-strained local maximum

11.1.1 Single equality constraint

Consider a set A ⇢ Rn _{and two functions} _f _:_A _! _R _and _g _: _A_! _R_{. The}

setC =_{x₂A:g(x) = 0_}is referred to as the constraint set. The following is a typical optimization problem:

maxf(x) subject to x2C. (11.1)

Definition 11.1 (Local maximum). A point x⇤ 2 C is a point of local maximum off subject to the constraint g(x) = 0 if there exists an open ball aroundx⇤_, _B_✏(_x⇤_{) such that} _f₍_x⇤₎ _f₍_x_{) for all} _x₂_B_✏(_x⇤₎_\_C_.

Definition 11.2 (Global maximum). A point x⇤ ₂ _C _{is a point of global} maximum of f subject to the constraint g(x) = 0 if it solves the problem 11.1.

Local and global minimum can be defined in an analogous manner.

11.1.2 Lagrange Method

Theorem 11.1(Lagrange: single equality constraint). LetA⇢Rn_{be open,}

and f : A_! R, g :A _!R be C1 _{functions on}_A_{. Suppose}_x⇤ _{is a point of}

(2)

84 CHAPTER 11. OPTIMIZATION WITH EQUALITY CONSTRAINTS

local maximum or local minimum off subject tog(x) = 0. Further suppose

rg(x⇤₎₆_{= 0. Then there is} ⇤ ₂_R _{such that}

rf(x⇤) = ⇤rg(x⇤) (11.2) The n conditions in 11.2 and the constraint condition g(x) = 0 together are referred to as first order conditions for a constrained local maximum or local minimum.

There is a convenient method to express the conclusion of Theorem 11.2: Consider the function L:A_⇥R_!Rgiven by

L(x, ) =f(x) g(x). (11.3) The functionLis referred to as the Lagrangian and is referred to as the La-grangian multiplier. Now consider the problem of finding the unconstrained local maximum or minimum ofL. It has the first order conditions:

DiL(x, ) = 0, i= 1, . . . , n+ 1,

which givesDif(x) = Dig(x) for i= 1, . . . , n+ 1 and g(x) = 0. Note that

the first n conditions are exactly the same as in the statement of Theorem 11.2. This method of expressing the conditions of a constrained optimization problem is known as the Lagrangian multiplier method.

Remark. The condition_rg(x)₆= 0 is known as the constraint qualification. Theorem 11.2 may not be applicable without the constraint qualification. For instance, consider f(x1, x2) = x1 +x2 and g(x1, x2) = x21 +x22 for all (x1, x2) 2 R2. Check that the conclusion of Theorem 11.2 does not hold with respect to the problem maxx1,x2f(x1, x2) subject to g(x1, x2) = 0 in

this case. This is because the constraint set is a singleton, containing the point (0,0), at which the constraint qualification is not satisfied.

11.1.3 Multiple equality constraints

Suppose there are m equality constraints given by gj(x) = 0, j = 1, . . . , m. Then the constraint set is

C ={x2A:gj(x) = 0, j = 1, . . . , m}.

The constraint qualification involves the Jacobian derivative of the constraint functions:

Dg(x⇤) =

2 6 4

@g1

@x1(x

⇤₎ _{· · ·} @g1

@xn(x

⇤₎ ... . .. ... @gm

@x1(x

⇤₎ _{· · ·} @gm

@xn(x

⇤₎

(3)

The natural generalization of the constraint qualification to the case of multi-ple constraints is that the rank ofDg(x⇤_{) must be equal to}_m_{. This condition} is referred to as the non-degenerate constraint qualification. It implies that the constraint set has a well-definedn mdimensional tangent plane every-where.

We will use the following definition.

Definition 11.3 (Critical point). A point x is called a critical point of

g = (g1_{, . . . , g}m_{) if the rank of}_Dg₍_x_{) is less than} _m_.

Theorem 11.2 (Lagrange: multiple equality constraints). Let A ⇢ Rn _be

open, and f : A _! R, gj _: _A _! _R_, _j _{= 1}_{, . . . , m}_{, be} _C1 _{functions on} _A_. Suppose x⇤ _{is a point of local maximum or local minimum of} _f _{subject to}

gj₍_x_{) = 0,} _j _{= 1}_{, . . . , m}_{. Further suppose the rank of} _Dg₍_x⇤_{) is} _m_{. Then} there exist ( ⇤

1, . . . , ⇤m) 2 Rm such that (x⇤, ⇤) is a critical point of the

Lagrangian

L(x, ) =f(x) 1g1(x) · · · mgm(x), (11.4)

i.e.,

@L

@xi

(x⇤, ⇤) = 0, i= 1, . . . , n;

@L

@ j

(x⇤, ⇤) = 0, j = 1, . . . , m;

Proof. We first claim that the (m+ 1)⇥n Jacobian matrix

2 6 6 6 4

@f

@x1(x

⇤₎ _{· · ·} @f

@xn(x

⇤₎

@g1

@x1(x

⇤₎ _{· · ·} @g1

@xn(x

⇤₎ ... . .. ... @gm

@x1(x

⇤₎ _{· · ·} @gm

@xn(x

⇤₎

3 7 7 7 5

does not have maximal rank. Letf(x⇤_{) =} _c_{. We know that} _x⇤ _{is a solution} of

f(x) = c g1(x) = 0 ... ... ...

(4)

Suppose the Jacobian matrix above has full rank. Then by the Implicit Function Theorem 8.9, we can find a solutionx⇤⇤ _{to the system}

f(x) = c+✏

g1₍_x_{) = 0} ... ... ...

gm(x) = 0

where ✏ is a small positive number. Then f(x⇤⇤) > f(x⇤) and gj₍_x⇤_{) = 0} forj = 1, . . . , m, contradicting our assumption that x⇤ _maximizes _f _subject to the constraints gj₍_x_{) = 0,} _j _{= 1}_{, . . . , m}_{. Consequently, the (}_m_{+ 1)}_⇥_n

matrix does not have maximal rank. This implies that them+ 1 rows of this matrix are linearly dependent, i.e., there exist scalars ↵0,↵1, . . . ,↵m not all

zero such that

↵0

2 6 4

@f

@x1(x

⇤₎ ... @f

@xn(x

⇤₎

3 7 5+↵1

2 6 4

@g1

@x1(x

⇤₎ ... @g1

@xn(x

⇤₎

3 7

5+· · ·+↵m

2 6 4

@gm

@x1(x

⇤₎ ... @gm

@xn(x

⇤₎ 3 7 5= 2 6 4 0 ... 0 3 7 5. (11.5)

Now we claim that ↵0 6= 0: otherwise, there exist scalars ↵1, . . . ,↵m not all zero such that

↵1

2 6 4

@g1

@x1(x

⇤₎ ... @g1

@xn(x

⇤₎

3 7

5+_{· · ·}+↵m

2 6 4

@gm

@x1(x

⇤₎ ... @gm

@xn(x

⇤₎ 3 7 5= 2 6 4 0 ... 0 3 7 5.

i.e., them rows ofDg(x⇤_{) are not independent, or}_Dg₍_x⇤_{) does not have full} rank. Hence contradiction.

Finally, divide 11.5 through by ↵0 and writing _↵↵₀i = i, i = 1, . . . , m, to

obtain

rf(x⇤) 1rg1(x⇤) · · · mrgm(x⇤) = 0.

Hence the claim.

11.2 Second order necessary conditions

(5)

g(x) = 0. Further suppose rg(x⇤)6= 0. Then there is ⇤ 2R such that

rf(x⇤) = ⇤rg(x⇤) (11.6) and y0HL(x⇤, ⇤)y0 for all y:y·rg(x⇤) = 0, (11.7)

where L(x, ⇤_{) =} _f₍_x₎ ⇤_g₍_x_{) and} _H

L(x⇤, ⇤) is the Hessian matrix of

L(x, ⇤_{) with respect to} _x_{evaluated at (}_x_, ⇤_).

The second order necessary condition for a local minimum would require

y0HL(x⇤, ⇤)y 0 for all y:y·rg(x⇤) = 0.

When there are m equality constraints, the condition _rg(x⇤₎ ₆_{= 0 is} replaced by the condition that Dg(x⇤_{) has full rank} _m_.

11.3 Sufficient conditions for constrained

lo-cal maximum

Theorem 11.4. Let A _⇢ Rn _{be open, and} _f _: _A _! _R_, _g _: _A _! _R _be _C2 functions on A. Suppose (x⇤_, ⇤₎₂_C_⇥_R _and

rf(x⇤) = ⇤_rg(x⇤) (11.8) and y0HL(x⇤, ⇤)y<0 for ally6= 0 : y·rg(x⇤) = 0, (11.9)

where L(x, ⇤) = f(x) ⇤g(x) and HL(x⇤, ⇤) is the Hessian matrix of

L(x, ⇤_{) with respect to} _x _{evaluated at (}_x_, ⇤_{). Then} _x⇤ _{is a point of local} maximum of f subject to g(x) = 0.

The second order sufficient condition for constrained local minimum is

y0HL(x⇤, ⇤)y>0 for all y6= 0 :y·rg(x⇤) = 0.

There is a convenient way of checking the second order condition 11.9 as given in the following result.

Theorem 11.5. LetA be ann_⇥n symmetric matrix and b be ann-vector with b1 6= 0. Consider the (n+ 1)⇥(n+ 1) matrix

S =



0 b

b A .

If_|S_|has the same sign as ( 1)n _{and the last} _n _{1 leading principal minors}

of S alternate in sign, then y0_A_y _< _{0 for all} _y ₆_{= 0 such that} _y_·_b _{= 0. If}

|S| and the last n 1 leading principal minors of S are all negative, then

(6)

Consequently, we have to check the signs of the leading principal minors of



0 rg(x⇤)

rg(x⇤₎ _H

L(x⇤, ⇤) .

Here we provide a restricted proof of this result when n = 2: given two

C2 _functions _f _and _g _on _R2_{, consider the problem of maximizing} _f _{on the} constraint setCg ={(x, y)2R2 :g(x, y) = 0}.

We form the Lagrangian

L(x, y, ) =f(x, y) g(x, y).

Suppose (x⇤_{, y}⇤_, ⇤_{) satisfies} @L

@x = 0,

@L

@y = 0,

@L

@ = 0, and

0 @_@g_x @_@g_y @g

@x

@2_L

@x2 @

2_L

@x@y

@g

@y

@2_L

@x@y

@2_L

@y2

>0 at (x⇤, y⇤, ⇤).

We will show that (x⇤_{, y}⇤_{) maximizes} _f _on_C

g.

By the second condition above, either @_@_xg ₆= 0 or _@@g_y ₆= 0. Without loss of generality, let @_@g_y ₆= 0. Then by the Implicit Function Theorem 8.7Cg can be

written as the graph of aC1 function y= (x) around (x⇤, y⇤):

h(x, (x)) = C for all xnear x⇤. (11.10)

Di↵erentiating this expression with respect to x, we get

@g

@x(x, (x)) +

@g

@y(x, (x))

0₍_x_{) = 0}_, _(11.11)

or, 0(x) = @g

@x(x, (x))

@g

@y(x, (x))

. (11.12)

Let F(x) = f(x, (x)) be f evaluated on Cg. Note that it is a function

of one unconstrained variable. Consequently, if F0₍_x⇤_{) = 0 and} _F00₍_x⇤₎_< _0, then x⇤ _{will be a local maximum of} _F _{and (}_x⇤_{, y}⇤_{) = (}_x⇤_, ₍_x⇤_{)) will be a} local constrained maximum of f. Now, adding ⇤ times (11.11) to

F0(x) = @f

@x(x, (x)) +

@f

@y(x, (x))

0₍_x₎_, _(11.13)

(7)

F0(x⇤) =

✓

@f

@x(x

⇤_{, y}⇤₎ ⇤₍@g

@x(x

⇤_{, y}⇤₎

◆

+

✓

@f

@y(x

⇤_{, y}⇤₎ ⇤₍@g

@y(x

⇤_{, y}⇤₎

◆

0₍_x⇤₎

= @L

@x(x

⇤_{, y}⇤_{) +} @L

@y(x

⇤_{, y}⇤₎ 0₍_x⇤₎_. _(11.14)

By the hypothesis of this result, F0₍_x⇤_{) = 0.}

Di↵erentiating (11.14) again at x⇤_{, setting}_y⇤ _{= (}_x⇤_),

F00(x⇤) = @ 2_L

@x2 + 2

@2_L

@x@y

0₍_x⇤_{) +} @2L

@y2

0₍_x⇤₎2

= @ 2_L

@x2 + 2

@2_L

@x@y

@g

@x(x, (x))

@g

@y(x, (x))

!

+ @ 2_L

@y2

@g

@x(x, (x))

@g

@y(x, (x))

!2

= 1 (@_@g_y)2

"

@2_L

@x2

✓ @g @y ◆2 2 @ 2_L

@x@y

@g

@x

@g

@y +

@2_L

@y2

✓

@g

@x

◆2#

which is negative by the hypothesis of this result. Hence F(x) =f(x, (x)) has a local maximum at x⇤, and therefore, f restricted to Cg has a local

maximum at (x⇤_{, y}⇤_).

This result is generalized to the case of m equality constraints below.

Theorem 11.6. Let A _⇢ Rn _{be open, and} _f _: _A _! _R_, _gj _: _A _! _R_,

j = 1, . . . , m, be C2 _{functions on} _A_{. Suppose (}_x⇤_, ⇤₎₂_C_⇥_Rm _and

@L

@xi

(x⇤, ⇤) = 0, i= 1, . . . , n; (11.15)

@L

@ j

(x⇤, ⇤) = 0, j = 1, . . . , m; (11.16)

and y0HL(x⇤, ⇤)y<0 for all y6= 0 :Dg(x⇤)·y= 0, (11.17)

where L(x, ⇤) = f(x) ⇤₁g1₍_x₎ _{· · ·} ⇤

mgm(x) and HL(x⇤, ⇤) is the

Hessian matrix of L(x, ⇤_{) with respect to} _x _{evaluated at (}_x⇤_, ⇤_{). Then} _x⇤ is a point of local maximum off subject togj₍_x_{) = 0,} _j _{= 1}_{, . . . , m}_.

The second order sufficient condition for a local minimum isy0_H

L(x⇤, ⇤)y>

0 for all y6= 0 : Dg(x⇤)·y= 0.

(8)

Theorem 11.7. Consider the quadratic form Q(x) = x0Ax restricted to a constraint set given bym linear equationsBx= 0. Construct the (n+m)_⇥ (n+m) matrix

S =



0 B B0 A .

If_|S_|has the same sign as ( 1)n_{and the last}_{n m}_{leading principal minors of}

S alternate in sign, thenQ is negative definite on the constraint setBx= 0. If_|S_| and the lastn m leading principal minors ofS have the same sign as ( 1)m_{, then} _Q_{is positive definite on the constraint set} _B_x_{= 0.}

Consequently, we have to check the signs of the leading principal minors of



0 rg(x⇤₎

rg(x⇤) HL(x⇤, ⇤) .

11.4 Sufficient conditions for constrained global

maximum

Theorem 11.8. Let A ⇢ Rn _{be an open convex set, and} _f _: _A _! _R_,

gj :A!R,j = 1, . . . , m, beC1 functions onA. Suppose (x⇤, ⇤)2C⇥Rm

and _rf(x⇤₎

1rg1(x⇤) · · · mrgm(x⇤) = 0. If L(x, ⇤) = f(x)

⇤

1g1(x) · · · ⇤mgm(x) is concave (resp, convex) in x on A, then x⇤ is

a point of global maximum (resp. minimum) of f subject to gj(x) = 0,

j = 1, . . . , m.

11.5 Solving optimization problems

The results above suggest two ways of solving an optimization problem:

• use the conditions laid down in Theorem 11.8;