Homework 1

(1)

Homework 1

Ana Huaman

March 9, 2011

1. In class it was claimed that given a function h : <m_{→ <, its derivative}

∂h ∂u

T

points in the direction in which h grows the most. Show that this is indeed the case

Solution

By definition, the directional derivative ∇vh represents the instantaneous

rate of change of the function h, moving in the direction of v. If h is differentiable in u, we have: ∇vh = ∇h · v lim α→0 h(u + αv) − h(u) α = ∂h ∂u T · v For property of escalar multiplication:

lim α→0 h(u + αv) − h(u) α = k ∂h ∂u T kkvk cos θ (1)

where θ is the angle between ∂h ∂u

T

and v.

To find the maximum rate of change of h moving in u direction (that is, to find the biggest h(u + αv) − h(u)), we need the right side of the Eq.1 to be the maximum. For cos θ, the maximum value it can take is 1, which happens when θ = 0

θ = 0 implies that the angle between ∂h ∂u

T

and v is 0, or in other words, that ∂h

∂u

T

points towards the same direction as v (they are parallel). As we defined that the rate of growth is in the direction of v, and that ∂h

∂u

T

is parallel to it, in consequence ∂h ∂u

T

also points to the maximum growth direction.

(2)

2. Solve the following minimization problem: minu∈<m(

1 2u

T_{Qu − b}T_u)

subject to the constraint that

Au = c

where Q = QT _{0 Moreover, A is a p × m matrix (p ≤ m), with}

rank(A) = p Solution

We know that the following equation must be satisfied to find a minimum: ∂L ∂u + λ ∂h ∂u = 0 First, we calculate ∂L ∂u: L =1 2u T_{Qu − b}T_u ∂L ∂u = lim→0 L(u + v) − L(u) v Hallamos L(u + v) − L(u) first:

L(u + v) − L(u) = 1 2(u + v) T_{Q(u + v) − b}T_{(u + v) −} 1 2u T_{Qu + b}T_u L(u+v)−L(u) =1 2u T_Qu+1 2u T_Qv+1 2v T_Qu+1 2v T_Qv−bT_u−bT_v−1 2u T_Qu+bT_u L(u + v) − L(u) = 1 2u T_{Qv +}1 2v T_{Qu +}1 2v T_{Qv − b}T_v

We cancel the 3rd _{term of the right side, because it has a factor =}2_,

which goes to zero. Grouping properly: L(u + v) − L(u) = (1

2u

T_{Q +}1

2u

T_QT_{− b}T_)v

Replacing in the original equation:

L(u + v) − L(u) v = (1 2u T_{Q +}1 2u T_QT _{− b}T_)v v ∂L ∂u = L(u + v) − L(u) v = ( 1 2u T_{Q +}1 2u T_QT _{− b}T₎

As stated in the initial conditions: Q = QT _{so we finally have:}

∂L ∂u = ( 1 2u T_{Q +}1 2u T_{Q − b}T_{) = (u}T_{Q − b}T₎ ₍₂₎

(3)

Second, to find ∂h

∂u, we analyze h:

h : Au − C = 0 Multiplying to both sides by CT

h : CTAu − CTC = 0 So we have:

∂h ∂u = C

T_A ₍₃₎

Third, using 2 and 3 into the Lagrangian: ∂L

∂u + λ ∂h ∂u = 0 uTQ − bT + λCTA = 0 We find the value of u∗_{(λ) from above:}

u∗ = Q−1(b − λ∗ATC) (4) Now we plug this u∗into h, to find λ∗:

h : Au = c Replacing 4 into the equation shown:

AQ−1(b − λ∗ATC) = C Operating, we find λ∗:

λ∗=k C

T_(AAT₎−1_AQ(AT_A)−1_AT_(AQ−1_{b − C) k}

k CT_{C k} (5)

So, our minimizer u∗ would be: u∗ = Q−1(b − λ∗AT_{C), where λ}∗ _{is the}

value obtained in Equation 5

Note: To make sure that u∗ is a minimizer, we can analize ∂

2_L

∂u2, which

is Q, that is positive definite from the premise of the problem. Hence, the value u∗ is safely considered a minimum.

3. • Let F (M ) be the matrix function

F (M ) = MTM MT

where M is an n × m matrix. What is the directional derivative of F?

Solution

The directional derivative is defined by: ∂f (x, y) = lim

→0

f (x + y) − f (x)

(4)

For the function F in this problem, we find F (M + δN ) F (M + δN ) = (M + δN )T(M + δN )(M + δN )T

F (M + δN ) = MTM MT+ δMTN MT+ δNTM MT+ δ2NTN MT+ δMTM NT + δ2MTN NT+ δ2NTM NT + δ3NTN NT We find F (M + δN ) − F (M ). We do also eliminate the factors which have δn F (M + δN ) − F (M ) = δMTN MT+ δNTM MT+ δMTM NT F (M + δN ) − F (M ) δ = M T_{N M}T _{+ N}T_{M M}T _{+ M}T_{M N}T ∂F (M, N ) = MTN MT + NTM MT + MTM NT (6) The directional derivative of F is Equation 6

• If f is a continuously differentiable function, f : <m_{→ <. Show or}

disprove that the directional derivative is given by ∂f (x; y) = ∂f (x)

∂x y

Solution The directional derivative is defined by: ∂f (x, y) = lim

h→0

f (x + hy) − f (x)

h (7)

As f is C1, we find the Taylor expansion (h ← 0): f (x + hy) = f (x) +∂f (x)

∂x hy + Ø((hy)

2₎

Getting rid of the term O(h2_y2₎

f (x + hy) − f (x) =∂f (x) ∂x hy f (x + hy) − f (x) h = ∂f (x) ∂x y (8)

Replacing the left side of 8 with 7: ∂f (x, y) = ∂f (x)

∂x y

so, the claim of this problem is true, if the initial conditions are met (function continuous and differentiable).

4. Consider the volume maximization problem: Construct a box (closed on all sides) of maximal volume with sides x,y,z, given an upper bound c on the area of the boundary of the box

(5)

Defining formally our problem for maximizing the volume of a cube with sides x,y,z :

max

x,y,zV (x, y, z) = maxx,y,z(xyz)

We can re-define the maximum like the opposite of the minimum of the negative of L, that is:

max

x,y,zV (x, y, z) = minx,y,z−L(x, y, z) = minx,y,z−(xyz)

such that the area of the boundary is less than c:

g(x, y, z) : xy + xz + yz < c → xy + xz + yz − c < 0

Applying the Kuhn-Tucker conditions to find a possible maximum (- min-imum) for this situation:

∂L ∂u + µ

∂g

∂u = 0 (9)

(where µ > 0). We know calculate each factor in the KT condition: ∂L

∂u = (−yz, −xz, −xy) ∂g

∂u = (y + z, x + z, x + y) Applying both 9 in we get the following equations:

−yz + 2µy + 2µz = 0 −xz + 2µx + 2µz = 0 −xy + 2µx + 2µy = 0

From these equations we can find x,y,z in function of λ: x = y = z = 4λ

Applying this in the inequality condition g:

g(x, y, z) : 2xy + 2xz + 2yz ≤ c 96µ2≤ c µ ≤ √ c 4√6 We define µ∗= √ c

4√6 ,so for this we would have a possible minimum of: x∗= y∗= z∗=

√ c √

6 Checking the last condition:

(6)

µ(2c 6 + 2 c 62 c 6− c) = 0 µ(c − c) = 0

we see that it is true. Hence, this is a probable minimum for L, which is the negative of the volume. Hence the maximum for the original problem is: x∗= y∗= z∗= √ c √ 6 5. Given an unconstrained optimization problem

minuL(u)

the ”normal” FONC for optimality is ∂L ∂u = 0

But what if we only have directional derivatives instead of ”normal” deriva-tives. What is FONC in this case?

Solution

In some problems, it happens that we do not have ”normal” derivatives. This due to the fact that not all directions of movement are feasible, mean-ing that if gomean-ing in that direction, we are gomean-ing to get out of the area where our problem is defined. An example of this is for the points that are lo-cated in the boundaries. Any direction that points outside the area is infeasible, whereas any direction pointing inside is feasible.

For problems where we have only directional derivatives, we have to eval-uate them like:

∂f

∂d(x) = lim α → 0

f (x + αd) − f (x) α

where d is a feasible direction and α > 0 Back to the problem:

We plan to calculate the Taylor expansion of: L(u∗+ αd)

where we consider that u∗is a minimum with respect to cost L. To make things simpler, we define the function:

u(α) = u∗+ αd Note that u(0) = u∗. Let define another function:

ψ(α) = L(u(α))

Note that ψ(0) = L(u(0)) = L(u∗), or the minimum cost we are consider-ing.

(7)

Applying Taylor expansion to find L(u∗+ αd) = ψ(α): ψ(α) = ψ(0) +∂ψ(0)

∂α α + O(α) We can ignore the last element O(α):

ψ(α) = ψ(0) + dT∇L(u(0))α ψ(α) = ψ(0) + dT∇L(u∗)α Putting everything back to L and u∗:

L(u∗+ αd) = L(u∗) + dT∇L(u∗_)α ₍₁₀₎

Now, if L(u∗) is the minimum cost, then L(u∗+ αd) must be bigger than L(u∗):

L(u∗+ αd) ≥ L(u∗) Replacing the left side with Eq.10

L(u∗) + dT∇L(u∗)α > L(u∗) dT∇L(u∗)α > 0

We are considering that α > 0 (from the definition of directional deriva-tives). So, we can eliminate it safely:

dT∇L(u∗_{)α > 0} ₍₁₁₎

is a condition for L(u∗) to be a minimum, considering directional deriva-tives. So, the FONC for this case is given by Eq. 11, namely:

dT∇L(u∗) ≥ 0 6. Let L ∈ C1 be convex, i.e.,

L(αu1+ (1 − α)u2) ≤ αL(u1) + (1 − α)L(u2), ∀u1, u2∈ <m, α ∈ [0, 1]

Assume that

∂L ∂u(u

∗_{) = 0}

Show that u∗ is a global minimum to L Solution

Of the definition for a convex function:

L(αu1+ (1 − α)u2) ≤ αL(u1) + (1 − α)L(u2), ∀u1, u2∈ <m, α ∈ [0, 1]

We reorder the equation:

L(αu1+ (1 − α)u2) ≤ α(L(u1) − L(u2)) + L(u2)

(8)

We substract L(u2) from both sides:

L(u2+ α(u1− u2)) − L(u2) ≤ α(L(u1) − L(u2))

L(u2+ α(u1− u2)) − L(u2)

α ≤ (L(u1) − L(u2)) We multiply and divide the left side of the inequality by (u1− u2)

(u1− u2)

L(u2+ α(u1− u2)) − L(u2)

α(u1− u2)

≤ (L(u1) − L(u2))

If we consider α → 0, we would have that the second factor in the left side in the form of a derivative:

(u1− u2)

∂L(u2)

∂u ≤ L(u1) − L(u2) (12) Now, from the initial affirmation, we have that a u∗ such that:

∂L ∂u(u

∗_{) = 0}

Let u2= u∗, then replacing it in Eq. 12

(u1− u∗)

∂L(u∗)

∂u ≤ L(u1) − L(u

∗₎

(u1− u∗) × 0 ≤ L(u1) − L(u∗)

0 ≤ L(u1) − L(u∗)

L(u∗) ≤ L(u1)

We change u1 by u and we finally have:

L(u∗) ≤ L(u) (13)

From the expression obtained in Eq.13, we easily see that L(u) is always greater or equal than L(u∗_{); hence, it is a global minimum over the}