Math 346 Lecture #6 6.6 Taylor s Theorem

Full text

(1)

Math 346 Lecture #6 6.6 Taylor’s Theorem

Throughout let (X, k · k X ) and (Y, k · k Y ) be Banach spaces over the same field F, and U an open set in X.

6.5.1 Higher-Order Derivatives

Higher-order derivatives are defined inductively as explained below after some technical results.

Definition 6.6.1. For B 1 (X, Y ) = B(X, Y ), and k ∈ N with k ≥ 2, the Banach spaces B k (X, Y ) are defined inductively by

B k (X, Y ) = B(X, B k−1 (X, Y )).

Note. That B k (X, Y ) is a Banach space follows by Theorem 5.7.1, i.e., X is Banach space and B(X, Y ) is a Banach space, so B(X, B(X, Y )) is a Banach space in the induced norm, etc.

Note. An element of B 2 (X, Y ) is a linear transformation L : X → B(X, Y ) whose induced norm

kLk X, B(X,Y ) = sup  kL(h 1 )k X,Y

kh 1 k X : h 1 ∈ X \ {0}



is finite, where for each h 1 ∈ X the linear transformation L(h 1 ) : X → Y is bounded, i.e.,

kL(h 1 )k X,Y = sup  kL(h 1 )h 2 k Y kh 2 k X

: h 2 ∈ X \ {0}



< ∞.

We combine kL(h 1 )k X,Y ≤ kLk X, B(X,Y ) kh 1 k X and kL(h 1 )h 2 k Y ≤ kL(h 1 )k X,Y kh 2 k X to get kL(h 1 )h 2 k Y ≤ kLk X, B(X,Y ) kh 1 k X kh 2 k X ,

or when both kh 1 k X and kh 2 k X nonzero, that

kLk X, B(X,Y ) ≥ kL(h 1 )h 2 k Y kh 1 k X kh 2 k X .

The upper bound kLk X, B(X,Y ) on the ratios is the supremum because for  > 0 there exists a nonzero h 1 ∈ X such that

kL(h 1 )k X,Y

kh 1 k X > kLk X, B(X,Y ) −  2 and there exists a nonzero h 2 ∈ X such that

kL(h 1 )h 2 k Y kh 2 k X

> kL(h 1 )k X,Y − kh 1 k X 2 , so that

kL(h 1 )h 2 k Y

kh 1 k X kh 2 k X > kL(h 1 )k X,Y kh 1 k X − 

2 > kLk X, B(X,Y ) − .

(2)

Thus

kLk X, B(X,Y ) = sup  kL(h 1 )h 2 k Y kh 1 k X kh 2 k X

: h 1 , h 2 ∈ X \ {0}

 .

Note. Another interpretation of the elements L of B 2 (X, Y ) is as continuous multilinear transformations L : X × X → Y .

A transformation L : X × X → Y is multilinear if for fixed h 2 , the map h 1 → L(h 1 , h 2 ) from X to Y is linear, and for each fixed h 1 , the map h 2 → L(h 1 , h 2 ) from X to Y is linear.

Previously for L ∈ B 2 (X, Y ) = B(X, B(X, Y )) we wrote L(h 1 )h 2 , but since this L is linear in h 1 and linear in h 2 , it is multilinear, and we write L(h 1 , h 2 ) instead.

A multilinear transformation L : X × X → Y is continuous if its norm kLk = sup  kL(h 1 , h 2 )k Y

kh 1 k X kh 2 k X : h 1 , h 2 ∈ X \ {0}



is finite. [This is precisely the norm on L we got previous.]

Note. For Banach spaces X 1 , . . . , X n and Y the Banach space B(X 1 , . . . , X n ; Y ) consists of multilinear transformations L : X 1 × · · · × X n → Y whose norms

kLk = sup  kL(h 1 , . . . , h n )k Y

kh 1 k X

1

· · · kh n k X

n

: h i ∈ X i \ {0}



are finite. The norm kLk has the property that

kL(h 1 , . . . , h n )k Y ≤ kLk kh 1 k X

1

· · · kh n k X

n

for all h i ∈ X i . If X 1 = · · · = X n we write B k (X, Y ) instead of B(X, . . . , X; Y ).

Definition 6.6.2. Let f : U → Y be differentiable on U .

We say f is twice differentiable on U if Df : U → B(X, Y ) is differentiable on U, and write D 2 f = D(Df ) for the second derivative.

For each x ∈ U , the second derivative D 2 f (x), if it exists, belongs to B 2 (X, Y ), so that D 2 f (x) is a continuous multilinear transformation that acts on a pair of vectors (h 1 , h 2 ) ∈ X × X to produce a vector in Y .

Proceeding inductively for k ≥ 2, if the map D k−1 f : U → B k−1 (X, Y ) is differentiable on U , then we say that f is k-times differentiable on U and denote the k th derivative by D k f = D(D k−1 f ).

For each x ∈ U , the k th derivative D k f (x), if it exists, belongs to B k (X, Y ), so that D k f (x) is a continuous multilinear transformation that acts on k vectors (h 1 , . . . , h k ) ∈ X × · · · × X to produce a vector in Y .

If the k th derivative D k f is continuous on U , then we say that f is k-times continuously differentiable on U .

We denote the set of k-times continuously differentiable functions on U by C k (U, Y ).

This is a vector space of functions.

(3)

A function f : U → Y is called smooth if f ∈ C k (U, Y ) for all k ∈ N.

We denote the vector space of smooth functions from U to Y by C (U, Y ).

Example (slight variation of 6.3.3). For U open in R n , suppose f : U → R is differentiable on U , i.e., Df (x) ∈ B(R n , R) exists at each x ∈ U .

The Banach space B(R n , R) is the dual space of R n , which by the Riesz Representation Theorem is isomorphic to R n , i.e., for each L ∈ B(R n , R) there exists a unique vector u ∈ R n such that L(v) = hu, vi = u T v.

By writing the vector u as the row vector u T , we represent Df (x) as a row vector, which by Theorem 6.2.11, in the standard basis of R n , is

Df (x) = D 1 f (x) · · · D n f (x) 

where D i f (x) = Df (x)e i ∈ R, i = 1, . . . , n, are the partial derivatives.

Now suppose that f is twice differentiable on U , i.e., D 2 f (x) ∈ B 2 (R n , R) exists at each x ∈ U .

Since B 2 (R n , R) = B(R n , B(R n , R)) = B(R n , (R n ) ), the directional derivative of Df (x) in the direction u ∈ R n , i.e., D 2 f (x)(u) ∈ (R n ) , is a row vector.

We can still apply Theorem 6.2.11, but in transposed form, i.e., D 2 f (x)(u) = u T H T where H is the “Hessian” of f at x,

H = D D 1 f (x) · · · D n f (x)  T

=

D 1 D 1 f (x) · · · D n D 1 f (x) .. . . .. .. . D 1 D n f (x) · · · D n D n f (x)

 , so that

D 2 f (x)(u)(v) = u T H T v ∈ R for all u, v ∈ R n .

From D 2 f (x)(u)(v) = u T H T v = v T Hu (the transpose of a scalar is itself) we see that D 2 f (x) does indeed act multilinearly on a pair of vectors (u, v) ∈ R n × R n to produce a scalar, i.e., D 2 f (x)(u, v) = u T H T v ∈ R.

Definition 6.6.4. Let (X i , k·k X

i

), i = 1, 2 . . . , n, be a finite collection of Banach spaces.

Fix an open set U ⊂ X 1 × X 2 × · · · × X n , and an ordered list of k integers i 1 , i 2 , · · · , i k where i j ∈ {1, . . . , k} (not necessarily distinct).

The k th -order partial derivative of f ∈ C k (U, Y ) corresponding to i 1 , . . . , i k is the function D i

1

D i

2

· · · D i

k

f ∈ C(U, B(X 1 , X 2 , . . . , X k ; Y )).

[Recall from Definition 6.3.12, that the i th partial derivative of a function α : X 1 ×

· · · × X n → Y is the derivative of the function β : X i → Y defined by β(z) =

α(x 1 , . . . , x i−1 , z, x i+1 , . . . , x n ), i.e., the function obtained from α by fixing all of its inputs

except the i th variable.]

(4)

When X i = F for all i = 1, 2, . . . , n and Y = F, we often write D i

1

D i

2

· · · D i

k

f as the more familiar partial derivative

k f

∂x i

1

∂x i

2

· · · ∂x i

k

.

The Hessian of f : U → R, U open in R n , is nothing more than the matrix of second-order partial derivatives:

H =

2 f

∂x 1 ∂x 1

· · · ∂ 2 f

∂x n ∂x 1

.. . . .. .. .

2 f

∂x 1 ∂x n · · · ∂ 2 f

∂x n ∂x n

 .

You might remember that this square matrix of second-order partial derivatives is usually symmetric. This is true when the second derivative is continuous.

Proposition 6.6.5. If f ∈ C 2 (U, Y ) with Y finite dimensional, then for all x ∈ U and for all (u, v) ∈ X × X, there holds

D 2 f (x)(u, v) = D 2 f (x)(v, u).

When U is an open subset of X = X 1 × X 2 × · · · × X n for Banach spaces X i , and f ∈ C 2 (U, Y ) for finite dimensional Y , then for all x ∈ U and for all i, j ∈ {1, 2, . . . , n}, there holds

D i D j f (x) = D j D i f (x).

When U is a open subset of X = F n = F×F×· · ·×F, Y = F m , and f = (f 1 , f 2 , . . . , f m ) ∈ C 2 (U, Y ), then for all x ∈ U and for all i, j ∈ {1, 2, . . . , n} and all k ∈ {1, 2, . . . , m} there holds

2 f k

∂x i ∂x j = ∂ 2 f k

∂x j ∂x i .

Proof. The hypothesis of finite dimensionality of Y implies that we can assume WLOG that Y = F m and that f = (f 1 , . . . , f m ).

With C m and R 2m isomorphic as Banach spaces (with the standard norms) we can assume WLOG that F = R.

Then as f k : U → R it suffices to show the result for Y = R.

For a fixed x ∈ U and u, v ∈ X there exist t, s > 0 by the openness of U such that x + ξu + ηv ∈ U for all ξ, η ∈ [0, max{s, t}].

Define g : [0, t] → R by

g ξ (x) = f (x + ξu) − f (x) and S η,t (x) : [0, s] → R by

S η,t (x) = g t (x + ηv) − g t (x)

= f (x + tu + ηv) − f (x + ηv) − f (x + tu) + f (x).

(5)

Recognize that S 0,t (x) = 0 and that with x and t fixed, DS η,t (x) = Dg t (x + ηv)v.

The function S η,t is continuous on [0, s] and differentiable on (0, s), so by the Mean Value Theorem there exists σ s,t ∈ (0, s) such that

S s,t (x) = S s,t (x) − S 0,t (x) = Dg t (x + σ s,t v)(v)(s − 0) = Dg t (x + σ s,t v)(sv).

Since g t (x + ηv) = f (x + tu + ηv) − f (x + ηv) we have

Dg t (x + σ s,t v)(sv) = Df (x + tu + σ s,t v)(sv) − Df (x + σ s,t v)(sv).

The function

ξ → Df (x + ξu + σ s,t v)(sv) − Df (x + σ s,t v)(sv)

is zero when ξ = 0, is continuous on [0, t] and differentiable on (0, t), so by the Mean Value Theorem there exists τ s,t ∈ (0, t) such that

Df (x + tu + σ s,t v)(sv) − Df (x + σ s,t v)(sv) = D 2 f (x + τ s,t u + σ s,t v)(sv)(tu).

Thus

S s,t (x) = D 2 f (x + τ s,t u + σ s,t v)(sv, tu).

Switching the roles of tu and sv in the above argument gives the existence of τ s,t 0 and σ s,t 0 such that

S s,t (x) = D 2 f (x + σ 0 s,t v + τ s,t 0 u)(tu, sv).

[Needed that x + ξu + ηv ∈ U for all ξ, η ∈ [0, max{s, t}] here.] Equating the two expressions for the same quantity S s,t (x) gives

D 2 f (x + τ s,t u + σ s,t v)(sv, tu) = D 2 f (x + σ 0 s,t v + τ s,t 0 u)(tu, sv).

Since D 2 f (x) is multilinear, we can pull out the scalars s and t from the inputs sv and tu from both sides; they cancel, giving

D 2 f (x + τ s,t u + σ s,t v)(v, u) = D 2 f (x + σ 0 s,t v + τ s,t 0 u)(u, v).

As the scalars s, t → 0 the quantities σ s,t , σ s,t 0 , τ s,t , τ s,t 0 all go to zero.

The assumed continuity of D 2 f on U implies as s, t → 0 that D 2 f (x)(v, u) = D 2 f (x)(u, v).

In the case that X = X 1 × · · · × X n , we take u i ∈ X i and v j ∈ X j and form the vectors u = (0, . . . , u i , . . . , 0), v = (0, . . . , v j , . . . , 0)

where u i is in the i th slot and v j is in the j th slot, to get

D 2 f (x)(u, v) = D 2 f (x)(v, u)

which implies that D i D j f (x) = D j D i f (x).

(6)

Finally, the D i D j f (x) = D j D i f (x) implies the equality

2 f k

∂x i ∂x j = ∂ 2 f k

∂x j ∂x i

for all k = 1, . . . , m. 

Remark 6.6.6. For U ⊂ R n and f ∈ C 2 (U, R), Proposition 6.6.5 guarantees that the Hessian of f is a symmetric matrix.

6.6.2 Higher-Order Directional Derivatives

For U open in F n and f ∈ C k (U, F m ), the derivative D k f (x) is an element of B k (F n , F m ) for each x ∈ U .

This means that D k f (x) acts on k vectors from F n giving a vector in F m . When all of the inputs of D k f (x) are the same vector v, we call the output

D v k f (x) = D k f (x)(v, . . . , v) the k th directional derivative of f at x in the direction v.

We express the directional derivatives of f in terms of standard coordinates for F n and F m .

For k = 1 and

v =

n

X

i=1

v i e i ∈ F n , the vector

Df (x)v = D 1 f (x) · · · D n f (x) v =

n

X

j=1

D j f (x)v j ∈ F m is the first-order directional derivative D v f (x) of f at x in the direction v.

The second-order directional derivative of f at x in the direction v is D v 2 f (x) = D v

n

X

j=1

D j f (x)v j =

n

X

i=1 n

X

j=1

D i D j f (x)v i v j = v T H(x)v,

where H(x) is the Hessian of f at x.

Iterating k times gives the k th -order directional derivative of f at x in the direction v:

D k v f (x) =

n

X

i

1

,...,i

k

=1

D i

1

· · · D i

k

f (x)v i

1

· · · v i

k

.

Often we write D k f (x)v (k) for D k v f (x) where by v (k) we mean the k-component Cartesian product (v, . . . , v).

Proposition 6.6.5 and its application to higher-order derivatives shows that many of the

terms in the k th -order directional derivative are repeated.

(7)

Combining these repeated terms gives D v k f (x) = X

j

1

+···+j

n

=k

k!

j 1 ! · · · j n ! D j 1

1

· · · D j n

n

f (x)v 1 j

1

· · · v n j

n

where the quantities j 1 , . . . , j n are nonnegative integers summing to the integer k.

6.6.3 Taylor’s Theorem

We first recall Taylor’s Theorem and Lagrange’s Remainder Theorem for functions f : U → R for an open interval U ⊂ R.

Theorem 6.6.8. For an open interval U of R, if f : U → R is (k +1)-times differentiable on U , then for all a ∈ U and h ∈ R satisfying a + h ∈ U there exists c between a and a + h (h could be negative) such that

f (a + h) = f (a) + f 0 (a)h + f 00 (a)

2 h 2 + · · · + f (k) (a)

k! h k + f (k+1) (c) (k + 1)! h k+1 , where f (k) is the k th derivative of f .

We extend this to the Banach space setting where we make use of the higher-order directional derivatives D k f (x)v (k) .

Theorem 6.6.9. If f ∈ C k (U, Y ), then for all x ∈ U and h ∈ X such that the line segment `(x, x + h) = {(1 − t)x + th : t ∈ [0, 1]} lies in U , there holds

f (x + h) = f (x) + Df (x)h + D 2 f (x)h (2)

2! + · · · + D k−1 f (x)h (k−1) (k − 1)! + R k where the remainder R k is given by the integral form

R k (x, h) = Z 1

0

(1 − t) k−1

(k − 1)! D k f (x + th)h (k) dt.

The proof is by induction (see the text).

Remark 6.6.10. The (k − 1) th Taylor polynomial approximation of f ∈ C k (U, Y ) is given in Theorem 6.6.9 by ignoring the remainder term R k .

If f ∈ C (U, Y ) and the remainder R k goes to zero as k → ∞, then the Taylor series for f converges to f , and we say that f is analytic.

Corollary 6.6.14. If kD k f (x + th)k < M for all t ∈ [0, 1], then kR k k Y ≤ M

k! khk k X .

The proof of this is HW (Exercise 6.33. Hint: use a property of the norm for bounded multilinear transformations).

Example. Find the first-order Taylor polynomial approximation for

f (x, y) = sin(x + y)

(8)

at the origin.

We compute several quantities:

f (0, 0) = sin(0 + 0) = 0, D 1 f (0, 0) = cos(0 + 0) = 1, D 2 f (0, 0) = cos(0 + 0) = 1.

Thus

f (h 1 , h 2 ) = f (0, 0) + D 1 f (0, 0) D 2 f (0, 0)  h 1 h 2



= h 1 + h 2 .

This is the tangent plane of the graph of z = f (x, y) = sin(x + y) at the point (0, 0).

Figure

Updating...

References

Updating...

Related subjects :