## Math 346 Lecture #6 6.6 Taylor’s Theorem

### Throughout let (X, k · k X ) and (Y, k · k Y ) be Banach spaces over the same field F, and U an open set in X.

## 6.5.1 Higher-Order Derivatives

### Higher-order derivatives are defined inductively as explained below after some technical results.

## Definition 6.6.1. For B ^{1} (X, Y ) = B(X, Y ), and k ∈ N with k ≥ 2, the Banach spaces B ^{k} (X, Y ) are defined inductively by

### B ^{k} (X, Y ) = B(X, B ^{k−1} (X, Y )).

## Note. That B ^{k} (X, Y ) is a Banach space follows by Theorem 5.7.1, i.e., X is Banach space and B(X, Y ) is a Banach space, so B(X, B(X, Y )) is a Banach space in the induced norm, etc.

## Note. An element of B ^{2} (X, Y ) is a linear transformation L : X → B(X, Y ) whose induced norm

### kLk _{X,} _{B(X,Y )} = sup kL(h _{1} )k _{X,Y}

### kh _{1} k _{X} : h _{1} ∈ X \ {0}

### is finite, where for each h _{1} ∈ X the linear transformation L(h _{1} ) : X → Y is bounded, i.e.,

### kL(h _{1} )k _{X,Y} = sup kL(h _{1} )h _{2} k _{Y} kh 2 k X

### : h _{2} ∈ X \ {0}

### < ∞.

### We combine kL(h 1 )k X,Y ≤ kLk _{X,} _{B(X,Y )} kh 1 k X and kL(h 1 )h 2 k Y ≤ kL(h 1 )k X,Y kh 2 k X to get kL(h _{1} )h _{2} k _{Y} ≤ kLk _{X,} _{B(X,Y )} kh _{1} k _{X} kh _{2} k _{X} ,

### or when both kh _{1} k _{X} and kh _{2} k _{X} nonzero, that

### kLk X, B(X,Y ) ≥ kL(h _{1} )h _{2} k _{Y} kh _{1} k _{X} kh _{2} k _{X} .

### The upper bound kLk _{X,} _{B(X,Y )} on the ratios is the supremum because for > 0 there exists a nonzero h _{1} ∈ X such that

### kL(h 1 )k X,Y

### kh _{1} k _{X} > kLk _{X,} _{B(X,Y )} − 2 and there exists a nonzero h _{2} ∈ X such that

### kL(h _{1} )h _{2} k _{Y} kh 2 k X

### > kL(h _{1} )k _{X,Y} − kh _{1} k _{X} 2 , so that

### kL(h _{1} )h _{2} k _{Y}

### kh _{1} k _{X} kh _{2} k _{X} > kL(h _{1} )k _{X,Y} kh _{1} k _{X} −

### 2 > kLk _{X,} _{B(X,Y )} − .

### Thus

### kLk _{X,} _{B(X,Y )} = sup kL(h _{1} )h _{2} k _{Y} kh 1 k X kh 2 k X

### : h _{1} , h _{2} ∈ X \ {0}

### .

## Note. Another interpretation of the elements L of B ^{2} (X, Y ) is as continuous multilinear transformations L : X × X → Y .

### A transformation L : X × X → Y is multilinear if for fixed h _{2} , the map h _{1} → L(h _{1} , h _{2} ) from X to Y is linear, and for each fixed h _{1} , the map h _{2} → L(h _{1} , h _{2} ) from X to Y is linear.

### Previously for L ∈ B ^{2} (X, Y ) = B(X, B(X, Y )) we wrote L(h 1 )h _{2} , but since this L is linear in h _{1} and linear in h _{2} , it is multilinear, and we write L(h _{1} , h _{2} ) instead.

### A multilinear transformation L : X × X → Y is continuous if its norm kLk = sup kL(h _{1} , h _{2} )k _{Y}

### kh _{1} k _{X} kh _{2} k _{X} : h _{1} , h _{2} ∈ X \ {0}

### is finite. [This is precisely the norm on L we got previous.]

## Note. For Banach spaces X _{1} , . . . , X _{n} and Y the Banach space B(X 1 , . . . , X _{n} ; Y ) consists of multilinear transformations L : X 1 × · · · × X n → Y whose norms

### kLk = sup kL(h _{1} , . . . , h _{n} )k _{Y}

### kh _{1} k _{X}

_{1}

### · · · kh _{n} k _{X}

_{n}

### : h i ∈ X i \ {0}

### are finite. The norm kLk has the property that

### kL(h _{1} , . . . , h _{n} )k _{Y} ≤ kLk kh _{1} k _{X}

_{1}

### · · · kh _{n} k _{X}

_{n}

### for all h _{i} ∈ X _{i} . If X _{1} = · · · = X _{n} we write B ^{k} (X, Y ) instead of B(X, . . . , X; Y ).

## Definition 6.6.2. Let f : U → Y be differentiable on U .

### We say f is twice differentiable on U if Df : U → B(X, Y ) is differentiable on U, and write D ^{2} f = D(Df ) for the second derivative.

### For each x ∈ U , the second derivative D ^{2} f (x), if it exists, belongs to B ^{2} (X, Y ), so that D ^{2} f (x) is a continuous multilinear transformation that acts on a pair of vectors (h 1 , h 2 ) ∈ X × X to produce a vector in Y .

### Proceeding inductively for k ≥ 2, if the map D ^{k−1} f : U → B ^{k−1} (X, Y ) is differentiable on U , then we say that f is k-times differentiable on U and denote the k ^{th} derivative by D ^{k} f = D(D ^{k−1} f ).

### For each x ∈ U , the k ^{th} derivative D ^{k} f (x), if it exists, belongs to B ^{k} (X, Y ), so that D ^{k} f (x) is a continuous multilinear transformation that acts on k vectors (h _{1} , . . . , h _{k} ) ∈ X × · · · × X to produce a vector in Y .

### If the k ^{th} derivative D ^{k} f is continuous on U , then we say that f is k-times continuously differentiable on U .

### We denote the set of k-times continuously differentiable functions on U by C ^{k} (U, Y ).

### This is a vector space of functions.

### A function f : U → Y is called smooth if f ∈ C ^{k} (U, Y ) for all k ∈ N.

### We denote the vector space of smooth functions from U to Y by C ^{∞} (U, Y ).

## Example (slight variation of 6.3.3). For U open in R ^{n} , suppose f : U → R is differentiable on U , i.e., Df (x) ∈ B(R ^{n} , R) exists at each x ∈ U .

### The Banach space B(R ^{n} , R) is the dual space of R ^{n} , which by the Riesz Representation Theorem is isomorphic to R ^{n} , i.e., for each L ∈ B(R ^{n} , R) there exists a unique vector u ∈ R ^{n} such that L(v) = hu, vi = u ^{T} v.

### By writing the vector u as the row vector u ^{T} , we represent Df (x) as a row vector, which by Theorem 6.2.11, in the standard basis of R ^{n} , is

### Df (x) = D 1 f (x) · · · D _{n} f (x)

### where D _{i} f (x) = Df (x)e _{i} ∈ R, i = 1, . . . , n, are the partial derivatives.

### Now suppose that f is twice differentiable on U , i.e., D ^{2} f (x) ∈ B ^{2} (R ^{n} , R) exists at each x ∈ U .

### Since B ^{2} (R ^{n} , R) = B(R ^{n} , B(R ^{n} , R)) = B(R ^{n} , (R ^{n} ) ^{∗} ), the directional derivative of Df (x) in the direction u ∈ R ^{n} , i.e., D ^{2} f (x)(u) ∈ (R ^{n} ) ^{∗} , is a row vector.

### We can still apply Theorem 6.2.11, but in transposed form, i.e., D ^{2} f (x)(u) = u ^{T} H ^{T} where H is the “Hessian” of f at x,

### H = D D _{1} f (x) · · · D _{n} f (x) T

### =

###

###

###

### D _{1} D _{1} f (x) · · · D _{n} D _{1} f (x) .. . . .. .. . D _{1} D _{n} f (x) · · · D _{n} D _{n} f (x)

###

###

### , so that

### D ^{2} f (x)(u)(v) = u ^{T} H ^{T} v ∈ R for all u, v ∈ R ^{n} .

### From D ^{2} f (x)(u)(v) = u ^{T} H ^{T} v = v ^{T} Hu (the transpose of a scalar is itself) we see that D ^{2} f (x) does indeed act multilinearly on a pair of vectors (u, v) ∈ R ^{n} × R ^{n} to produce a scalar, i.e., D ^{2} f (x)(u, v) = u ^{T} H ^{T} v ∈ R.

## Definition 6.6.4. Let (X _{i} , k·k _{X}

_{i}

### ), i = 1, 2 . . . , n, be a finite collection of Banach spaces.

### Fix an open set U ⊂ X _{1} × X _{2} × · · · × X _{n} , and an ordered list of k integers i _{1} , i _{2} , · · · , i _{k} where i _{j} ∈ {1, . . . , k} (not necessarily distinct).

### The k ^{th} -order partial derivative of f ∈ C ^{k} (U, Y ) corresponding to i _{1} , . . . , i _{k} is the function D _{i}

_{1}

### D _{i}

_{2}

### · · · D _{i}

_{k}

### f ∈ C(U, B(X 1 , X _{2} , . . . , X _{k} ; Y )).

### [Recall from Definition 6.3.12, that the i ^{th} partial derivative of a function α : X _{1} ×

### · · · × X _{n} → Y is the derivative of the function β : X _{i} → Y defined by β(z) =

### α(x _{1} , . . . , x _{i−1} , z, x _{i+1} , . . . , x _{n} ), i.e., the function obtained from α by fixing all of its inputs

### except the i ^{th} variable.]

### When X _{i} = F for all i = 1, 2, . . . , n and Y = F, we often write D i

1### D _{i}

_{2}

### · · · D _{i}

_{k}

### f as the more familiar partial derivative

### ∂ ^{k} f

### ∂x i

1### ∂x i

2### · · · ∂x i

_{k}

### .

### The Hessian of f : U → R, U open in R ^{n} , is nothing more than the matrix of second-order partial derivatives:

### H =

###

###

###

###

###

###

### ∂ ^{2} f

### ∂x 1 ∂x 1

### · · · ∂ ^{2} f

### ∂x n ∂x 1

### .. . . .. .. .

### ∂ ^{2} f

### ∂x _{1} ∂x _{n} · · · ∂ ^{2} f

### ∂x _{n} ∂x _{n}

###

###

###

###

###

### .

### You might remember that this square matrix of second-order partial derivatives is usually symmetric. This is true when the second derivative is continuous.

## Proposition 6.6.5. If f ∈ C ^{2} (U, Y ) with Y finite dimensional, then for all x ∈ U and for all (u, v) ∈ X × X, there holds

### D ^{2} f (x)(u, v) = D ^{2} f (x)(v, u).

### When U is an open subset of X = X _{1} × X _{2} × · · · × X _{n} for Banach spaces X _{i} , and f ∈ C ^{2} (U, Y ) for finite dimensional Y , then for all x ∈ U and for all i, j ∈ {1, 2, . . . , n}, there holds

### D _{i} D _{j} f (x) = D _{j} D _{i} f (x).

### When U is a open subset of X = F ^{n} = F×F×· · ·×F, Y = F ^{m} , and f = (f _{1} , f _{2} , . . . , f _{m} ) ∈ C ^{2} (U, Y ), then for all x ∈ U and for all i, j ∈ {1, 2, . . . , n} and all k ∈ {1, 2, . . . , m} there holds

### ∂ ^{2} f _{k}

### ∂x _{i} ∂x _{j} = ∂ ^{2} f _{k}

### ∂x _{j} ∂x _{i} .

### Proof. The hypothesis of finite dimensionality of Y implies that we can assume WLOG that Y = F ^{m} and that f = (f 1 , . . . , f m ).

### With C ^{m} and R ^{2m} isomorphic as Banach spaces (with the standard norms) we can assume WLOG that F = R.

### Then as f _{k} : U → R it suffices to show the result for Y = R.

### For a fixed x ∈ U and u, v ∈ X there exist t, s > 0 by the openness of U such that x + ξu + ηv ∈ U for all ξ, η ∈ [0, max{s, t}].

### Define g : [0, t] → R by

### g _{ξ} (x) = f (x + ξu) − f (x) and S η,t (x) : [0, s] → R by

### S _{η,t} (x) = g _{t} (x + ηv) − g _{t} (x)

### = f (x + tu + ηv) − f (x + ηv) − f (x + tu) + f (x).

### Recognize that S _{0,t} (x) = 0 and that with x and t fixed, DS _{η,t} (x) = Dg _{t} (x + ηv)v.

### The function S _{η,t} is continuous on [0, s] and differentiable on (0, s), so by the Mean Value Theorem there exists σ _{s,t} ∈ (0, s) such that

### S _{s,t} (x) = S _{s,t} (x) − S _{0,t} (x) = Dg _{t} (x + σ _{s,t} v)(v)(s − 0) = Dg _{t} (x + σ _{s,t} v)(sv).

### Since g t (x + ηv) = f (x + tu + ηv) − f (x + ηv) we have

### Dg _{t} (x + σ _{s,t} v)(sv) = Df (x + tu + σ _{s,t} v)(sv) − Df (x + σ _{s,t} v)(sv).

### The function

### ξ → Df (x + ξu + σ _{s,t} v)(sv) − Df (x + σ _{s,t} v)(sv)

### is zero when ξ = 0, is continuous on [0, t] and differentiable on (0, t), so by the Mean Value Theorem there exists τ s,t ∈ (0, t) such that

### Df (x + tu + σ _{s,t} v)(sv) − Df (x + σ _{s,t} v)(sv) = D ^{2} f (x + τ _{s,t} u + σ _{s,t} v)(sv)(tu).

### Thus

### S _{s,t} (x) = D ^{2} f (x + τ _{s,t} u + σ _{s,t} v)(sv, tu).

### Switching the roles of tu and sv in the above argument gives the existence of τ _{s,t} ^{0} and σ _{s,t} ^{0} such that

### S _{s,t} (x) = D ^{2} f (x + σ ^{0} _{s,t} v + τ _{s,t} ^{0} u)(tu, sv).

### [Needed that x + ξu + ηv ∈ U for all ξ, η ∈ [0, max{s, t}] here.] Equating the two expressions for the same quantity S _{s,t} (x) gives

### D ^{2} f (x + τ s,t u + σ s,t v)(sv, tu) = D ^{2} f (x + σ ^{0} _{s,t} v + τ _{s,t} ^{0} u)(tu, sv).

### Since D ^{2} f (x) is multilinear, we can pull out the scalars s and t from the inputs sv and tu from both sides; they cancel, giving

### D ^{2} f (x + τ _{s,t} u + σ _{s,t} v)(v, u) = D ^{2} f (x + σ ^{0} _{s,t} v + τ _{s,t} ^{0} u)(u, v).

### As the scalars s, t → 0 the quantities σ _{s,t} , σ _{s,t} ^{0} , τ _{s,t} , τ _{s,t} ^{0} all go to zero.

### The assumed continuity of D ^{2} f on U implies as s, t → 0 that D ^{2} f (x)(v, u) = D ^{2} f (x)(u, v).

### In the case that X = X _{1} × · · · × X _{n} , we take u _{i} ∈ X _{i} and v _{j} ∈ X _{j} and form the vectors u = (0, . . . , u _{i} , . . . , 0), v = (0, . . . , v _{j} , . . . , 0)

### where u _{i} is in the i ^{th} slot and v _{j} is in the j ^{th} slot, to get

### D ^{2} f (x)(u, v) = D ^{2} f (x)(v, u)

### which implies that D _{i} D _{j} f (x) = D _{j} D _{i} f (x).

### Finally, the D _{i} D _{j} f (x) = D _{j} D _{i} f (x) implies the equality

### ∂ ^{2} f k

### ∂x _{i} ∂x _{j} = ∂ ^{2} f k

### ∂x _{j} ∂x _{i}

### for all k = 1, . . . , m.

## Remark 6.6.6. _{For U ⊂ R} ^{n} and f ∈ C ^{2} (U, R), Proposition 6.6.5 guarantees that the Hessian of f is a symmetric matrix.

## 6.6.2 Higher-Order Directional Derivatives

### For U open in F ^{n} and f ∈ C ^{k} (U, F ^{m} ), the derivative D ^{k} f (x) is an element of B ^{k} (F ^{n} , F ^{m} ) for each x ∈ U .

### This means that D ^{k} f (x) acts on k vectors from F ^{n} giving a vector in F ^{m} . When all of the inputs of D ^{k} f (x) are the same vector v, we call the output

### D _{v} ^{k} f (x) = D ^{k} f (x)(v, . . . , v) the k ^{th} directional derivative of f at x in the direction v.

### We express the directional derivatives of f in terms of standard coordinates for F ^{n} and F ^{m} .

### For k = 1 and

### v =

### n

### X

### i=1

### v _{i} e _{i} ∈ F ^{n} , the vector

### Df (x)v = D _{1} f (x) · · · D _{n} f (x) v =

### n

### X

### j=1

### D _{j} f (x)v _{j} ∈ F ^{m} is the first-order directional derivative D _{v} f (x) of f at x in the direction v.

### The second-order directional derivative of f at x in the direction v is D _{v} ^{2} f (x) = D _{v}

### n

### X

### j=1

### D _{j} f (x)v _{j} =

### n

### X

### i=1 n

### X

### j=1

### D _{i} D _{j} f (x)v _{i} v _{j} = v ^{T} H(x)v,

### where H(x) is the Hessian of f at x.

### Iterating k times gives the k ^{th} -order directional derivative of f at x in the direction v:

### D ^{k} _{v} f (x) =

### n

### X

### i

1### ,...,i

k### =1

### D _{i}

_{1}

### · · · D _{i}

_{k}

### f (x)v _{i}

_{1}

### · · · v _{i}

_{k}

### .

### Often we write D ^{k} f (x)v ^{(k)} for D ^{k} _{v} f (x) where by v ^{(k)} we mean the k-component Cartesian product (v, . . . , v).

### Proposition 6.6.5 and its application to higher-order derivatives shows that many of the

### terms in the k ^{th} -order directional derivative are repeated.

### Combining these repeated terms gives D _{v} ^{k} f (x) = X

### j

1### +···+j

n### =k

### k!

### j 1 ! · · · j n ! D ^{j} _{1}

^{1}

### · · · D ^{j} _{n}

^{n}

### f (x)v _{1} ^{j}

^{1}

### · · · v _{n} ^{j}

^{n}