The Derivative
4.3 One-Variable Revisionism; the Derivative Redefined
for any d > 0 there exists εd> 0 such that |ψ(k)| ≤ d|k| if |k| ≤ εd. Now let e > 0 be given. Define d = e/c and ρe= min{δ, εd/c}. Suppose that
|h| ≤ ρe. Then
|ϕ(h)| ≤ c|h| ≤ εd since|h| ≤ δ and |h| ≤ εd/c, and so
|ψ(ϕ(h))| ≤ d|ϕ(h)| ≤ cd|h| since |ϕ(h)| ≤ εd and|ϕ(h)| ≤ c|h|.
That is,
|ψ(ϕ(h))| ≤ e|h| since cd = e.
This shows that ψ◦ ϕ is o(h) since for any e > 0 there exists ρe> 0 such that
|(ψ ◦ φ)(h)| ≤ e|h| if |h| ≤ ρe.)
The other rules are proved similarly (exercise 4.2.5). ⊓⊔
Exercises
4.2.1. By analogy to Definition 4.2.1, give the appropriate definition of an O(1)-mapping. What is the geometric interpretation of the definition? Need anO(1)-mapping take 0 to 0?
4.2.2. Let e be a nonnegative real number. Consider the function ϕe: Rn−→ R, ϕ(x) = |x|e.
(a) Suppose that e > 0. Let c > 0 be given. If|h| ≤ c1/e then what do we know about|ϕe(h)| in comparison to c? What does this tell us about ϕe?
(b) Prove that ϕ1is O(h).
(c) Suppose that e > 1. Combine parts (a) and (b) with the product property for Landau functions (Proposition 4.2.6) to show that ϕeis o(h).
(d) Explain how parts (a), (b), and (c) have proved Proposition 4.2.2.
4.2.3. Complete the proof of Proposition 4.2.4.
4.2.4. Establish the componentwise nature of theO(h) condition, and estab-lish the componentwise nature of the o(h) condition.
4.2.5. Complete the proof of Proposition 4.2.7.
4.3 One-Variable Revisionism; the Derivative Redefined
The one-variable derivative as recalled at the beginning of the chapter, f′(a) = lim
h→0
f (a + h)− f(a)
h ,
is a construction. To rethink the derivative, we should characterize it instead.
To think clearly about what it means for the graph of a function to have tangent slope t at a point (a, f (a)), we should work in local coordinates and normalize to the case of a horizontal tangent. That is, given a function f of x-values near some point a, and given a candidate tangent-slope t at (a, f (a)), define a related function g of h-values near 0,
g(h) = f (a + h)− f(a) − th.
Thus g takes 0 to 0, and the graph of g near the origin is like the graph of f near (a, f (a)) but with the line of slope t subtracted. To reiterate, the idea that f has tangent slope t at (a, f (a)) has been normalized to the tidier idea that g has slope 0 at the origin. Here the idea is:
To say that the graph of g is horizontal at the origin is to say that for any positive real number c, however small, the region between the lines of slope±c contains the graph of g close enough to the origin.
That is:
The intuitive condition for the graph of g to be horizontal at the origin is precisely that g is o(h). The horizontal nature of the graph of g at the origin connotes that the graph of f has tangent slope t at (a, f (a)).
The symbolic connection between this characterization of the derivative and the constructive definition is immediate. As always, the definition of f having derivative f′(a) at a is
h→0lim
f (a + h)− f(a)
h = f′(a), which is to say,
hlim→0
f (a + h)− f(a) − f′(a)h
h = 0,
and indeed this is precisely the o(h) condition on g. Figure 4.2 illustrates the idea that when h is small, not only is the vertical distance f (a + h)− f(a) − f′(a)h from the tangent line to the curve small as well, but it is small even relative to the horizontal distance h.
We need to scale these ideas up to many dimensions. Instead of viewing the one-variable derivative as the scalar f′(a), think of it as the corresponding linear mapping Ta : R−→ R, multiplication by f′(a). That is, think of it as the mapping
Ta(h) = f′(a)h for all h∈ R.
Figure 4.3 incorporates this idea. The figure is similar to figure 4.2, but it shows the close approximation in the local coordinate system centered at the point of tangency, and in the local coordinate system the tangent line is indeed the graph of the linear mapping Ta. The shaded axis-portions in the figure
4.3 One-Variable Revisionism; the Derivative Redefined 145
x f (x)
f (a + h) f (a) + f′(a)h
f (a)
a + h a
Figure 4.2. Vertical distance from tangent line to curve
h h
Ta(h) Ta(h)
f (a + h)− f(a)
Figure 4.3.Vertical distance in local coordinates
are h horizontally and g(h) = f (a + h)− f(a) − f′(a)h vertically, and the fact that the vertical portion is so much smaller illustrates that g(h) is o(h).
We are nearly ready to rewrite the derivative definition pan-dimensionally.
The small remaining preliminary matter is to take into account the local nature of the characterizing condition: it depends on the behavior of f only on an ε-ball about a, but on the other hand, it does require an entire ε-ball.
Thus the following definition is appropriate for our purposes.
Definition 4.3.1 (Interior Point). Let A be a subset of Rn, and let a be a point of A. Then a is an interior point of A if some ε-ball about a is a subset of A. That is, a is an interior point of A if B(a, ε)⊂ A for some ε > 0.
Now we can define the derivative in a way that encompasses many variables and is suitably local.
Definition 4.3.2 (Derivative). Let A be a subset of Rn, let f : A−→ Rm be a mapping, and let a be an interior point of A. Then f is differentiable at a if there exists a linear mapping Ta: Rn−→ Rm satisfying the condition f (a + h)− f(a) − Ta(h) is o(h). (4.1) This Ta is called the derivative of f at a, written Dfa or (Df )a. When f is differentiable at a, the matrix of the linear mapping Dfa is written f′(a) and is called the Jacobian matrix of f at a.
Here are two points to note about Definition 4.3.2:
• Again, any assertion that a mapping is differentiable at a point has the connotation that the point is an interior point of the mapping’s domain.
That is, if f is differentiable at a then B(a, ε)⊂ A for some ε > 0. In the special case n = 1 we are disallowing the derivative at an endpoint of the domain.
• The domain of the linear mapping Ta is unrestricted even if f itself is defined only locally about a. Indeed, the definition of linearity requires that the linear mapping have all of Rnas its domain. Any linear mapping is so uniform that in any case its behavior on all of Rnis determined by its behavior on any ε-ball about 0n (exercise 4.3.1). In geometric terms, the graph of T , the tangent object approximating the graph of f at (a, f (a)), extends without bound, even if the graph of f itself is restricted to points near (a, f (a)). But the approximation of the graph by the tangent object needs to be close only near the point of tangency.
Returning to the idea of the derivative as a linear mapping, when n = 2 and m = 1 a function f : A −→ R is differentiable at an interior point (a, b) of A if for small scalar values h and k, f (a + h, b + k)− f(a, b) is well approximated by a linear function
T (h, k) = αh + βk
where α and β are scalars. Since the equation z = f (a, b) + αh + βk describes a plane in (x, y, z)-space (where h = x− a and k = y − b), f is differentiable at (a, b) if its graph has a well-fitting tangent plane through (a, b, f (a, b)).
(See figure 4.4.) Here the derivative of f at (a, b) is the linear mapping tak-ing (h, k) to αh + βk and the Jacobian matrix of f at a is therefore [α, β].
The tangent plane in the figure is not the graph of the derivative Df(a,b), but rather a translation of the graph. Another way to say this is that the (h, k, Df(a,b)(h, k))-coordinate system has its origin at the point (a, b, f (a, b)) in the figure.
When n = 1 and m = 3, a mapping f : A−→ R3 is differentiable at an interior point a of A if f (a + h)− f(a) is closely approximated for small real h by a linear mapping
4.3 One-Variable Revisionism; the Derivative Redefined 147
f (x, y) T (h, k)
(a, b) h
k
x y
Figure 4.4.Graph and tangent plane
T (h) =
α β γ
h
for some scalars α, β, and γ. As h varies through R, f (a) + T (h) traverses the line ℓ = ℓ(f (a), (α, β, γ)) in R3 that is tangent at f (a) to the output curve of f . (See figure 4.5.) Here Dfa(h) =hα
βγ
ih and the corresponding Jacobian
matrix ishα βγ
i. Note that the figure does not show the domain of f , so it may help to think of f as a time-dependent traversal of the curve rather than as the curve itself. The figure does not have room for the (h, Dfa(h))-coordinate system (which is 4-dimensional), but the Dfa(h)-coordinate system has its origin at the point f (a).
For an example, let A = B((0, 0), 1) be the unit disk in R2, and consider the function
f : A−→ R, f (x, y) = x2− y2.
We show that for any point (a, b) ∈ A, f is differentiable at (a, b) and its derivative is the linear mapping
T(a,b): R2−→ R, T(a,b)(h, k) = 2ah− 2bk.
To verify this, we need to check Definition 4.3.2. The point that is written in the definition intrinsically as a (where a is a vector) is written here in coordinates as (a, b) (where a and b are scalars), and similarly the vector h in the definition is written (h, k) here, because the definition is intrinsic whereas here we are going to compute. To check the definition, first note that every point (a, b) of A is an interior point; the fact that every point of A is interior
f (a) ℓ
Figure 4.5.Tangent to a parametrized curve
doesn’t deserve a detailed proof right now, only a quick comment. Second, confirm the derivative’s characterizing property (4.1) by calculating that
f (a + h, b + k)− f(a, b) − T(a,b)(h, k)
= (a + h)2− (b + k)2− a2+ b2− 2ah + 2bk
= h2− k2.
We saw immediately after the product property for Landau functions (Propo-sition 4.2.6) that h2−k2is o(h, k). This is the desired result. Also, the calcula-tion tacitly shows how the derivative was found for us to verify: the difference f (a + h, b + k)− f(a, b) is 2ah − 2bk + h2− k2, which as a function of h and k has a linear part 2ah− 2bk and a quadratic part h2− k2 that is much smaller when h and k are small. The linear approximation of the difference is the derivative.
Before continuing, we need to settle a grammatical issue. Definition 4.3.2 refers to any linear mapping that satisfies condition (4.1) as the derivative of f at a. Fortunately, the derivative, if it exists, is unique, justifying the definite article. The uniqueness is geometrically plausible: if two straight objects (e.g., lines or planes) approximate the graph of f well near (a, f (a)), then they should also approximate each other well enough that straightness forces them to coincide. The quantitative argument amounts to recalling that the only linear o(h)-mapping is zero.
Proposition 4.3.3 (Uniqueness of the Derivative). Let f : A −→ Rm (where A⊂ Rn) be differentiable at a. Then there is only one linear mapping satisfying the definition of Dfa.
Proof. Suppose that the linear mappings Ta, ˜Ta : Rn−→ Rmare both deriva-tives of f at a. Then the two mappings
f (a + h)− f(a) − Ta(h) and f (a + h)− f(a) − ˜Ta(h)
4.3 One-Variable Revisionism; the Derivative Redefined 149 are both o(h). By the vector space properties of o(h), so is their difference ( ˜Ta− Ta)(h). Since the linear mappings from Rn to Rm form a vector space as well, the difference ˜Ta− Ta is linear. But the only o(h) linear mapping is
the zero mapping, so ˜Ta = Ta as desired. ⊓⊔
Finally, another result is immediate in our setup.
Proposition 4.3.4. If f is differentiable at a then f is continuous at a.
Proof. Compute, using the differentiability of f at a and the fact that linear mappings areO(h), then the containment o(h) ⊂ O(h) and the closure of O(h) under addition, and finally the containmentO(h) ⊂ o(1), that
f (a+h)−f(a) = f(a+h)−f(a)−Ta(h)+Ta(h) = o(h)+O(h) = O(h) = o(1).
Since the o(1) condition describes continuity, the argument is complete. ⊓⊔ We will study the derivative via two routes. On the one hand, the linear mapping Dfa : Rn−→ Rmis specified by mn scalar entries of its matrix f′(a), and so calculating the derivative is tantamount to determining these scalars by using coordinates. On the other hand, developing conceptual theorems without getting lost in coefficients and indices requires the intrinsic idea of the derivative as a well-approximating linear mapping.
Exercises
4.3.1. Let T : Rn−→ Rmbe a linear mapping. Show that for any ε > 0, the behavior of T on B(0n, ε) determines the behavior of T everywhere.
4.3.2. Give a geometric interpretation of the derivative when n = m = 2.
Give a geometric interpretation of the derivative when n = 1 and m = 2.
4.3.3. Let f : A −→ Rm (where A ⊂ Rn) have component functions f1,· · · , fm, and let a be an interior point of A. Let T : Rn −→ Rm be a lin-ear mapping with component functions T1,· · · , Tm. Using the componentwise nature of the o(h) condition, established in section 4.2, show the componen-twise nature of differentiability: f is differentiable at a with derivative T if and only if each component fi is differentiable at a with derivative Ti. 4.3.4. Let f (x, y) = (x2−y2, 2xy). Show that Df(a,b)(h, k) = (2ah−2bk, 2bh+
2ak) for all (a, b)∈ R2. (By the previous problem, you may work componen-twise.)
4.3.5. Let g(x, y) = xey. Show that Dg(a,b)(h, k) = heb+ kaeb for all (a, b)∈ R2. (Note that because e0 = 1 and because the derivative of the exponential function at 0 is 1, the one-variable characterizing property says that ek− 1 = k + o(k).)
4.3.6. Show that if f : Rn −→ Rm satisfies|f(x)| ≤ |x|2 for all x∈ Rn then f is differentiable at 0n.
4.3.7. Show that the function f (x, y) = p
|xy| for all (x, y) ∈ R2 is not differentiable at (0, 0). (First see what Df(0,0)(h, 0) and Df(0,0)(0, k) need to be.)