The Derivative
4.7 Extreme Values
Establish the same relations as a moment ago,
XY − Y X = H, HX− XH = 2X, HY − Y H = −2Y.
The three matrices generate a small instance of a Lie algebra, and this exercise shows that the space of smooth functions on Rncan be made a representation of the Lie algebra. Further show, partly by citing the work at the end of the section, that the action of any orthogonal matrix A on smooth functions commutes with the representation,
X(f◦ TA) = (Xf )◦ TA, Y (f◦ TA) = (Y f )◦ TA, H(f◦ TA) = (Hf )◦ TA.
4.7 Extreme Values
In one variable calculus the derivative is used to find maximum and minimum values (extrema) of differentiable functions. Recall the following useful facts.
• (Extreme Value Theorem.) If f : [α, β] −→ R is continuous then it assumes a maximum and a minimum on the interval [α, β].
• (Critical Point Theorem.) Suppose that f : [α, β] −→ R is differentiable on (α, β) and that f assumes a maximum or minimum at an interior point a of [α, β]. Then f′(a) = 0.
• (Second Derivative Test.) Suppose that f : [α, β] −→ R is C2on (α, β) and that f′(a) = 0 at an interior point a of [α, β]. If f′′(a) > 0 then f (a) is a local minimum of f , and if f′′(a) < 0 then f (a) is a local maximum.
Geometrically the idea is that just as the affine function A(a + h) = f (a) + f′(a)h
specifies the tangent line to the graph of f at (a, f (a)), the quadratic function P (a + h) = f (a) + f′(a)h +1
2f′′(a)h2
determines the best fitting parabola. When f′(a) = 0 the tangent line is horizontal and the sign of f′′(a) specifies whether the parabola opens upward or downward. When f′(a) = 0 and f′′(a) = 0, the parabola degenerates to the horizontal tangent line, and the second derivative provides no information.
(See figure 4.8.)
This section generalizes these facts to functions f of n variables. The Ex-treme Value Theorem has already generalized as Theorem 2.4.15: a continuous function f on a compact subset of Rn takes maximum and minimum values.
The Critical Point Theorem also generalizes easily to say that each extreme value of the function f : A−→ R that occurs at a point where f is differen-tiable occurs at critical point of f , meaning a point a where Dfa is the zero function.
Figure 4.8. Approximating parabolas
Theorem 4.7.1 (Multivariable Critical Point Theorem). Suppose that the function f : A−→ R (where A ⊂ Rn) takes an extreme value at the point a of A, and suppose that f is differentiable at a. Then all partial derivatives of f at a are zero.
Proof. For each j ∈ {1, · · · , n}, the value f(a) is an extreme value for the one-variable function ϕ from definition 4.5.1 of the partial derivative Djf (a).
By the one-variable Critical Point Theorem, ϕ′(aj) = 0. That is, Djf (a) = 0.
⊓
⊔ The generalization of the second derivative test is more elaborate. From now on, all functions are assumed to be of type C2 on the interiors of their domains, meaning that all their second order partial derivatives exist and are continuous.
Definition 4.7.2 (Second Derivative Matrix). Let f : A −→ R (where A ⊂ Rn) be a function and let a be an interior point of A. The second derivative matrix of f at a is the n-by-n matrix whose (i, j)th entry is the second order partial derivative Dijf (a). Thus
f′′(a) =
D11f (a)· · · D1nf (a) ... . .. ... Dn1f (a)· · · Dnnf (a)
.
By the equality of mixed partial derivatives, the second derivative matrix is symmetric, i.e., f′′(a)T= f′′(a). Beware of confusing the second derivative matrix and the Jacobian matrix: the second derivative matrix is a square matrix defined only for scalar valued functions and its entries are second order partial derivatives, while for scalar valued functions the Jacobian matrix is the row vector of first partial derivatives.
Irksomely, f′′ is not (f′)′. The problem is that the seemingly-reasonable map f′ : A −→ Rn taking each point a ∈ A to f′(a) does not fit correctly into our conventions: each f′(a) is specifically a row vector, but we view the elements of Rneither as ordered lists with no shape at all or as column vectors.
Thus (f′)′ does not even exist. The correction is that the map f′T: A−→ Rn taking each point a∈ A to f′(a)Tdoes fit our scheme, and indeed
4.7 Extreme Values 179
(f′T)′(a) = f′′(a) for interior points a of A.
The pettiness of this issue makes clear that eventually we should loosen our conventions and freely transpose vectors and matrices as called for by context, as indeed many software packages do. But since the conventions can be helpful for a student who is seeing the material for the first time, we retain them for now and grudgingly accept the transpose here.
As an example, if Any n-by-n matrix M determines a quadratic function
QM : Rn −→ R, QM(h) = hTM h.
Here h is viewed as a column vector. If M has entries mijand h = (h1,· · · , hn) then the rules of matrix multiplication show that
QM(h) =
The function QM is homogeneous of degree 2, meaning that each of its terms has degree 2 in the entries of h and therefore QM(th) = t2QM(h) for all t∈ R and h∈ Rn.
When M is the second derivative matrix of a function f at a point a, the corresponding quadratic function is denoted Qfa rather than Qf′′(a). Just as f (a) + Dfa(h) gives the best affine approximation of f (a + h) for small h, f (a) + Dfa(h) +12Qfa(h) gives the best quadratic approximation.
In the example f (x, y) = sin2x + x2y + y2, the second derivative matrix at a point (a, b) defines the quadratic function
Qf(a,b)(h, k) = and so the best quadratic approximation of f near, for instance, the point (π/2, 1) is
f (π/2 + h, 1 + k)≈ f(π/2, 1) + Df(π/2,1)(h, k) + 1
2Qf(π/2,1)(h, k)
= π2/4 + 2 + πh + (π2/4 + 2)k + πhk + k2.
Suppose that f : A−→ R (where A ⊂ R2) has a critical point at (a, b), i.e., f′(a, b) = [0 0]. Working in local coordinates, we will approximate f by a quadratic function on R2 having a critical point at (0, 0). The graphs of nine such quadratic functions are shown in figure 4.9. If the best quadratic approximation of f at (a, b) is a bowl then f should have a minimum at (a, b).
Similarly for an inverted bowl and a maximum. If the best quadratic ap-proximation is a saddle then there should be points (x, y) near (a, b) where f (x, y) > f (a, b) and points (x′, y′) near (a, b) where f (x′, y′) < f (a, b). In this case (a, b) is called for obvious reasons a saddle point of f .
Figure 4.9. Two bowls, two saddles, four halfpipes, and a plane
Returning to the example f (x, y) = sin2x + x2y + y2, note that (0, 0) is a critical point of f since f′(0, 0) = [0 0]. The second derivative matrix f′′(0, 0) is [2 00 2], and so the quadratic function 12Qf(0,0)is given by
1
2Qf(0,0)(h, k) = 1 2
h k 2 0 0 2
h k
= h2+ k2.
Thus the graph of f looks like a bowl near (0, 0) and f (0, 0) should be a local minimum.
4.7 Extreme Values 181 This discussion is not yet rigorous. Justifying the ideas and proving the appropriate theorems will occupy the rest of this section. The first task is to study quadratic approximation ofC2-functions.
Proposition 4.7.3 (Special Case of Taylor’s Theorem). Let I be an open interval in R containing [0, 1]. Let ϕ : I−→ R be a C2-function. Then
ϕ(1) = ϕ(0) + ϕ′(0) +1
2ϕ′′(c) for some c∈ [0, 1].
The proposition follows from the general Taylor’s Theorem in section 1.3 since the first degree Taylor polynomial of ϕ at 0 is T1(t) = ϕ(0) + ϕ′(0)t, so that in particular, T1(1) = ϕ(0) + ϕ′(0).
Theorem 4.7.4 (Quadratic Taylor Approximation). Let f : A −→ R (where A ⊂ Rn) be a C2-function on the interior points of A. Let a be an interior point of A. Then for all small enough h∈ Rn,
f (a + h) = f (a) + Dfa(h) +1
2Qfa+ch(h) for some c∈ [0, 1], or, in matrices, viewing h as a column vector,
f (a + h) = f (a) + f′(a)h + 1
2hTf′′(a + ch)h for some c∈ [0, 1].
Proof. Let I = (−ε, 1 + ε) be a small superinterval of [0, 1] in R. Define γ : I−→ A, γ(t) = a + th.
Thus γ(0) = a, γ(1) = a + h, and γ′(t) = h for all t∈ I. Further define ϕ = f◦ γ : I −→ R.
That is, ϕ(t) = f (a + th) is the restriction of f to the line segment from a to a + h. By the Chain Rule and the fact that γ′= h,
ϕ′(t) = (f◦ γ)′(t) = f′(γ(t))h = Dfa+th(h).
The previous display rephrases as ϕ′(t) =hf′T(γ(t)), hi, so the Chain Rule, the irksome formula (f′T)′= f′′, and the symmetry of f′′give
ϕ′′(t) =hf′′(γ(t))h, hi = hTf′′(a + th)h = Qfa+th(h).
Since f (a + h) = ϕ(1), the special case of Taylor’s Theorem says that for some c∈ [0, 1],
f (a + h) = ϕ(0) + ϕ′(0) +1
2ϕ′′(c) = f (a) + Dfa(h) + 1
2Qfa+ch(h),
giving the result. ⊓⊔
Thus, to study f near a critical point a∈ Rnwhere Dfais zero, we need to look at the sign of Qfa+ch(h) for small vectors h. The next order of business is therefore to discuss the values taken by a homogeneous quadratic function.
Definition 4.7.5 (Positive Definite, Negative Definite, Indefinite Ma-trix). The symmetric square n-by-n matrix M is called
• positive definite if QM(h) > 0 for every nonzero h∈ Rn,
• negative definite if QM(h) < 0 for every nonzero h∈ Rn,
• indefinite if QM(h) is positive for some h and negative for others.
The identity matrix I is positive definite since hTIh = |h|2 for all h.
The matrix1 0
0−1
is indefinite. The general question of whether a symmetric n-by-n matrix is positive definite leads to an excursion into linear algebra too lengthy for this course. (See exercise 4.7.10 for the result without proof.) However, in the special case of n = 2, basic methods give the answer. Recall that the quadratic polynomial αh2+2βh+δ takes positive and negative values if and only if it has distinct real roots, i.e., αδ− β2< 0.
Proposition 4.7.6 (Two-by-two Definiteness Test). Consider a matrix M =
α β β δ
∈ M2(R). Then
(1) M is positive definite if and only if α > 0 and αδ− β2> 0.
(2) M is negative definite if and only if α < 0 and αδ− β2> 0.
(3) M is indefinite if and only if αδ− β2< 0.
Proof. Since QM(t(h, k)) = t2QM(h, k) for all real t, scaling the input vector (h, k) by nonzero real numbers doesn’t affect the sign of the output. The second entry k can therefore be scaled to 0 or 1, and if k = 0 then the first entry h can be scaled to 1. Therefore, to show (1), reason that
M is positive definite ⇐⇒ QM(1, 0) > 0 and QM(h, 1) > 0 for all h∈ R
⇐⇒ α > 0 and αh2+ 2βh + δ > 0 for all h∈ R
⇐⇒ α > 0 and αδ − β2> 0.
(2) is similar. As for (3),
M is indefinite ⇐⇒ αh2+ 2βh + δ takes positive and negative values
⇐⇒ αδ − β2< 0.
⊓
⊔ The proposition provides no information if αδ− β2 = 0. Geometrically, the proposition gives conditions on M to determine that the graph of QM is a bowl, an inverted bowl, or a saddle. The condition αδ− β2= 0 indicates a degenerate graph: a halfpipe, an inverted halfpipe, or a plane.
4.7 Extreme Values 183 The positive definite, negative definite, or indefinite character of a matrix is preserved if the matrix entries vary by small enough amounts. Again we restrict our discussion to the 2-by-2 case. Here the result is plausible geomet-rically, since it says that if the matrix M (a, b) defines a function whose graph is (for example) a bowl, then matrices close to M (a, b) should define functions with similar graphs, which thus should still be bowl-shaped. The same persis-tence holds for a saddle, but a halfpipe can deform immediately into either a bowl or a saddle, and so can a plane.
Proposition 4.7.7 (Persistence of Definiteness). Let A be a subset of R2, and let the matrix-valued mapping
M : A−→ M2(R), M (x, y) =
α(x, y) β(x, y) β(x, y) δ(x, y)
be continuous. Let (a, b) be an interior point of A. Suppose that the matrix M (a, b) is positive definite. Then for all (x, y) in some ε-ball about (a, b), the matrix M (x, y) is also positive definite. Similar statements hold for negative definite and indefinite matrices.
Proof. By the Persistence of Inequality principle (Proposition 2.3.10), the criteria α > 0 and αδ− β2> 0 remain valid if x and y vary by a small enough amount. The other statements follow similarly. ⊓⊔ When a function f has continuous second order partial derivatives, the entries of the second derivative matrix f′′(a) vary continuously with a. The upshot of the last proposition is therefore that we may replace the nebulous notion of Qfa+ch for some c with the explicit function Qfa.
Proposition 4.7.8 (Two-variable Max/min Test). Let f : A −→ R (where A ⊂ R2) be C2 on its interior points. Let (a, b) be an interior point of A, and suppose that f′(a, b) = [0 0]. Let f′′(a, b) =h
α β β δ
i. Then
(1) If α > 0 and αδ− β2> 0 then f (a, b) is a local minimum.
(2) If α < 0 and αδ− β2> 0 then f (a, b) is a local maximum.
(3) If αδ− β2< 0 then f (a, b) is a saddle point.
Proof. This follows from Theorem 4.7.4, Proposition 4.7.6 and
Proposi-tion 4.7.7. ⊓⊔
Again, the test gives no information if αδ− β2= 0.
Returning once again to the example f (x, y) = sin2x + x2y + y2 with its critical point (0, 0) and second derivative matrix f′′(0, 0) = [2 00 2], the max/min test shows that f has a local minimum at (0, 0).
Another example is to find the extrema of the function f (x, y) = xy(x + y− 3)
on the triangle
T ={(x, y) ∈ R2: x≥ 0, y ≥ 0, x + y ≤ 3}.
To solve this, first note that T is compact. Therefore f is guaranteed to take a maximum and a minimum value on T . These are assumed either at interior points of T or along the edge. Examining the signs of x, y and x + y− 3 shows that f is zero at all points on the edge of T and negative on the interior of T . Thus f assumes its maximum value—zero—along the boundary of T and must assume its minimum somewhere inside. (See figure 4.10.) To find the extrema of f inside T , we first find the critical points. The partial derivatives of f (now viewed as a function only on the interior of T ) are
fx(x, y) = y(2x + y− 3) fy(x, y) = x(x + 2y− 3),
and since x and y are nonzero on the interior of G, these are both zero only at the unique solution (x, y) = (1, 1) of the simultaneous equations 2x + y = 3, x + 2y = 3. Therefore f (1, 1) =−1 must be the minimum value of f. A quick calculation shows that f′′(1, 1) = [2 11 2], and the max/min test confirms the minimum at (1, 1).
Figure 4.10.Zero on the boundary, negative on the interior
Another example is to find the extreme values of the function f : R2−→ R, f(x, y) = 1
2x2+ xy− 2x −1 2y2.
Since R2is not compact, there is no guarantee that f has any extrema. In fact, for large x, f (x, 0) gets arbitrarily large, and for large y, f (0, y) gets arbitrarily small (where small means negative, not epsilontic). So f has no global extrema.
Nonetheless there may be local ones. Every point of R2is interior, so it suffices to examine the critical points of f . The partial derivatives are
4.7 Extreme Values 185
fx(x, y) = x + y− 2 fy(x, y) = x− y,
and the only point where both of them vanish is (x, y) = (1, 1). The second derivative matrix is f′′(1, 1) =1 1
1−1
, so the critical point (1, 1) is a saddle point. The function f has no extrema, local or global.
Exercises
4.7.1. Compute the best quadratic approximation of f (x, y) = excos y at the point (0, 0), f (h, k)≈ f(0, 0) + Df(0,0)(h, k) +12Qf(0,0)(h, k).
4.7.2. Compute the best quadratic approximation of f (x, y) = ex+2y at the point (0, 0).
4.7.3. Explain, making whatever reasonable assumptions seem to be helpful, of why the n-dimensional conceptual analogue of figure 4.9 should have 3n pictures. How does this relate to figure 4.8?
4.7.4. Find the extreme values taken by f (x, y) = xy(4x2+ y2− 16) on the quarter ellipse
E ={(x, y) ∈ R2: x≥ 0, y ≥ 0, 4x2+ y2≤ 16}.
4.7.5. Find the local extrema of the function f (x, y) = x2+xy−4x+32y2−7y on R2.
4.7.6. Determine the nature of f (x, y) = 13x3+13y3+ (x−32)2− (y + 4)2 at each of its critical points. Are there global extrema?
4.7.7. Find the critical points. Are they maxima, minima, or saddle points?
(The max/min test will not always help.)
f (x, y) = x2y + xy2 g(x, y) = ex+y h(x, y) = x5y + xy5+ xy.
4.7.8. Discuss local and global extrema of f (x, y) = x−11 −y−11 on the open ball B((0, 0); 1) in R2.
4.7.9. The graph of the function m(x, y) = 6xy2− 2x3 − 3y4 is called a monkey saddle. Find the three critical points of m and classify each as a maximum, minimum or saddle. (The max/min test will work on two. Study m(x, 0) and m(0, y) to classify the third.) Explain the name monkey saddle—
computer graphing software may help.
4.7.10. Linear algebra readily addresses the question of whether an n-by-n matrix is positive definite, negative definite, or indefinite.
Definition 4.7.9 (Characteristic Polynomial). Let M be an n-by-n ma-trix. Its characteristic polynomial is
pM(λ) = det(M− λI).
The characteristic polynomial of M is a polynomial of degree n in the scalar variable λ.
While the roots of a polynomial with real coefficients are in general com-plex, the roots of the characteristic polynomial of a symmetric matrix in Mn(R) are guaranteed to be real. The characterization we want is
Theorem 4.7.10 (Description of Definite/Indefinite Matrices). Let M be a symmetric matrix in Mn(R). Then
(1) M is positive definite if and only if all the roots of pM(λ) are positive.
(2) M is negative definite if and only if all the roots of pM(λ) are negative.
(3) M is indefinite if and only if pM(λ) has positive roots and negative roots.
With this result one can extend the methods in the section to functions of more than two variables.
(a) Let M be the symmetric matrixhα β
β δ
i∈ M2(R). Show that
pM(λ) = λ2− (α + δ)λ + (αδ − β2).
(b) Show that Theorem 4.7.10 is equivalent to Proposition 4.7.6 when n = 2.
(c) Classify the 3-by-3 matrices
1−1 0
−1 2 0 0 0 1
0 1 0 1 0 1 0 1 0
.
A generalization of Proposition 4.7.7 also holds, since the roots of a poly-nomial vary continuously with the polypoly-nomial’s coefficients. The generalized proposition leads to
Proposition 4.7.11 (General Max/min Test). Let f : A −→ R (where A ⊂ Rn) be C2 on its interior points. Let a be an interior point of A, and suppose that f′(a) = 0n. Let the second derivative matrix f′′(a) have charac-teristic polynomial p(λ).
(1) If all roots of p(λ) are positive then f (a) is a local minimum.
(2) If all roots of p(λ) are negative then f (a) is a local maximum.
(3) If p(λ) has positive and negative roots then f (a) is a saddle point.