For elementary problems in algebra, it is always vital to know whether to add or to multiply. For example, the total change due to two separate causes clearly ought usually to be the sum of the separate changes. On the other hand, one must multiply to get a composite rate of change: For example, given the exchange rate from pounds to dollars and the rate from dollars to francs, the product will be the exchange rate from pounds to francs.
These two simple observations of the practical meaning of addition and multiplication appear formally in the
chain rule
for differentiation in the calculus. This we have already used: Ifz
= g(y) andy
=h(x)
are two functions with continuous derivatives, then in the relevant rangez
=g(h(x»
is a function ofx
and has derivativez '(x)
= g'
(y)h '(x),
ordz
dx dz dy
dy dx ' ( I )
9. Partial Derivatives
1 69
The proof requires a little care with the limits entering in the definition of the derivatives involved; the limits involve the formulas for the corresponding finite increments ax = x - X I , for then
a
z
_�
ayax ay ax ·
Thus (I) expresses the underlying reason why one should multiply rates.
The corresponding chain rule is more striking for functions of several variables, such as a quantity
z
given as a functionz
= I(x,y) for all points (x,y) in some open set U of the cartesian (x,y)-plane. There is no problem in finding what derivatives might mean here; if one holds y fixed, the quantityz
remains just a function of x; its derivative, when it exists, is called thepartial derivative
with respect to x. Thus at a point (x,y) in U this derivative, forh =1= 0,
isaz
_ I' ( )- 1·
I(x+ h
,y ) - I(x,y )- a
x -x
x,y - 1m h �Oh . (2)
Holding x fixed, there is a similar partial derivative I 'y(x,y). This is the more explicit notation, specifying both variables and indicating which is
"held fixed" and which is variable. The notation
az / a
y for the same partial derivative is incomplete unless an indication is added to show
which
other variable is at hand and is held fixed. (The observation is important, say in thermodynamics, where several alternative pairs of independent variables may be to hand.)
For
z
= I(x,y) these two partial derivatives give the rate of change ofz
when the point (x,y) in the plane moves horizontally (in the x direction) or vertically (in the y direction). As such, they do not give complete infor
mation about all the possible rates of change of
z.
One would at least also want the rate at whichz
changes when the point (x,y) moves off in another direction-say in a direction making an angle(J
with the positive x axis. A linear change in this direction is then given, in terms of a parameter
t,
byx(
t
) = x+
(cos (J)t, y(t
) = y+
(sin(J)t .
The intuitive principle that the total change is the
sum
of the separate changes in x and in y then suggests that the derivative ofz
with respect to the parametert
in the direction(J
should be the linear combinationdz [ az ] [ az ] .
dt =
a
x cos(J + a
y SIn(J, (3)
usually called the
directional
derivative. When both the partial derivatives ofz
are continuous, this formula holds; it is a special case of the following1 70
V I . Concepts of Calculus chain rule. If the functionsx = g( t )
andy = h(t )
have continuous derivatives, giving valuesx,y
in the set U CR2
where the functionz = j(x,y)
also has two continuous first partial derivatives, thenz = j
(g
(t ),h( t »
is a function of (suitable values of)t
with the continuous derivativedz dt ax dt + � dy . ay dt
(4)This clearly includes the motivating case
(3).
A similar formula applies whenz
is given as a function of more than two variables or when these variablesx
andy
depend not on one but on several parameters.This chain rule (4) has several different aspects.
First, think of
dx = (dx/dt )dt
as an infinitesimal change inx,
caused by the (equally) infinitesimal changedt
int.
Then, multiplying (4) bydt
and cancelling gives
az az
dz = - dx + - dy . ax ay (5)
This expression is called the
total differential
ofz;
for given values ofx
and
y
it gives the total change inz
due to infinitesimal changesdx
anddy;
we will soon give a less "infinitesimal" interpretation.
Starting from a point
xo, yo
with finite changesx - Xo
andy - Yo ,
the formula
(5)
suggests a linear approximationz - Zo
to the change inz:
(z - zo ) = [£] ax
0(x - xo ) + [� ] ay
0 (y- yo ) . (6)
This suggests (and correctly) that there is a Taylor's formula and also a Taylor series, each valid for functions
z
of two variablesx
andy
which are sufficiently "smooth". Here and later we will use the termsmooth
for functions with enough continuous derivatives, in cases where we do not wish to specify in detail how many such derivatives are in fact needed.In the chain rule (4),
dz Idt
can be regarded as an "inner product" of two "vectors", as followsdz = £ dx + £ dy = [£, £ ] . [ dx , dY ] . (7)
dt ax dt ay dt ax ay dt dt
The first factor on the right is called the
gradient
ofz
=j
(x,y
); it is defined at each point(xo,Yo )
of the plane, and is written( \l /)0
= [ ax ay aj , aj ]
X = Xo. Y= Yo. (8)
9. Partial Derivatives
1 7 1
This vector "points" in the direction of the maximum rate of increase of the function
f,
and has that rate as its length. The functionf
determines one such vector at each point of the plane. At each point, all such vectors for allf
form a two-dimensional vector space, called thecotangent space
attached to the plane at that point.
The second vector in the product
(7)
depends on the functionsx
=g(t ), y
=h(t )
. They describe a continuouspath
passing through the pointxo, yo ;
such a path is also called aparametrized curve,
consisting of the points(g( t ),h( t»
each labelled with the corresponding value of theparameter t.
Such a curve is thetrajectory
of a moving point. At timeto ,
where
x
=xo ,
andy
=Yo ,
the velocity of this moving point is the second factor of(7)
[ ddx , dd� t t 1
1 = 10 =(g '(to ), h '(to » ·
(9) It is called thetangent vector
to the path at the point. All the tangent vectors to trajectories through the point
(xo,Yo )
form a two dimensional vector space, called the
tangent space To
to the plane at this point.Now the chain rule, in the form
(7),
givesdz I dt
as the product of the gradient vector ofz
by the tangent vector to the path. If one considers both vectors to lie in the same two-dimensional space, this product is just the inner product as described in §IV.9. However, it is preferable to consider the tangent and cotangent spaces as conceptually distinct. The
"product" in
(7)
is then a real-valued function of two vectors, one from each space. This function is linear in each vector when the other is held constant, so is said to bebilinear;
in Chapter VII we will indicate why it makes the cotangent space "dual" to the tangent space.The cotangent space may be constructed formally as follows: Take all smooth functions
f(x,y),
each defined in some neighborhood of(xo,Yo );
they form an (infinite dimensional) vector space under addition of values and multiplication by real constants. Call two such smooth functions
z
=f(x,y)
and w =k(x,y) cotangent
(or equivalent) at the point(xo,Yo )
when they have the same first partial derivatives there :
The equivalence classes of these functions under this relation then form the desired two-dimensional space, called the
cotangent
space at the point(xo,Yo ).
This construction works not just for the plane, but for other curved surfaces such as the sphere, when the coordinates(x,y)
in the plane are replaced by suitable coordinates, such as latitude and longitude for the sphere.1 72
V I . Concepts of Calculus On the other hand, every smooth functionf
has a gradient Vf
as in (8). In particular, the coordinates x and y are smooth functions, with gradients vx =
( 1 ,0)
and \ly =(0, 1 ).
Hence every gradient can be expressed at each point as a linear combination of these two gradients, in the formExcept for notation, this is j ust the definition
of the
total differential.
Thus the differential, born as an infinitesimal, may be defined to be the gradient Vf
-a vector in the cotangent space. This is why the cotangent space differs from the tangent space.The tangent vector at to to the path
g(t), h(t)
also determines the usual tangent line to that path, with the parametric equationsx - Xo =
g '(to )(t
-t
o ), y - yo =h '(to )(t
- to ) ,( 1 1 )
where Xo =
g(to )
and yo =h(to ).
The chain rule
(4)
also has a three-dimensional interpretation. The function z = f(x,y) represents a height z above (or below) the point (x,y) in the plane, and so may be pictured by a smooth surfaceS
at these heights above some portion of the plane. Thetangent plane
'IT to this surface at a point
p
= (xo,yo,zo = f(xo,yo » is by definition the plane (if there is one) containing all the tangent lines atp
to all the smooth curves onS
passing throughp.
Such a smooth trajectory is given by x =g(t),
y =h(t)
and z =f(g(t),h(t» ;
its tangent line atp
is given bythe parametric equation
( I I )
plus the corresponding equation for z : z - Z o =l
o(t
- to ) ·( I I ')
But these three equation
( I I )
and( I I ')
together satisfy the linear equation(6)
for the approximate change z - Zo in z. This linear equation(6)
represents a plane in 3-space; since it is satisfied by
( I I ),
it must be the tangent plane to the surfaceS
according to our definition.In this way, the chain rule combines ideas from geometry (tangent planes), from mechanics (velocity vectors), from calculus (linear approxi
mation), and from algebra (dual spaces), with the results appropriately added or multiplied. It gives meaning to the "total differential".
Some of these ideas are more vivid in pictures. Thus the gradients of a function f(x,y) defined in the whole (x,y)-plane give a vector at each