Partial Derivatives

For elementary problems in algebra, it is always vital to know whether to add or to multiply. For example, the total change due to two separate causes clearly ought usually to be the sum of the separate changes. On the other hand, one must multiply to get a composite rate of change: For example, given the exchange rate from pounds to dollars and the rate from dollars to francs, the product will be the exchange rate from pounds to francs.

These two simple observations of the practical meaning of addition and multiplication appear formally in the

chain rule

for differentiation in the calculus. This we have already used: If

z

⁼ g(y) and

y

⁼

h(x)

are two functions with continuous derivatives, then in the relevant range

z

⁼

g(h(x»

is a function of

x

and has derivative

z '(x)

⁼g

'

(y)

h '(x),

dz

dx dz dy

dy dx ' ( I )

9. Partial Derivatives

1 69

The proof requires a little care with the limits entering in the definition of the derivatives involved; the limits involve the formulas for the corresponding finite increments ax = x - X ^I, for then

z

�

^ay

ax ay ax ·

Thus (I) expresses the underlying reason why one should multiply rates.

The corresponding chain rule is more striking for functions of several variables, such as a quantity

z

given as a function

z

⁼ I(x,y) for all points (x,y) in some open set U of the cartesian (x,y)-plane. There is no problem in finding what derivatives might mean here; if one holds y fixed, the quantity

z

remains just a function of x; its derivative, when it exists, is called the

partial derivative

with respect to x. Thus at a point (x,y) in U this derivative, for

h =1= 0,

az

^_I' ( )

- 1·

I(x

+ h

,y ) - I(x,y )

- a

_x -

x

^{x,y - 1m}_{h �O}

h . (2)

Holding x fixed, there is a similar partial derivative I 'y(x,y). This is the more explicit notation, specifying both variables and indicating which is

"held fixed" and which is variable. The notation

az / a

y for the same par

tial derivative is incomplete unless an indication is added to show

which

other variable is at hand and is held fixed. (The observation is important, say in thermodynamics, where several alternative pairs of independent variables may be to hand.)

For

z

⁼ I(x,y) these two partial derivatives give the rate of change of

z

when the point (x,y) in the plane moves horizontally (in the x direction) or vertically (in the y direction). As such, they do not give complete infor

mation about all the possible rates of change of

z.

One would at least also want the rate at which

z

changes when the point (x,y) moves off in another direction-say in a direction making an angle

(J

with the positive x axis. A linear change in this direction is then given, in terms of a parame

ter

t,

t

) = x

+

(cos (J)t, y(

t

) ⁼y

+

(sin

(J)t .

The intuitive principle that the total change is the

sum

of the separate changes in x and in y then suggests that the derivative of

z

with respect to the parameter

t

in the direction

(J

should be the linear combination

dz [ ^az ] [ ^az ] ^.

dt ⁼

a

x cos

(J + a

y SIn

(J, (3)

usually called the

directional

derivative. When both the partial derivatives of

z

are continuous, this formula holds; it is a special case of the following

1 70

V I . Concepts of Calculus chain rule. If the functions

x = g( t )

and

y = h(t )

have continuous derivatives, giving values

x,y

in the set U C

R2

where the function

z = j(x,y)

also has two continuous first partial derivatives, then

z = j

(

g

(

t ),h( t »

is a function of (suitable values of)

t

with the continuous derivative

dz dt ax dt + � dy . ay dt

⁽⁴⁾

This clearly includes the motivating case

(3).

A similar formula applies when

z

is given as a function of more than two variables or when these variables

x

and

y

depend not on one but on several parameters.

This chain rule (4) has several different aspects.

First, think of

dx = (dx/dt )dt

as an infinitesimal change in

x,

caused by the (equally) infinitesimal change

dt

t.

Then, multiplying (4) by

dt

and cancelling gives

az az

dz = - dx + - dy . ax ay (5)

This expression is called the

total differential

z;

for given values of

x

and

y

it gives the total change in

z

due to infinitesimal changes

dx

and

dy;

we will soon give a less "infinitesimal" interpretation.

Starting from a point

xo, yo

with finite changes

x - Xo

and

y - Yo ,

the formula

(5)

suggests a linear approximation

z - Zo

to the change in

z:

(z - zo ) = [£] ^ax

(x - xo ) + [� ] ay

₀^(y

^{- yo ) .} ⁽⁶⁾

This suggests (and correctly) that there is a Taylor's formula and also a Taylor series, each valid for functions

z

of two variables

x

and

y

which are sufficiently "smooth". Here and later we will use the term

smooth

for functions with enough continuous derivatives, in cases where we do not wish to specify in detail how many such derivatives are in fact needed.

In the chain rule (4),

dz Idt

can be regarded as an "inner product" of two "vectors", as follows

dz = £ dx + £ dy = [£, ^£ ] ^. [ ^{dx , dY} ] ^. ⁽⁷⁾

dt ax dt ay dt ax ay dt dt

The first factor on the right is called the

gradient

z

j

(

x,y

); it is defined at each point

(xo,Yo )

of the plane, and is written

( \l /)0

= [ ^{ax ay} ^{aj , aj} ^]

X = Xo. Y= Yo

^. (8)

9. Partial Derivatives

1 7 1

This vector "points" in the direction of the maximum rate of increase of the function

f,

and has that rate as its length. The function

f

determines one such vector at each point of the plane. At each point, all such vectors for all

f

form a two-dimensional vector space, called the

cotangent space

attached to the plane at that point.

The second vector in the product

(7)

depends on the functions

x

g(t ), y

h(t )

. They describe a continuous

path

passing through the point

xo, yo ;

such a path is also called a

parametrized curve,

consisting of the points

(g( t ),h( t»

each labelled with the corresponding value of the

parameter t.

Such a curve is the

trajectory

of a moving point. At time

to ,

where

x

xo ,

and

y

⁼

Yo ,

the velocity of this moving point is the second factor of

(7)

[ ddx , dd� t t 1

1 = 10 ⁼

(g '(to ), h '(to » ·

(9) It is called the

tangent vector

to the path at the point. All the tangent vec

tors to trajectories through the point

(xo,Yo )

form a two dimensional vec

tor space, called the

tangent space To

to the plane at this point.

Now the chain rule, in the form

(7),

gives

dz I dt

as the product of the gradient vector of

z

by the tangent vector to the path. If one considers both vectors to lie in the same two-dimensional space, this product is just the inner product as described in §IV.9. However, it is preferable to con

sider the tangent and cotangent spaces as conceptually distinct. The

"product" in

(7)

is then a real-valued function of two vectors, one from each space. This function is linear in each vector when the other is held constant, so is said to be

bilinear;

in Chapter VII we will indicate why it makes the cotangent space "dual" to the tangent space.

The cotangent space may be constructed formally as follows: Take all smooth functions

f(x,y),

each defined in some neighborhood of

(xo,Yo );

they form an (infinite dimensional) vector space under addition of values and multiplication by real constants. Call two such smooth functions

z

f(x,y)

and w =

k(x,y) cotangent

(or equivalent) at the point

(xo,Yo )

when they have the same first partial derivatives there :

The equivalence classes of these functions under this relation then form the desired two-dimensional space, called the

cotangent

space at the point

(xo,Yo ).

This construction works not just for the plane, but for other curved surfaces such as the sphere, when the coordinates

(x,y)

in the plane are replaced by suitable coordinates, such as latitude and longitude for the sphere.

1 72

V I . Concepts of Calculus On the other hand, every smooth function

f

has a gradient V

f

as in (8). In particular, the coordinates x and y are smooth functions, with gra

dients vx =

( 1 ,0)

and \ly =

(0, 1 ).

Hence every gradient can be expressed at each point as a linear combination of these two gradients, in the form

Except for notation, this is j ust the definition

of the

total differential.

Thus the differential, born as an infinitesimal, may be defined to be the gradient V

f

-a vector in the cotangent space. This is why the cotangent space differs from the tangent space.

The tangent vector at to to the path

g(t), h(t)

also determines the usual tangent line to that path, with the parametric equations

x - Xo =

g '(to )(t

t

o ), y - yo =

h '(to )(t

- to ) ,

( 1 1 )

where Xo =

g(to )

and yo =

h(to ).

The chain rule

(4)

also has a three-dimensional interpretation. The function z = f(x,y) represents a height z above (or below) the point (x,y) in the plane, and so may be pictured by a smooth surface

S

at these heights above some portion of the plane. The

tangent plane

'IT to this sur

face at a point

p

⁼ (xo,yo,zo = f(xo,yo » is by definition the plane (if there is one) containing all the tangent lines at

p

to all the smooth curves on

S

passing through

p.

Such a smooth trajectory is given by x =

g(t),

y =

h(t)

and z =

f(g(t),h(t» ;

its tangent line at

p

is given by

the parametric equation

( I I )

plus the corresponding equation for z : z - Z o ⁼

l

^(t

^{- to ) ·}

^{( I I ')}

But these three equation

( I I )

and

( I I ')

together satisfy the linear equation

(6)

for the approximate change z - Zo in z. This linear equation

(6)

represents a plane in 3-space; since it is satisfied by

( I I ),

it must be the tangent plane to the surface

S

according to our definition.

In this way, the chain rule combines ideas from geometry (tangent planes), from mechanics (velocity vectors), from calculus (linear approxi

mation), and from algebra (dual spaces), with the results appropriately added or multiplied. It gives meaning to the "total differential".

Some of these ideas are more vivid in pictures. Thus the gradients of a function f(x,y) defined in the whole (x,y)-plane give a vector at each

In document Mathematics Form and Function (Page 179-184)

chain rule

z

y

h(x)

z

g(h(x»

x

z '(x)

'

h '(x),

dz

dx dz dy

dy dx ' ( I )

1 69

z

�

z

z

z

partial derivative

h =1= 0,

az

- 1·

+ h

- a

x

h . (2)

az / a

which

z

z

z.

z

(J

t,

t

+

t

+

(J)t .

sum

z

t

(J

dz [ az ] [ az ] .

a

(J + a

(J, (3)

directional

z

1 70

x = g( t )

y = h(t )

x,y

R2

z = j(x,y)

z = j

g

t ),h( t »

t

dz dt ax dt + � dy . ay dt

(3).

z

x

y

dx = (dx/dt )dt

x,

dt

t.

dt

az az

dz = - dx + - dy . ax ay (5)

total differential

z;

x

y

z

dx

dy;

dz [ ^az ] [ ^az ] ^.

(z - zo ) = [£] ^ax

^{- yo ) .} ⁽⁶⁾

dz = £ dx + £ dy = [£, ^£ ] ^. [ ^{dx , dY} ] ^. ⁽⁷⁾

= [ ^{ax ay} ^{aj , aj} ^]

^. (8)