4 MT210 Notebook Eigenvalues and Eigenvectors Definitions; Graphical Illustrations... 3

(1)

MT210 Notebook 4 Fall 2013/2014 prepared by Professor Jenny Baglivo

c

4 MT210 Notebook 4

This notebook is concerned with further matrix concepts and their applications. In particular, we will study eigenvalues, eigenvectors, orthogonality and least squares. The notes correspond to material in Chapters 5 and 6 of the Lay textbook.

4.1 Eigenvalues and Eigenvectors 4.1.1 Definitions; Graphical Illustrations

Let Abe a square matrix of ordern, and λ(“lambda”) be a scalar. 1. λis said to be aneigenvalue of A if

Ax=λxfor some nonzero vectorx.

2. Ifx6=O satisfiesAx=λx, thenx is aneigenvector of A with eigenvalueλ. Note that ifxis a nonzero eigenvector, then so is cxfor each c∈R since

A(cx) =c(Ax) =c(λx) =λ(cx). Thus, the nonzero elements of Span(x) are all eigenvectors ofA.

Eigenvalues are “exceptional values” and eigenvectors are “exceptional vectors.” The prefix “eigen” comes from the German language meaning “owned by” or “peculiar to.”

Example 1. LetA = 0.5 0 0 1.5 . Then

1. λ1 = 0.5 is an eigenvalue of A, with corre-sponding eigenvector e1.

2. λ2 = 1.5 is an eigenvector of A, with corre-sponding eigenvector e2.

Consider the transformation with rule T(x) =Ax.

Then

• T(x) contracts points in the x-direction; in particular, T(e1) = 0.5e1.

(4)

Example 2. LetA = 0.6 0.1 0.4 0.9 . Then

1. λ1 = 0.5 is an eigenvalue of A, with corre-sponding eigenvector v1 = » −1/√2 1/√2 – ,

2. λ2= 1 is an eigenvector ofA, with correspond-ing eigenvectorv2 = » 1/√17 4/√17 – . Consider the transformation with rule

T(x) =Ax. Then

• T(x) contracts points in the v1-direction; in particular, T(v1) = 0.5v1.

• T(x) leaves points in thev2 direction fixed; in particular, T(v2) =v2.

• T(x) maps the unit circle to the ellipse shown in the plot.

The complete analysis of Example 2 will be carried out in the next section.

Note that a square matrix of ordernwith values in the real numbers (aij ∈R for alli,j) may not have eigenvalues λ∈R, and eigenvectorsx∈Rn.

For example,2×2 matrices Acorresponding to rotations around the origin:

A=

cosθ −sinθ sinθ cosθ

, whereθ6= 2mπfor some integerm, leave no direction in 2-space fixed.

4.1.2 Eigenspaces, Characteristic Polynomials, Characteristic Equations

Let Abe a square matrix of ordern, and let λbe a scalar.

(5)

2. Characteristic Polynomial: The expressiondet(A−λI) is an nth degree polynomial in the variable λ, and is called the characteristic polynomial ofA.

3. Characteristic Equation: The equation det(A−λI) = 0 is called the characteristic equation of A. To find the eigenvalues ofA we solve the characteristic equation for λ.

Example 2, continued. LetA = 0.6 0.1 0.4 0.9 , as above. (1) Since A−λI = » 0.6 0.1 0.4 0.9 – −λ » 1 0 0 1 – = » 0.6−λ 0.1 0.4 0.9−λ –

, the characteristic polynomial is det(A−λI) =(0.6−λ)(0.9−λ)−(0.1)(0.4) = 0.54−1.5λ+λ2−0.04 =λ2−1.5λ+ 0.5. Since

det(A−λI) = (λ−0.5)(λ−1),

the solutions to the characteristic equation are the eigenvalues 0.5 and 1 listed earlier.

(2) Letλ= 0.5. SinceA−λI = 0.1 0.1 0.4 0.4 , A−λI O = 0.1 0.1 0 0.4 0.4 0 ∼ 1 1 0 0 0 0 ⇒ x= −x2 x2 =x2 −1 1 , wherex2 is free, we know that

Eigenspace(0.5) = N ull(A−0.5I) = Span

» −1 1 –ff . Note thatv1 = −1/√2 1/√2 = √1 2 −1 1 ∈Eigenspace(0.5).

(3) Letλ= 1. SinceA−λI = −0.4 0.1 0.4 −0.1 , A−λI O = −0.4 0.1 0 0.4 −0.1 0 ∼ 1 −1/4 0 0 0 0 ⇒ x= x2/4 x2 =x2 1/4 1 , wherex2 is free, we know that

Eigenspace(1) =N ull(A−I) = Span

» 1/4 1 –ff = Span » 1 4 –ff . Note thatv2 = 1/√17 4/√17 = √1 17 1 4 ∈ Eigenspace(1.5).

(6)

Problem 1. LetA= 0.8 0.3 0.2 0.7 .

Determine the eigenvalues of A, and write each eigenspace as the span of a set of vectors.

(7)

Problem 2. LetA= 1.0 −0.5 −0.5 1.0 .

Determine the eigenvalues of A, and write each eigenspace as the span of a set of vectors.

(8)

Problem 3. LetA =   0 0 1 0 0.8 0 0.25 0 0  .

(9)

4.1.3 Eigenanalysis and Powers; Eigenvector Bases; Special Cases

Conducting aneigenanalysis (that is, finding eigenvalues and eigenvectors) can be challenging. The following is an initial list of useful theorems for eigenanalysis:

1. Powers: Ifλis an eigenvalue ofA with eigenvectorx and kis a positive integer, then

xis an eigenvector forAk _{with corresponding eigenvalue}_λk_.

2. Bases and Powers:Let A be a square matrix of ordern. Suppose that vi is an eigenvector for Awith corresponding eigenvalue λi,

fori= 1,2, . . . , n, and the set{v1,v2, . . . ,vn} is a basis forRn. Then, for every vector

xand positive integer k,Ak_x _{can be computed quickly using the unique representation} of xin the eigenvector basis{v1,v2, . . . ,vn}.

Specifically, if x=c1v1+· · ·+cnvn for unique constants ci, then

Akx =Ak(c1v1+· · ·+cnvn) =c1(Akv1) +· · ·+cn(Akvn) =c1(λ1)kv1+· · ·+cn(λn)kvn.

3. Diagonal Matrices:LetA be a diagonal matrix of ordern. Then

ei is an eigenvector forA with eigenvalueaii, for i= 1,2, . . . , n.

Thus, the standard basis for Rn is an eigenvector basis for the diagonal matrix A, and the eigenvalues are the diagonal elements ofA.

4. Distinct Eigenvalues:LetAbe a square matrix of ordern. IfAhasndistinct eigenvalues, then A has an eigenvector basis. To construct an eigenvector basis, choose one nonzero vector from each eigenspace.

5. Triangular Matrices:Let Abe a triangular matrix of order n. Then det(A−λI) = (a11−λ)(a22−λ)· · ·(ann−λ)

(10)

General application: projections over time. A general application of eigenanalysis is to the analysis of projections over time. In this type of application,

1. x0 represents information at time 0,

2. x1 =Ax0 represents information at time 1,

3. x2 =Ax1=A2x0 represents information at time 2,

and so forth. IfA has an eigenvector basis, then information at time kis

xk=Akx0 =c1(λ1)kv1+· · ·+cn(λn)kvn wherex0 =c1v1+· · ·+cnvn.

We will see an important application of this methodology in Section 4.1.7 (page 17).

As a simple illustration, considerA =

»

0.6 0.1 0.4 0.9

–

once again. Let

v1= » −1 1 – , v2 = » 1/4 1 – , λ1 = 0.5, λ2 = 1, and x0 =c1v1+c2v2. Now, xk=Akx0 =c1(0.5)kv1+c2(1)kv2 →c2v2 as k→ ∞.

Thus, information at time k is approximately equal to the v2-component of information at time 0 when kis large.

Problem 4. Use the definitions of eigenvalue and eigenvector, and properties of matrices, to prove the following special case of the first theorem listed on the previous page:

“Let A be a square matrix of order n, and let x be an eigenvector of A with eigenvalueλ. Demonstrate thatx is an eigenvector ofA3 with eigenvalue λ3.”

(11)

Problem 5. LetAbe a square matrix of ordern, and assume thatAhasndistinct eigenvalues and let

vi∈Eigenspace(λi) be a nonzero vector, for each i.

(12)

Problem 6. The following triangular matrices each have eigenvalues 3, 3, 2: (a)A= 2 4 3 0 0 2 3 0 2 1 2 3 5; (b) A= 2 4 3 0 0 0 3 0 −1 1 2 3 5.

(13)

4.1.4 Fundamental Theorem of Algebra, Complex Numbers and Eigenvalues

Let A be a square matrix of order n. To find the eigenvalues of A we need to solve the characteristic equation, which requires that we factor the characteristic polynomial,det(A−λI). By the fundamental theorem of algebra, the characteristic polynomial can always be factored inton linear terms if we allow both real and complex numbers:

det(A−λI) = (λ1−λ)(λ2−λ)· · ·(λn−λ), where eachλi ∈C.

The eigenvalues are λ1,λ2,. . .,λn. In general, not allλi’s are distinct.

For example, let A=   0 6 0 −6 0 0 0 5 1  . Since |A−λI|= −λ 6 0 −6 −λ 0 0 5 1−λ = (1−λ) ˛ ˛ ˛ ˛ −λ 6 −6 −λ ˛ ˛ ˛ ˛= (1 −λ)(λ2+ 36) = (1−λ)(6i−λ)(−6i−λ) = 0 implies λ= 1,6i,−6i, and the eigenvalues ofA are 1, 6iand −6i. Further,

Eigenspace(1) Eigenspace(6i) Eigenspace(−6i)

=N ull(A−I) = N ull(A−6iI) =N ull(A+ 6iI)

= Span         0 0 1         = Span         6 +i −1 + 6i 5         = Span         6−i −1−6i 5        

Note that matrices with complex eigenvalues and complex eigenvectors are common in applied mathematics. Examples include population projection matrices (see Section 4.1.7, page 17).

4.1.5 Algebraic and Geometric Multiplicity; More About Eigenvector Bases

Let Abe a square matrix of ordern and letλ0 be an eigenvalue of A. Then

1. Algebraic Multiplicity: The algebraic multiplicity of λ0 is the number of times (λ0−λ) appears as a factor of the characteristic polynomial.

2. Geometric Multiplicity: The geometric multiplicity ofλ0is the dimension of Eigenspace(λ0).

Note that, by the fundamental theorem of algebra, the sum of the algebraic multiplicities of the eigenvalues of Amust be n.

(14)

Problem 6, continued. Fill-in the table below with information for the triangular matrices from the problem on page 12:

Geometric Algebraic Geometric Algebraic Multiplicity Multiplicity Multiplicity Multiplicity

ofλ= 2 ofλ= 2 ofλ= 3 ofλ= 3 (a) A=   3 0 0 2 3 0 2 1 2   (b) A=   3 0 0 0 3 0 −1 1 2  

More on finding eigenvector bases. Here are two additional theorems that are useful for doing eigenanalyses:

1. Algebraic and Geometric Multiplicity:LetA be a square matrix of ordernand letλ0 be an eigenvalue of A. Then the algebraic and geometric multiplicities of λ0 must satisfy the following inequalities:

1≤Geometric Multiplicity of λ0 ≤Algebraic Multiplicity of λ0.

(If the geometric multiplicity is strictly less than the algebraic multiplicity, then there is a “deficiency” of eigenvectors and we won’t be able to find an eigenvector basis.) 2. Pooling Eigenspace Bases: If A has p distinct eigenvalues (λi for i= 1,2, . . . , p) and Bi

is a basis for eigenspace ofλi for each i, then the set

B₁∪ B₂∪ · · · ∪ B_p

is a linearly independent set. (If you pool the bases, you get a linearly independent set.)

Problem 6, continued. Do either of the matrices in Problem 6 have an eigenvector basis for

(15)

4.1.6 Similar Matrices, Diagonalizable Matrices

Similar matrices: Let A and B be square matrices of order n. Then A and B are said to besimilar if there exists an invertible matrix P satisfying

A=P BP−1 (andB =P−1AP).

If Aand B are similar matrices, then

1. they have the same determinant, det(A) =det(B), 2. they have the same eigenvalues, and

3. theirkth _{powers satisfy} _Ak₌_{P B}k_P−1_{, for each positive integer} _k_. The factorization A=P BP−1 is useful whenB is easier to work with than A.

Diagonalizable matrices: Let A be a square matrix of order n. Then A is said to be

diagonalizableif it is similar to a diagonal matrix. That is, if

A=P DP−1 whereD is diagonal andP is invertible.

The following theorem tells us exactly whenA is diagonalizable.

Theorem (Diagonalization). LetA be a square matrix of ordern. Then

Ais diagonalizable if and only if A hasnlinearly independent eigenvectors.

In fact, A = P DP−1 iff the columns of P are n linearly independent eigenvectors and the diagonal entries of Dare the corresponding eigenvalues.

For example, 1. IfA = 0.6 0.1 0.4 0.9 , then A=P DP−1 where P = −1 1 1 4 and D= 0.5 0 0 1 . 2. IfA =   0 6 0 −6 0 0 0 5 1  , thenA=P DP−1 where P =   0 6 +i 6−i 0 −1 + 6i −1−6i 1 5 5   and D=   1 0 0 0 6i 0 0 0 −6i  .

(16)

Problem 3, continued. The square matrix A = 2 4 0 0 1 0 0.8 0 0.25 0 0 3 5 is diagonalizable.

Use the work you did on page 8 to find matrices P and Dso that A=P DP−1.

Diagonalization and Transformations. Suppose thatA=P DP−1, where

P = v1 v2 . . . vn and D=      λ1 0 · · · 0 0 λ2 · · · 0 .. . ... · · · ... 0 0 · · · λn      ,

and let B= {v1,v2, . . . ,vn} be the basis whose elements are the columns of P.

Then the “action” ofA is to

(↓): change from the standard basis of

Rn to basisB;

(→): operate as a diagonal matrix in basisB, and

(↑): change back to the standard basis for interpretation. x=Pn i=1civi _{−−−−→}A Ax=Pni=1λicivi

↓

↑

[x]B = 2 6 6 6 6 6 6 4 c1 c2 . . . cn 3 7 7 7 7 7 7 5 D −−−−→ [Ax]B= 2 6 6 6 6 6 6 4 λ1c1 λ2c2 . . . λncn 3 7 7 7 7 7 7 5

Similarly, the “action” ofAk is to _x₌Pn

i=1civi _{−−−−→}Ak Akx=Pni=1λkicivi

(17)

4.1.7 Applications: Population Projections and Stochastic Matrices

This section contains two applications of eigenanalysis.

I. Population projections, and the northern spotted owl (Source: Lay text, p.265).

Researchers used demographic data for the northern spotted owl to develop a stage-matrix model using 3 life stages (juvenile, subadult and adult). Their goal was to track the population growth/decline of the owl in a particular old growth forest in the Pacific northwest.

If ji is the number of juveniles, si is the number of subadults and ai is the number of adults in the population at time i, then

xi=   ji si ai 

 is the population vector at timei,

and the total population at timeiis the sum of the components (ji+si+ai). The matrix that allows you to project one year is

A= 2 4 0 0 0.33 0.18 0 0 0 0.71 0.94 3 5.

Given xi, the population vector at time (i+ 1) is

xi+1=Axi=   0 0 0.33 0.18 0 0 0 0.71 0.94     ji si ai  =   0.33ai 0.18ji 0.71si+ 0.94ai  =   ji+1 si+1 ai+1  .

Matrices used in population problems are generally diagonalizable, with both real and complex eigenvalues. For this problem, we can write A=P DP−1, where

D=   0.984 0 0 0 −0.022 + 0.206i 0 0 0 −0.022−0.206i   and P =   0.318 0.682 0.682 0.058 −0.062−0.59i −0.062 + 0.59i 0.946 −0.045 + 0.426i −0.045−0.426i  .

The diagonal elements of Dare the eigenvalues of A. Since

(0.984)k →0, (−0.022 + 0.206i)k→0 and (−0.022−0.206i)k →0 ask→ ∞, we know that the population will eventually crash given any initial population vector.

(18)

The author tells us that, if the (2,1)-entry of theAmatrix was 0.60 (the proportion that would be appropriate for this species in a different location), then the population would grow. The new matrix (the one with the new (2,1)-entry) would be diagonalizable with

D=   1.064 0 0 0 −0.062 + 0.358i 0 0 0 −0.062−0.358i  andP =   0.292 −0.077 + 0.443i −0.077−0.443i 0.165 0.743 0.743 0.942 −0.467−0.167i −0.467 + 0.167i  . Since (1.064)k_{→ ∞}_{, (}₋₀_._{062 + 0}_.₃₅₈_i₎k _→_{0 and (}₋₀_.₀₆₂₋₀_.₃₅₈_i₎k_→_{0 as}_k_{→ ∞}_, the statement the author makes is correct.

I simulated population growth over 30 years starting with an initial population of 1500 individ-uals (j0+s0+a0 = 1500) and using both the true matrix and the matrix with the (2,1)-entry changed. The results are summarized in the plot below.

1. Left Plot: Using the true matrix, the total population declined over time.

2. Right Plot: Using the altered matrix, the total population increased over time with approximate geometric growth. Geometric growth “kicks in” when k is large enough so that kth powers of the last two eigenvalues are close to zero:

(−0.062 + 0.358i)k≈0 and (−0.062−0.358i)k ≈0.

II. Stochastic matrices, moving cars, and searching the internet. Aprobability vector

is one whose entries are nonnegative real numbers with sum 1. Astochastic matrix is a square matrix whose columns are probability vectors.

(19)

Example. A car rental agency has three rental locations (1, 2, 3). A customer may rent a car from any of the three locations and return the car to any of the three locations. From past experience, management observes that:

1. Location 1: Cars rented from location 1 are returned to locations 1, 2, 3 with proba-bilities 0.5, 0.3 and 0.2, respectively;

2. Location 2: Cars rented from location 2 are returned to locations 1, 2, 3 with proba-bilities 0.2, 0.8 and 0, respectively; and

3. Location 3: Cars rented from location 3 are returned to locations 1, 2, 3 with proba-bilities 0.3, 0.3 and 0.4, respectively.

Suppose that we would like to determine the probabilities that a car initially rented from a given location (either 1, 2 or 3) will be returned to locations 1, 2, 3 after k rental periods. Let A be the matrix whose columns are the probabilities listed above, letai,bi and ci be the probabilities that the car is at locations 1, 2, 3 afterirental periods, and let xi be the vector whose components are the probabilities ai,bi and ci:

A=   0.5 0.2 0.3 0.3 0.8 0.3 0.2 0 0.4  , xi=   ai bi ci  .

The matrixA can be used to project one rental period. That is, xi+1=Axi for each i. The starting location vectors (x0) are

  1 0 0  for location 1,   0 1 0  for location 2,   0 0 1   for location 3.

We can writeA=P DP−1, where

D=   1 0 0 0 0.5 0 0 0 0.2  , P = v1 v2 v3 =   0.3 0.1 −0.1 0.6 −0.3 0 0.1 0.2 0.1  ,

and the first column of P has nonnegative terms with sum 1. Note that 1k →1, (0.5)k→0 and (0.2)k→0 as k→ ∞. If x0 corresponds to the location 2 vector, for example, then

x0 =v1−(4/3)v2+ (5/3)v3

and xk=Akx0 ≈v1 for largek. In fact,xk≈v1 after only 10 time periods:

k= 0 k= 1 k= 2 k= 3 k= 4 k= 5 k= 6 k= 7 k= 8 k= 9 k= 10 k= 11 ak 0 0.2 0.26 0.282 0.291 0.296 0.298 0.299 0.299 0.3 0.3 0.3 bk 1 0.8 0.7 0.65 0.625 0.613 0.606 0.603 0.602 0.601 0.6 0.6

(20)

Similarly, if x0 corresponds to the location 1 vector, then

x0 =v1+ 2v2−5v3

and xk=Akx0 ≈v1 for largek, and if x0 corresponds to the location 3 vector, then

x0 =v1+ 2v2+ 5v3

and xk =Akx0 ≈v1 for large k. Thus, if k is large, the probabilities that a car initially at any one of the three locations will be returned to locations 1, 2, 3 after k rental periods are (approximately) 0.3, 0.6 and 0.1.

Surfing the web. Now, imagine yourself surfing the web starting from some initial location and randomly following hyperlinks. Assuming an appropriate A matrix can be created and analyzed as above, the probability that you will be at a given location after a sufficient number of steps can be determined.

The designers of Google use the eventual probabilities to determine the order in which the results of a search are reported; specifically, webpages with higher probabilities are listed before those with lower probabilities. Their A matrix uses the hyperlink structure of the web and some proprietary information.

4.2 Orthogonality and Orthogonal Projections 4.2.1 Inner Product, Length, Distance

Let v and wbe vectors inRk. Then

1. Inner product: The inner product (dot product) of v and w is the number

v·w=vTw=v1 v2 · · · vk 2 6 6 6 4 w1 w2 . . . wk 3 7 7 7 5 =v1w1+v2w2+· · ·+vkwk.

2. Length: The length of v is the number

(21)

For example, ifv =     8 −1 0 2     andw =     −2 −1 4 3     , then (1) v·w= (2) The length of v is

(3) The unit vector in the direction of v is

(4) The distance between v and w is

4.2.2 Properties of Inner Product

Let u,v,w∈Rk,c∈R. Then

1. Commutative: v·w=w·v.

2. Scalars: (cv)·w=v·(cw) =c(v·w).

3. Distributive: (u+v)·w= (u·w) + (v·w) and w·(u+v) = (w·u) + (w·v).

4. Nonnegative: v·v≥0. Further v·v= 0 iff v=O.

Problem 1. Let v1, v2 and w be vectors in Rk, and suppose that v1·w = 0 and v2·w = 0. Use the properties of inner product to demonstrate thatv·w= 0 for every v∈Span{v1,v2}.

(22)

4.2.3 Orthogonal, Orthogonal Sets, Orthogonal Complement

The concept of orthogonality is important in applications.

1. Orthogonal: Letv and wbe vectors inRk_{. Then}_v _and_w_{are said to be}_orthogonal _if their inner product is zero: v·w= 0.

2. Orthogonal Set: Let v1,v2, . . . ,vp be vectors in Rk. Then {v1,v2, . . . ,vp} is said to be anorthogonal set if

vi 6=O for alli, and vi·vj = 0 wheni6=j.

3. Orthogonal Complement: Let V be a subspace of Rk. The orthogonal complement

of V, denoted byV⊥ (“V-perp”), is the collection of all vectors orthogonal to V: V⊥={w : w is orthogonal to eachv ∈V}. Problem 2. Letv1 =   a −2 1  ,v2 =   0 1 2  ,v3 =   5 2 b  .

(23)

Properties of orthogonal complements. Let V be a subspace of Rk and let V⊥ be its orthogonal complement. Then

1. Subspace: The orthogonal complement ofV is a subspace ofRk. Further, V ∩V⊥={O} sinceO is the only vector satisfyingx·x= 0.

2. Orthogonal Complement of V⊥: The orthogonal complement ofV⊥ is V: V⊥⊥=V. 3. Spanning Sets: Suppose that V = Span{v1,v2, . . . ,vp}. Then

w∈V⊥ if and only ifw·vi= 0 for i= 1,2, . . . , p.

4. Pooling Bases: If B1 is a basis for V and B2 is a basis for V⊥, then the union of the bases, B1∪ B2, is a basis for Rk.

To illustrate orthogonal complements, let V = Span{v} = Span 4 1 , and w = w1 w2 . Since 0 =w·v= 4w1+w2 ⇒ w=w1 1 −4 , where w1 is free, we know that

V⊥= Span

1 −4

(24)

Problem 3. In each case, writeV⊥ as a span. (a) V = Span 8 < : 2 4 1 −1 0 3 5, 2 4 2 0 −2 3 5 9 = ; (b) V = Span 8 < : 2 4 1 2 3 3 5 9 = ;

(25)

4.2.4 Fundamental Theorem of Linear Algebra

Let Abe anm×nmatrix and let AT be its transpose. The following theorem, known as the

fundamental theorem of linear algebra, gives important relationships among the four subspaces related to A and its transpose.

Fundamental Theorem of Linear Algebra. Let Abe anm×nmatrix. Then 1. N ull(A) and Row(A) =Col(AT) are orthogonal complements in Rn.

2. N ull(AT) and Row(AT) =Col(A) are orthogonal complements in Rm. Further, ifrank(A) =r, then

1. dim(Col(A)) =dim(Col(AT)) =r, 2. dim(N ull(A)) =n−r and

3. dim(N ull(AT)) =m−r.

Proof of first statement: It is instructive to demonstrate the first statement in the theorem.

Let AT = α1 α2 · · · αmand A =    α1T .. . αmT   . Then Ax=O ⇔      α1T α2T .. . αmT      x=      α1·x α2·x .. . αm·x      =O.

(26)

Problem 4. Find bases for N ull(A),Col(A),N ull(AT) andCol(AT), where A=   1 2 1 −2 −3 −6 −4 3 2 4 1 −7  .

(27)

4.2.5 Orthogonal Spanning Sets

The following theorem tells us that a set of mutually orthogonal nonzero vectors is linearly independent. Further, the coordinates of a vector w with respect to a basis of mutually orthogonal nonzero vectors can be found quickly using dot products:

Theorem (Orthogonal Spanning Sets). Let{v1,v2, . . . ,vp} be an orthogonal set of vec-tors in Rk _{and let} _V _{= Span}_{_v

1,v2, . . . ,vp}. Then

1. {v1,v2, . . . ,vp} is a basis forV.

2. Ifw∈V, thenw=c1v1+· · ·+cpvp whereci = w_v_i·_·v_vi_i for each i.

Problem 5. In each case, use dot products to find the coordinates of the vectorwwith respect to the given orthogonal basis.

(a) V = Span{v1,v2}= Span 8 < : 2 4 1 −1 1 3 5, 2 4 2 3 1 3 5 9 = ; , and w= 2 4 14 1 11 3 5.

(28)

(b) V = Span{v1,v2,v3} = Span 8 > > < > > : 2 6 6 4 0 1 −4 −1 3 7 7 5 , 2 6 6 4 3 5 1 1 3 7 7 5 , 2 6 6 4 1 0 1 −4 3 7 7 5 9 > > = > > ; , and w = 2 6 6 4 7 13 0 4 3 7 7 5 .

4.2.6 Angles, Inner Products, and Orthogonal Projections

Angle between v and w. Letv and wbe vectors inR2 orR3 represented as directed line segments beginning at the origin. The angle between v and w, θ, is the smaller of the two angles at the origin determined byv andw. The angle θlies in the interval [0, π].

(29)

Orthogonal projection. For vectors in R2 or R3, the (orthogonal) projection ofw onto v, denoted by proj_v(w), is the vector highlighted in each diagram below for angles θ6= π₂:

1. Left Plot 0≤θ < π₂

: The projection of w onto v is the vector that points in the direction of v and whose length is kwkcos(θ).

2. Right Plot π₂ < θ≤π

: The projection of w onto v is the vector that points in the direction opposite tov and whose length iskwkcos(π−θ).

When θ= π₂, the projection of wonto v is the zero vector.

Geometry, trigonometry and the relationshipv·w=kvk kwkcos(θ) can be used to demonstrate that the projection can be computed as follows:

proj_v(w) =

w·v

v·v

v.

That is, the projection is the scalar multiplecv, where cis the ratio w_v_··_vv.

For example, letv = 6 2 andw = 1 5 . Then (1) proj_v(w) = (2) w−proj_v(w) =

(30)

Orthogonal projection in k-space. Letv andwbe vectors inRk, and letV = Span{v}. Then the (orthogonal) projection of w onto v (equivalently, the projection of w onto the subspace spanned by v) is defined as follows:

proj_v(w) = proj_V(w) =

w·v

v·v

v.

The projection, wb = projv(w) = projV(w), is an element of the vector space V and satisfies

the following properties:

1. Minimum Distance: w_b is the unique vector inV closest tow.

2. Orthogonality: wb is the unique vector inV for which w−wb is orthogonal toV.

Thus, we can find the distance between w andV by computing kw−w_bk.

Problem 6. In each case, find the distance betweenw and V = Span{v}.

(a)w= 2 4 0 8 4 3 5 and V = Span 8 < : 2 4 −1 4 1 3 5 9 = ; (b) w= 2 4 1 1 7 3 5and V = Span 8 < : 2 4 0 1 2 3 5 9 = ;

(31)

Orthogonal projection onto V. LetV = Span{v1,v2, . . . ,vp} be the span of an orthogo-nal set of vectors (a set of mutually orthogoorthogo-nal nonzero vectors) in Rk _{and let}_w_∈_Rk_{. Then} the(orthogonal) projectionof wonto V is defined as follows:

b

w= proj_V(w) =c1v1+c2v2+· · ·+cpvp where ci =

w·vi

vi·vi

for each i.

This definition generalizes the case for projection onto the span of a single vector in Rk, and requires that the spanning set is an orthogonal set. Note also that

1. IfVi = Span{vi}for eachi, thenwb is the sum of projections in each coordinate direction:

b

w= proj_V(w) = proj_V₁(w) + proj_V₂(w) +· · ·+ proj_V_p(w).

2. Ifw∈V, thenwb =w.

Properties of orthogonal projections are stated in the following theorem, and illustrated to the right.

In the plot,

• the horizontal “axis” represents vector space V and the vertical “axis” represents the orthogonal complement, V⊥; and

• w is decomposed into the part of winV and the part in V⊥.

Theorem (Orthogonal Projections). LetV be a subspace of Rk,wbe any vector inRk, and w_b be the projection ofw onto V. Then

1. Orthogonal Decomposition: The difference (w−w_b) is a vector inV⊥, and the sum

w=wb + (w−wb)

is the uniquerepresentation of was the sum of a vector in V and a vector in V⊥. (Thus, we have an orthogonal decomposition of w into the part of the vector in V and the part of the vector in V⊥.)

(32)

Problem 7. In each case, find w_b and (w−w_b). Note that eachV has been written as the span of an orthogonal set. (a) V = Span » −1 3 –ff ,w = » 5 −1 – (b) V = Span 8 < : 2 4 1 −1 0 3 5, 2 4 1 1 −2 3 5 9 = ; ,w = 2 4 7 2 0 3 5 (c) V = Span 8 > > > > < > > > > : 2 6 6 6 6 4 1 1 0 0 0 3 7 7 7 7 5 , 2 6 6 6 6 4 0 0 0 1 1 3 7 7 7 7 5 , 2 6 6 6 6 4 1 −1 0 −1 1 3 7 7 7 7 5 9 > > > > = > > > > ; ,w = 2 6 6 6 6 4 8 6 5 7 −3 3 7 7 7 7 5

(33)

4.2.7 Gram-Schmidt Orthogonalization Process

LetV ⊆Rkbe ap-dimensional subspace and let{x1,x2, . . . ,xp}be a basis forV. The

Gram-Schmidt orthogonalization process allows us to construct an orthogonal basis{v1,v2, . . . ,vp} forV starting with{x1,x2, . . . ,xp}. The method is as follows:

1. Letv1=x1.

2. Letv2=x2−projV1(x2) whereV1 = Span{v1}= Span{x1}.

3. Letv3=x3−projV2(x3) whereV2 = Span{v1,v2}= Span{x1,x2}. 4. Letv4=x4−projV3(x4) whereV3 = Span{v1,v2,v3}= Span{x1,x2,x3}.

And, so forth. The final set, {v1,v2, . . . ,vp}, is an orthogonal basis forV.

Problem 8. In each case, find an orthogonal basis forV.

(a) V = Span 8 < : 2 4 1 −1 0 3 5, 2 4 2 0 −2 3 5 9 = ; (b) V = Span 8 > > < > > : 2 6 6 4 1 −1 0 1 3 7 7 5 , 2 6 6 4 2 0 −2 1 3 7 7 5 , 2 6 6 4 4 0 5 2 3 7 7 5 9 > > = > > ;

(34)

4.3 Least Squares Analysis

4.3.1 Best Approximate Solutions; Normal Equations

Let Abe anm×ncoefficient matrix and assume thatAx=bis inconsistent. We propose to find approximatesolutions to the system as follows:

(1) Find the projection of bonto Col(A),bb, and

(2) Report solutions to the consistent system Ax=bb.

Observation 1: Sincebbis as close to bas possible, each approximate solutionx satisfies

kb−bbk=kb−Axk is assmall as possible.

The difference vector is

b−Ax=         b1−(a1,1x1+a1,2x2+· · ·+a1,nxn) b2−(a2,1x1+a2,2x2+· · ·+a2,nxn) .. . bm−(am,1x1+am,2x2+· · ·+am,nxn)        

and the square of the length of the difference vector is m

X

i=1

(bi−(ai,1x1+ai,2x2+· · ·+ai,nxn))2.

Each approximate solution x will minimize the above sum of squared differences. For this reason the approximate solutions are called least squares solutions.

Observation 2: Since the difference vector (b−bb) = (b−Ax)∈(Col(A))

⊥

and the orthogonal complement of the col-umn space of A is the null space of the transpose ofA,

(35)

Further,

AT(b−Ax) =O ⇔ ATb−ATAx=O ⇔ ATAx=ATb.

Thus, least squares solutions can be found by solving the consistent system on the right (called thenormal equation of the system). By using the normal equation, we do not need to find the projection of bon the column space ofA.

The following theorem gives the properties of this process:

The Least Squares Theorem. Under the conditions above,

1. xis a least squares solution to Ax=biff xis a solution toATAx=ATb.

2. ATA is invertible iff the columns ofA are linearly independent. Thus, there is a unique least squares solution iff the columns ofA are linearly independent.

For example, consider the inconsistent systemAx=bwhere A =

2 4 1 0 1 1 1 2 3 5and b= 2 4 6 0 0 3 5. Then ATA ATb = 3 3 6 3 5 0 ∼ 1 1 2 0 2 −6 ∼ 1 0 5 0 1 −3 ⇒ x= 5 −3

is the unique least squares solution to the inconsistent system above.

Problem 1. In each case, find the least squares solution(s) to Ax=b.

(a) A =   1 4 1 2 2 3  and b=   −2 6 1  .

(36)

(b)A =   1 −2 2 −4 −1 2   and b=   3 −4 15  . (c)A =     3 1 −1 1 2 0 0 1 2 1 1 −1     andb=     −3 −3 8 9     .

(37)

4.3.2 Application: Least Squares Analyses of Data

The methodology from the last section can be applied to finding curves of “best fit” (as minimizing a sum of squared differences).

As a simple illustration, consider the four data pairs (1,1), (4,2), (8,4) and (11,5). These points lie close to a straight line with equationy_b=a+bx, as illustrated in theleft plot below.

The intercept and slope of the line can be found by the method of least squares. Specifically, we start with a 4-by-2 system of linear equations, convert the system to a matrix equation Ax=b, and find the least squares solution(s) by solving the normal equationATAx=ATb.

a+b 1 = 1 a+b 4 = 2 a+b 8 = 4 a+b11 = 5 ⇒     1 1 1 4 1 8 1 11     a b =     1 2 4 5     ⇒ 4 24 24 202 a b = 12 96 ⇒ a b = 15/29 12/29

Thus, the least squares regression line is

b y= 15 29 + 12 29 x≈0.52 + 0.41x, as shown on the left above.

Let y_bi= 1529 + 12 29 xi and ei =yi−ybi fori= 1,2,3,4: b yi 0.931 2.172 3.828 5.069 ei 0.069 −0.172 0.172 −0.069 A plot of (ybi, ei) pairs is shown on theright above.

(38)

Example: Olympic winning times (Source: Hand et al, 1994). Consider the following 20 data pairs, where xi is the time in years since 1900 and yi is the Olympic winning time in seconds for men in the final round of the 100 meter event.

i 1 2 3 4 5 6 7 8 9 10 xi 0 4 8 12 20 24 28 32 36 48 yi 10.8 11.0 10.8 10.8 10.8 10.6 10.8 10.3 10.3 10.3 i 11 12 13 14 15 16 17 18 19 20 xi 52 56 60 64 68 72 76 80 84 88 yi 10.4 10.5 10.2 10.0 9.95 10.14 10.06 10.25 9.99 9.92

The data cover all Olympic events held between 1900 and 1988. (Note that Olympic games were not held in 1916, 1940, and 1944.)

The twenty data pairs lie approximately on a straight line with equation y_b= a+bx, whose intercept and slope can be estimated by the method of least squares.

a+b0 = 10.80 a+b4 = 11.00 · · · · a+b88 = 9.92 ⇒     1 0 1 4 · · · · 1 88     a b =     10.80 11.00 · · · 9.92     ⇒ 20 912 912 56928 a b = 207.91 9311.76

which implies that a b ≈ 10.898 −0.011

. Thus, the least squares regression line is

b

y = 10.898−0.011x,

as illustrated on the left above. Note that the origin of the plot is not (0,0). The results suggest that the winning times have decreased at the rate of about 0.011 seconds per year

(39)

Example: Brain-body study (Source: Allison & Cicchetti, 1976). As part of a study on sleep in mammals, researchers collected information on the average body weight (in kilograms) and average brain weight (in grams) for 43 different species. Let

xi = ln (Average Body Weighti) and yi = ln (Average Brain Weighti)

for i = 1,2, . . . ,43. The (xi, yi) pairs lie approximately on a line with equation yb= a+bx,

whose intercept and slope can be estimated by the method of least squares.

Starting with the normal equation ATAx=ATb:

43 30.8655 30.8655 416.725 a b = 116.26 392.109 ⇒ a b ≈ 2.142 0.782 .

Thus, the least squares regression line is by= 2.142 + 0.782x.

As with the earlier examples, theleft plot above is a plot of (xi, yi)-pairs superimposed on the least squares regression line and the right plot is a plot of (y_bi, ei) pairs, for i= 1,2, . . . ,43.

It is instructive to examine the estimated relation-ship between average brain and body weights on their original scales. The graph of this relationship is shown to the right.

The formula for this curve is (please complete)

.

Note that Man’s brain weight is much larger than expected given the modest body weight. The Asian Elephanthas an enormous body weight and a correspondingly large brain weight.

(40)

Example: Timber yield study (Source: Hand et al, 1994). As part of a study designed to estimate the volume of a tree (and therefore its yield) given its diameter and height, data were collected on the volume (in cubic feet), diameter at 54 inches above the ground (in inches), and height (in feet) of 31 black cherry trees in the Allegheny National Forest. Let

x1,i= ln(Diameteri), x2,i = ln(Heighti) and yi= ln(Volumei) for i= 1, . . . ,31. The (x1,i, x2,i, yi) triples lie approximately on a plane with equationyb=a+bx1+cx2, whose

coefficients can be estimated using the method of least squares.

  31 79.278 134.144 79.278 204.376 343.37 134.144 343.37 580.694     a b c  =   101.455 263.056 439.896   ⇒   a b c  ≈   −6.632 1.983 1.117  .

Thus, the least squares regression equation is by=−6.632 + 1.983x1+ 1.117x2.

The regression equation is plotted on theleftabove, along with the (xi,1, x2,i, yi) triples. Triples lying under the surface appear slightly lighter in color. Therightplot is a plot of (y_bi, ei) pairs, fori= 1,2, . . . ,31.

It is instructive to examine the estimated relation-ship among diameter, height and volume in their original scales. The graph of this relationship is shown to the right.

(41)

Example: Body fat study (Source: Johnson, 1996). As part of a study to determine if the percentage of body fat can be predicted accurately using only a scale and measuring tape, data were collected on 100 men. Let x1,i, x2,i, x3,i and x4,i be the abdomen, wrist, hip and neck circumferences (in centimeters) of the ith individual, and letyi be the man’s percent body fat measured using an accurate underwater technique. Here is some summary information:

Average Value Minimum Value Maximum Value

Abdomen (x1) 93.365 74.6 148.1

Wrist (x2) 18.246 16.1 21.4

Hip (x3) 100.163 85.3 147.7

Neck (x4) 38.174 31.1 51.2

Body Fat (y) 19.444 0.7 47.5

Consider fitting a linear function of the form _by=a+bx1+cx2+dx3+ex4 using the method of least squares to estimate the coefficients (athrough e).

2 6 6 6 6 4 100.0 9336.5 1824.6 10016.3 3817.4 9336.5 885594.7 171109.9 943738.8 359012.2 1824.6 171109.9 33388.4 183261.6 69843.1 10016.3 943738.8 183261.6 1009635.7 384080.0 3817.4 359012.2 69843.1 384080.0 146454.6 3 7 7 7 7 5 2 6 6 6 6 4 a b c d e 3 7 7 7 7 5 = 2 6 6 6 6 4 1944.4 190340.0 35815.9 199664.6 75630.6 3 7 7 7 7 5 ⇒ 2 6 6 6 6 4 a b c d e 3 7 7 7 7 5 ≈ 2 6 6 6 6 4 9.88 1.04 −1.89 −0.36 −0.45 3 7 7 7 7 5

Thus, the least squares regression formula is y_b= 9.88 + 1.04x1−1.89x2−0.36x3−0.45x4. The (y_bi, ei) pairs are shown on the left below. Individuals with

1. the largest ybi and smallestei, and measurements

(Abdomen, Wrist, Hip, Neck, Body Fat) = (148.1, 21.4, 147.7, 51.2, 35.2), and 2. the largest ei, and measurements

(Abdomen, Wrist, Hip, Neck, Body Fat) = (93.9, 17.3, 100.1, 39.1, 32.9), have been highlighted in the plot. Any comments?

(42)

4.3.3 Footnote: Eigenvalues, Eigenvectors and Least Squares Analysis

Let M = ATA be the matrix used in the normal equation for finding least squares solutions to inconsistent systems. M is a symmetric matrix, satisfying the following properties:

1. M is a diagonalizable matrix,

2. the eigenvalues of M are nonnegative real numbers, 3. M has an eigenvector basis that is an orthogonal set, and 4. M is invertible if and only if all eigenvalues are positive.

If M is invertible, then the least squares solution is unique. Further, as long as the smallest eigenvalue is not “too close to zero,” then the computer will have no trouble finding the unique solution accurately.

Body fat study example, continued. Consider again the body fat study from the last section. M =AT_A _{can be written as}_M ₌_{P DP}−1_{, where}

D≈ 2 6 6 6 6 4 2072726.621 0 0 0 0 0 2059.74 0 0 0 0 0 330.145 0 0 0 0 0 56.593 0 0 0 0 0 0.198 3 7 7 7 7 5 and P ≈       −0.007 −0.017 0.015 −0.031 0.999 −0.653 0.753 0.063 −0.052 0.005 −0.127 −0.201 0.332 −0.912 −0.037 −0.698 −0.564 −0.437 0.063 −0.006 −0.265 −0.273 0.834 0.400 −0.007       .

The eigenvalues ofM are written in decreasing order along the diagonal ofD; all eigenvalues are “comfortably” greater than zero , implying that the computer had no trouble finding accurate least squares estimates of the coefficients of the prediction formula.

Finally, to improve the accuracy of least squares estimates in situations where eigenvalues may be close to zero (and M may be “close to singular”), practitioners use a singular value decomposition of M before trying to find the estimates.

4 MT210 Notebook Eigenvalues and Eigenvectors Definitions; Graphical Illustrations... 3

Contents

4

MT210 Notebook 4

↓

↑