• No results found

Linear Algebra Methods for Data Mining

N/A
N/A
Protected

Academic year: 2021

Share "Linear Algebra Methods for Data Mining"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Linear Algebra Methods

for Data Mining

Saara Hyv¨onen, [email protected]

Spring 2007

(2)

QR decomposition

• Any matrix A ∈ Rm×n, m ≥ n, can be transformed to upper triangular

form by an orthogonal matrix:

A = Q

R 0

(3)

How? By using the Givens rotation:

Let x be a vector. The parameters c and s, c2 + s2 = 1, can be chosen

so that multiplication of x by G =      1 0 0 0 0 c 0 s 0 0 1 0 0 −s 0 c     

(4)

θ

x

(5)

”Skinny” QR decomposition

Partition Q = (Q1 Q2), where Q1Rm×n: A = (Q1 Q2) R1 0 = Q1R1.

(6)

A = 1 1 1

1 2 4

1 3 9

1 4 16

[Q,R]=qr(A) %the QR decomposition

Q = -0.5000 0.6708 0.5000 0.2236 -0.5000 0.2236 -0.5000 -0.6708 -0.5000 -0.2236 -0.5000 0.6708 -0.5000 -0.6708 0.5000 -0.2236 R = -2.0000 -5.0000 -15.0000 0 -2.2361 -11.1803 0 0 2.0000 0 0 0

(7)

[Q,R]=qr(A,0) %the skinny version of QR Q = -0.5000 0.6708 0.5000 -0.5000 0.2236 -0.5000 -0.5000 -0.2236 -0.5000 -0.5000 -0.6708 0.5000 R = -2.0000 -5.0000 -15.0000 0 -2.2361 -11.1803 0 0 2.0000

(8)

Uniqueness?

Suppose A ∈ Rm×n has full column rank (i.e. the column vectors are

linearly independent). The ”skinny” QR factorization

A = Q1R1

is unique where Q1 has orthonormal columns and R1 is upper triangular

(9)

What is QR good for?

• It will come up when we discuss the computation of some matrix

decompositions

• It can also be used for solving the least squares problem.

(10)

Least squares for linear regression

• Consider problems of the following form: given a number of

measure-ments X = [x1...xn] and an outcome y, build a linear model

ˆ y = b0 + n X j=1 xjbj,

which uses the measurements X to predict the outcome y.

• X = clinical measures of patients, y = level of cancer specific antigen.

• X = atmospheric measurements of each day, y = occurence of

(11)

Least squares for linear regression

• The model yˆ = b0 + Pnj=1 xjbj can be written in the form

ˆ

y = XTb, b = (b1 ... bn b0)T.

To do this, one must append X = [x1 ... xn xn+1], where xn+1 is a

vector of ones.

• Fitting a linear model to data is usually done using the method of least

squares.

(12)

Example

We have measurement data:

x 1 2 3 4 5

(13)

0 1 2 3 4 5 6 6 8 10 12 14 16 18 20 22

(14)

Example (continued)

x 1 2 3 4 5

y 7.97 10.2 14.2 16.0 21.2

We wish to find α and β such that αx + β = y. Thus

α + β = 7.97 2α + β = 10.2 3α + β = 14.2 4α + β = 16.0 5α + β = 21.2

(15)

Example (continued)

In matrix form:        1 1 2 1 3 1 4 1 5 1        α β =        7.97 10.2 14.2 16.0 21.2       

Overdetermined! (More equations than unknowns.) Solve using the least squares method.

(16)

Example

• X = atmospheric measurements of each day: temperature, UV-A:

X = t1 t2 ... tn r1 r2 ... rn = temperatures UV-A measurements

• y = which month each day belongs to. Choices are:

August (y = 0) or September (y = 1).

• Looking for b such that yˆ = XTb and yˆ ≈ y.

(17)

−1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5

August days (b) and September days (r)

UV

(18)

The least squares problem

• Let A ∈ Rm×n, m ≥ n. The system

Ax = b

is called overdetermined: more equations than unknown. Usually such

(19)

Example

(20)

What to do?

• make the residual vector

r = b − Ax

as small as possible. But how?

• Make r orthogonal to the columns of A:

rT a1 a2 ... an = rTA = 0.

• Now write r = b − Ax to get the normal equations

(21)

Example

A= 1 1 b= 7.97 2 1 10.2 3 1 14.2 4 1 16.0 5 1 21.2

C=A’*A %Normal equations

C= 55 15

15 5

x=C\(A’*b)

x= 3.2260

(22)

• If the column vectors of A are linearly independent, then the normal equations

ATAx = ATb

are non-singular and have a unique solution

• BUT: the normal equations have two significant drawbacks:

– forming ATA leads to loss of information.

– the condition number of ATA is the square of that of A:

(23)

Example

A = 1 1 2 1 3 1 4 1 5 1 cond(A) = 8.3657 cond(A’*A) = 69.9857

(24)

Worse example

A = 101 1 102 1 103 1 104 1 105 1 cond(A) = 7.5038e+03 cond(A’*A) = 5.6307e+07

(25)

Solving least squares problem using QR

krk2 = kb − Axk2 = kb − Q R 0 xk2 = kQ(QTb − R 0 x)k2 = kQTb − R 0 xk2

Partition Q = (Q1 Q2), where Q1Rm×n, and denote

QTb = b1 b := QT1 b QTb .

(26)

Then krk2 =k b1 b2 − R 0 x)k2 = kb1 − Rxk2 + kb2k2.

Minimize krk by making the first term equal to zero: i.e. solve

(27)

LS by QR

Theorem. Let A ∈ Rm×n have full column rank and a thin

QR-decomposition A = Q1R. Then the least squares problem

min

x kAx − bk2

has the unique solution

(28)

Example

A= 1 1 b= 7.97 2 1 10.2 3 1 14.2 4 1 16.0 5 1 21.2 [Q1,R]=qr(A,0) %skinny QR Q1 = -0.1348 -0.7628 -0.2697 -0.4767 -0.4045 -0.1907 -0.5394 0.0953 -0.6742 0.3814

(29)

R = -7.4162 -2.0226

0 -0.9535

x=R\(Q1’*b)

x = 3.2260

(30)

Back to this example:

• X = atmospheric measurements of each day: temperature, UV-A:

X = t1 t2 ... tn r1 r2 ... rn = temperatures UV-A measurements

• y = which month each day belongs to. Choices are:

August (y = 0) or September (y = 1).

• Looking for b such that yˆ = XTb and yˆ ≈ y.

(31)

−1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5

August days (b) and September days (r)

UV

(32)

Example

%X (transpose of) data matrix, 1st col: temperature, 2nd col: UV-A %y month vector, y(j)=0 if August and 1 if September

[length(find(y==0)) length(find(y==1))] = 116 99

%number of August and September days

size(X) = 215 2 %size of data matrix

Xap=[X ones(215,1)]; [Q1,R]=qr(Xap,0);

b=R\(Q1’*y) %this is the same as b=inv(R)*(Q1’*y)

b= -0.4355

(33)

yhat=Xap*b; %least squares solution

[min(yhat) max(yhat)] = -0.3506 1.2436

alpha=0.5; %day classified as September if x’y>alpha

length(find(1.0*(yhat>alpha)-y)) = 34 %misclassifications

x=-1:0.01:1;db=(alpha-b(3)-b(1)*x)/b(2);%decision boundary x’b=alpha %attn: in reality, choose alpha with more care!

scatter(Xnorm(:,1),Xnorm(:,2),20,y,’filled’) hold;plot(x,db,’k’)

(34)

−1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5

August days (b) and September days (r)

UV

(35)

Updating the solution of the LS problem

Assume we have reduced the matrix and the right hand side

(A b) → QT(A b) = R QT1 b 0 QT2 b .

From this the solution of the LS problem is readily available.

Assume we have not saved Q.

We then get a new observation (a b), a ∈ Rn, b ∈ R.

(36)

Updating the solution of the LS problem

No. Instead, write the reduction on the previous slide as

A b aT b → QT 0 0 1 A b aT b =    R QT1 b 0 QT2 b aT b   .

(37)

References

[1] Lars Eld´en: Matrix Methods in Data Mining and Pattern Recognition,

SIAM 2007.

[2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996.

[3] T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning. Data mining, Inference and Prediction, Springer Verlag, New York, 2001.

References

Related documents

Mimecast, the leading supplier of cloud-based email archiving, continuity and security for Microsoft Exchange and Office 365, summarizes its findings from a survey of more than

Various methods of Natural Language Processing (NLP) are involved in Text Analysis like Lexical Analysis, Pattern Recognition, Information Retrieval, Data Mining, Parsing,

6.7.2 The association between being bullied and perceived health status and school related factors: ...259 6.7.3 The association between being bullied and other risk behaviours:

Rather God told Hosea to go and marry an adulterous woman to us as an example of His love for the nation of Israel even though they had been unfaithful to Him just

Fatum, Rasmus and Barry Scholnick (2008): ―Monetary Policy News and Exchange Rate Responses: Do Only Surprises Matter?, Journal of Banking and Finance, 32, 1076-1086. Rogers;

The Australian Quarantine and Inspection Service (AQIS) proposes that the Australian health certificate ("El71 - Official Certificate with respect to Meat, bIeat Products and

This (simplified) way, thought leadership also appears below the line. In order to become a thought leader, INV needs to have a novel standpoint. This point- of-view can be

We study the computational complexity of the following decision problems related to the L -cyclomatic number of directed hypergraphs: computing the L -cyclomatic number of a