Exploration and Analysis of a Method for Estimating the Rank of a Matrix

(1)

ORLOWSKI, NICHOLASRICHARD. Explorationand Analysis of a Method

forEstimatingtheRank of aMatrix. (Underdirectionof Dr. Robert

Funder-lic.)

The singular value decomposition (SVD) provides important information

abouta matrixandits rank,includingits singularvaluesand singularvectors.

Because of noise ina matrix and the limitationsof binary representation, the

calculatedSVD of amatrix is necessarilyan estimate of thetrue SVDof that

matrix. Our method approximates the true singular values of a matrix by

gathering samples of the calculatedsingular valuesof the matrix. With these

approximations,wecanusehypothesistestingtomakequantitativestatements

(2)

ESTIMATING THE RANK OF A MATRIX

by

NICHOLAS ORLOWSKI

A thesis submitted to the Graduate Faculty of

North CarolinaState University

inpartial fulllmentof

the requirements for the Degree of

Master of Science

COMPUTER SCIENCE

Raleigh, North Carolina

2007

APPROVED BY:

Dr. DavidThuente Dr. Donald Bitzer

Dr. Robert Funderlic

(3)

I was born in Hickory, North Carolina on June 20, 1982. While growing up, I

had aninsatiableappetite forscience. Atanearlyage, I alsodeveloped a prociency

with computers, such as the Apple II and the original 1984 Macintosh. I spent my

most formative years attending high school at Phillips Academy, a boarding school

in Andover, Massachusetts. There,I learned the importanceof imaginativethinking

and an appreciation for all disciplines, be they science and math, orart and poetry.

With this new mindset, I came to North Carolina State University with an eager

mind, and was fortunate enough to discover the undergraduate research program.

This programfed mycurious mindtothe brimand has mademyundergraduate and

graduate careers complete. What ispresented inthis paper isa culminationof what

(4)

Iwouldliketoexpressmydeepappreciationforthosewhohavemademygraduate

and undergraduate careers a success. I would like to thank my family for providing

thiswonderfuleducationalopportunity. Iwouldliketothankmycommitteechairman

and adviser, Dr. Funderlic, for his tirelesseorts inensuring my success atNCSU. I

trulycherish the timeI spent withhim, and appreciate both hislessonsinnumerical

analysis and the life experiences he was willing toshare.

I would like to thank Dr. Chu, who taught me to expect more frommyself. Dr.

Chugraciouslyguidedmethroughthefactoranalysisandstatisticsneededtodevelop

this thesis and supplied the foundation for my understanding of the svd. It was a

pleasure working with and learningfrom him.

Dr. Bitzer deserves recognition for imparting on me his teaching and learning

styles. Hisability tosimplify problems totheir core helped me learn tosee solutions

which may not be immediately apparent. It has been an honor working with him.

I would also like to thank Dr. Thuente for serving on my committee, as well as

providing guidance for me in my graduate career. There are innumerable others:

friends, teachers and teammates who have had signicant inuences on me, and I

(5)

List of Figures . . . v

1 Introduction 1 2 Singular Value Decomposition 4 2.1 The Meaning of Zero . . . 5

2.2 Example SVD Application . . . 5

2.3 FilteringNoisewith the SVD . . . 9

2.4 Goalof the Method . . . 11

3 Statistical Rank Determination 12 3.1 Gathering Samples . . . 13

3.2 Hypothesis Testing . . . 14

3.3 Foreseen Contingencies . . . 16

4 Empirical Support 17 4.1 Doesthe Method Work Under IdealCircumstances? . . . 18

4.2 How DoesNoisein the Matrix Aectthe RankEstimate?. . . 19

4.3 How Doesthe Manual Perturbation Term Aect the Rank Estimate? 22 5 Computational Considerations 23 5.1 Complexity . . . 24

6 Conclusions and Future Work 26

(6)

1 A fullrank 250250 square imageA with values from0 to1 . . . . 6

2 The image of the rst six factors of A ( u i i v T i );i=1;:::;6 . . . 8

3 The rst 6 factors of A combined . . . 8

4 The combinationof the rst 15 factors A . . . 9

5 The combinationof the rst 25 factors of A . . . 9

6 A and itsapproximationA 0 . . . 10

7 Reconstructions of A and A 0 using their rst 35 factors . . . 11

8 The method isaccurate for amatrix without addednoise. . . 18

9 Themethodcorrectlydeterminestherankofamatrixwithlittlestatic noise.. . . 20

10 Large variance inthe static noise matrix, E, generates Type II errors, over-determining the matrix rank. . . 21

11 Large variance in the manual perturbation matrix, F, causes Type I errors, and anunder-estimate in rank. . . 22

(7)

The Singular Value Decomposition (SVD) of a matrix is invaluable in a wide range

ofdisciplines: imageanalysisand restoration[12,7],structural vibrationanalysis[?],

investigation of weather patterns [13], document retrieval (LSI) [10], clustering [8],

andeconometrics[4]. Thisisonlythebeginningofalonglistofpotentialapplications

of the SVD. Many of these applications use the number of non-zero singular values

of a matrix to determine its rank. However, the rankof amatrix is not always

well-dened, as there are several factors which may aect the calculation of the SVD in

anunwanted orunexpected fashion:

Unwanted samplingnoise inthe matrix

Limitationsof numericalprecision

{ Finite binary representation of the matrix

{ Unavoidable round-o errorin computer algorithms.

LetA,A 0

andE besquaremm matrices. LetArepresentatheoreticalmatrix,

whose elements are represented with innite precision. This matrix is unknown to

us, but we can approximate it on a binary computer by A 0

. Let E represent the

accumulation of sampling error in A and rounding error when representing A on a

binarycomputer. WewillassumethattheseroundingerrorsaredistributedNormally

with mean zero. We can represent our approximation of A by

A 0

(8)

sentation of a matrix A imprecise, we must assume that the SVD of A is imprecise

as well. Therefore, determining the rank derived from the singular values of that

matrixbecomesasubjectivetask. Therearemanymethodsand metricsthatprovide

approximate measures of rank, but whichever technique we use, knowledge of the

problemisrequired inordertointerpretthese measures correctly(are wending the

rank of an image matrix or of a correlation matrix?). Dierent applications require

dierentinterpretationswhen derivingrank fromsingular values.

Some of these applications use the relative distances between singular values of

a matrix as a guideline for rank [9, 16], while others perform fast rank-reduction

to approximate the dimension of the column space of the matrix [5, 6, 14]. Some

methods are assimple as specifying a singular value cut-o point todetermine rank

[15, 2] (Matlabuses machine zero, for instance).

The methodintroducedinthis paperuses statisticalanalysis andhypothesis

test-ing in an attempt to measure the relative volatility [17] of the calculated singular

values of a matrix. This method is basedon asimilar method for least squares

coef-cients by Hastie et al. [11]. Large singularvalues willhave lower relativevolatility,

while smallsingular values willhave higher relativevolatility. If the volatility of the

singular values is known, the rank of the matrix can be determined with respect to

a desired level of certainty. For instance, we will be able to say that we are 99%

condent that a particular singular value is zero. With this information, we can

subjectively decidethe number of non-zero singularvalues and thus, the rank of the

matrix.

(9)

the rank of A within a certain probability in the face of noise-producing

factors?

In section2we willexamine thesingularvaluedecompositionand illustratesome

ofitsuses. This sectionwillincludeadiscussion ofthe meaningof\zero"ona

nite-precisionbinarycomputer. Section3describesthemethod,howweapplyhypothesis

testingand theconsequences thatthis testinghas. Section4presentsevidenceof the

validity and accuracy of the method and illustrates its pitfalls regarding statistical

errors. Section5 shows how we can optimizethe method,circumventing the need to

calculate more than one full singular value decomposition. Finally, in Section 6 we

(10)

Consider the square matrices A 2 R m m

;U 2 R mm

; 2 R mm and V T 2 R mm .

The singular value decomposition of A isgiven by

A = UV T = m X i=1 i u i v T i :

ThecolumnsinthematrixU,u,aretheleftsingularvectors ofAwhilethecolumn

vectors in V, v, are the right singular vectors of A. Both U and V are orthogonal

matricesand createorthogonalbasesforA. Thematrixisadiagonalmatrixwhose

main diagonalconsists of the singular values

i

, indescending order, of A,

= 2 6 6 6 6 6 6 6 6 6 6 4 1

0 0 0 0

0

2

0 0 0

0 0 .

.

0 0

0 0 0

m 1 0

0 0 0 0

m 3 7 7 7 7 7 7 7 7 7 7 5 :

Let the vector consisting of the main diagonalof (vector of singular values) be

denoted by so asnot to confuse the typicalnotationfor sample variance, 2

.

The singular values are the loadings of the column spaces of U and V on the

matrix A. Thus, if one of the singular values iszero, the matrix is rank-decient, as

the column or rowspace of the A is not fully-spanned. The rank of amatrix can be

obtained by counting the numberof non-zero singularvalues. However, counting the

(11)

When counting non-zero components,a problemarises: What is meant by zero? On

a machine with limited numerical precision, the answer is not simple. It is easy to

determineifacalculatedsingularvalueiszerobyinspection,buthowsurearewethat

the actual singular value is 0? (The probability of a calculated value being exactly

0 is very small.) Recall from Eq. 1 that we can represent the distance between the

representation of amatrix A 0

and itstrue value as

A 0

=A+E:

By estimating E, we can obtain a goodidea of how close a \zero" singular value

in A 0

is to a truly zero singular value in A. Overton [17] provides an introduction

to these numerical pitfalls. The following experiments will provide support that a

combinationof linearalgebra and statisticalhypothesistesting can providea way to

quantify the distributionof E,and howsure we are that a given singularvalue of A

is zero. With this information,wecan alsodetermine a probable rank of the matrix

A.

2.2 Example SVD Application

The singular value decomposition of a matrix can bea hard concept tograsp, so we

willshowanexampleofitsuseinimageprocessingasaprimerforunderstanding the

SVD.

InFig. 1,wehaveablackandwhitepictureofafamousnumericalanalyst,Alston

Householder [1]. This image is a 250250 matrix, A, of intensities between black

(12)

50

100

150

200

50

100

150

200

Figure 1: A full rank250250 square image A with values from 0to 1

For our purposes, this isthe exact representation of A. Here, A 0

= A and E =0.

Let A = UV T

and let u

i

be the i th

column vector in U and v

i

be the i th

column

vector inV. In the interestof brevity, letus callthe matrix u

i i v T i the i th factor of A 0

. The SVD of this imagewillbe

[u

1 u

2

::: u

249 u 250 ] 2 6 6 6 6 6 6 6 6 6 6 4 1

0 0 0 0

0

2

0 0 0

0 0

.

0 0

0 0 0

249 0

0 0 0 0

250 3 7 7 7 7 7 7 7 7 7 7 5 [v T 1 v T 2

::: v T 249 v T 250 ]:

Becausewehaveassumedthisimagehasfullrank,weknowthateach

i

ispositive

(greater than zero). Notice that even thoughA 0

= A,the calculated decomposition

of A 0

will not generally be the exact decomposition of A 0

(13)

equal a true result is negligible. Potential rounding error is injected into a problem

anytimethereisacalculation. Inthispaper,weareconsideringonlythosenumerical

errorswhichoccurinthe representation ofAandthe calculationofthe svdofA 0

. We

will assume that the eects of further rounding errors are negligible. We will next

show exactly what it means to decompose a matrix. First, let us answer a question

about the meaningof a factor matrix, u

i i v T i

. Ifthe imageis the summation of the

matrices u 1 1 v T 1 + u 2 2 v T 2

++ u

249 249 v T 249 + u 250 250 v T 250 ;

whatwilloneof thesematriceslooklike? Because eachof thesematricesisonlya

rank1 matrix, none of these matricesalone can be expected tolooklikethe original

image, which is of signicantly higher rank. Fig. 2 shows the images of the rst 6

factors:

In Fig. 2, none of these images alone represents the original image to the point

where we can asses some obvious similarity. The reason for this is that each of

these matrices is of rank 1 and cannot be expected to accurately represent a rank

250 matrix. Notice, however, that the image of the rst factor favors the average

horizontalintensity of the originalimage, while the second factor (orthogonal to the

rst)favorsitsaverageverticalintensity. Ifarank1matrixisinsuÆcienttorepresent

the fullimage, couldwe get abetterrepresentation with arank6 matrix,created by

(14)

Factor1

50 100150200250

50

100

150

200

250 Factor2

50 100150200250

50

100

150

200

250 Factor3

50 100150200250

50

100

150

200

250 Factor4

50 100150200250

50

100

150

200

250 Factor5

50 100150200250

50

100

150

200

250 Factor6

50 100150200250

50

100

150

200

250

Figure 2: The image of the rst six factors of A (u

i i v T i

);i=1;:::;6

Original Image

50

100

150

200

250

50

100

150

200

250 First 6 Factors

50

100

150

200

250

50

100

150

200

250

Figure3: The rst 6factors of A combined

In Fig. 3, The combination of these factors looks \more like" the image of A.

Notice that we are still making subjective judgements about how much each factor

contributes to the \likeness" of A. Suppose we combine the rst 15 and the rst 25

(15)

Original Image

50

100

150

200

250

50

100

150

200

250 First 15 Factors

50

100

150

200

250

50

100

150

200

250

Figure 4: The combination of the rst

15 factors A

Original Image

50

100

150

200

250

50

100

150

200

250 First 25 Factors

50

100

150

200

250

50

100

150

200

250

Figure5: Thecombinationoftherst 25

factors of A

Wecan judge that these imagesquickly approachvisualsimilaritytothe original

image. The questionarises: Howmanyfactorsdoweneed toaddinordertoobtaina

\good"representation ofthe originalimage? Thisquestionsparks ourmotivationfor

ndingtherankofA. Ifwehaveahaveagoodestimateoftherankofamatrix,wecan

decideatwhichpointfactors stopcontributingmeaningfullytotheapproximationof

A. Ifwedecideafactorisun-important,wecandisregardit,savingstoragespaceand

calculationtime. Inmany applications,these\less-important"factorsare considered

noise, and in this way can be lteredfromthe matrix.

2.3 Filtering Noise with the SVD

If we consider the factors corresponding to the largest singular values to contain

themost informationaboutanimage,wecanmaketheopposite statementabout the

smallestfactors. Ifthereisnoiseinthematrix,thesmallersingularvalueswillcontain

less information about A 0

and more information about E, the error inherent in the

matrix. Keeping in mind that the singular values are ordered, if we can determine

at which point these factors begin to describe E, we can attempt to remove them

(16)

Let us compare the eects of reconstructing an image without noise along with

thatsame imagewithnoise. Fig. 6shows our originalimage, Aandthat sameimage

image with articialnoise added, A 0

.

Original Image

50

100

150

200

250

50

100

150

200

250 Image with Noise Added

50

100

150

200

250

50

100

150

200

250

Figure6: A and itsapproximation A 0

Both imagesin Fig. 6, A and A 0

, are of full rank and each contains allof its factors.

KeepinmindthatweareonlygivenA 0

andAisunknowntous. Suppose weestimate

thattherst35factorsaresuÆcienttoapproximateAandthattherestofthefactors

are descriptions of noise. We can construct animage of the rst 35factors of A and

compare it tothe rst 35factors of A 0

asinFig. 7.

The images in Fig. 7 are more similar to each other than those in Fig. 6; our

rank 35 approximation of A 0

is \less noisy" than our full rank approximation of A 0

.

(17)

First 35 Factors From Noiseless Image

50

100

150

200

250

50

100

150

200

250 First 35 Factors From Noisy Image

50

100

150

200

250

50

100

150

200

250

Figure7: Reconstructions of A and A 0

using their rst 35 factors

7 is noticeably dierent from A in Fig. 6. We must be careful not to sacrice too

much information about A in our attempts to eliminateinformation about E. This

issue further illustrates the importanceof having areliable estimateof the rankof a

matrix.

2.4 Goal of the Method

As stated above, the importanceof each factor is subjective to the viewer. We know

that this is an image, and can visually assess that a factor with a small singular

value does not contribute much to an image. However, it is impractical to view the

matrices associated with most problems as an image. These matrices are often too

large orvisually unrecognizable.

When comparing two matrices, we can easily dene an objective measure to

de-scribe their simiarity. However, we cannot dene anobjective mechanism for

deter-mining if two matrices are \alike enough" to be considered the same. As with the

(18)

deter-their rank.

Manymethodsusesingularvaluecutostodeterminewhichfactorsareimportant,

but these methods are stillsubject touser judgement (Why isthis cuto good? Do

weknowsomethingaboutthe matrix?). Wedonot claimthat the methodpresented

in this paper provides objective knowledge of matrix rank. Instead, it allows the

user tospecify a threshold of certainty that a particular singular value is zero using

statisticalanalysis of the singular values of a matrix.

The next section outlines our method and shows how we use the SVD to obtain

measurements of the rankof amatrix.

3 Statistical Rank Determination

Fortheexperimentsinthispaper,wewillassumethatwehavethecalculatedsingular

value decomposition of A 0

using a numerically stable method. More specically,

assume we have the singular values associated with the matrix A 0

in a vector,

and the left and right singularvalues inthe matrices U and V, respectively. Keepin

mindthat the SVD of A 0

has incurred additionalrounding errorfromitscalculation.

As stated above, we wish to determine the rank of A. In order to do this we

must nd the number of non-zero singular values. However, we know that there

is uncertainty in the likelihood that the calculated singular values equal the actual

singularvalues.

Because of these inherent inaccuracies, the best we can do is quantify our

cer-tainty that a given singular value is zero. In this way, we can count the number of

(19)

the rankof A. Wewilluse the student'st-test toobtain our estimate. Given A, the

approximation of A, The basic method is asfollows:

1. Calculate the SVD of A 0

.

2. Gather samples of the singular values of A 0

by projecting Gaussiannoise with

known varianceonto the vector spaces spanned by U and V T

3. Calculate the sample mean and sample variance from the perturbed singular

value samples.

4. Usethosestatisticstoacceptorrejectthehypothesisthatasingularvaluemight

be non-zero using the student's t-test.

3.1 Gathering Samples

In ordertouse any statisticalhypothesis testingmethod,a samplemust be gathered

from which to calculate the sample mean and variance. For this method, we are

gathering samples of ,the vector of singularvalues of A.

Recall that A is a matrix whereA2R m m

with rank r. Wehave the relation

A = UV T

= m

X

i=1

i u

i v

T

i ;

where the singularvalues of A are

=[

1 ;

2

;:::;

m 1 ;

m ]

T

:

Next, let us introduce a normally distributed random articial perturbation matrix,

F 2R m m

(20)

,

F N(0; 2

): (2)

ThisF matrixwillserveasaperturbationtool,whichwewilluse totestthe

orthogo-nalityof the columnspaceof A. By perturbingthe matrix A 0

n times,we cangather

a sampleof n perturbed matrices,

A [j] :=A 0 +F [j]

; j =1;:::;n: (3)

By minimizingeach

[j]

:=arg min

2R m jjA [j] m X i=1 i u i v T i

jj; j =1;:::;n; (4)

we obtainn vectors, each of which is asample of the singular values of A 0 , [1] ; [2] ;:::; [n] :

A discussion of amore eÆcientway toperform these samplescomes later. It suÆces

tosay that we donot compute the SVD for each sample. Each of these samples will

be distributed with variance 2

and mean ,such that

[j] N(; 2 ) ) [j] i N( i ; 2 ):

3.2 Hypothesis Testing

Now that we have samples of each of our singular values, [j]

i

;i = 1;:::;m; j =

1;:::;n, we can calculate the sample mean,

i

, and sample variance, s 2

i

(21)

value to be complete, we must choose a value with which to compare the sample

mean. Letus callthis value =0. The formulafor calculating the t-value givesus

t

i =

i

s

i =

p

n 1

; i = 1;:::;m:

With this t-value, we can make quantitative statements about our certainty that a

particular singularvalue is0.

Weareprojectingrandommatriceswithknownmeanandvarianceontothevector

spaces spanned by U and V T

. By perturbing the column and row spaces of A, we

are similarly perturbing the singular values of A. If A is perfectly orthogonal, the

singular values obtained from Equation 4 will change very little. If we encounter a

singular value that, when perturbed, varies by a large amount, it is probable that

that singular value is zero. This points out a deciency in rank. We wish to use

hypothesistesting[3] inordertodetermine with acertainprobabilitywhichsingular

values are zero, and thus, an estimateof the rankof A. Let the null and alternative

hypothesesbe

H

0 :

i =0

and

H

1 :

i 6=0:

Wewillchooseacondencelevelof99%,sothatwewillrejectH

0

correctly99%ofthe

timeandwewilltaken=30samples. Byreferringtothecorrespondingt-distribution

table value, we see that in order to reject H

0

(22)

jt

i

j>=2:75:

Ifthet-valuemeetsthis requirement,wecanreject H

0

labeling

i

asnon-zero. Ifthis

requirement is not met, we cannot reject H

0

and must allow for the possibility that

i

= 0. Examples in the next section demonstrate an application of this methodto

a real matrix.

3.3 Foreseen Contingencies

As with any statisticalhypothesis method, there willbeuncertainties. The classical

Type I and Type II errors are unavoidable. Under ideal circumstances, the matrix

A would be devoid of static noise, and its corresponding singular values would be

precisely represented in binary format. This would allow for simple inspection to

determine its rank. However, because we do not know the true singular values, we

can never know if this is the case. Wemust assume two sourcesof error:

ThenoisematrixE,whichincorporatesthestaticsamplingerrorwiththeerror

from numerical representation. The errors contained in E are inherent in the

representation of A and are unavoidable.

The variance in F, the articial perturbation matrix, may be inappropriately

specied.

In aTypeI error,H

0

wouldbeincorrectlyrejected,meaningthatasingularvalue

which is truly 0 would be determined not to be. These errors will over-determine

the estimate of the rank. In a Type II error, H

0

would be accepted erroneously,

(23)

is no cure for this phenomenon. We must be aware of the problem and temper our

conclusions thusly.

4 Empirical Support

Wewillnowevaluate the abovemethodfor itsaccuracyanddetermine whichfactors

aect the volatility of the method with respect to Type I and Type II errors. First,

wewillcreatepre-formedmatriceswithaknown rankandnoise. Thesetoyexamples

willserve toevaluate the accuracy and sensitivity of the method. Keepin mind, the

decompositions of these matriceswill not be exact due to inevitable round-o error

in theircalculation. Importantquestions we willaddress inthis sectionare:

1. Doesthe methodwork under ideal circumstances?

2. How doesthe varianceof E aect the estimateof rank?

3. How doesthe varianceof F aect the estimateof rank?

For allof the following experiments:

The acceptance threshold for the t-test willbe 99%, orjt

i

j>=2:75.

The singularvalues for each matrix willbesampled30 times.

symbols willrepresent singularvalues which weare 99% condent are zero.

symbolswill represent those singular values which we are 99% sure are not

(24)

For a proof of concept, we constructed a standard matrix from which to base our

conclusions. Letthematrix A=PQ,whereP 2R 5 017

andQ2R 1750

are random,

full-rank matrices with Gaussian distributions. Such a construction will guarantee

that Aisa square, 5050matrix whichhas rank17. Again, the SVDof this matrix

is limitedby machine precision.

Because wewish toprovethe concept underideal conditions,wewillassumethat

A 0

= A and will not add an error term at this time (E = 0). For this experiment,

F N(0;1). The sign of success willbe that the method rejects H

0

17 times. This

happens when the t-value is abovethe 99% threshold, or2.75.

0

10

20

30

40

50

10

−20

10

−15

10

−10

10

−5

10

0

10

5

Specified Rank :17 | Determined Rank :17 | Static Noise :0

Threshold

t−values

Singular Values

17th Singular Value

Figure 8: The methodis accurate for amatrix withoutadded noise.

As shown in Fig. 8, the method correctly identies the rank of the matrix under

an ideal situation, where the distance between A and A 0

(25)

examine the eects of introducingerror into A.

4.2 How Does Noise in the Matrix Aect the Rank

Esti-mate?

Using the same 5050 matrix, we will add a static error term to make the matrix

closer to full rank. The objective of this test is to see if the method can detect the

intended rank of 17 behind the noise. Let E N(0;:01) and F N(0;1): We will

add E toA inorder tosimulate a greaterdistance between A and A 0

,

A 0

=A+E

and create n sample matrices

A [j]

=A 0

+F [j]

;j =1;:::;n:

If we are successful in discovering the underlying rank of A 0

, the results should

look likethose inFig. 8;there willbe17rejected singularvalues (

i

6=0) which will

(26)

0

10

20

30

40

50

10

−3

10

−2

10

−1

10

0

10

1

10

2

Specified Rank :17 | Determined Rank :17 | Static Noise :0.01

Threshold

t−values

Singular Values

17th Singular Value

Figure 9: The method correctly determines the rank of a matrix with little static

noise.

InFig. 9,wehavesuccessfullydeterminedthe\true"rankofthematrix. However,

wemustacknowledgethatthevarianceofE issmall. Wewillnext considertheeects

ofaddingnoisewithlargervariance. Thenextgureshows theeectsofaddingnoise

(27)

0

10

20

30

40

50

10

−3

10

−2

10

−1

10

0

10

1

10

2

Specified Rank :17 | Determined Rank :22 | Static Noise :0.1

Threshold

t−values

Singular Values

17th Singular Value

Figure 10: Large variance in the static noise matrix, E, generates Type II errors,

over-determiningthe matrix rank.

The experiment in Fig. 10 shows that the rank is signicantly over-determined.

Eachsingularvaluewhichisincorrectlydeterminedtobenon-zeroconstitutesaType

II statisticalerror; H

0

wasrejected when itshouldnot have been. The method

over-determinestherankwhenthevarianceofE islargeduetothefactthatA 0

isnolonger

close to rank 17. It has incurred suÆcient noise to make it indistinguishable from a

higher-ranked matrix. This problemisnot uniquetothis method. Norank-detection

method can purport to be able torecover the originalrank from a suÆciently noisy

matrix.

In the previous example, we showed that Type II errors can occur due to noise

inthe data matrix. The following examplewillillustratethe sources and meaningof

(28)

Rank Estimate?

TypeIerrorsoccurwhenthevarianceoftheperturbationmatrix,F,istoohigh. The

t-value that this method calculates describes the relative variance of the perturbed

singular values to the actual ones. Under ideal circumstances, only those singular

values that are small relative to the variance of F will be perturbed a signicant

amount. However, if the variance of F is large, there is the potential to perturb

largersingularvalues enoughtoindicatethat theselarger singularvaluesare closeto

zero by increasingthe magnitude ofthe denominatorof the t-value. Inthis example,

letE N(0;:01) and F N(0;6).

0

10

20

30

40

50

10

−2

10

−1

10

0

10

1

Specified Rank :17 | Determined Rank :13 | Static Noise :0.1

Threshold

t−values

Singular Values

17th Singular Value

Figure11: Largevarianceinthemanualperturbationmatrix,F,causesTypeIerrors,

and an under-estimate inrank.

InFig. 11,Thevarianceofthe perturbationmatrix, F,is6,andthe determinedrank

found in A 0

(29)

0

These Type I errors illustrate the eect that the variance of F has on this method.

F should be chosen carefully, using knowledge or hypotheses about A, in order to

obtain meaningfulresults.

5 Computational Considerations

Calculating the SVD of the matrix A [j]

, where A [j]

= A 0

+ F [j]

, n times in order

to obtain a sample of is extremely expensive. However, we will show that the

singular value decomposition of A can be calculated once and used repeatedly to

obtain samplesof . Let 0

represent the vector of the singularvalues of A 0

and let

F N(0; 2

)beour manual perturbationmatrix as in Equation2.

In theory, our samples are gathered by perturbing A 0

n times, or

A [j] = A 0 +F [j]

; j =1;:::;n:

Recall that the singular value decomposition of A 0 is A 0 =UV T :

Wecan now describe the perturbed A 0 matrix as A [j] = UV T +F [j]

;j =1;:::;n:

Noticethat byaddingtheperturbationmatrix,F [j]

,weare generatingamatrixclose

to A 0

. We would like a way to gather samples of the singular values of A [j]

without

(30)

each sample, then nding the singular values of A becomes no longer a task of

decomposing a matrix, but of calculating

U T

A [j]

V =

[j]

= +U T

F [j]

V;

orin our notation:

[j]

= +diag(U T

F [j]

V):

If A 0

is the closest representation of A we have, then U, and V will also be

the closest representation of the SVD of A we have. Therefore, we will keep U

and V constant. If we retain U and V, gathering samples of singular values can

be done without calculating multiple decompositions. Notice, also, that each [j]

i

is independent of all other [j]

k

;i 6= k. This means that it is possible only to test

one or a few singular values without having to multiply the entire matrix. This

property allows us to perform further optimization for large matrices, such as using

abisection methodtond the rankofA. The next sectionillustratesthe specics of

the complexity involved with sampling singularvalues.

5.1 Complexity

The runtime ofthis methodincludesthe timetocalculatethe singularvalue

decom-position of A and the time it takes to compute a m m matrix product n times,

where n is the sample size or,

O((m 3

)+nO(m 3

))2O(m 3

(31)

of the SVD, yet provides a quantied estimate of the rank of a matrix (In order to

providea sound sample,n doesnot need tobe verylarge, and usually m >>n).

What follows is an experiment with a large, 250250 matrix. The method

cor-rectly recognizes the rankof the matrix

0

50

100

150

10

−3

10

−2

10

−1

10

0

10

1

10

2

10

3

Specified Rank :117 | Determined Rank :117 | Static Noise :0.01

Threshold

t−values

Singular Values

117th Singular Value

Figure12: The methodprovides encouragingresults for large matrices

It isevidentfromFig. 12that themethodstillholdspromisewithlarge matrices.

Likethe SVD, this methodof taking samplesof singularvaluesbecomesvery

expen-sive quickly with regard to m. However, the benet to this price is a more accurate

(32)

It is certain that determining the rank of a matrix, no matter what method is used,

isa hardproblem. Many methodswhich use singularvalues tocalculaterankhavea

at cuto, wherethe rankis determined by where the ordered listof singular values

goesunderacertaincuto. Withthe methoddescribed inthispaper, wehaveshown

howtoobtain aprobabilisticestimateof rank, based onthe volatility ofthe singular

values of the matrix.

The results thus far are exciting, for sure, and there are numerous possible

op-timizationsto be included in future research. The relationship between the manual

perturbation and the static noise in the matrix needs to be studied. Study of the

eects of numerical errors within in the perturbations, themselves, should be

stud-ied. Comparisons with large, real-world data sets would alsobe useful, and, because

the variance of each singular value is independent of the others once we have UV T

,

someformof bisectionmightbeusedtoreducethe numberof matrixmultiplications

needed to nd the rank of amatrix.

7 References

References

[1] http://www-history.mcs.st-andrews.ac.uk/pictdisplay/householder.html.

[2] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. J.

(33)

McKen-trialand Applied Mathematics, Philadelphia, PA, USA, 1999.

[3] J. K. Backhouse, Statistics: An introduction to Tests of Signicance,

Long-man's, Green and Co. ltd., 1967.

[4] J. Bai and S. Ng, Determining the number of factors in approximate factor

models,Econometrica,70 (2002),pp. 191{221.

[5] M. T.Chu and R. E. Funderlic,Thecentroid decomposition: Relationships

between discrete variational decompositions and svd's, SIAM Journal of Matrix

Analysis and Applications, 23(2002),pp. 1025{1044.

[6] A. Cline, C. Moler, G. Stewart, and J. H. Wilkinson, An estimate

for the condition number of a matrix, SIAM Journal onNumerical Analysis, 16

(1979), pp. 368{375.

[7] E. Drinea, P. Drineas, and P. Huggins, A randomized singular value

de-compositionalgorithm for image processing applications.

[8] P.Drineas, A.Frieze,R.Kannan, S.Vempla, andV.Vinay,Clustering

large graphs via the singular value decomposition, Machine Learning, 56(2004),

pp. 9{33.

[9] S.C.EinstatandI.C.F.Ipsen,Relativeperturbationtechniquesforsingular

value problems, SIAMJournalonNumericalAnalysis, 32(1995),pp.1972{1988.

[10] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A.

(34)

pp. 165{480.

[11] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical

Learning, Springer, 2001.

[12] J.KammandJ.G.Nagy,Kroneckerproductandsvdapproximationsinimage

restoration, Linear Algebra and itsApplications, 284 (1998).

[13] J. Kamm and J. G. Nagy, Conditional maximum covariance analysis and its

application to the tropical indian ocean sst and surface wind stress anomalies,

Journal of Climate,16(2003).

[14] F.KleibergenandR.Paap,Generalizedreducedranktestsusingthesingular

value decomposition,Journal of Econometrics,Forthcoming.

[15] V. C. Klema and A. J. Laub, The singular value decomposition: Its

compu-tation and some applications, IEEE Transactions on Automatic Control, ac-25

(1980), pp. 164{176.

[16] K. Konstantinides and K. Yao,Statisticalanalysisof eective singular

val-ues in matrix rank determination, IEEE tranactions on acoustics, speech and

signal processing,36 (1988),pp. 757{763.

[17] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic,