ORLOWSKI, NICHOLASRICHARD. Explorationand Analysis of a Method
forEstimatingtheRank of aMatrix. (Underdirectionof Dr. Robert
Funder-lic.)
The singular value decomposition (SVD) provides important information
abouta matrixandits rank,includingits singularvaluesand singularvectors.
Because of noise ina matrix and the limitationsof binary representation, the
calculatedSVD of amatrix is necessarilyan estimate of thetrue SVDof that
matrix. Our method approximates the true singular values of a matrix by
gathering samples of the calculatedsingular valuesof the matrix. With these
approximations,wecanusehypothesistestingtomakequantitativestatements
ESTIMATING THE RANK OF A MATRIX
by
NICHOLAS ORLOWSKI
A thesis submitted to the Graduate Faculty of
North CarolinaState University
inpartial fulllmentof
the requirements for the Degree of
Master of Science
COMPUTER SCIENCE
Raleigh, North Carolina
2007
APPROVED BY:
Dr. DavidThuente Dr. Donald Bitzer
Dr. Robert Funderlic
I was born in Hickory, North Carolina on June 20, 1982. While growing up, I
had aninsatiableappetite forscience. Atanearlyage, I alsodeveloped a prociency
with computers, such as the Apple II and the original 1984 Macintosh. I spent my
most formative years attending high school at Phillips Academy, a boarding school
in Andover, Massachusetts. There,I learned the importanceof imaginativethinking
and an appreciation for all disciplines, be they science and math, orart and poetry.
With this new mindset, I came to North Carolina State University with an eager
mind, and was fortunate enough to discover the undergraduate research program.
This programfed mycurious mindtothe brimand has mademyundergraduate and
graduate careers complete. What ispresented inthis paper isa culminationof what
Iwouldliketoexpressmydeepappreciationforthosewhohavemademygraduate
and undergraduate careers a success. I would like to thank my family for providing
thiswonderfuleducationalopportunity. Iwouldliketothankmycommitteechairman
and adviser, Dr. Funderlic, for his tirelesseorts inensuring my success atNCSU. I
trulycherish the timeI spent withhim, and appreciate both hislessonsinnumerical
analysis and the life experiences he was willing toshare.
I would like to thank Dr. Chu, who taught me to expect more frommyself. Dr.
Chugraciouslyguidedmethroughthefactoranalysisandstatisticsneededtodevelop
this thesis and supplied the foundation for my understanding of the svd. It was a
pleasure working with and learningfrom him.
Dr. Bitzer deserves recognition for imparting on me his teaching and learning
styles. Hisability tosimplify problems totheir core helped me learn tosee solutions
which may not be immediately apparent. It has been an honor working with him.
I would also like to thank Dr. Thuente for serving on my committee, as well as
providing guidance for me in my graduate career. There are innumerable others:
friends, teachers and teammates who have had signicant inuences on me, and I
List of Figures . . . v
1 Introduction 1 2 Singular Value Decomposition 4 2.1 The Meaning of Zero . . . 5
2.2 Example SVD Application . . . 5
2.3 FilteringNoisewith the SVD . . . 9
2.4 Goalof the Method . . . 11
3 Statistical Rank Determination 12 3.1 Gathering Samples . . . 13
3.2 Hypothesis Testing . . . 14
3.3 Foreseen Contingencies . . . 16
4 Empirical Support 17 4.1 Doesthe Method Work Under IdealCircumstances? . . . 18
4.2 How DoesNoisein the Matrix Aectthe RankEstimate?. . . 19
4.3 How Doesthe Manual Perturbation Term Aect the Rank Estimate? 22 5 Computational Considerations 23 5.1 Complexity . . . 24
6 Conclusions and Future Work 26
1 A fullrank 250250 square imageA with values from0 to1 . . . . 6
2 The image of the rst six factors of A ( u i i v T i );i=1;:::;6 . . . 8
3 The rst 6 factors of A combined . . . 8
4 The combinationof the rst 15 factors A . . . 9
5 The combinationof the rst 25 factors of A . . . 9
6 A and itsapproximationA 0 . . . 10
7 Reconstructions of A and A 0 using their rst 35 factors . . . 11
8 The method isaccurate for amatrix without addednoise. . . 18
9 Themethodcorrectlydeterminestherankofamatrixwithlittlestatic noise.. . . 20
10 Large variance inthe static noise matrix, E, generates Type II errors, over-determining the matrix rank. . . 21
11 Large variance in the manual perturbation matrix, F, causes Type I errors, and anunder-estimate in rank. . . 22
The Singular Value Decomposition (SVD) of a matrix is invaluable in a wide range
ofdisciplines: imageanalysisand restoration[12,7],structural vibrationanalysis[?],
investigation of weather patterns [13], document retrieval (LSI) [10], clustering [8],
andeconometrics[4]. Thisisonlythebeginningofalonglistofpotentialapplications
of the SVD. Many of these applications use the number of non-zero singular values
of a matrix to determine its rank. However, the rankof amatrix is not always
well-dened, as there are several factors which may aect the calculation of the SVD in
anunwanted orunexpected fashion:
Unwanted samplingnoise inthe matrix
Limitationsof numericalprecision
{ Finite binary representation of the matrix
{ Unavoidable round-o errorin computer algorithms.
LetA,A 0
andE besquaremm matrices. LetArepresentatheoreticalmatrix,
whose elements are represented with innite precision. This matrix is unknown to
us, but we can approximate it on a binary computer by A 0
. Let E represent the
accumulation of sampling error in A and rounding error when representing A on a
binarycomputer. WewillassumethattheseroundingerrorsaredistributedNormally
with mean zero. We can represent our approximation of A by
A 0
sentation of a matrix A imprecise, we must assume that the SVD of A is imprecise
as well. Therefore, determining the rank derived from the singular values of that
matrixbecomesasubjectivetask. Therearemanymethodsand metricsthatprovide
approximate measures of rank, but whichever technique we use, knowledge of the
problemisrequired inordertointerpretthese measures correctly(are wending the
rank of an image matrix or of a correlation matrix?). Dierent applications require
dierentinterpretationswhen derivingrank fromsingular values.
Some of these applications use the relative distances between singular values of
a matrix as a guideline for rank [9, 16], while others perform fast rank-reduction
to approximate the dimension of the column space of the matrix [5, 6, 14]. Some
methods are assimple as specifying a singular value cut-o point todetermine rank
[15, 2] (Matlabuses machine zero, for instance).
The methodintroducedinthis paperuses statisticalanalysis andhypothesis
test-ing in an attempt to measure the relative volatility [17] of the calculated singular
values of a matrix. This method is basedon asimilar method for least squares
coef-cients by Hastie et al. [11]. Large singularvalues willhave lower relativevolatility,
while smallsingular values willhave higher relativevolatility. If the volatility of the
singular values is known, the rank of the matrix can be determined with respect to
a desired level of certainty. For instance, we will be able to say that we are 99%
condent that a particular singular value is zero. With this information, we can
subjectively decidethe number of non-zero singularvalues and thus, the rank of the
matrix.
the rank of A within a certain probability in the face of noise-producing
factors?
In section2we willexamine thesingularvaluedecompositionand illustratesome
ofitsuses. This sectionwillincludeadiscussion ofthe meaningof\zero"ona
nite-precisionbinarycomputer. Section3describesthemethod,howweapplyhypothesis
testingand theconsequences thatthis testinghas. Section4presentsevidenceof the
validity and accuracy of the method and illustrates its pitfalls regarding statistical
errors. Section5 shows how we can optimizethe method,circumventing the need to
calculate more than one full singular value decomposition. Finally, in Section 6 we
Consider the square matrices A 2 R m m
;U 2 R mm
; 2 R mm and V T 2 R mm .
The singular value decomposition of A isgiven by
A = UV T = m X i=1 i u i v T i :
ThecolumnsinthematrixU,u,aretheleftsingularvectors ofAwhilethecolumn
vectors in V, v, are the right singular vectors of A. Both U and V are orthogonal
matricesand createorthogonalbasesforA. Thematrixisadiagonalmatrixwhose
main diagonalconsists of the singular values
i
, indescending order, of A,
= 2 6 6 6 6 6 6 6 6 6 6 4 1
0 0 0 0
0
2
0 0 0
0 0 .
.
.
0 0
0 0 0
m 1 0
0 0 0 0
m 3 7 7 7 7 7 7 7 7 7 7 5 :
Let the vector consisting of the main diagonalof (vector of singular values) be
denoted by so asnot to confuse the typicalnotationfor sample variance, 2
.
The singular values are the loadings of the column spaces of U and V on the
matrix A. Thus, if one of the singular values iszero, the matrix is rank-decient, as
the column or rowspace of the A is not fully-spanned. The rank of amatrix can be
obtained by counting the numberof non-zero singularvalues. However, counting the
When counting non-zero components,a problemarises: What is meant by zero? On
a machine with limited numerical precision, the answer is not simple. It is easy to
determineifacalculatedsingularvalueiszerobyinspection,buthowsurearewethat
the actual singular value is 0? (The probability of a calculated value being exactly
0 is very small.) Recall from Eq. 1 that we can represent the distance between the
representation of amatrix A 0
and itstrue value as
A 0
=A+E:
By estimating E, we can obtain a goodidea of how close a \zero" singular value
in A 0
is to a truly zero singular value in A. Overton [17] provides an introduction
to these numerical pitfalls. The following experiments will provide support that a
combinationof linearalgebra and statisticalhypothesistesting can providea way to
quantify the distributionof E,and howsure we are that a given singularvalue of A
is zero. With this information,wecan alsodetermine a probable rank of the matrix
A.
2.2 Example SVD Application
The singular value decomposition of a matrix can bea hard concept tograsp, so we
willshowanexampleofitsuseinimageprocessingasaprimerforunderstanding the
SVD.
InFig. 1,wehaveablackandwhitepictureofafamousnumericalanalyst,Alston
Householder [1]. This image is a 250250 matrix, A, of intensities between black
50
100
150
200
50
100
150
200
Figure 1: A full rank250250 square image A with values from 0to 1
For our purposes, this isthe exact representation of A. Here, A 0
= A and E =0.
Let A = UV T
and let u
i
be the i th
column vector in U and v
i
be the i th
column
vector inV. In the interestof brevity, letus callthe matrix u
i i v T i the i th factor of A 0
. The SVD of this imagewillbe
[u
1 u
2
::: u
249 u 250 ] 2 6 6 6 6 6 6 6 6 6 6 4 1
0 0 0 0
0
2
0 0 0
0 0
.
.
.
0 0
0 0 0
249 0
0 0 0 0
250 3 7 7 7 7 7 7 7 7 7 7 5 [v T 1 v T 2
::: v T 249 v T 250 ]:
Becausewehaveassumedthisimagehasfullrank,weknowthateach
i
ispositive
(greater than zero). Notice that even thoughA 0
= A,the calculated decomposition
of A 0
will not generally be the exact decomposition of A 0
equal a true result is negligible. Potential rounding error is injected into a problem
anytimethereisacalculation. Inthispaper,weareconsideringonlythosenumerical
errorswhichoccurinthe representation ofAandthe calculationofthe svdofA 0
. We
will assume that the eects of further rounding errors are negligible. We will next
show exactly what it means to decompose a matrix. First, let us answer a question
about the meaningof a factor matrix, u
i i v T i
. Ifthe imageis the summation of the
matrices u 1 1 v T 1 + u 2 2 v T 2
++ u
249 249 v T 249 + u 250 250 v T 250 ;
whatwilloneof thesematriceslooklike? Because eachof thesematricesisonlya
rank1 matrix, none of these matricesalone can be expected tolooklikethe original
image, which is of signicantly higher rank. Fig. 2 shows the images of the rst 6
factors:
In Fig. 2, none of these images alone represents the original image to the point
where we can asses some obvious similarity. The reason for this is that each of
these matrices is of rank 1 and cannot be expected to accurately represent a rank
250 matrix. Notice, however, that the image of the rst factor favors the average
horizontalintensity of the originalimage, while the second factor (orthogonal to the
rst)favorsitsaverageverticalintensity. Ifarank1matrixisinsuÆcienttorepresent
the fullimage, couldwe get abetterrepresentation with arank6 matrix,created by
Factor1
50 100150200250
50
100
150
200
250
Factor2
50 100150200250
50
100
150
200
250
Factor3
50 100150200250
50
100
150
200
250
Factor4
50 100150200250
50
100
150
200
250
Factor5
50 100150200250
50
100
150
200
250
Factor6
50 100150200250
50
100
150
200
250
Figure 2: The image of the rst six factors of A (u
i i v T i
);i=1;:::;6
Original Image
50
100
150
200
250
50
100
150
200
250
First 6 Factors
50
100
150
200
250
50
100
150
200
250
Figure3: The rst 6factors of A combined
In Fig. 3, The combination of these factors looks \more like" the image of A.
Notice that we are still making subjective judgements about how much each factor
contributes to the \likeness" of A. Suppose we combine the rst 15 and the rst 25
Original Image
50
100
150
200
250
50
100
150
200
250
First 15 Factors
50
100
150
200
250
50
100
150
200
250
Figure 4: The combination of the rst
15 factors A
Original Image
50
100
150
200
250
50
100
150
200
250
First 25 Factors
50
100
150
200
250
50
100
150
200
250
Figure5: Thecombinationoftherst 25
factors of A
Wecan judge that these imagesquickly approachvisualsimilaritytothe original
image. The questionarises: Howmanyfactorsdoweneed toaddinordertoobtaina
\good"representation ofthe originalimage? Thisquestionsparks ourmotivationfor
ndingtherankofA. Ifwehaveahaveagoodestimateoftherankofamatrix,wecan
decideatwhichpointfactors stopcontributingmeaningfullytotheapproximationof
A. Ifwedecideafactorisun-important,wecandisregardit,savingstoragespaceand
calculationtime. Inmany applications,these\less-important"factorsare considered
noise, and in this way can be lteredfromthe matrix.
2.3 Filtering Noise with the SVD
If we consider the factors corresponding to the largest singular values to contain
themost informationaboutanimage,wecanmaketheopposite statementabout the
smallestfactors. Ifthereisnoiseinthematrix,thesmallersingularvalueswillcontain
less information about A 0
and more information about E, the error inherent in the
matrix. Keeping in mind that the singular values are ordered, if we can determine
at which point these factors begin to describe E, we can attempt to remove them
Let us compare the eects of reconstructing an image without noise along with
thatsame imagewithnoise. Fig. 6shows our originalimage, Aandthat sameimage
image with articialnoise added, A 0
.
Original Image
50
100
150
200
250
50
100
150
200
250
Image with Noise Added
50
100
150
200
250
50
100
150
200
250
Figure6: A and itsapproximation A 0
Both imagesin Fig. 6, A and A 0
, are of full rank and each contains allof its factors.
KeepinmindthatweareonlygivenA 0
andAisunknowntous. Suppose weestimate
thattherst35factorsaresuÆcienttoapproximateAandthattherestofthefactors
are descriptions of noise. We can construct animage of the rst 35factors of A and
compare it tothe rst 35factors of A 0
asinFig. 7.
The images in Fig. 7 are more similar to each other than those in Fig. 6; our
rank 35 approximation of A 0
is \less noisy" than our full rank approximation of A 0
.
First 35 Factors From Noiseless Image
50
100
150
200
250
50
100
150
200
250
First 35 Factors From Noisy Image
50
100
150
200
250
50
100
150
200
250
Figure7: Reconstructions of A and A 0
using their rst 35 factors
7 is noticeably dierent from A in Fig. 6. We must be careful not to sacrice too
much information about A in our attempts to eliminateinformation about E. This
issue further illustrates the importanceof having areliable estimateof the rankof a
matrix.
2.4 Goal of the Method
As stated above, the importanceof each factor is subjective to the viewer. We know
that this is an image, and can visually assess that a factor with a small singular
value does not contribute much to an image. However, it is impractical to view the
matrices associated with most problems as an image. These matrices are often too
large orvisually unrecognizable.
When comparing two matrices, we can easily dene an objective measure to
de-scribe their simiarity. However, we cannot dene anobjective mechanism for
deter-mining if two matrices are \alike enough" to be considered the same. As with the
deter-their rank.
Manymethodsusesingularvaluecutostodeterminewhichfactorsareimportant,
but these methods are stillsubject touser judgement (Why isthis cuto good? Do
weknowsomethingaboutthe matrix?). Wedonot claimthat the methodpresented
in this paper provides objective knowledge of matrix rank. Instead, it allows the
user tospecify a threshold of certainty that a particular singular value is zero using
statisticalanalysis of the singular values of a matrix.
The next section outlines our method and shows how we use the SVD to obtain
measurements of the rankof amatrix.
3 Statistical Rank Determination
Fortheexperimentsinthispaper,wewillassumethatwehavethecalculatedsingular
value decomposition of A 0
using a numerically stable method. More specically,
assume we have the singular values associated with the matrix A 0
in a vector,
and the left and right singularvalues inthe matrices U and V, respectively. Keepin
mindthat the SVD of A 0
has incurred additionalrounding errorfromitscalculation.
As stated above, we wish to determine the rank of A. In order to do this we
must nd the number of non-zero singular values. However, we know that there
is uncertainty in the likelihood that the calculated singular values equal the actual
singularvalues.
Because of these inherent inaccuracies, the best we can do is quantify our
cer-tainty that a given singular value is zero. In this way, we can count the number of
the rankof A. Wewilluse the student'st-test toobtain our estimate. Given A, the
approximation of A, The basic method is asfollows:
1. Calculate the SVD of A 0
.
2. Gather samples of the singular values of A 0
by projecting Gaussiannoise with
known varianceonto the vector spaces spanned by U and V T
3. Calculate the sample mean and sample variance from the perturbed singular
value samples.
4. Usethosestatisticstoacceptorrejectthehypothesisthatasingularvaluemight
be non-zero using the student's t-test.
3.1 Gathering Samples
In ordertouse any statisticalhypothesis testingmethod,a samplemust be gathered
from which to calculate the sample mean and variance. For this method, we are
gathering samples of ,the vector of singularvalues of A.
Recall that A is a matrix whereA2R m m
with rank r. Wehave the relation
A = UV T
= m
X
i=1
i u
i v
T
i ;
where the singularvalues of A are
=[
1 ;
2
;:::;
m 1 ;
m ]
T
:
Next, let us introduce a normally distributed random articial perturbation matrix,
F 2R m m
,
F N(0; 2
): (2)
ThisF matrixwillserveasaperturbationtool,whichwewilluse totestthe
orthogo-nalityof the columnspaceof A. By perturbingthe matrix A 0
n times,we cangather
a sampleof n perturbed matrices,
A [j] :=A 0 +F [j]
; j =1;:::;n: (3)
By minimizingeach
[j]
:=arg min
2R m jjA [j] m X i=1 i u i v T i
jj; j =1;:::;n; (4)
we obtainn vectors, each of which is asample of the singular values of A 0 , [1] ; [2] ;:::; [n] :
A discussion of amore eÆcientway toperform these samplescomes later. It suÆces
tosay that we donot compute the SVD for each sample. Each of these samples will
be distributed with variance 2
and mean ,such that
[j] N(; 2 ) ) [j] i N( i ; 2 ):
3.2 Hypothesis Testing
Now that we have samples of each of our singular values, [j]
i
;i = 1;:::;m; j =
1;:::;n, we can calculate the sample mean,
i
, and sample variance, s 2
i
value to be complete, we must choose a value with which to compare the sample
mean. Letus callthis value =0. The formulafor calculating the t-value givesus
t
i =
i
s
i =
p
n 1
; i = 1;:::;m:
With this t-value, we can make quantitative statements about our certainty that a
particular singularvalue is0.
Weareprojectingrandommatriceswithknownmeanandvarianceontothevector
spaces spanned by U and V T
. By perturbing the column and row spaces of A, we
are similarly perturbing the singular values of A. If A is perfectly orthogonal, the
singular values obtained from Equation 4 will change very little. If we encounter a
singular value that, when perturbed, varies by a large amount, it is probable that
that singular value is zero. This points out a deciency in rank. We wish to use
hypothesistesting[3] inordertodetermine with acertainprobabilitywhichsingular
values are zero, and thus, an estimateof the rankof A. Let the null and alternative
hypothesesbe
H
0 :
i =0
and
H
1 :
i 6=0:
Wewillchooseacondencelevelof99%,sothatwewillrejectH
0
correctly99%ofthe
timeandwewilltaken=30samples. Byreferringtothecorrespondingt-distribution
table value, we see that in order to reject H
0
jt
i
j>=2:75:
Ifthet-valuemeetsthis requirement,wecanreject H
0
labeling
i
asnon-zero. Ifthis
requirement is not met, we cannot reject H
0
and must allow for the possibility that
i
= 0. Examples in the next section demonstrate an application of this methodto
a real matrix.
3.3 Foreseen Contingencies
As with any statisticalhypothesis method, there willbeuncertainties. The classical
Type I and Type II errors are unavoidable. Under ideal circumstances, the matrix
A would be devoid of static noise, and its corresponding singular values would be
precisely represented in binary format. This would allow for simple inspection to
determine its rank. However, because we do not know the true singular values, we
can never know if this is the case. Wemust assume two sourcesof error:
ThenoisematrixE,whichincorporatesthestaticsamplingerrorwiththeerror
from numerical representation. The errors contained in E are inherent in the
representation of A and are unavoidable.
The variance in F, the articial perturbation matrix, may be inappropriately
specied.
In aTypeI error,H
0
wouldbeincorrectlyrejected,meaningthatasingularvalue
which is truly 0 would be determined not to be. These errors will over-determine
the estimate of the rank. In a Type II error, H
0
would be accepted erroneously,
is no cure for this phenomenon. We must be aware of the problem and temper our
conclusions thusly.
4 Empirical Support
Wewillnowevaluate the abovemethodfor itsaccuracyanddetermine whichfactors
aect the volatility of the method with respect to Type I and Type II errors. First,
wewillcreatepre-formedmatriceswithaknown rankandnoise. Thesetoyexamples
willserve toevaluate the accuracy and sensitivity of the method. Keepin mind, the
decompositions of these matriceswill not be exact due to inevitable round-o error
in theircalculation. Importantquestions we willaddress inthis sectionare:
1. Doesthe methodwork under ideal circumstances?
2. How doesthe varianceof E aect the estimateof rank?
3. How doesthe varianceof F aect the estimateof rank?
For allof the following experiments:
The acceptance threshold for the t-test willbe 99%, orjt
i
j>=2:75.
The singularvalues for each matrix willbesampled30 times.
symbols willrepresent singularvalues which weare 99% condent are zero.
symbolswill represent those singular values which we are 99% sure are not
For a proof of concept, we constructed a standard matrix from which to base our
conclusions. Letthematrix A=PQ,whereP 2R 5 017
andQ2R 1750
are random,
full-rank matrices with Gaussian distributions. Such a construction will guarantee
that Aisa square, 5050matrix whichhas rank17. Again, the SVDof this matrix
is limitedby machine precision.
Because wewish toprovethe concept underideal conditions,wewillassumethat
A 0
= A and will not add an error term at this time (E = 0). For this experiment,
F N(0;1). The sign of success willbe that the method rejects H
0
17 times. This
happens when the t-value is abovethe 99% threshold, or2.75.
0
10
20
30
40
50
10
−20
10
−15
10
−10
10
−5
10
0
10
5
Specified Rank :17 | Determined Rank :17 | Static Noise :0
Threshold
t−values
Singular Values
17th Singular Value
Figure 8: The methodis accurate for amatrix withoutadded noise.
As shown in Fig. 8, the method correctly identies the rank of the matrix under
an ideal situation, where the distance between A and A 0
examine the eects of introducingerror into A.
4.2 How Does Noise in the Matrix Aect the Rank
Esti-mate?
Using the same 5050 matrix, we will add a static error term to make the matrix
closer to full rank. The objective of this test is to see if the method can detect the
intended rank of 17 behind the noise. Let E N(0;:01) and F N(0;1): We will
add E toA inorder tosimulate a greaterdistance between A and A 0
,
A 0
=A+E
and create n sample matrices
A [j]
=A 0
+F [j]
;j =1;:::;n:
If we are successful in discovering the underlying rank of A 0
, the results should
look likethose inFig. 8;there willbe17rejected singularvalues (
i
6=0) which will
0
10
20
30
40
50
10
−3
10
−2
10
−1
10
0
10
1
10
2
Specified Rank :17 | Determined Rank :17 | Static Noise :0.01
Threshold
t−values
Singular Values
17th Singular Value
Figure 9: The method correctly determines the rank of a matrix with little static
noise.
InFig. 9,wehavesuccessfullydeterminedthe\true"rankofthematrix. However,
wemustacknowledgethatthevarianceofE issmall. Wewillnext considertheeects
ofaddingnoisewithlargervariance. Thenextgureshows theeectsofaddingnoise
0
10
20
30
40
50
10
−3
10
−2
10
−1
10
0
10
1
10
2
Specified Rank :17 | Determined Rank :22 | Static Noise :0.1
Threshold
t−values
Singular Values
17th Singular Value
Figure 10: Large variance in the static noise matrix, E, generates Type II errors,
over-determiningthe matrix rank.
The experiment in Fig. 10 shows that the rank is signicantly over-determined.
Eachsingularvaluewhichisincorrectlydeterminedtobenon-zeroconstitutesaType
II statisticalerror; H
0
wasrejected when itshouldnot have been. The method
over-determinestherankwhenthevarianceofE islargeduetothefactthatA 0
isnolonger
close to rank 17. It has incurred suÆcient noise to make it indistinguishable from a
higher-ranked matrix. This problemisnot uniquetothis method. Norank-detection
method can purport to be able torecover the originalrank from a suÆciently noisy
matrix.
In the previous example, we showed that Type II errors can occur due to noise
inthe data matrix. The following examplewillillustratethe sources and meaningof
Rank Estimate?
TypeIerrorsoccurwhenthevarianceoftheperturbationmatrix,F,istoohigh. The
t-value that this method calculates describes the relative variance of the perturbed
singular values to the actual ones. Under ideal circumstances, only those singular
values that are small relative to the variance of F will be perturbed a signicant
amount. However, if the variance of F is large, there is the potential to perturb
largersingularvalues enoughtoindicatethat theselarger singularvaluesare closeto
zero by increasingthe magnitude ofthe denominatorof the t-value. Inthis example,
letE N(0;:01) and F N(0;6).
0
10
20
30
40
50
10
−2
10
−1
10
0
10
1
Specified Rank :17 | Determined Rank :13 | Static Noise :0.1
Threshold
t−values
Singular Values
17th Singular Value
Figure11: Largevarianceinthemanualperturbationmatrix,F,causesTypeIerrors,
and an under-estimate inrank.
InFig. 11,Thevarianceofthe perturbationmatrix, F,is6,andthe determinedrank
found in A 0
0
These Type I errors illustrate the eect that the variance of F has on this method.
F should be chosen carefully, using knowledge or hypotheses about A, in order to
obtain meaningfulresults.
5 Computational Considerations
Calculating the SVD of the matrix A [j]
, where A [j]
= A 0
+ F [j]
, n times in order
to obtain a sample of is extremely expensive. However, we will show that the
singular value decomposition of A can be calculated once and used repeatedly to
obtain samplesof . Let 0
represent the vector of the singularvalues of A 0
and let
F N(0; 2
)beour manual perturbationmatrix as in Equation2.
In theory, our samples are gathered by perturbing A 0
n times, or
A [j] = A 0 +F [j]
; j =1;:::;n:
Recall that the singular value decomposition of A 0 is A 0 =UV T :
Wecan now describe the perturbed A 0 matrix as A [j] = UV T +F [j]
;j =1;:::;n:
Noticethat byaddingtheperturbationmatrix,F [j]
,weare generatingamatrixclose
to A 0
. We would like a way to gather samples of the singular values of A [j]
without
each sample, then nding the singular values of A becomes no longer a task of
decomposing a matrix, but of calculating
U T
A [j]
V =
[j]
= +U T
F [j]
V;
orin our notation:
[j]
= +diag(U T
F [j]
V):
If A 0
is the closest representation of A we have, then U, and V will also be
the closest representation of the SVD of A we have. Therefore, we will keep U
and V constant. If we retain U and V, gathering samples of singular values can
be done without calculating multiple decompositions. Notice, also, that each [j]
i
is independent of all other [j]
k
;i 6= k. This means that it is possible only to test
one or a few singular values without having to multiply the entire matrix. This
property allows us to perform further optimization for large matrices, such as using
abisection methodtond the rankofA. The next sectionillustratesthe specics of
the complexity involved with sampling singularvalues.
5.1 Complexity
The runtime ofthis methodincludesthe timetocalculatethe singularvalue
decom-position of A and the time it takes to compute a m m matrix product n times,
where n is the sample size or,
O((m 3
)+nO(m 3
))2O(m 3
of the SVD, yet provides a quantied estimate of the rank of a matrix (In order to
providea sound sample,n doesnot need tobe verylarge, and usually m >>n).
What follows is an experiment with a large, 250250 matrix. The method
cor-rectly recognizes the rankof the matrix
0
50
100
150
10
−3
10
−2
10
−1
10
0
10
1
10
2
10
3
Specified Rank :117 | Determined Rank :117 | Static Noise :0.01
Threshold
t−values
Singular Values
117th Singular Value
Figure12: The methodprovides encouragingresults for large matrices
It isevidentfromFig. 12that themethodstillholdspromisewithlarge matrices.
Likethe SVD, this methodof taking samplesof singularvaluesbecomesvery
expen-sive quickly with regard to m. However, the benet to this price is a more accurate
It is certain that determining the rank of a matrix, no matter what method is used,
isa hardproblem. Many methodswhich use singularvalues tocalculaterankhavea
at cuto, wherethe rankis determined by where the ordered listof singular values
goesunderacertaincuto. Withthe methoddescribed inthispaper, wehaveshown
howtoobtain aprobabilisticestimateof rank, based onthe volatility ofthe singular
values of the matrix.
The results thus far are exciting, for sure, and there are numerous possible
op-timizationsto be included in future research. The relationship between the manual
perturbation and the static noise in the matrix needs to be studied. Study of the
eects of numerical errors within in the perturbations, themselves, should be
stud-ied. Comparisons with large, real-world data sets would alsobe useful, and, because
the variance of each singular value is independent of the others once we have UV T
,
someformof bisectionmightbeusedtoreducethe numberof matrixmultiplications
needed to nd the rank of amatrix.
7 References
References
[1] http://www-history.mcs.st-andrews.ac.uk/pictdisplay/householder.html.
[2] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. J.
McKen-trialand Applied Mathematics, Philadelphia, PA, USA, 1999.
[3] J. K. Backhouse, Statistics: An introduction to Tests of Signicance,
Long-man's, Green and Co. ltd., 1967.
[4] J. Bai and S. Ng, Determining the number of factors in approximate factor
models,Econometrica,70 (2002),pp. 191{221.
[5] M. T.Chu and R. E. Funderlic,Thecentroid decomposition: Relationships
between discrete variational decompositions and svd's, SIAM Journal of Matrix
Analysis and Applications, 23(2002),pp. 1025{1044.
[6] A. Cline, C. Moler, G. Stewart, and J. H. Wilkinson, An estimate
for the condition number of a matrix, SIAM Journal onNumerical Analysis, 16
(1979), pp. 368{375.
[7] E. Drinea, P. Drineas, and P. Huggins, A randomized singular value
de-compositionalgorithm for image processing applications.
[8] P.Drineas, A.Frieze,R.Kannan, S.Vempla, andV.Vinay,Clustering
large graphs via the singular value decomposition, Machine Learning, 56(2004),
pp. 9{33.
[9] S.C.EinstatandI.C.F.Ipsen,Relativeperturbationtechniquesforsingular
value problems, SIAMJournalonNumericalAnalysis, 32(1995),pp.1972{1988.
[10] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A.
pp. 165{480.
[11] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning, Springer, 2001.
[12] J.KammandJ.G.Nagy,Kroneckerproductandsvdapproximationsinimage
restoration, Linear Algebra and itsApplications, 284 (1998).
[13] J. Kamm and J. G. Nagy, Conditional maximum covariance analysis and its
application to the tropical indian ocean sst and surface wind stress anomalies,
Journal of Climate,16(2003).
[14] F.KleibergenandR.Paap,Generalizedreducedranktestsusingthesingular
value decomposition,Journal of Econometrics,Forthcoming.
[15] V. C. Klema and A. J. Laub, The singular value decomposition: Its
compu-tation and some applications, IEEE Transactions on Automatic Control, ac-25
(1980), pp. 164{176.
[16] K. Konstantinides and K. Yao,Statisticalanalysisof eective singular
val-ues in matrix rank determination, IEEE tranactions on acoustics, speech and
signal processing,36 (1988),pp. 757{763.
[17] M. L. Overton, Numerical Computing with IEEE Floating Point Arithmetic,