**TR-CS-98-13**

**Finding Near Rank Deficiency in**

**Matrix Products**

**Michael Stewart**

**December 1998**

### Joint Computer Science Technical Report Series

### Department of Computer Science

### Faculty of Engineering and Information Technology

### Computer Sciences Laboratory

### This technical report series is published jointly by the Department of

### Computer Science, Faculty of Engineering and Information Technology,

### and the Computer Sciences Laboratory, Research School of Information

### Sciences and Engineering, The Australian National University.

### Please direct correspondence regarding this series to:

### Technical Reports

### Department of Computer Science

### Faculty of Engineering and Information Technology

### The Australian National University

### Canberra ACT 0200

### Australia

### or send email to:

### Technical.Reports@cs.anu.edu.au

### A list of technical reports, including some abstracts and copies of some full

### reports may be found at:

### http://cs.anu.edu.au/techreports/

**Recent reports in this series:**

*TR-CS-98-12 Vadim Olshevsky and Michael Stewart. Stable factorization*

*of Hankel and Hankel-like matrices. December 1998.*

*TR-CS-98-11 Michael Stewart. An error analysis of a unitary Hessenberg*

QR
*algorithm. December 1998.*

### TR-CS-98-10 Peter Strazdins.

*Optimal load balancing techniques for*

*block-cyclic decompositions for matrix factorization. September*

### 1998.

### TR-CS-98-09 Jim Grundy,

### Martin Schwenke,

### and Trevor Vickers

### (editors).

*International Refinement Workshop & Formal*

*Methods Pacific ’98 — Work-in-progress papers of IRW/FMP’98,*

*29 September – 2 October 1998, Canberra, Australia. September*

### 1998.

### TR-CS-98-08 Jim Grundy and Malcolm Newey (editors).

*Theorem*

*Proving in Higher Order Logics:*

*Emerging Trends —*

*Proceedings of the 11th International Conference, TPHOLs’98,*

*Canberra, Australia, September – October 1998, Supplementary*

*Proceedings. September 1998.*

Finding Near Rank Deciency in Matrix Products

Michael Stewart

December 1,1998

Thispapergivesatheoremcharacterizingapproximatelyminimalnormrankone perturbations E and F that makethe product (A+E)(B+F)

T

rankdecient. Thetheoremisstatedintermsofthesmallestsingularvalueofaparticularmatrix chosen froma parameterizedfamily of matricesby solving anonlinear equation. Consequently, it is analogous to the special case of the Eckhart-Young theorem describing the minimal perturbation that induces an order one rank deciency. While the theorem does not naturally extend to higher order rank deciencies, itcan beused tocompute a complete orthogonalproduct decomposition togive improved practical reliability inrevealing the numericalrankof AB

T .

1 Introduction

We assume that the m a

n and m b

n matrices A and B with n > m a

;m b

come from a model of the form

A= ^ A+E; B = ^ B +F (1) where ^ A, ^ B or ^ A ^ B T

exhibit some degreeof rankdeciency and E and F are perturbations corrupting the exact data in

^ A and

^

B. Without any loss of generality we assume that m a m b . ThusA, B, and AB T

willgenericallyhave fullrank even if rank deciency of the underlyingmodelimpliesthatoneormoreoftheseperturbedmatriceswillbeill-conditioned.

Our primary goal isto recover the rank of ^ A

^ B T

given A and B and tond estimates of the correspondingrangeand nullspaces. Togive fullgenerality,wemightalsobeconcerned with simultaneously nding anestimateof the ranks of

^ A and

^ B.

Before moving on to the central and most diÆcult problem of estimating the product rank, we describe a complete orthogonal product decomposition to jointly reveal the ranks of ^ A, ^ B and ^ A ^ B T

when the unperturbed matrices are available. The decomposition takes the form U T ^ AQ= 2 6 4 A 11 0 0 0 0 A 21 A 22 0 0 0 0 0 0 0 0 3 7 5 ; V T ^ BQ= 2 6 4 0 0 0 0 0 0 0 B 23 0 0 0 B 32 B 33 B 34 0 3 7 5 (2)

where U, V and Q are square and orthogonal, the columns of the matrices are partitioned in the same way and A

11 , A 22 , B 32 and B 23

are square and have fullrank. If

r a =rank ( ^ A); r b =rank( ^ B); r p =rank( ^ A ^ B T ) 1

11 a p a p 22 32 p p 23 b p b p 24 is r p r p .

One possiblealgorithmforcomputingthis decompositionstartswithanorthogonalrank revealing factorization of ^ A to get " U (1) 0 0 V (1) # T 2 4 ^ A ^ B 3 5 Q (1) = 2 6 6 4 A (1) 1 0 0 0 B (1) 1 B (1) 2 3 7 7 5 (3) where A (1) 1 isr a r a

and has full rank. Since

rank ^ A ^ B T =rank A (1) 1 B (1) 1 T =r p

a rankrevealing factorization of B (1) 1 gives " I 0 0 V (2) # T 2 6 6 4 A (1) 1 0 0 0 B (1) 1 B (1) 2 3 7 7 5 Q (2) = 2 6 6 6 6 6 4 A (2) 1 A (2) 2 0 0 0 0 0 0 B (2) 13 0 B (2) 22 B (2) 23 3 7 7 7 7 7 5 where B (2) 22 is r p r p

and has full rank. Clearly rank(B (2) 13 ) = r b r p . A rank revealing factorization of B (2) 13 gives " I 0 0 V (3) # T 2 6 6 6 6 6 4 A (2) 1 A (2) 2 0 0 0 0 0 0 B (2) 13 0 B (2) 32 B (2) 23 3 7 7 7 7 7 5 Q (3) = 2 6 6 6 6 6 6 6 4 A (3) 1 A (3) 2 0 0 0 0 0 0 0 0 0 0 0 0 B (3) 23 0 0 B (3) 32 B (3) 33 B (3) 34 3 7 7 7 7 7 7 7 5 where B (3) 23 is (r b r p )(r b r p ) and B (3) 32 = B (2) 22 . A further transformation U (3) can be used to give ^

A the desired block triangularstructure. To get (2) the r p (n r a r b +r p ) matrix B (3) 34

can becompressed intor p

nonzero columnsusinga furthertransformationQ (4)

. Clearly the singular values of the product A

22 B

T 32

are the nonzero singular values of ^ A ^ B T . Related decompositionsmay befound in [3,1].

The above discussion shows that r p

can be found from the SVD of B (1) 1

. It can also be founddirectlyfromthesingularvaluesof

^ A

^ B T

. However, withoutaccesstotheunperturbed ^

A and ^

B neither of these approaches isentirely satisfactory. Forthe latter this can be seen from the two examples A = B =

p

and A = 1, B = . The product singular value, , is the same, but in the second example A and B come from a model of the form (1) with E;F = O() and rank(

^ A

^ B T

) = 0; in the rst example they do not. Looking for small singularvaluesinB

(1) 1

can alsofail. Theproblemisthat B (1) 1

A = " 1 0 0 0 0 Æ 0 # ; B = " 1 0 0 0 0 0 1 0 # (4)

where 0<<Æ<1. The element represents a perturbation to

^ A= " 1 0 0 0 0 Æ 0 0 # : The matrix ^

B is unperturbed. We suppose that Æ is signicantly smaller than 1 but that it is large enough that A can be considered to have full rank. We assume that is small enoughthat it is of the same order asthe tolerance used inrank decisions.

We consider the algorithm outlined in the derivation of (2) applied to the perturbed matrices A and B. The algorithmstarts with anorthogonal transformationdetermined by the LQ factorization of A 2 6 6 6 6 4 1 0 0 0 0 Æ 0 1 0 0 0 0 0 1 0 3 7 7 7 7 5 2 6 6 6 6 4 1 0 0 0 0 Æ p Æ 2 + 2 p Æ 2 + 2 0 0 p Æ 2 + 2 Æ p Æ 2 + 2 0 0 0 0 1 3 7 7 7 7 5 = 2 6 6 6 6 4 1 0 0 0 0 p Æ 2 + 2 0 0 1 0 0 0 0 p Æ 2 + 2 Æ p Æ 2 + 2 0 3 7 7 7 7 5 :

This is the factorization step represented in (3). Having computed a trivial rank revealing factorizationofA, weproceedby determiningtherankofB

(1) 1 . If=1e 16andÆ =1e 8 then B (1) 1 = " 1 0 0 p Æ 2 + 2 # = " 1 0 0 1e 8 # :

Withoutthe perturbation,B (1) 1

willbeexactly rankdecient. However if isnot zero and Æ isat all small, we get a very hard rank decision. Since wehave assumed that Æ isgreater than the tolerance, we would conclude for these chosen values of and Æ that B

(1) 1

has full rankand that r

p

=2. Theend result ismisleadinglypartitioneddecompositionthat failsto reveal near rankdeciency inAB

T .

As an alternative to looking at the product singular values or at the singular values of B

(1) 1

, we propose a generalization of the Eckhart-Young theorem to nd nearly minimal perturbations E and F that make (A E)(B F)

T

rankdecient. The ordinary Eckhart-Young theorem may be stated asfollows. Let

^ A =U h ^ 0 i V T

then the Eckhart-Young theorem states that

kA ^ Ak 2;F = min rank ( ~ A)p kA ~ Ak 2;F : (5)

Thus small singular values indicate that A is close to some rank decient ^

A. This is the basis of the standard approach to rank estimation, [4]; the eectiveness of most alternate approaches is typically measured in terms of their ability to reveal the presence of small singular values. In generalizing (5) to recover the rank of

^ A

^ B T

from A and B we will estimate d 2 (A;B)= min rank ( ^ A ^ B T )(m a 1) kA ^ Ak 2 F +kB ^ Bk 2 F : (6)

Ourestimateof(6)dependsonaperturbationexpansionandwillnotbeexact;theestimated d

2

(A;B) can be larger than the true value by a quanitity that is O(d 3

(A;B)).

In x2 we state and prove the main theorem. We outline an algorithm that attempts to compute a completely rank revealing product decomposition in x4. Given prescribed ranks r a , r b and r p

, the goal is to compute a decomposition that reveals nearly minimal perturbations to nd nearby

^ A and

^

B with the prescribed product SVD structure. The performance of the algorithm as judged by this standard depends on the accuracy of the perturbation expansion given in x2 and on the accuracy of a de ation step that is used to generalizethedecompositiontothecaseofhigherorderrankdeciency. BothposediÆculties which we will illustrate with examples. Nevertheless, in practice the overall method seems tobemore robust than other methods for estimatingthe rankof a matrix product.

2 The Eckhart-Young Generalization

Weassume that rank (AB T

)=m a

and that A and B come from (1)for some rankdecient ^

A ^ B T

. We will develop an estimate of d 2

(A;B) that can be computed using just A and B. The estimate is based on an expansion in terms of the perturbations E and F. To provide a compact notationfor neglecting higherorder terms, we willset

=max(kEk F

;kFk F

)

and freely ignore O( 2

) terms in expressions for quantities that are O(1)or O(). In some cases we will neglect terms of O(

3

) in expansions of terms that are O( 2

). The occasional need for a higher order expansion arises from a theorem from [5] in which second order terms are kept to retain accuracy in a perturbation expansion for a singular value that is very small or exactly zero. All vector norms will be 2-norms. For a matrix X for which XX

T

isnonsingular we dene the projection

P X =X T XX T 1 X:

A= ^ A+E; B = ^ B+F we assume that AB T

has full rank and that ^ A ^ B T has rank m a 1. For some u satisfying kuk=1 let y= A T u kA T uk ; g = P ? A B T BP ? A +kA T uk 2 I n 1 P ? A B T BA T u and f = B y 1 kA T uk g ! = kA T uk BP ? A B T +kA T uk 2 I m b 1 BA T u: Then u T (A ug T )(B fy T ) T =0: (7) If u satises ma AB T BP ? A B T +kA T uk 2 I m b 1 BA T =u T AB T BP ? A B T +kA T uk 2 I m b 1 BA T u (8) where ma

() is the smallesteigenvalue (singular value) of the m a

m a

matrix and kuk=1 and

^

A has full rank then forsuÆciently small perturbations E and F

kug T k 2 2;F +kyf T k 2 2;F = ma AB T BP ? A B T +kA T uk 2 I m b 1 BA T (9) (kEk F +kFk F ) 2 +O( 3 ): (10) (11)

A solution to (8) with kuk=1 always exists.

The theorem shows thatA ug T

and B fy T

are a pairwith arankdecientproduct that is nearly asclose as possible to Aand B inthe sense of (6).

The conditions that make (10) validare hard to state ina more preciseform. This and the fact that the phrase \suÆciently small" must be interpreted in terms of the unknown matrices

^ A and

^

B pose signicant problems in evaluating the quality of the expansion. Nevertheless, experiments suggest that (10) is often a signicant improvement over other methods for ndingrank deciency in amatrix product.

The theorem makesve claims: the existenceof a solutionto(8), the equivalence of the two relations for f, the fact that u is a null vector of the perturbed matrix pair in (7), the equivalence of the eigenvalue

ma

with the sums of the norms of the perturbations in (9) and the upper bound (10). The distance estimate (10) is the most diÆcult to verify. The presence of the term kA

T uk 2 I m b

of the theorem.

To prove (9)we willshowthat the equality

kug T k 2 2;F +kfy T k 2 2;F =u T AB T BP ? A B T +kA T uk 2 I m b 1 BA T u

holds for any u with kuk =1. Since (8) holds by assumption this implies (9). To complete this proof we note that as it is dened in the theorem g is the solutionto the least squares problem min g " 1 kA T uk BP ? A I n # g " 1 kA T uk BA T u 0 # 2 :

Fromthe presenceofthe projection P ? A

intherst m b

rows ofthe leastsquaresproblemand thepenaltyonthenormofginthelastnrows, itfollowsthatP

? A

g =g. Sincekuk=kyk=1 and since P

? A

g =g, the residual of the least squares problemis 1 kA T uk BP ? A g 1 kA T uk BA T u 2 +kgk 2 = 1 kA T uk Bg By 2 +kgk 2 =kug T k 2 2;F +kfy T k 2 2;F :

To makethe connection with the righthand sideof (9), we can get analternate formulafor this residual using the orthogonality property of least squares solutions. In particular

kug T k 2 2;F +kfy T k 2 2;F = h 1 kA T uk u T AB T 0 i " 1 kA T uk BA T u 0 # " 1 kA T uk BP ? A I n # g ! = u T AB T 1 kA T uk 2 I m b BP ? A kA T uk 2 1 kA T uk 2 P ? A B T BP ? A + I n ! 1 P ? A B T kA T uk 2 1 A BA T u = u T AB T BP ? A B T +kA T uk 2 I m b 1 BA T u:

The nal equality follows fromthe Sherman-Morrison-Woodburymatrix inversion formula X+YZ T 1 =X 1 X 1 Y I+Z T X 1 Y 1 Z T X 1 :

Together with (8) this establishes (9).

Vericationthatthetwoequationsforf areequivalentisbyuseofP ? A

g =g,substitution of the expressions for y and g and anotherapplication of the Sherman-Morrison-Woodbury formula. By harmlesslyinserting P

? A

intothe rst formulafor f weget

P ? A g =g and y2R (A T ),y T g =0 and u T (A ug T )(B fy T ) T = u T AB T g T B T u T Ay y T 1 kA T uk g T ! B T = u T A(I yy T )B T g T B T + u T Ay kA T uk g T B T =0:

The cancellations happen because u T Ay =kA T ukand u T A(I yy T )=kA T uky T P ? y T =0:

To show that asolution to(8) exists dene

m a ( )= m a AB T BP ? A B T + 2 I m b 1 BA T for > 0. Let u ma

( ) be an eigenvector associated with ma

( ). We need to show that there isan 0 such that kA T u ma ( 0 )k= 0 (12)

for some choice of the singular vector u ma ( 0 ). If u ma

( ) were unique (up to sign change) foreach and continuous asa function of , then the proof would be very simple. Since A by assumptionhas full rank

f( )=kA T

u ma

( )k >0

forsuÆcientlysmall . ForsuÆciently large ,f( )<0. Hencecontinuitywouldguarantee a solution for which f( ) = 0. Unfortunately a rigorous proof is complicated by possible discontinuities inu

ma

( ) thatcan occurwhen the eigenvalue ma

( )is repeated. The proof of the followingresult uses onlybasic ideas fromanalysis together with the well known fact that a particular eigenvalue of a family of matrices varying continuously with >0 is also continuous in .

Theorem 2 If A has fullrank then (8) is satised forsome u normalized so that kuk=1.

Proof: As we have noted, we need to show that (12) holds for some 0 > 0. Since A has full rank f( )=min u ma kA T u ma ( )k >0

for all suÆciently small >0. We alsohave f( )< 0 for all suÆciently large >0. The minimum that denes f( ) is taken over all possible choices of u

m a

( ) (i.e. all norm one vectorsinthe subspace spanned by theeigenvectors associatedwith

ma ( )). Let 0 >0be dened by 0 =supfjf( )0g:

From this denition it follows that either f( 0 ) 0 or in any interval [ 0 ; 0 ] there must be an innite number of points for which f(

0

existsequences u k and k suchthatkA u k k k whereu k

isaneigenvector associatedwith m a ( k ), k 0

for all k and k

! 0

. The continuity of eigenvalues implies that the bounded sequence u

k

has somesubsequence convergingto usuch that kA T

uk 0

whereu is aneigenvector associated with

ma (

0 ). Similarly the denition of

0

implies that f( ) < 0 for > 0

. From this the same argumentused toconstruct u implies that there isan eigenvector u such that

kA T uk 0 kA T uk:

It follows that there are scalars c and s such that c 2

+s 2

= 1 and u = cu+su is also an eigenvector associated with

ma ( 0 ) and kA T uk= 0 .

At this point we have veried allof Theorem 1 except for (10). To nish the proof, we start with the followingperturbationexpansion for singularvaluesfrom [5].

Theorem 3 For a general m a m b matrix ^ D with SVD U T ^ DV = " U T 1 u T 2 # ^ D h V 1 v 2 V 3 i = " ^ 0 0 0 ^ ma 0 # with U T U =I m a , V T V =I m b and invertible ^ ^ m a I m a 1 let U T HV = " G 11 g 12 G 13 g T 21 22 g T 23 # and h=^ ma g 12 +g 21 . Then D= ^

D+H has a singular value

2 ma =(^ ma + 22 ) 2 +kg 21 k 2 +kg 23 k 2 +h T (^ 2 ma I ma 1 2 ) 1 h+O(kEk 3 ):

Byconsidering secondorder termsthis expansion can accurately characterizethe eect ofa perturbation on a zero singular value ^

ma

= 0 so long as ^

has full rank. In particular, if ^ ma =0 then 2 ma = 2 22 +kg 23 k 2 : (13) By assumption ^ A ^ B T has rank m a 1. If wedene ^ C= ^ A ^ B T ^ BP ? ^ A ^ B T +kA T uk 2 I m b 1 ^ B ^ A T

for some particular choice of u satisfying(8)and alsodene

^ D= ^ A ^ B T ^ BP ? ^ A ^ B T +kA T uk 2 I m b 1=2 thenboth ^ C and ^ Dhaverankm a 1. Consequently ^ asdenedintermsof ^ DinTheorem3 has full rank. This shows that the theorem can be applied to estimate the eect of the perturbations E and F on^ m a = m a ( ^ D).

D=AB T BP ? A B T +kA T uk 2 I m b 1=2

then we willderive anexpansion of the form

D= ^

D+H+O( 2

) (14)

where H isa perturbation dened in terms of E and F. Note that in dening D and ^ D we haveused the quantity kA

T

uk withthe perturbed Ain both cases. Usingthe expression we willderive for H and Theorem 3we willshow that if u satises (8)then

m a (C)= 2 ma (D)(kEk F +kFk F ) 2 +O( 3 )

where C isdened interms of A and B ina manneranalogous to the denition of ^ C. This willcomplete the proof of (10).

The matrix square rootas we have dened itis nonunique. However since ma

(D) does not depend onthechoiceofsquare root,weare freetochoose particularsquarerootsD and

^

D to make H suitably smallsubject only to the constraints DD T =C and ^ D ^ D T = ^ C. We start by choosing anarbitrary factorization of the form

^ S ^ S T = ^ BP ? ^ A ^ B T +kA T ukI m b 1 :

This is guaranteed to exist since the inverse matrix is symmetric and positive denite. We let ^ D= ^ A ^ B T ^ S:

For suÆciently smallE and F perturbing ^ A and

^

B we can choose a square rootS such that S = ^ S+H S +O( 2 ) wherekH S

k=O(). ItturnsoutthattheprecisemagnitudeofkH S

kisofnoconsequenceto theanalysis;allthatmattersisthattermsoftheformO(H

S

)areO( 2

)andareconsequently negligible.

We can expand D interms of E, F and H S as follows D = ^ A+E ^ B+F T ^ S+H S = ^ A ^ B T ^ S+E ^ B T ^ S+ ^ AF T ^ S+ ^ A ^ B T H S +O( 2 ) = ^ A ^ B T ^ S+EP ^ A ^ B T ^ S+EP ? ^ A ^ B T ^ S+ ^ A F T ^ S+ ^ A ^ B T H B +O( 2 ) = ^ A ^ B T ^ S+H:

By the assumption that rank( ^ A ^ B T ) =m a , the matrix ^ D = ^ A ^ B T ^ S has rank m a 1 and a unique left nullvector u

2

that depends only on the leftnull space of ^ A ^ B T and not on ^ S or the value of kA T uk.

We consider Theorem 3 applied to the smallest singular value ma

( ^

D) = 0 with the perturbation H. Using(13) we get

where u 2

isthe leftnull vector of D. Note that P ^ A ^ B T ^ S h v 2 V 3 i = ^ A T ^ A ^ A T T ^ D h v 2 V 3 i =0

sothatmany ofthe termsintheexpansionforH shareeithercommonleftorcommonright null vectors with

^ D. Consequently 2 ma (D) = u T 2 ( ^ AF T ^ S+EP ? ^ A ^ B T ^ S) h v 2 V 3 i 2 +O( 3 ) k ^ A T u 2 kk ^ Sk 2 kFk 2 +kP ? ^ A ^ B T ^ Sk 2 kEk 2 2 +O( 3 ):

ForE and F thatare suÆcientlysmallthe assumptionthat rank ( ^ A ^ B T ^ S)=rank( ^ A ^ B)= m a

1guaranteesthatuisaperturbedversionoftheleftnullvectoru 2 . Sinceu=u 2 +O() we have kA T u 2 k=kA T uk+O():

The qualication that E and F must be suÆciently small is standard in any rst order perturbation expansion ofa singularvector associated with anisolated singularvalue.

The fact that k ^ Sk

2

1=kA T

uk+O() guarantees that

kA T u 2 kk ^ Sk 2 1+O(): Similarly kP ? ^ A ^ B T ^ Sk 2 1+O(): Thus 2 m a (D) (kEk F +kFk F ) 2 +O( 3 ) 2 kEk 2 F +kFk 2 F +O( 3 ): (15) Since 2 ma (D)= m a

(C),at this point we have proven all the claims of Theorem 1. Wecan oerfurther justicationfor the assumptionthat

^ A ^ B T has rankm a 1. Instead of letting E and F be arbitrary perturbations, wedene

(E;F)= argmin f(E;F)jrank((A+E)(B+F) T )<mag kEk 2 F +kFk 2 F : (16) Thus d 2 (A;B)=kEk 2 F +kFk 2 F : With ^ A and ^

B dened by (1), the matrix pair ^ A and

^

B is a closest pair to A and B for which ^ A ^ B T is rank decient. Theorem 4 For AB T with rank m a

Proof: Any degreeof rankdeciency in ^ A

^

B impliesthat there exists avector usuch that u T ^ A ^ B T

=0 and kuk = 1. This in turn implies that orthogonal transformations U and W can be constructed togive

U T ^ AQ= " ^ a 11 0 ^ a 21 ^ A 22 # ; ^ BQ= h 0 ^ B 2 i (17) where Ue 1 = u and a 11

is a scalar. Since perturbing the nonzero elements of (17) only increases the Frobenius norm of the perturbations it follows from the assumed minimality of the perturbations that

U T AQ= " ^ a 11 e T ^ a 21 ^ A 22 # ; BQ= h f ^ B 2 i and E =U " 0 e T 0 0 # Q T ; F = h f 0 i Q T : (18)

These perturbations inducea higher orderrank deciency in ^ A ^ B T only if rank( ^ A 22 ^ B T 2 )<m a 1:

However if that is the case and e6=0,then a strictly smaller pair of perturbations

E =0; F = h f 0 i Q T givesrank ( ^ A ^ B T )<m a

. This contradictstheassumed minimalityofE and F. Onthe other hand, if e=0then ^ A ^ B T =AB T AQ h f 0 i T

so that the perturbation to AB T

is rank one and cannot cause the rank of ^ A ^ B T to be less than m a 1.

ItfollowsfromthistheoremthattheclosestpairtoAandB witharankdecientproduct has rank( ^ A ^ B T ) = m a

1. This pair can serve as the ^ A and ^ B mentioned in Theorem 1 giving kug T k 2 2;F +kyf T k 2 2;F d 2 (A;B)+O( 3 ):

Whilethe observation that ^ A

^ B T

dened inthis way always has rank m a

1 is comforting, it doesn't reallygive obviousbenet inevaluatingthe quality of the expansion(10).

The following example shows, the expansion fail dramatically if ^

A does not have full rank.

Example 2 Consider the matrix pair

A= h 0 i ; B = h 1 0 i (19)

for 0<1. Whilewe donot rigorously prove the fact,itis easyto show that the matrix pair that is closest toA and B in the sense of (6)is

resultsinanegligibleO( 2 )changeto ^ A ^ B T

. That Theorem1failsfollowsfromthe factthat for m a =1, we have u=1and ma AB T BP ? A B T +kA T uk 2 I m b 1 BA T =(0+ 2 I m b ) 1 =1:

The method of looking at the smallest singular value of B (1) 1 , 1 (B (1) 1 ) = 1, fails in exactly thesamewayon(19). Howeverinlesscontrivedproblems,theexpansioninTheorem1often retains its accuracy even when A is very ill-conditioned and the smallest singular value of B

(1) 1

isordersof magnitudelarger thand(A;B). It isonlyinthe higherorderterms that ill-conditioninginA canhaveaneect on(10). In contrast, the eect oferrors onthesmallest singular value of B

(1) 1

is often directly magnied by a factor proportional to the condition number of A.

3 Finding the Null Vector

Theorem 1 characterizes nearly minimal perturbations in terms of a possibly nonunique solution to the nonlinear equation (8). In contrast to a direct optimization formulation in which we might have to worry about local minima, the perturbation analysis of the last section shows that any solution to (8) will give suitably small perturbations so long as the conditions on the validity of the expansion are not too close to being violated. Subject to these conditions, purely localinformationobtained by solving (8)providesan estimateof a globally minimumperturbationsgivingrank deciency.

However solving (8) is not always atrivialtask. The singularvector associated with the singularvalue m a AB T BP ? A B T + 2 I m b 1 BA T

is not in general continuous as a function of the parameter at points where the singular value has multiplicitygreater than one. Because of this potentialdiscontinuity, a provably convergent iteration for solving (8) seems to be a remote possibility. Nevertheless, the left singular vector associated with

ma (AB

T

) is often a good approximation to u. Making use of this starting vector, we propose the following iteration without any guarantees of convergence.

Algorithm 1 Takeu 0

tobetheleftsingularvector ofAB T associatedwith ma (AB T ). Let 0 =kA T u 0 k and letk =1. 1. Letu k

be the eigenvector vector dened by

work. Thefollowingexampleillustratesthe potentialproblems thatcan arise withrepeated singularvalues. Example 3 Let A = " 1 0 0 0 0 2 0 0 # ; B = " 1 0 1 0 0 1 0 3 # :

It is easyto verify that the application of Algorithm1 tothis problem gives

( 1 ; 2 ; 3 ; 4 ;:::)=(1;2;1;2;:::):

The algorithmdoes not converge, but asolution to(8)occurs for = q 5=3 forwhich AB T BP ? A B T + 2 I m b 1 BA T = " 3=8 0 0 3=8 #

has a repeated singularvalue and wecan choose

u= h p 7=3 p 2=3 i tosatisfy kA T uk 2 =5=3= 2 .

The example is somewhatcontrived. Algorithm1 seems toconverge in most cases.

4 Higher Order Rank Deciency

IfAand B comefromamodelofthe form(1)with rank( ^ A ^ B T )=r<m a 1wemightwish tond perturbations E and F that satisfy

(E;F)= argmin f(E;F) j rank((A+E)(B+F) T )rg kEk 2 F +kFk 2 F : (20)

ThemostobviousmethodsforattemptingtodecideifAandB areconsistentwithamodelof theform(1)witharankrmatrixproductare naturalgeneralizationsofmethodsforthecase r = m

a

1. As discussed in x1 we can attempt to determine the number of non-negligible singularvaluesof eitherAB

T orB (1) 1 whereB (1) 1

isdenedasin(3). Thepotentialproblems with these rankdecisions are the same as before.

Theorem 1 does not admit an obvious generalizationto r < m a

1. As an alternative we will assume that perturbations that further reduce the rank m

a

1 product to rank r can be chosen tobeorthogonal to the perturbationsthat originally reduced the product to rank m

a

reveals the rank of AB in the sense of nding nearly minimal E and F. Nevertheless, we willgetanalgorithmthattypicallyndssmallerperturbationsthancan befoundbylooking for smallsingularvalues inB

(1) 1

.

Thedistancemeasure(6)isinvariantunderorthogonaltransformations. Thuswearefree toapply transformationsof the form U

T

AQ and V T

BQ whereU, V and Q are orthogonal. Westart by with perturbations

~ E (1) and ~ F (1)

and a vector usuch that

u T A+ ~ E (1) B + ~ F (1) T =0

with kuk = 1 and k ~ E (1) k 2 F +k ~ F (1) k 2 F

not much larger than d 2

(A;B). Our goal is to nd further perturbations E (2) and F (2) so that rank A+ ~ E (1) +E (2) B+ ~ F (1) +F (2) T =r (21)

with nearly minimal ~ E (1) +E (2) 2 F + ~ F (1) +F (2) 2 F .

Before describing our orthogonality assumption and attempting to construct E (2)

and F

(2)

weconsider the computation of ~ E (1) and ~ F (1)

. Theorem 1givesan explicit formulafor nearlyminimalperturbations. Howeverwecandoslightlybetter. InsteadofusingTheorem1 directly, we will use u solving (8) to construct alternate perturbations that can be slightly smallerthan those of the theorem. Theserened rankreducing perturbationsalsohave the advantage of more directly illuminating the algorithmic signicance of the assumption of orthogonalitybetween the perturbations.

Suppose that u, g, y and f are dened as inthe theorem. If

E (1) = ug T ; F (1) = fy T (22)

are the rankreducing perturbations then the theorem statesthat

u T A+E (1) B+F (1) T =0:

Choose an orthogonalQ such that

u T A+E (1) Q=u T A+E (1) h q 1 Q 2 i = h 0 T i

forsome scalar and anorthogonalU sothat U T

u=e 1

wheree 1

isthe rst standard basis vector. Then U T AQ= " ~ a 11 ~ g ~ a 21 ~ A 22 # and BQ= h ~ f ~ B 2 i where ~a 11 is a scalar, ~

f isa columnvector. Bythe constructionof Q

q T 1 B+F (1) T =0 so that k ~ fk 2 = F (1) q 1 2 kfy T k F : Thusif ~ E (1) =U " 0 g~ T 0 0 # Q T ; ~ F (1) = h ~ f 0 i Q T (23) then u T A+ ~ E (1) B + ~ F (1) T =0 and ~ E (1) 2 F + ~ F (1) 2 F E (1) 2 F + F (1) 2 F :

We will use the perturbations ~ E (1) and ~ F (1)

to induce rank deciency instead of E (1)

and F

(1) .

The following lemma shows that if the conditions that make the expansion from Theo-rem 1 validare satisedthen both these perturbationsare non-zero.

Lemma 1 Let u, g, f and y be dened as in Theorem 1, E (1) and F (1) dened by (22) and ~ E (1) and ~ F (1)

dened by(23). The assumption

u T AB T 6=0 (24) implies that ~ E (1) 6=0 and ~ F (1) 6=0.

Proof: We extendour assumptionby notingthat it immediatelyimpliesu T A6=0 and also u T A+E (1) 6=0:

The latter follows becauseTheorem 1implies P ? A g =g sothat R A T ?R E (1) T and u T A+E (1) =0 onlyif u T A =0.

Toprovethe lemma,itissuÆcient toprovethat ~ f 6=0andg~6=0. Wehave ~ f =Bq 1 and Bq 1 = 1 (A+E (1) ) T u BA T u Bg = kA T uk (A+E (1) ) T u f:

By assumption Au 6= 0. The fact that f 6= 0 follows from the second expression for f in Theorem 1 and fromthe assumptionu

The orthogonality of Q implies that g~ = Q T 2 E (1) u = 0 only if E (1) u = q 1 . This is equivalentto E (1) T u= (A+E (1) ) T u A+E (1) T u or 0 @ 1 (A+E (1) ) T u 1 A E (1) T u= (A+E (1) ) T u A T u: Since E (1) T u = g, g ? R A T and u T

A 6= 0 this is impossible unless g = 0. The fact that g 6=0follows fromthe denition of g and u

T AB

T

6=0. So wemust have g~6=0. Giventhedenitionoftheperturbations

~ E (1) and ~ F (1)

wemakethefollowingassumption toallowrecursive application of Theorem 1.

Assumption 1 Givenu determinedasin Theorem 1 satisfyingu T AB T 6=0weassumethat perturbations E (2) and F (2)

can be chosen so that in addition to (21) wehave

E (2) T E (1) =0; F (2) F (1) T =0 (25) with ~ E (1) +E (2) 2 F + ~ F (1) +F (2) 2 F = ~ E (1) 2 F + E (2) 2 F + ~ F (1) 2 F + F (2) 2 F min rank( ^ A ^ B T )r kA ^ Ak 2 F +kB ^ Bk 2 F : (26)

Lemma 1 means that u T

AB T

6= 0 implies that the constraints (25) are nontrivial and are equivalentto E (2) T u=0; F (2) q 1 =0: (27)

To see how this assumptioncorresponds to the reduction ofthe order of the problemwe note that with U and Q constructedas before the orthogonality relationsimply that

E (2) =U " 0 0 e 21 E 22 # Q T ; F (2) =U h 0 F 2 i Q T :

Thuswehave the problemof nding minimalperturbations such that

r = rank A+ ~ E (1) +E (2) B + ~ F (1) +F (2) = rank " ~ a 11 0 ~ a 21 +e 21 ~ A 22 +E 22 #" 0 ~ B T 2 +F T 2 # ! = rank ( ~ A 22 +E 22 )( ~ B 2 +F 2 ) T :

The orthogonalitycondition on the perturbationsto B impliesthat e 21

has noeect on the rank. If it is nonzero it can only increase the norm of the perturbation. Consequently we can assume that

22 2 min rank(( ~ A22+E22)( ~ B2+F2) T )=r kE 22 k 2 F +kF 2 k 2 F :

This isthe recursivestep: given the perturbations ~ E (1) and ~ F (1)

that introduce arst order rankdeciency,wetransformAand B by U and Qandcontinue recursively byreducing the rankof ~ A 22 ~ B T 2 .

Unfortunately, as we willshortly illustratewith a small example, the orthogonality and minimality conditions, (25) and (26), are not always consistent. Since it seems to be al-gorithmically necessary for the recursive application of Theorem 1, we will strictly enforce (25) andhopefor the best insimultaneously trying tosatisfy(26). Theinconsistency of the conditions will degrade the ability of the algorithmto nd the appropriate degree of rank deciency in AB

T .

The orthogonalityrelationsareanalogoustothosethatapply torankreducing perturba-tions given by the Eckhart-Young theorem and the SVD: if E

(1)

is a minimal perturbation such that A+E

(1)

is rank decient, the construction of rank reducing perturbations from the SVD shows that it ispossible to choose E

(2) with E (2) T E (1) =0 and E (1) E (2) T =0such thatE (1) +E (2)

isaminimalperturbationreducing therankofA tor. Our assumption(25) representanaiveattempttoextendthisorthogonalitypropertytoperturbationsofamatrix product.

Thefollowingexamplehighlightstheproblemwiththisapproachandwiththeunderlying assumptionthat(25)isconsistentwith(26). Forsimplicityweconsideranexampleforwhich rank deciency is already present and u

T AB

T

= 0. Since this implies that ~ E (1) = 0 and ~ F (1)

=0, we use the alternate formof the orthogonality constraints given in(27).

Example 4 Consider the matrices

A= h A 1 0 i = 2 6 4 Æ 0 0 0 0 0 1 Æ 0 0 0 0 1 0 1 0 0 0 3 7 5 B = h B 1 B 2 i = 2 6 4 0 1 1 Æ 0 0 0 =Æ 0 Æ 0 0 0 0 0 0 Æ 3 7 5 where 0<Æ 1.

The product is already rankdecient and it is easytoverify that the vector

u T = h 1 0 0 i

is asolution to(8). Since no perturbationis required togiverank deciency wecan choose

q T 1 = h 1 0 0 0 0 0 i :

E 2 = 2 6 4 0 0 0 0 0 0 0 0 3 7 5 ; F 1 = 2 6 4 0 0 0 0 0 0 0 0 3 7 5 then h A 1 E 2 i " B T 1 +F T 1 B T 2 # = 2 6 4 0 0 0 Æ 0 0 1 0 0 3 7 5

so that the matrix pair is within O() of a pair ^ A and

^

B for which rank ( ^ A ^ B T ) = 1. Our proposed algorithmtries tointroducerank deciency into

~ A ~ B T

for the pair

~ A 22 = h A 1 0 i = " Æ 0 0 0 0 0 1 0 0 0 # ~ B 2 = h B 1 B 2 i = 2 6 4 1 1 Æ 0 0 =Æ 0 Æ 0 0 0 0 0 Æ 3 7 5 :

Weseek O()perturbationsE 22 and F 2 sothat rank ~ A 22 +E 22 ~ B 2 +F 2 T =1: Let E 22 = h E 1 E 2 i ; F 2 = h F 1 F 2 i : Weassume that ~ A 22 +E 22 ~ B 2 +F 2 T = h A 1 + E 1 E 2 i " B T 1 + F T 1 B T 2 + F T 2 # = A 1 B T 1 + A 1 F T 1 + E 1 B T 1 + E 2 B T 2 +O( 2 )

has rank 1. Since B 2

= ÆI the Eckhart-Young theorem implies that the only way this can happen is if A 1 B T 1 + A 1 F T 1 + E 1 B T 1 = " Æ 0 1 0 # + " Æ 0 0 1 # F T 1 + E 1 " 1 0 0 1 0 0 # +O( 2 )

has a singular value of O(Æ). If k E 1 k =O() and k F 1

k = O() then the rst two columns of this matrix have the form

X = " Æ+O() +O(Æ) 1+O() +O() # :

Withoutthe O()terms this matrix has rank1. It has aleftsingularvector associatedwith the smallest singularvalue of the form

2 2 (X)= u T " O() +O(Æ) O() +O() #" 0 1 # 2 +O( 3 )=(+O(Æ)) 2 :

Ignoring the third columncan onlydecrease 2 so 2 ( A 1 B T 1 + A 1 F T 1 + E 1 B T 1 ) 2 (X) O(Æ)Æ: Thusif k E 1 k andk F 1

kare not muchlarger than then wemust have k E 2

k. It can also be experimentally veried that Theorem 1 applied to

~ A 22 and ~ B 2

results in rank reducing perturbations that are much larger than .

5 Conclusions

Wehave proposed anew methodfordetecting nearrankdeciency inthe productAB T

. To rstorderitsperformance isprovablyinsensitivetoill-conditioningofthematrices. Itisalso insensitivetothepresence ofnearly intersectingsubspaces associatedwithmoderatelysmall singularvalues|asituationthat candegrade the accuracyof subspacesestimated using the product SVD.

We have also proposed a natural method for attempting to nd higher order rank de-ciency based on generalizing the orthogonality properties of SVD and the Eckhart-Young rankreducing perturbations. Unfortunately,the methodwasshown tobe inadequate inthe productcase. Findinganalgorithmfor reliablyrevealingthe rankofaproduct ofperturbed matricesremains anopen problem.

References

[1] B. De Moor and P. Van Dooren. Generalizations of the singular value and QR decom-positions. SIAM Journal of Matrix Analysis and Applications, 13:993{1014,1992.

[2] G.H.GolubandC.F.VanLoan. Matrix Computations. JohnsHopkinsUniversityPress, Baltimore, Maryland, 3nd edition,1996.

[3] C. C.Paige. SomeaspectsofgeneralizedQRfactorizations. InM.G.CoxandS.J. Ham-marling,editors,ReliableNumericalComputation,pages71{91,Oxford,1990.Clarendon Press.

[4] G.W.Stewart. Rankdegeneracy. SIAM JournalonScienticandStatisticalComputing, 5:403{413, 1984.