Finding Near Rank Deficiency in Matrix Products

Full text

(1)

TR-CS-98-13

Finding Near Rank Deficiency in

Matrix Products

Michael Stewart

December 1998

Joint Computer Science Technical Report Series

Department of Computer Science

Faculty of Engineering and Information Technology

Computer Sciences Laboratory

(2)

This technical report series is published jointly by the Department of

Computer Science, Faculty of Engineering and Information Technology,

and the Computer Sciences Laboratory, Research School of Information

Sciences and Engineering, The Australian National University.

Please direct correspondence regarding this series to:

Technical Reports

Department of Computer Science

Faculty of Engineering and Information Technology

The Australian National University

Canberra ACT 0200

Australia

or send email to:

Technical.Reports@cs.anu.edu.au

A list of technical reports, including some abstracts and copies of some full

reports may be found at:

http://cs.anu.edu.au/techreports/

Recent reports in this series:

TR-CS-98-12 Vadim Olshevsky and Michael Stewart. Stable factorization

of Hankel and Hankel-like matrices. December 1998.

TR-CS-98-11 Michael Stewart. An error analysis of a unitary Hessenberg

QR

algorithm. December 1998.

TR-CS-98-10 Peter Strazdins.

Optimal load balancing techniques for

block-cyclic decompositions for matrix factorization. September

1998.

TR-CS-98-09 Jim Grundy,

Martin Schwenke,

and Trevor Vickers

(editors).

International Refinement Workshop & Formal

Methods Pacific ’98 — Work-in-progress papers of IRW/FMP’98,

29 September – 2 October 1998, Canberra, Australia. September

1998.

TR-CS-98-08 Jim Grundy and Malcolm Newey (editors).

Theorem

Proving in Higher Order Logics:

Emerging Trends —

Proceedings of the 11th International Conference, TPHOLs’98,

Canberra, Australia, September – October 1998, Supplementary

Proceedings. September 1998.

(3)

Finding Near Rank De ciency in Matrix Products

Michael Stewart

December 1,1998

Thispapergivesatheoremcharacterizingapproximatelyminimalnormrankone perturbations E and F that makethe product (A+E)(B+F)

T

rankde cient. Thetheoremisstatedintermsofthesmallestsingularvalueofaparticularmatrix chosen froma parameterizedfamily of matricesby solving anonlinear equation. Consequently, it is analogous to the special case of the Eckhart-Young theorem describing the minimal perturbation that induces an order one rank de ciency. While the theorem does not naturally extend to higher order rank de ciencies, itcan beused tocompute a complete orthogonalproduct decomposition togive improved practical reliability inrevealing the numericalrankof AB

T .

1 Introduction

We assume that the m a

n and m b

n matrices A and B with n > m a

;m b

come from a model of the form

A= ^ A+E; B = ^ B +F (1) where ^ A, ^ B or ^ A ^ B T

exhibit some degreeof rankde ciency and E and F are perturbations corrupting the exact data in

^ A and

^

B. Without any loss of generality we assume that m a m b . ThusA, B, and AB T

willgenericallyhave fullrank even if rank de ciency of the underlyingmodelimpliesthatoneormoreoftheseperturbedmatriceswillbeill-conditioned.

Our primary goal isto recover the rank of ^ A

^ B T

given A and B and to nd estimates of the correspondingrangeand nullspaces. Togive fullgenerality,wemightalsobeconcerned with simultaneously nding anestimateof the ranks of

^ A and

^ B.

Before moving on to the central and most diÆcult problem of estimating the product rank, we describe a complete orthogonal product decomposition to jointly reveal the ranks of ^ A, ^ B and ^ A ^ B T

when the unperturbed matrices are available. The decomposition takes the form U T ^ AQ= 2 6 4 A 11 0 0 0 0 A 21 A 22 0 0 0 0 0 0 0 0 3 7 5 ; V T ^ BQ= 2 6 4 0 0 0 0 0 0 0 B 23 0 0 0 B 32 B 33 B 34 0 3 7 5 (2)

where U, V and Q are square and orthogonal, the columns of the matrices are partitioned in the same way and A

11 , A 22 , B 32 and B 23

are square and have fullrank. If

r a =rank ( ^ A); r b =rank( ^ B); r p =rank( ^ A ^ B T ) 1

(4)

11 a p a p 22 32 p p 23 b p b p 24 is r p r p .

One possiblealgorithmforcomputingthis decompositionstartswithanorthogonalrank revealing factorization of ^ A to get " U (1) 0 0 V (1) # T 2 4 ^ A ^ B 3 5 Q (1) = 2 6 6 4 A (1) 1 0 0 0 B (1) 1 B (1) 2 3 7 7 5 (3) where A (1) 1 isr a r a

and has full rank. Since

rank  ^ A ^ B T  =rank  A (1) 1  B (1) 1  T  =r p

a rankrevealing factorization of B (1) 1 gives " I 0 0 V (2) # T 2 6 6 4 A (1) 1 0 0 0 B (1) 1 B (1) 2 3 7 7 5 Q (2) = 2 6 6 6 6 6 4 A (2) 1 A (2) 2 0 0 0 0 0 0 B (2) 13 0 B (2) 22 B (2) 23 3 7 7 7 7 7 5 where B (2) 22 is r p r p

and has full rank. Clearly rank(B (2) 13 ) = r b r p . A rank revealing factorization of B (2) 13 gives " I 0 0 V (3) # T 2 6 6 6 6 6 4 A (2) 1 A (2) 2 0 0 0 0 0 0 B (2) 13 0 B (2) 32 B (2) 23 3 7 7 7 7 7 5 Q (3) = 2 6 6 6 6 6 6 6 4 A (3) 1 A (3) 2 0 0 0 0 0 0 0 0 0 0 0 0 B (3) 23 0 0 B (3) 32 B (3) 33 B (3) 34 3 7 7 7 7 7 7 7 5 where B (3) 23 is (r b r p )(r b r p ) and B (3) 32 = B (2) 22 . A further transformation U (3) can be used to give ^

A the desired block triangularstructure. To get (2) the r p (n r a r b +r p ) matrix B (3) 34

can becompressed intor p

nonzero columnsusinga furthertransformationQ (4)

. Clearly the singular values of the product A

22 B

T 32

are the nonzero singular values of ^ A ^ B T . Related decompositionsmay befound in [3,1].

The above discussion shows that r p

can be found from the SVD of B (1) 1

. It can also be founddirectlyfromthesingularvaluesof

^ A

^ B T

. However, withoutaccesstotheunperturbed ^

A and ^

B neither of these approaches isentirely satisfactory. Forthe latter this can be seen from the two examples A = B =

p

 and A = 1, B = . The product singular value, , is the same, but in the second example A and B come from a model of the form (1) with E;F = O() and rank(

^ A

^ B T

) = 0; in the rst example they do not. Looking for small singularvaluesinB

(1) 1

can alsofail. Theproblemisthat B (1) 1

(5)

A = " 1 0 0 0 0 Æ  0 # ; B = " 1 0 0 0 0 0 1 0 # (4)

where 0<<Æ<1. The element represents a perturbation to

^ A= " 1 0 0 0 0 Æ 0 0 # : The matrix ^

B is unperturbed. We suppose that Æ is signi cantly smaller than 1 but that it is large enough that A can be considered to have full rank. We assume that  is small enoughthat it is of the same order asthe tolerance used inrank decisions.

We consider the algorithm outlined in the derivation of (2) applied to the perturbed matrices A and B. The algorithmstarts with anorthogonal transformationdetermined by the LQ factorization of A 2 6 6 6 6 4 1 0 0 0 0 Æ  0 1 0 0 0 0 0 1 0 3 7 7 7 7 5 2 6 6 6 6 4 1 0 0 0 0 Æ p Æ 2 + 2  p Æ 2 + 2 0 0  p Æ 2 + 2 Æ p Æ 2 + 2 0 0 0 0 1 3 7 7 7 7 5 = 2 6 6 6 6 4 1 0 0 0 0 p Æ 2 + 2 0 0 1 0 0 0 0  p Æ 2 + 2 Æ p Æ 2 + 2 0 3 7 7 7 7 5 :

This is the factorization step represented in (3). Having computed a trivial rank revealing factorizationofA, weproceedby determiningtherankofB

(1) 1 . If=1e 16andÆ =1e 8 then B (1) 1 = " 1 0 0  p Æ 2 + 2 # = " 1 0 0 1e 8 # :

Withoutthe perturbation,B (1) 1

willbeexactly rankde cient. However if isnot zero and Æ isat all small, we get a very hard rank decision. Since wehave assumed that Æ isgreater than the tolerance, we would conclude for these chosen values of  and Æ that B

(1) 1

has full rankand that r

p

=2. Theend result ismisleadinglypartitioneddecompositionthat failsto reveal near rankde ciency inAB

T .

As an alternative to looking at the product singular values or at the singular values of B

(1) 1

, we propose a generalization of the Eckhart-Young theorem to nd nearly minimal perturbations E and F that make (A E)(B F)

T

rankde cient. The ordinary Eckhart-Young theorem may be stated asfollows. Let

(6)

^ A =U h ^  0 i V T

then the Eckhart-Young theorem states that

kA ^ Ak 2;F = min rank ( ~ A)p kA ~ Ak 2;F : (5)

Thus small singular values indicate that A is close to some rank de cient ^

A. This is the basis of the standard approach to rank estimation, [4]; the e ectiveness of most alternate approaches is typically measured in terms of their ability to reveal the presence of small singular values. In generalizing (5) to recover the rank of

^ A

^ B T

from A and B we will estimate d 2 (A;B)= min rank ( ^ A ^ B T )(m a 1) kA ^ Ak 2 F +kB ^ Bk 2 F : (6)

Ourestimateof(6)dependsonaperturbationexpansionandwillnotbeexact;theestimated d

2

(A;B) can be larger than the true value by a quanitity that is O(d 3

(A;B)).

In x2 we state and prove the main theorem. We outline an algorithm that attempts to compute a completely rank revealing product decomposition in x4. Given prescribed ranks r a , r b and r p

, the goal is to compute a decomposition that reveals nearly minimal perturbations to nd nearby

^ A and

^

B with the prescribed product SVD structure. The performance of the algorithm as judged by this standard depends on the accuracy of the perturbation expansion given in x2 and on the accuracy of a de ation step that is used to generalizethedecompositiontothecaseofhigherorderrankde ciency. BothposediÆculties which we will illustrate with examples. Nevertheless, in practice the overall method seems tobemore robust than other methods for estimatingthe rankof a matrix product.

2 The Eckhart-Young Generalization

Weassume that rank (AB T

)=m a

and that A and B come from (1)for some rankde cient ^

A ^ B T

. We will develop an estimate of d 2

(A;B) that can be computed using just A and B. The estimate is based on an expansion in terms of the perturbations E and F. To provide a compact notationfor neglecting higherorder terms, we willset

=max(kEk F

;kFk F

)

and freely ignore O( 2

) terms in expressions for quantities that are O(1)or O(). In some cases we will neglect terms of O(

3

) in expansions of terms that are O( 2

). The occasional need for a higher order expansion arises from a theorem from [5] in which second order terms are kept to retain accuracy in a perturbation expansion for a singular value that is very small or exactly zero. All vector norms will be 2-norms. For a matrix X for which XX

T

isnonsingular we de ne the projection

P X =X T  XX T  1 X:

(7)

A= ^ A+E; B = ^ B+F we assume that AB T

has full rank and that ^ A ^ B T has rank m a 1. For some u satisfying kuk=1 let y= A T u kA T uk ; g =  P ? A B T BP ? A +kA T uk 2 I n  1 P ? A B T BA T u and f = B y 1 kA T uk g ! = kA T uk  BP ? A B T +kA T uk 2 I m b  1 BA T u: Then u T (A ug T )(B fy T ) T =0: (7) If u satis es  ma  AB T  BP ? A B T +kA T uk 2 I m b  1 BA T  =u T AB T  BP ? A B T +kA T uk 2 I m b  1 BA T u (8) where  ma

() is the smallesteigenvalue (singular value) of the m a

m a

matrix and kuk=1 and

^

A has full rank then forsuÆciently small perturbations E and F

kug T k 2 2;F +kyf T k 2 2;F =  ma  AB T  BP ? A B T +kA T uk 2 I m b  1 BA T  (9)  (kEk F +kFk F ) 2 +O( 3 ): (10) (11)

A solution to (8) with kuk=1 always exists.

The theorem shows thatA ug T

and B fy T

are a pairwith arankde cientproduct that is nearly asclose as possible to Aand B inthe sense of (6).

The conditions that make (10) validare hard to state ina more preciseform. This and the fact that the phrase \suÆciently small" must be interpreted in terms of the unknown matrices

^ A and

^

B pose signi cant problems in evaluating the quality of the expansion. Nevertheless, experiments suggest that (10) is often a signi cant improvement over other methods for ndingrank de ciency in amatrix product.

The theorem makes ve claims: the existenceof a solutionto(8), the equivalence of the two relations for f, the fact that u is a null vector of the perturbed matrix pair in (7), the equivalence of the eigenvalue 

ma

with the sums of the norms of the perturbations in (9) and the upper bound (10). The distance estimate (10) is the most diÆcult to verify. The presence of the term kA

T uk 2 I m b

(8)

of the theorem.

To prove (9)we willshowthat the equality

kug T k 2 2;F +kfy T k 2 2;F =u T AB T  BP ? A B T +kA T uk 2 I m b  1 BA T u

holds for any u with kuk =1. Since (8) holds by assumption this implies (9). To complete this proof we note that as it is de ned in the theorem g is the solutionto the least squares problem min g " 1 kA T uk BP ? A I n # g " 1 kA T uk BA T u 0 # 2 :

Fromthe presenceofthe projection P ? A

inthe rst m b

rows ofthe leastsquaresproblemand thepenaltyonthenormofginthelastnrows, itfollowsthatP

? A

g =g. Sincekuk=kyk=1 and since P

? A

g =g, the residual of the least squares problemis 1 kA T uk BP ? A g 1 kA T uk BA T u 2 +kgk 2 = 1 kA T uk Bg By 2 +kgk 2 =kug T k 2 2;F +kfy T k 2 2;F :

To makethe connection with the righthand sideof (9), we can get analternate formulafor this residual using the orthogonality property of least squares solutions. In particular

kug T k 2 2;F +kfy T k 2 2;F = h 1 kA T uk u T AB T 0 i " 1 kA T uk BA T u 0 # " 1 kA T uk BP ? A I n # g ! = u T AB T 1 kA T uk 2 I m b BP ? A kA T uk 2 1 kA T uk 2 P ? A B T BP ? A + I n ! 1 P ? A B T kA T uk 2 1 A BA T u = u T AB T  BP ? A B T +kA T uk 2 I m b  1 BA T u:

The nal equality follows fromthe Sherman-Morrison-Woodburymatrix inversion formula  X+YZ T  1 =X 1 X 1 Y  I+Z T X 1 Y  1 Z T X 1 :

Together with (8) this establishes (9).

Veri cationthatthetwoequationsforf areequivalentisbyuseofP ? A

g =g,substitution of the expressions for y and g and anotherapplication of the Sherman-Morrison-Woodbury formula. By harmlesslyinserting P

? A

intothe rst formulafor f weget

(9)

P ? A g =g and y2R (A T ),y T g =0 and u T (A ug T )(B fy T ) T = u T AB T g T B T u T Ay y T 1 kA T uk g T ! B T = u T A(I yy T )B T g T B T + u T Ay kA T uk g T B T =0:

The cancellations happen because u T Ay =kA T ukand u T A(I yy T )=kA T uky T P ? y T =0:

To show that asolution to(8) exists de ne

 m a ( )= m a  AB T  BP ? A B T + 2 I m b  1 BA T  for > 0. Let u ma

( ) be an eigenvector associated with  ma

( ). We need to show that there isan 0 such that kA T u ma ( 0 )k= 0 (12)

for some choice of the singular vector u ma ( 0 ). If u ma

( ) were unique (up to sign change) foreach and continuous asa function of , then the proof would be very simple. Since A by assumptionhas full rank

f( )=kA T

u ma

( )k >0

forsuÆcientlysmall . ForsuÆciently large ,f( )<0. Hencecontinuitywouldguarantee a solution for which f( ) = 0. Unfortunately a rigorous proof is complicated by possible discontinuities inu

ma

( ) thatcan occurwhen the eigenvalue  ma

( )is repeated. The proof of the followingresult uses onlybasic ideas fromanalysis together with the well known fact that a particular eigenvalue of a family of matrices varying continuously with >0 is also continuous in .

Theorem 2 If A has fullrank then (8) is satis ed forsome u normalized so that kuk=1.

Proof: As we have noted, we need to show that (12) holds for some 0 > 0. Since A has full rank f( )=min u ma kA T u ma ( )k >0

for all suÆciently small >0. We alsohave f( )< 0 for all suÆciently large >0. The minimum that de nes f( ) is taken over all possible choices of u

m a

( ) (i.e. all norm one vectorsinthe subspace spanned by theeigenvectors associatedwith 

ma ( )). Let 0 >0be de ned by 0 =supf jf( )0g:

From this de nition it follows that either f( 0 )  0 or in any interval [ 0 ; 0 ] there must be an in nite number of points for which f(

0

(10)

existsequences u k and k suchthatkA u k k k whereu k

isaneigenvector associatedwith  m a ( k ), k  0

for all k and k

! 0

. The continuity of eigenvalues implies that the bounded sequence u

k

has somesubsequence convergingto usuch that kA T

uk 0

whereu is aneigenvector associated with 

ma (

0 ). Similarly the de nition of

0

implies that f( ) < 0 for > 0

. From this the same argumentused toconstruct u implies that there isan eigenvector u such that

kA T uk 0 kA T uk:

It follows that there are scalars c and s such that c 2

+s 2

= 1 and u = cu+su is also an eigenvector associated with 

ma ( 0 ) and kA T uk= 0 .

At this point we have veri ed allof Theorem 1 except for (10). To nish the proof, we start with the followingperturbationexpansion for singularvaluesfrom [5].

Theorem 3 For a general m a m b matrix ^ D with SVD U T ^ DV = " U T 1 u T 2 # ^ D h V 1 v 2 V 3 i = " ^  0 0 0 ^ ma 0 # with U T U =I m a , V T V =I m b and invertible ^  ^ m a I m a 1 let U T HV = " G 11 g 12 G 13 g T 21 22 g T 23 # and h=^ ma g 12 +g 21 . Then D= ^

D+H has a singular value

 2 ma =(^ ma + 22 ) 2 +kg 21 k 2 +kg 23 k 2 +h T (^ 2 ma I ma 1  2 ) 1 h+O(kEk 3 ):

Byconsidering secondorder termsthis expansion can accurately characterizethe e ect ofa perturbation on a zero singular value ^

ma

= 0 so long as ^

 has full rank. In particular, if ^  ma =0 then  2 ma = 2 22 +kg 23 k 2 : (13) By assumption ^ A ^ B T has rank m a 1. If wede ne ^ C= ^ A ^ B T  ^ BP ? ^ A ^ B T +kA T uk 2 I m b  1 ^ B ^ A T

for some particular choice of u satisfying(8)and alsode ne

^ D= ^ A ^ B T  ^ BP ? ^ A ^ B T +kA T uk 2 I m b  1=2 thenboth ^ C and ^ Dhaverankm a 1. Consequently ^ asde nedintermsof ^ DinTheorem3 has full rank. This shows that the theorem can be applied to estimate the e ect of the perturbations E and F on^ m a = m a ( ^ D).

(11)

D=AB T  BP ? A B T +kA T uk 2 I m b  1=2

then we willderive anexpansion of the form

D= ^

D+H+O( 2

) (14)

where H isa perturbation de ned in terms of E and F. Note that in de ning D and ^ D we haveused the quantity kA

T

uk withthe perturbed Ain both cases. Usingthe expression we willderive for H and Theorem 3we willshow that if u satis es (8)then

 m a (C)= 2 ma (D)(kEk F +kFk F ) 2 +O( 3 )

where C isde ned interms of A and B ina manneranalogous to the de nition of ^ C. This willcomplete the proof of (10).

The matrix square rootas we have de ned itis nonunique. However since  ma

(D) does not depend onthechoiceofsquare root,weare freetochoose particularsquarerootsD and

^

D to make H suitably smallsubject only to the constraints DD T =C and ^ D ^ D T = ^ C. We start by choosing anarbitrary factorization of the form

^ S ^ S T =  ^ BP ? ^ A ^ B T +kA T ukI m b  1 :

This is guaranteed to exist since the inverse matrix is symmetric and positive de nite. We let ^ D= ^ A ^ B T ^ S:

For suÆciently smallE and F perturbing ^ A and

^

B we can choose a square rootS such that S = ^ S+H S +O( 2 ) wherekH S

k=O(). ItturnsoutthattheprecisemagnitudeofkH S

kisofnoconsequenceto theanalysis;allthatmattersisthattermsoftheformO(H

S

)areO( 2

)andareconsequently negligible.

We can expand D interms of E, F and H S as follows D =  ^ A+E  ^ B+F  T  ^ S+H S  = ^ A ^ B T ^ S+E ^ B T ^ S+ ^ AF T ^ S+ ^ A ^ B T H S +O( 2 ) = ^ A ^ B T ^ S+EP ^ A ^ B T ^ S+EP ? ^ A ^ B T ^ S+ ^ A F T ^ S+ ^ A ^ B T H B +O( 2 ) = ^ A ^ B T ^ S+H:

By the assumption that rank( ^ A ^ B T ) =m a , the matrix ^ D = ^ A ^ B T ^ S has rank m a 1 and a unique left nullvector u

2

that depends only on the leftnull space of ^ A ^ B T and not on ^ S or the value of kA T uk.

We consider Theorem 3 applied to the smallest singular value  ma

( ^

D) = 0 with the perturbation H. Using(13) we get

(12)

where u 2

isthe leftnull vector of D. Note that P ^ A ^ B T ^ S h v 2 V 3 i = ^ A T  ^ A ^ A T  T ^ D h v 2 V 3 i =0

sothatmany ofthe termsintheexpansionforH shareeithercommonleftorcommonright null vectors with

^ D. Consequently  2 ma (D) = u T 2 ( ^ AF T ^ S+EP ? ^ A ^ B T ^ S) h v 2 V 3 i 2 +O( 3 )   k ^ A T u 2 kk ^ Sk 2 kFk 2 +kP ? ^ A ^ B T ^ Sk 2 kEk 2  2 +O( 3 ):

ForE and F thatare suÆcientlysmallthe assumptionthat rank ( ^ A ^ B T ^ S)=rank( ^ A ^ B)= m a

1guaranteesthatuisaperturbedversionoftheleftnullvectoru 2 . Sinceu=u 2 +O() we have kA T u 2 k=kA T uk+O():

The quali cation that E and F must be suÆciently small is standard in any rst order perturbation expansion ofa singularvector associated with anisolated singularvalue.

The fact that k ^ Sk

2

1=kA T

uk+O() guarantees that

kA T u 2 kk ^ Sk 2 1+O(): Similarly kP ? ^ A ^ B T ^ Sk 2 1+O(): Thus  2 m a (D)  (kEk F +kFk F ) 2 +O( 3 )  2  kEk 2 F +kFk 2 F  +O( 3 ): (15) Since  2 ma (D)= m a

(C),at this point we have proven all the claims of Theorem 1. Wecan o erfurther justi cationfor the assumptionthat

^ A ^ B T has rankm a 1. Instead of letting E and F be arbitrary perturbations, wede ne

(E;F)= argmin f(E;F)jrank((A+E)(B+F) T )<mag kEk 2 F +kFk 2 F : (16) Thus d 2 (A;B)=kEk 2 F +kFk 2 F : With ^ A and ^

B de ned by (1), the matrix pair ^ A and

^

B is a closest pair to A and B for which ^ A ^ B T is rank de cient. Theorem 4 For AB T with rank m a

(13)

Proof: Any degreeof rankde ciency in ^ A

^

B impliesthat there exists avector usuch that u T ^ A ^ B T

=0 and kuk = 1. This in turn implies that orthogonal transformations U and W can be constructed togive

U T ^ AQ= " ^ a 11 0 ^ a 21 ^ A 22 # ; ^ BQ= h 0 ^ B 2 i (17) where Ue 1 = u and a 11

is a scalar. Since perturbing the nonzero elements of (17) only increases the Frobenius norm of the perturbations it follows from the assumed minimality of the perturbations that

U T AQ= " ^ a 11 e T ^ a 21 ^ A 22 # ; BQ= h f ^ B 2 i and E =U " 0 e T 0 0 # Q T ; F = h f 0 i Q T : (18)

These perturbations inducea higher orderrank de ciency in ^ A ^ B T only if rank( ^ A 22 ^ B T 2 )<m a 1:

However if that is the case and e6=0,then a strictly smaller pair of perturbations

E =0; F = h f 0 i Q T givesrank ( ^ A ^ B T )<m a

. This contradictstheassumed minimalityofE and F. Onthe other hand, if e=0then ^ A ^ B T =AB T AQ h f 0 i T

so that the perturbation to AB T

is rank one and cannot cause the rank of ^ A ^ B T to be less than m a 1.

ItfollowsfromthistheoremthattheclosestpairtoAandB witharankde cientproduct has rank( ^ A ^ B T ) = m a

1. This pair can serve as the ^ A and ^ B mentioned in Theorem 1 giving kug T k 2 2;F +kyf T k 2 2;F d 2 (A;B)+O( 3 ):

Whilethe observation that ^ A

^ B T

de ned inthis way always has rank m a

1 is comforting, it doesn't reallygive obviousbene t inevaluatingthe quality of the expansion(10).

The following example shows, the expansion fail dramatically if ^

A does not have full rank.

Example 2 Consider the matrix pair

A= h  0 i ; B = h 1 0 i (19)

for 0<1. Whilewe donot rigorously prove the fact,itis easyto show that the matrix pair that is closest toA and B in the sense of (6)is

(14)

resultsinanegligibleO( 2 )changeto ^ A ^ B T

. That Theorem1failsfollowsfromthe factthat for m a =1, we have u=1and  ma  AB T  BP ? A B T +kA T uk 2 I m b  1 BA T  =(0+ 2 I m b ) 1  =1:

The method of looking at the smallest singular value of B (1) 1 ,  1 (B (1) 1 ) = 1, fails in exactly thesamewayon(19). Howeverinlesscontrivedproblems,theexpansioninTheorem1often retains its accuracy even when A is very ill-conditioned and the smallest singular value of B

(1) 1

isordersof magnitudelarger thand(A;B). It isonlyinthe higherorderterms that ill-conditioninginA canhaveane ect on(10). In contrast, the e ect oferrors onthesmallest singular value of B

(1) 1

is often directly magni ed by a factor proportional to the condition number of A.

3 Finding the Null Vector

Theorem 1 characterizes nearly minimal perturbations in terms of a possibly nonunique solution to the nonlinear equation (8). In contrast to a direct optimization formulation in which we might have to worry about local minima, the perturbation analysis of the last section shows that any solution to (8) will give suitably small perturbations so long as the conditions on the validity of the expansion are not too close to being violated. Subject to these conditions, purely localinformationobtained by solving (8)providesan estimateof a globally minimumperturbationsgivingrank de ciency.

However solving (8) is not always atrivialtask. The singularvector associated with the singularvalue  m a  AB T  BP ? A B T + 2 I m b  1 BA T 

is not in general continuous as a function of the parameter at points where the singular value has multiplicitygreater than one. Because of this potentialdiscontinuity, a provably convergent iteration for solving (8) seems to be a remote possibility. Nevertheless, the left singular vector associated with 

ma (AB

T

) is often a good approximation to u. Making use of this starting vector, we propose the following iteration without any guarantees of convergence.

Algorithm 1 Takeu 0

tobetheleftsingularvector ofAB T associatedwith ma (AB T ). Let 0 =kA T u 0 k and letk =1. 1. Letu k

be the eigenvector vector de ned by

(15)

work. Thefollowingexampleillustratesthe potentialproblems thatcan arise withrepeated singularvalues. Example 3 Let A = " 1 0 0 0 0 2 0 0 # ; B = " 1 0 1 0 0 1 0 3 # :

It is easyto verify that the application of Algorithm1 tothis problem gives

( 1 ; 2 ; 3 ; 4 ;:::)=(1;2;1;2;:::):

The algorithmdoes not converge, but asolution to(8)occurs for = q 5=3 forwhich AB T  BP ? A B T + 2 I m b  1 BA T = " 3=8 0 0 3=8 #

has a repeated singularvalue and wecan choose

u= h p 7=3 p 2=3 i tosatisfy kA T uk 2 =5=3= 2 .

The example is somewhatcontrived. Algorithm1 seems toconverge in most cases.

4 Higher Order Rank De ciency

IfAand B comefromamodelofthe form(1)with rank( ^ A ^ B T )=r<m a 1wemightwish to nd perturbations E and F that satisfy

(E;F)= argmin f(E;F) j rank((A+E)(B+F) T )rg kEk 2 F +kFk 2 F : (20)

ThemostobviousmethodsforattemptingtodecideifAandB areconsistentwithamodelof theform(1)witharankrmatrixproductare naturalgeneralizationsofmethodsforthecase r = m

a

1. As discussed in x1 we can attempt to determine the number of non-negligible singularvaluesof eitherAB

T orB (1) 1 whereB (1) 1

isde nedasin(3). Thepotentialproblems with these rankdecisions are the same as before.

Theorem 1 does not admit an obvious generalizationto r < m a

1. As an alternative we will assume that perturbations that further reduce the rank m

a

1 product to rank r can be chosen tobeorthogonal to the perturbationsthat originally reduced the product to rank m

a

(16)

reveals the rank of AB in the sense of nding nearly minimal E and F. Nevertheless, we willgetanalgorithmthattypically ndssmallerperturbationsthancan befoundbylooking for smallsingularvalues inB

(1) 1

.

Thedistancemeasure(6)isinvariantunderorthogonaltransformations. Thuswearefree toapply transformationsof the form U

T

AQ and V T

BQ whereU, V and Q are orthogonal. Westart by with perturbations

~ E (1) and ~ F (1)

and a vector usuch that

u T  A+ ~ E (1)  B + ~ F (1)  T =0

with kuk = 1 and k ~ E (1) k 2 F +k ~ F (1) k 2 F

not much larger than d 2

(A;B). Our goal is to nd further perturbations E (2) and F (2) so that rank   A+ ~ E (1) +E (2)  B+ ~ F (1) +F (2)  T  =r (21)

with nearly minimal ~ E (1) +E (2) 2 F + ~ F (1) +F (2) 2 F .

Before describing our orthogonality assumption and attempting to construct E (2)

and F

(2)

weconsider the computation of ~ E (1) and ~ F (1)

. Theorem 1givesan explicit formulafor nearlyminimalperturbations. Howeverwecandoslightlybetter. InsteadofusingTheorem1 directly, we will use u solving (8) to construct alternate perturbations that can be slightly smallerthan those of the theorem. Thesere ned rankreducing perturbationsalsohave the advantage of more directly illuminating the algorithmic signi cance of the assumption of orthogonalitybetween the perturbations.

Suppose that u, g, y and f are de ned as inthe theorem. If

E (1) = ug T ; F (1) = fy T (22)

are the rankreducing perturbations then the theorem statesthat

u T  A+E (1)  B+F (1)  T =0:

Choose an orthogonalQ such that

u T  A+E (1)  Q=u T  A+E (1) h q 1 Q 2 i = h 0 T i

forsome scalar and anorthogonalU sothat U T

u=e 1

wheree 1

isthe rst standard basis vector. Then U T AQ= " ~ a 11 ~ g ~ a 21 ~ A 22 # and BQ= h ~ f ~ B 2 i where ~a 11 is a scalar, ~

f isa columnvector. Bythe constructionof Q

(17)

q T 1  B+F (1)  T =0 so that k ~ fk 2 = F (1) q 1 2 kfy T k F : Thusif ~ E (1) =U " 0 g~ T 0 0 # Q T ; ~ F (1) = h ~ f 0 i Q T (23) then u T  A+ ~ E (1)  B + ~ F (1)  T =0 and ~ E (1) 2 F + ~ F (1) 2 F  E (1) 2 F + F (1) 2 F :

We will use the perturbations ~ E (1) and ~ F (1)

to induce rank de ciency instead of E (1)

and F

(1) .

The following lemma shows that if the conditions that make the expansion from Theo-rem 1 validare satis edthen both these perturbationsare non-zero.

Lemma 1 Let u, g, f and y be de ned as in Theorem 1, E (1) and F (1) de ned by (22) and ~ E (1) and ~ F (1)

de ned by(23). The assumption

u T AB T 6=0 (24) implies that ~ E (1) 6=0 and ~ F (1) 6=0.

Proof: We extendour assumptionby notingthat it immediatelyimpliesu T A6=0 and also u T  A+E (1)  6=0:

The latter follows becauseTheorem 1implies P ? A g =g sothat R  A T  ?R  E (1) T  and u T  A+E (1)  =0 onlyif u T A =0.

Toprovethe lemma,itissuÆcient toprovethat ~ f 6=0andg~6=0. Wehave ~ f =Bq 1 and Bq 1 =  1 (A+E (1) ) T u  BA T u Bg  =  kA T uk (A+E (1) ) T u f:

By assumption Au 6= 0. The fact that f 6= 0 follows from the second expression for f in Theorem 1 and fromthe assumptionu

(18)

The orthogonality of Q implies that g~ = Q T 2 E (1) u = 0 only if E (1) u = q 1 . This is equivalentto E (1) T u= (A+E (1) ) T u  A+E (1)  T u or 0 @ 1 (A+E (1) ) T u 1 A E (1) T u= (A+E (1) ) T u A T u: Since E (1) T u = g, g ? R  A T  and u T

A 6= 0 this is impossible unless g = 0. The fact that g 6=0follows fromthe de nition of g and u

T AB

T

6=0. So wemust have g~6=0. Giventhede nitionoftheperturbations

~ E (1) and ~ F (1)

wemakethefollowingassumption toallowrecursive application of Theorem 1.

Assumption 1 Givenu determinedasin Theorem 1 satisfyingu T AB T 6=0weassumethat perturbations E (2) and F (2)

can be chosen so that in addition to (21) wehave

E (2) T E (1) =0; F (2) F (1) T =0 (25) with ~ E (1) +E (2) 2 F + ~ F (1) +F (2) 2 F = ~ E (1) 2 F + E (2) 2 F + ~ F (1) 2 F + F (2) 2 F  min rank( ^ A ^ B T )r kA ^ Ak 2 F +kB ^ Bk 2 F : (26)

Lemma 1 means that u T

AB T

6= 0 implies that the constraints (25) are nontrivial and are equivalentto E (2) T u=0; F (2) q 1 =0: (27)

To see how this assumptioncorresponds to the reduction ofthe order of the problemwe note that with U and Q constructedas before the orthogonality relationsimply that

E (2) =U " 0 0 e 21 E 22 # Q T ; F (2) =U h 0 F 2 i Q T :

Thuswehave the problemof nding minimalperturbations such that

r = rank  A+ ~ E (1) +E (2)  B + ~ F (1) +F (2)  = rank " ~ a 11 0 ~ a 21 +e 21 ~ A 22 +E 22 #" 0 ~ B T 2 +F T 2 # ! = rank  ( ~ A 22 +E 22 )( ~ B 2 +F 2 ) T  :

The orthogonalitycondition on the perturbationsto B impliesthat e 21

has noe ect on the rank. If it is nonzero it can only increase the norm of the perturbation. Consequently we can assume that

(19)

22 2 min rank(( ~ A22+E22)( ~ B2+F2) T )=r kE 22 k 2 F +kF 2 k 2 F :

This isthe recursivestep: given the perturbations ~ E (1) and ~ F (1)

that introduce a rst order rankde ciency,wetransformAand B by U and Qandcontinue recursively byreducing the rankof ~ A 22 ~ B T 2 .

Unfortunately, as we willshortly illustratewith a small example, the orthogonality and minimality conditions, (25) and (26), are not always consistent. Since it seems to be al-gorithmically necessary for the recursive application of Theorem 1, we will strictly enforce (25) andhopefor the best insimultaneously trying tosatisfy(26). Theinconsistency of the conditions will degrade the ability of the algorithmto nd the appropriate degree of rank de ciency in AB

T .

The orthogonalityrelationsareanalogoustothosethatapply torankreducing perturba-tions given by the Eckhart-Young theorem and the SVD: if E

(1)

is a minimal perturbation such that A+E

(1)

is rank de cient, the construction of rank reducing perturbations from the SVD shows that it ispossible to choose E

(2) with E (2) T E (1) =0 and E (1) E (2) T =0such thatE (1) +E (2)

isaminimalperturbationreducing therankofA tor. Our assumption(25) representanaiveattempttoextendthisorthogonalitypropertytoperturbationsofamatrix product.

Thefollowingexamplehighlightstheproblemwiththisapproachandwiththeunderlying assumptionthat(25)isconsistentwith(26). Forsimplicityweconsideranexampleforwhich rank de ciency is already present and u

T AB

T

= 0. Since this implies that ~ E (1) = 0 and ~ F (1)

=0, we use the alternate formof the orthogonality constraints given in(27).

Example 4 Consider the matrices

A= h A 1 0 i = 2 6 4 Æ 0 0 0 0 0 1 Æ 0 0 0 0 1 0 1 0 0 0 3 7 5 B = h B 1 B 2 i = 2 6 4 0 1 1 Æ 0 0 0 =Æ  0 Æ 0 0 0 0 0 0 Æ 3 7 5 where 0<Æ 1.

The product is already rankde cient and it is easytoverify that the vector

u T = h 1 0 0 i

is asolution to(8). Since no perturbationis required togiverank de ciency wecan choose

q T 1 = h 1 0 0 0 0 0 i :

(20)

E 2 = 2 6 4 0  0 0 0 0 0 0 0 3 7 5 ; F 1 = 2 6 4 0 0 0  0 0 0 0 0 3 7 5 then h A 1 E 2 i " B T 1 +F T 1 B T 2 # = 2 6 4 0 0 0 Æ 0 0 1 0 0 3 7 5

so that the matrix pair is within O() of a pair ^ A and

^

B for which rank ( ^ A ^ B T ) = 1. Our proposed algorithmtries tointroducerank de ciency into

~ A ~ B T

for the pair

~ A 22 = h  A 1 0 i = " Æ 0 0 0 0 0 1 0 0 0 # ~ B 2 = h  B 1  B 2 i = 2 6 4 1 1 Æ 0 0 =Æ  0 Æ 0 0 0 0 0 Æ 3 7 5 :

Weseek O()perturbationsE 22 and F 2 sothat rank   ~ A 22 +E 22  ~ B 2 +F 2  T  =1: Let E 22 = h  E 1  E 2 i ; F 2 = h  F 1  F 2 i : Weassume that  ~ A 22 +E 22  ~ B 2 +F 2  T = h  A 1 +  E 1  E 2 i "  B T 1 +  F T 1  B T 2 +  F T 2 # =  A 1  B T 1 +  A 1  F T 1 +  E 1  B T 1 +  E 2  B T 2 +O( 2 )

has rank 1. Since  B 2

= ÆI the Eckhart-Young theorem implies that the only way this can happen is if  A 1  B T 1 +  A 1  F T 1 +  E 1  B T 1 = " Æ  0 1  0 # + " Æ 0 0 1 #  F T 1 +  E 1 " 1 0 0 1 0 0 # +O( 2 )

has a singular value of O(Æ). If k  E 1 k =O() and k  F 1

k = O() then the rst two columns of this matrix have the form

X = " Æ+O() +O(Æ) 1+O() +O() # :

Withoutthe O()terms this matrix has rank1. It has aleftsingularvector associatedwith the smallest singularvalue of the form

(21)

 2 2 (X)= u T " O() +O(Æ) O() +O() #" 0 1 # 2 +O( 3 )=(+O(Æ)) 2 :

Ignoring the third columncan onlydecrease  2 so  2 (  A 1  B T 1 +  A 1  F T 1 +  E 1  B T 1 ) 2 (X) O(Æ)Æ: Thusif k  E 1 k andk  F 1

kare not muchlarger than then wemust have k  E 2

k. It can also be experimentally veri ed that Theorem 1 applied to

~ A 22 and ~ B 2

results in rank reducing perturbations that are much larger than .

5 Conclusions

Wehave proposed anew methodfordetecting nearrankde ciency inthe productAB T

. To rstorderitsperformance isprovablyinsensitivetoill-conditioningofthematrices. Itisalso insensitivetothepresence ofnearly intersectingsubspaces associatedwithmoderatelysmall singularvalues|asituationthat candegrade the accuracyof subspacesestimated using the product SVD.

We have also proposed a natural method for attempting to nd higher order rank de- ciency based on generalizing the orthogonality properties of SVD and the Eckhart-Young rankreducing perturbations. Unfortunately,the methodwasshown tobe inadequate inthe productcase. Findinganalgorithmfor reliablyrevealingthe rankofaproduct ofperturbed matricesremains anopen problem.

References

[1] B. De Moor and P. Van Dooren. Generalizations of the singular value and QR decom-positions. SIAM Journal of Matrix Analysis and Applications, 13:993{1014,1992.

[2] G.H.GolubandC.F.VanLoan. Matrix Computations. JohnsHopkinsUniversityPress, Baltimore, Maryland, 3nd edition,1996.

[3] C. C.Paige. SomeaspectsofgeneralizedQRfactorizations. InM.G.CoxandS.J. Ham-marling,editors,ReliableNumericalComputation,pages71{91,Oxford,1990.Clarendon Press.

[4] G.W.Stewart. Rankdegeneracy. SIAM JournalonScienti candStatisticalComputing, 5:403{413, 1984.

Figure

Updating...

References

Updating...

Related subjects :