CiteSeerX — On Spectral Clustering: Analysis and an algorithm

(1)

Analysis and an algorithm

Andrew Y.Ng

CS Division

U.C.Berkeley

ang s.berkeley.edu

Mi hael I. Jordan

CSDiv. &Dept. of Stat.

U.C. Berkeley

jordan s.berkeley.edu

Yair Weiss

S hoolofCS &Engr.

TheHebrewUniv.

yweiss s.huji.a .il

Abstra t

Despitemanyempiri alsu essesof spe tral lustering methods|

algorithms that luster points using eigenve tors of matri es de-

rived from the data|there are several unresolved issues. First,

there are a wide variety of algorithms that use the eigenve tors

in slightlydierent ways. Se ond,manyof these algorithms have

noproofthat theywill a tually omputea reasonable lustering.

In this paper, we present a simple spe tral lustering algorithm

that anbeimplementedusing afew linesofMatlab. Usingtools

from matrix perturbation theory, we analyze the algorithm, and

give onditions under whi h it an be expe ted to do well. We

also show surprisingly good experimental results on anumber of

hallenging lustering problems.

1 Introdu tion

The task of nding good lusters has been the fo us of onsiderable resear h in

ma hinelearningandpatternre ognition. For lusteringpointsinR n

|amainap-

pli ationfo us ofthis paper|one standardapproa his basedongenerativemod-

els, in whi h algorithms su h asEM are used to learna mixture density. These

approa hessuerfrom severaldrawba ks. First,touseparametri densityestima-

tors,harshsimplifyingassumptionsusuallyneedtobemade(e.g.,thatthedensity

ofea h lusterisGaussian). Se ond,theloglikelihood anhavemanylo alminima

and thereforemultiple restartsare requiredtond agood solutionusing iterative

algorithms. Algorithmssu hasK-meanshavesimilar problems.

A promising alternative that hasre ently emergedin anumberof elds is to use

spe tral methods for lustering. Here, one uses the top eigenve tors of a matrix

derivedfrom thedistan ebetweenpoints. Su h algorithmshavebeensu essfully

used in manyappli ations in luding omputervision andVLSI design [5,1℄. But

despite their empiri al su esses, dierent authors still disagree on exa tlywhi h

eigenve torsto use and how to derive lusters from them (see [11℄ for a review).

Also,theanalysisofthesealgorithms,whi hwebrie yreviewbelow,hastendedto

fo usonsimpliedalgorithmsthat onlyuseoneeigenve torat atime.

(2)

theeigenve torisseenasasolvingarelaxationofanNP-harddis retegraphparti-

tioningproblem[3℄,andit an beshownthat utsbasedonthese ondeigenve tor

give a guaranteed approximation to the optimal ut [9, 3℄. This analysis an be

extendedto lustering bybuildingaweightedgraphinwhi hnodes orrespondto

datapointsandedgesarerelatedtothedistan ebetweenthepoints. Sin ethema-

jorityofanalysesinspe tralgraphpartitioningappeartodealwithpartitioningthe

graphinto exa tlytwoparts,these methodsare thentypi allyapplied re ursively

to nd k lusters (e.g. [9℄). Experimentallyit hasbeenobserved that using more

eigenve torsanddire tly omputingak waypartitioningisbetter(e.g.[5,1℄).

Here, we build upon the re ent work of Weiss [11℄ and Meila and Shi [6℄, who

analyzedalgorithmsthat usekeigenve torssimultaneouslyin simplesettings. We

propose a parti ular manner to use the k eigenve tors simultaneously, and give

onditionsunder whi h thealgorithm anbeexpe ted todowell.

2 Algorithm

GivenasetofpointsS =fs

1

;:::;s

n gin R

l

thatwewantto lusterintoksubsets:

1. Formthe aÆnity matrixA 2 R nn

dened by A

ij

= exp( jjs

i s

j jj

2

=2

2

) if

i6=j,andA

ii

=0.

2. Dene D to be the diagonal matrixwhose (i;i)-element is the sumof A's i-th

row,and onstru tthematrixL=D 1=2

AD 1=2

. 1

3. Find x

1

;x

2

;:::;x

k

, the k largest eigenve tors of L ( hosen to be orthogonal

to ea h other in the ase of repeated eigenvalues), and form the matrixX =

[x

1 x

2 :::x

k

℄2R nk

bysta kingtheeigenve torsin olumns.

4. FormthematrixY fromXbyrenormalizingea hofX'srowstohaveunitlength

(i.e. Y

ij

=X

ij

=(

P

j X

2

ij )

1=2

).

5. Treatingea hrowofY asapointinR k

, lusterthemintok lustersviaK-means

oranyotheralgorithm(thatattemptstominimizedistortion).

6. Finally,assigntheoriginalpoints

i

to lusterjifandonlyifrowiofthematrix

Y wasassignedto lusterj.

Here, thes aling parameter 2

ontrols how rapidly theaÆnityA

ij

falls o with

the distan e betweens

i and s

j

, and wewill laterdes ribe amethod for hoosing

it automati ally. We also note that this is only one of a large family of possible

algorithms,andlaterdis usssomerelatedmethods(e.g.,[6℄).

At rst sight, this algorithm seems to make little sense. Sin e we run K-means

in step 5, whynot just apply K-meansdire tly to thedata? Figure 1e showsan

example. Thenatural lustersin R 2

do not orrespondto onvexregions, andK-

meansrundire tlyndstheunsatisfa tory lusteringinFigure1i. Buton ewemap

the pointsto R k

(Y's rows), theyform tight lusters (Figure 1h) from whi h our

method obtainsthegood lusteringshownin Figure1e. Wenote thatthe lusters

inFigure 1hlieat90 Æ

to ea hotherrelativeto theorigin( f.[8℄).

1

Readersfamiliarwithspe tralgraphtheory[3 ℄maybemorefamiliarwiththeLapla-

ianI L. Butasrepla ingLwithI Lwould ompli ateourlaterdis ussion,andonly

hangestheeigenvalues(fromito1 i)andnottheeigenve tors,weinsteaduseL.

(3)

3.1 Informaldis ussion: The \ideal" ase

Tounderstandthealgorithm,itisinstru tiveto onsideritsbehaviorinthe\ideal"

aseinwhi h allpointsindierent lustersareinnitelyfarapart. Forthesakeof

dis ussion, suppose that k=3,and that thethree lusters of sizes n

1 ,n

2 andn

3

areS

1 ,S

2 ,andS

3 (S=S

1 [S

2 [S

3 ,n=n

1 +n

2 +n

3

). Tosimplifyourexposition,

also assume that the points in S = fs

1

;:::;s

n

g are ordered a ordingto whi h

luster theyare in,sothat therstn

1

pointsare in luster S

1

,thenextn

2 in S

2 ,

et . We will also use \j 2 S

i

" as a shorthand for s

j 2 S

i

. Moving the lusters

\innitely"farapart orrespondsto zeroingall theelementsA

ij

orrespondingto

pointss

i ands

j

indierent lusters. Morepre isely,dene

^

A

ij

=0ifx

i andx

j are

in dierent lusters, and

^

A

ij

=A

ij

otherwise. Alsolet

^

L,

^

D,

^

X and

^

Y bedened

asin thepreviousalgorithm,butstartingwith

^

AinsteadofA. Notethat

^

Aand

^

L

arethereforeblo k-diagonal:

^

A= 2

4 A

(11)

0 0

0 A

(22)

0

0 0 A

(33) 3

5

;

^

L= 2

4

^

L (11)

0 0

0

^

L (22)

0

0 0

^

L (33)

3

5

(1)

wherewehaveadoptedthe onventionofusingparenthesizedsupers riptstoindex

into subblo ks of ve tors/matri es, and

^

L (ii)

= (

^

D (ii)

) 1=2

A (ii)

(

^

D (ii)

) 1=2

. Here,

^

A (ii)

=A (ii)

2R n

i

n

i

isthematrixof\intra- luster"aÆnitiesfor lusteri. Forfu-

tureuse,alsodene

^

d (i)

2R ni

tobetheve tor ontaining

^

D (ii)

'sdiagonalelements,

and

^

d2R n

to ontain

^

D'sdiagonalelements.

To onstru t

^

X,wend

^

L'srst k=3eigenve tors. Sin e

^

Lisblo kdiagonal,its

eigenvaluesandeigenve torsaretheunionoftheeigenvaluesandeigenve torsofits

blo ks(the latterpadded appropriately withzeros). It isstraightforwardto show

that

^

L (ii)

has a stri tly positive prin ipal eigenve tor x (i)

1 2 R

ni

with eigenvalue

1. Also, sin eA (ii)

jk

> 0(j 6=k),the next eigenvalue is stri tly lessthan 1. (See,

e.g.,[3℄). Thus,sta king

^

L'seigenve torsin olumnsto obtain

^

X,wehave:

^

X = 2

6

4 x

(1)

1

~

0

~

0

~

0 x

(2)

1

~

0

~

0

~

0 x

(3)

1 3

7

5 2R

n3

: (2)

A tually, asubtletyneeds to be addressedhere. Sin e 1is a repeated eigenvalue

in

^

L, we ouldjust aseasily havepi ked anyother 3orthogonalve torsspanning

thesamesubspa eas

^

X's olumns,anddenedthemtobeourrst3eigenve tors.

That is,

^

X ouldhavebeenrepla ed by

^

XR for anyorthogonal matrixR2R 33

(R T

R=R R T

=I). Notethatthisimmediatelysuggeststhatoneuse onsiderable

aution in attempting to interpret the individual eigenve torsof L, asthe hoi e

of

^

X's olumns is arbitrary upto arotation, and an easily hange due to small

perturbations to A oreven dieren es in the implementation of the eigensolvers.

Instead, what we an reasonably hope to guarantee about the algorithm will be

arrivedat notby onsidering the(unstable) individual olumns of

^

X,but instead

thesubspa e spannedbythe olumnsof

^

X,whi h anbe onsiderablymorestable.

Next,whenwerenormalizeea hof

^

X'srowstohaveunitlength,weobtain:

^

Y = 2

4

^

Y (1)

^

Y (2)

^

Y (3)

3

5

= 2

4

~

1

~

0

~

0

~

0

~

1

~

0

~

0

~

0

~

1 3

5

R (3)

where wehave used

^

Y (i)

2 R n

i

k

to denote the i-th subblo k of

^

Y. Letting y^ (i)

(4)

denotethej-throwof

^

Y (i)

,wethereforeseethaty^

j

isthei-throwoftheorthogonal

matrixR . Thisgivesusthefollowingproposition.

Proposition 1 Let

^

A's o-diagonal blo ks

^

A (ij)

, i 6= j, be zero. Also assume

that ea h luster S

i

is onne ted.

2

Then thereexistkorthogonal ve torsr

1

;:::;r

k

(r T

i r

j

=1if i=j,0otherwise)sothat

^

Y'srowssatisfy

^ y (i)

j

=r

i

(4)

for alli=1;:::;k; j =1;:::;n

i .

Inother words, there are k mutuallyorthogonal pointson thesurfa eof theunit

k-sphere aroundwhi h

^

Y'srowswill luster. Moreover,these lusters orrespond

exa tlyto thetrue lusteringoftheoriginaldata.

3.2 The general ase

Inthegeneral ase,A'so-diagonalblo ksarenon-zero,butwestillhopetore over

guaranteessimilarto Proposition 1. ViewingE =A

^

A asaperturbation tothe

\ideal"

^

AthatresultsinA=

^

A+E,weask: When anweexpe ttheresultingrows

ofY to luster similarlyto therowsof

^

Y? Spe i ally, whenwilltheeigenve tors

ofL,whi hwenowviewasaperturbedversionof

^

L,be\ lose"tothoseof

^

L?

Matrixperturbationtheory[10℄indi atesthatthestabilityoftheeigenve torsofa

matrixisdeterminedbytheeigengap. Morepre isely,thesubspa espannedby

^

L's

rst3eigenve torswillbestableto small hanges to

^

Lifand onlyiftheeigengap

Æ=j

3

4

j, thedieren ebetweenthe3rdand4theigenvaluesof

^

L,is large. As

dis ussed previously, the eigenvaluesof

^

L is theunion ofthe eigenvaluesof

^

L (11)

,

^

L (22)

,and

^

L (33)

,and

3

=1. Letting (i)

j

bethej-thlargesteigenvalueof

^

L (ii)

,we

thereforeseethat

4

=max

i

(i)

2

. Hen e,theassumptionthatj

3

4

jbelargeis

exa tlytheassumptionthat max

i

(i)

2

beboundedawayfrom1.

AssumptionA1. There existsÆ>0so that,foralli=1;:::;k, (i)

2

1 Æ.

Note that (i)

2

depends onlyon

^

L (ii)

,whi hin turn depends onlyonA (ii)

=

^

A (ii)

,

thematrixofintra- lustersimilaritiesfor luster S

i

. Theassumptionon (i)

2 hasa

verynaturalinterpretation inthe ontextof lustering. Informally,it apturesthe

ideathatifwewantanalgorithmtondthe lustersS

1

;S

2 andS

3

,thenwerequire

that ea h of these sets S

i

really look likea \tight" luster. Consider an example

in whi h S

1

= S

1:1 [S

1:2

, where S

1:1 and S

1:2

are themselvestwowell separated

lusters. Then S =S

1:1 [S

1:2 [S

2 [S

3

lookslike (at least)four lusters, and it

would beunreasonabletoexpe t analgorithm to orre tlyguesswhat partitionof

thefour lustersintothree subsetswehadin mind.

This onne tionbetweentheeigengapandthe ohesivenessoftheindividual lusters

anbeformalizedin anumberofways.

AssumptionA1.1. DenetheCheeger onstant [3℄ofthe lusterS

i to be

h(S

i

)=min

I P

j2I;k 62I A

(ii)

jk

minf P

j2I

^

d (i)

j

; P

k 62I

^

d (i)

k g

: (5)

where theouter minimumis overallindex subsetsI f1;:::;n

i

g. Assume that

thereexists Æ>0sothat (h(S

i ))

2

=2Æ foralli.

2

This onditionissatisedby

^

A (ii)

>0(j6=k),whi histrueinour ase.

(5)

Assumption A1. Re all that

^

d (i)

j

= P

k A

(ii)

jk

hara terizes how \well onne ted"

orhow\similar" point j is to the other points in the same luster. The term in

the min

I

fg hara terizes how well(I;I) partitions S

i

into two subsets, and the

minimumoverIpi ksoutthebestsu hpartition. Spe i ally,ifthereisapartition

of S

i

'spointssothat theweightof theedgesa rossthe partitionis small, and so

thatea hofthepartitions hasmoderatelylarge\volume"(sumof

^

d (i)

j

's),thenthe

Cheeger onstantwill besmall. Thus,the assumptionthat theCheeger onstants

h(S

i

)belargeisexa tlythatthe lustersS

i

behardto splitintotwosubsets.

We an also relatethe eigengap to the mixing time of arandom walk(as in [6℄)

denedonthepointsofa luster,inwhi hthe han eoftransitioningfrompointi

toj isproportionalto A

ij

, sothatwetendtojump to nearby-points. Assumption

A1 is equivalent to assuming that, for su h a walk dened on the points of any

oneofthe lustersS

i

,the orrespondingtransitionmatrixhasse ondeigenvalueat

most1 Æ. Themixingtimeofarandomwalkisgovernedbythese ondeigenvalue;

thus,thisassumptionisexa tlythatthewalksmixrapidly. Intuitively,thiswillbe

truefor tight(or at leastfairly \well onne ted") lusters,and untrueif a luster

onsistsof twowell-separatedsetsof pointssothat therandom walktakesalong

timetotransitionfromonehalfofthe lustertotheother. AssumptionA1 analso

berelated to theexisten e of multiple pathsbetweenany twopoints in the same

luster.

AssumptionA2. Thereissomexed

1

>0,sothatforeveryi

1

;i

2

2f1;:::;kg,

i

1 6=i

2

,wehavethat

P

j2Si

1 P

k 2Si

2 A

2

jk

^

d

j

^

d

k

1

: (6)

Togainintuitionaboutthis, onsider the aseoftwo\dense" lusters i

1 andi

2 of

size (n) ea h. Sin e

^

d

j

measures how \ onne ted" point j is to other points in

thesame luster,itwill be

^

d

j

=(n)in this ase,sothesum,whi hisoverO(n 2

)

terms, is in turn divided by

^

d

j

^

d

k

=(n 2

). Thus, aslongasthe individual A

jk 's

aresmall,thesumwillalsobesmall,andtheassumptionwillhold withsmall

1 .

Whereas

^

d

j

measures how onne ted s

j 2 S

i

is to the rest of S

i ,

P

k :k 62S

i A

jk

measures how onne teds

j

is to pointsin other lusters. Thenextassumption is

thatallpointsmustbemore onne tedtopointsinthesame lusterthantopoints

inother lusters;spe i ally,that theratiobetweenthesetwoquantitiesbesmall.

AssumptionA3. Forsomexed

2

>0,foreveryi=1;:::;k,j2S

i

,wehave:

P

k :k 62S

i A

jk

^

dj

2

P

k ;l2Si A

2

k l

^

dk

^

dl

1=2

(7)

Forintuition aboutthis assumption, again onsider the aseof densely onne ted

lusters(aswedidpreviously). Here,thequantityinparenthesesontherighthand

side isO(1), so thisbe omes equivalent todemanding that thefollowingratio be

small: ( P

k :k 62Si A

jk )=

^

d

j

=( P

k :k 62Si A

jk )=(

P

k :k 2Si A

jk

)=O(

2 ).

Assumption A4. There is some onstant C >0 sothat for everyi =1;:::;k,

j=1;:::;n

i

,wehave

^

d (i)

j

( P

ni

k =1

^

d (i)

k )=(Cn

i ).

Thislastassumptionisafairlybenignonethatnopointsina lusterbe\toomu h

less" onne tedthanotherpointsinthesame luster.

Theorem2 LetassumptionsA1,A2,A3andA4hold. Set= p

k(k 1)

1 +k

2

.

(6)

IfÆ>(2+ 2),thenthereexistk orthogonalve torsr

1

;::: ;r

k (r

T

i r

j

=1ifi=j,

0otherwise)sothat Y'srowssatisfy

1

n k

X

i=1 n

i

X

j=1 jjy

(i)

j r

i jj

2

4C

4+2 p

k

2

(Æ p

2) 2

: (8)

Thus,therowsofY willform tight lustersaroundkwell-separatedpoints(at90 Æ

fromea hother)onthesurfa eofthek-spherea ordingtotheir\true" lusterS

i .

4 Experiments

Totestouralgorithm,weappliedittoseven lusteringproblems. Notethatwhereas

2

waspreviouslydes ribedasahuman-spe iedparameter,theanalysisalsosug-

gests a parti ularly simple way of hoosing it automati ally: For the right 2

,

Theorem 2predi ts thatthe rowsofY willform k \tight" lustersonthe surfa e

of the k-sphere. Thus, we simply sear h over 2

, and pi k the value that, after

lustering Y's rows, gives the tightest (smallest distortion) lusters. K-means in

Step5ofthealgorithmwasalsoinexpensivelyinitializedusingthepriorknowledge

that the lusters areabout90 Æ

apart.

3

Theresultsof ouralgorithmare shown in

Figure 1a-g. Giving the algorithm only the oordinates of the points and k, the

dierent lustersfoundareshownin theFigureviathedierentsymbols(and ol-

ors,whereavailable). Theresultsaresurprisingly good: Even for lustersthat do

notform onvexregions orthat are not leanly separated (su h asin Figure 1g),

thealgorithm reliablynds lusterings onsistentwith what ahumanwouldhave

hosen.

We note that there are other, related algorithms that an givegood results ona

subset of these problems, but we are aware of no equally simple algorithm that

angiveresults omparabletothese. Forexample,wenotedearlierhowK-means

easilyfailswhen lustersdonot orrespondto onvexregions(Figure1i). Another

alternativemaybeasimple\ onne ted omponents"algorithmthat,forathreshold

,drawsanedgebetweenpointss

i ands

j

wheneverjjs

i s

j jj

2

,andtakesthe

resulting onne ted omponentstobethe lusters. Here, isaparameterthat an

(say)be optimizedto obtain thedesired numberof lusters k. Theresult of this

algorithmonthethree ir les-joineddatasetwithk=3isshowninFigure 1j.

One of the\ lusters" it found onsists of a singleton point at (1:5;2). It is lear

thatthis methodisverynon-robust.

Wealso ompareourmethodtothealgorithmofMeilaandShi[6℄(seeFigure1k).

Theirmethod is similarto ours, ex eptfor theseemingly osmeti dieren e that

theynormalizeA'srowstosumto1anduseitseigenve torsinsteadofL's, anddo

notrenormalizetherowsofX tounit length. Arenementofouranalysissuggests

thatthismethodmightbesus eptibletobad lusteringswhenthedegreetowhi h

dierent lustersare onne ted( P

j

^

d (i)

j

)variessubstantiallya ross lusters.

3

Brie y, we let the rst luster entroid be a randomly hosen row of Y, and then

repeatedly hoose as the next entroid the row of Y that is losest to being 90 Æ

from

all the entroids(formally, from the worst- ase entroid)already pi ked. The resulting

K-meanswasrunonlyon e(norestarts)togivetheresultspresented. K-meanswiththe

more onventionalrandominitializationandasmallnumberofrestartsalsogaveidenti al

(7)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

nips, 8 clusters

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

lineandballs, 3 clusters

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

fourclouds, 2 clusters

(a) (b) ( )

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

squiggles, 4 clusters

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

twocircles, 2 clusters

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

threecircles−joined, 2 clusters

(d) (e) (f)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

threecircles−joined, 3 clusters

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

Rows of Y (jittered, randomly subsampled) for twocircles

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

two circles, 2 clusters (K−means)

(g) (h) (i)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

threecircles−joined, 3 clusters (connected components)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

lineandballs, 3 clusters (Meila and Shi algorithm)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

nips, 8 clusters (Kannan et al. algorithm)

(j) (k) (l)

Figure1: Clustering examples,with lusters indi atedbydierent symbols(and olors,

whereavailable). (a-g)Resultsfromouralgorithm,wheretheonlyparametervarieda ross

runswask. (h)Rows ofY (jittered, subsampled)for two ir les dataset. (i)K-means.

(j)A\ onne ted omponents"algorithm. (k)MeilaandShialgorithm. (l)Kannanetal.

(8)

Therearesomeintriguingsimilaritiesbetweenspe tral lusteringmethodsandKer-

nelPCA,whi hhasbeenempiri allyobservedtoperform lustering[7,2℄. Themain

dieren ebetweentherststepsofouralgorithmandKernelPCAwithaGaussian

kernelis thenormalizationof A(to form L)and X. These normalizationsdoim-

provetheperforman eofthealgorithm,butitisalsostraightforwardtoextendour

analysisto prove onditionsunder whi hKernel PCAwillindeedgive lustering.

While dierent in detail, Kannan et al.[4℄ givean analysis of spe tral lustering

thatalsomakesuseofmatrixperturbationtheory,forthe aseofanaÆnitymatrix

with row sums equal to one. They also present a lustering algorithm based on

k singular ve tors, one that diers from ours in that it identies lusters with

individual singular ve tors. In our experiments, that algorithm very frequently

gavepoor results(e.g.,Figure 1l).

A knowledgments

WethankMarina Meilafor helpful onversationsaboutthis work. Wealsothank

Ali e Zheng for helpful omments. A. Ng is supported by a Mi rosoft Resear h

fellowship. Thisworkwasalso supportedbyagrantfrom IntelCorporation,NSF

grantIIS-9988642,andONRMURI N00014-00-1-0637.

Referen es

[1℄ C.Alpert,A.Kahng,andS.Yao. Spe tralpartitioning: Themoreeigenve tors,the

better. Dis reteAppliedMath,90:3{26,1999.

[2℄ N.Christianini, J.Shawe-Taylor,and J.Kandola. Spe tralkernelmethodsfor lus-

tering. InNeuralInformationPro essingSystems14,2002.

[3℄ F.Chung. Spe tralGraphTheory. Number92inCBMSRegionalConferen eSeries

inMathemati s.Ameri anMathemati alSo iety,1997.

[4℄ R. Kannan, S. Vempala, and A. Vetta. On lusterings|good, bad and spe tral.

InPro eedings ofthe41st Annual SymposiumonFoundationsof Computer S ien e,

2000.

[5℄ J.Malik,S.Belongie,T.Leung,andJ.Shi. Contour andtextureanalysisfor image

segmentation.InPer eptualOrganizationforArti ialVisionSystems.Kluwer,2000.

[6℄ M.MeilaandJ.Shi.Learningsegmentationbyrandomwalks.InNeuralInformation

Pro essingSystems13,2001.

[7℄ B.S holkopf, A.Smola,andK.-RMuller. Nonlinear omponentanalysisasakernel

eigenvalueproblem. NeuralComputation,10:1299{1319, 1998.

[8℄ G.S ottandH.Longuet-Higgins. Featuregroupingbyrelo alisationofeigenve tors

oftheproximitymatrix. InPro .British Ma hineVisionConferen e,1990.

[9℄ D. Spielman and S. Teng. Spe tral partitioning works: Planar graphs and nite

element meshes. InPro eedings of the 37th Annual Symposium on Foundations of

ComputerS ien e,1996.

[10℄ G.W.StewartandJ.-G.Sun. MatrixPerturbationTheory. A ademi Press, 1990.

[11℄ Y.Weiss. Segmentationusingeigenve tors: A unifyingview. InInternationalCon-

feren e onComputer Vision,1999.