Analysis and an algorithm
Andrew Y.Ng
CS Division
U.C.Berkeley
ang s.berkeley.edu
Mi hael I. Jordan
CSDiv. &Dept. of Stat.
U.C. Berkeley
jordan s.berkeley.edu
Yair Weiss
S hoolofCS &Engr.
TheHebrewUniv.
yweiss s.huji.a .il
Abstra t
Despitemanyempiri alsu essesof spe tral lustering methods|
algorithms that luster points using eigenve tors of matri es de-
rived from the data|there are several unresolved issues. First,
there are a wide variety of algorithms that use the eigenve tors
in slightlydierent ways. Se ond,manyof these algorithms have
noproofthat theywill a tually omputea reasonable lustering.
In this paper, we present a simple spe tral lustering algorithm
that anbeimplementedusing afew linesofMatlab. Usingtools
from matrix perturbation theory, we analyze the algorithm, and
give onditions under whi h it an be expe ted to do well. We
also show surprisingly good experimental results on anumber of
hallenging lustering problems.
1 Introdu tion
The task of nding good lusters has been the fo us of onsiderable resear h in
ma hinelearningandpatternre ognition. For lusteringpointsinR n
|amainap-
pli ationfo us ofthis paper|one standardapproa his basedongenerativemod-
els, in whi h algorithms su h asEM are used to learna mixture density. These
approa hessuerfrom severaldrawba ks. First,touseparametri densityestima-
tors,harshsimplifyingassumptionsusuallyneedtobemade(e.g.,thatthedensity
ofea h lusterisGaussian). Se ond,theloglikelihood anhavemanylo alminima
and thereforemultiple restartsare requiredtond agood solutionusing iterative
algorithms. Algorithmssu hasK-meanshavesimilar problems.
A promising alternative that hasre ently emergedin anumberof elds is to use
spe tral methods for lustering. Here, one uses the top eigenve tors of a matrix
derivedfrom thedistan ebetweenpoints. Su h algorithmshavebeensu essfully
used in manyappli ations in luding omputervision andVLSI design [5,1℄. But
despite their empiri al su esses, dierent authors still disagree on exa tlywhi h
eigenve torsto use and how to derive lusters from them (see [11℄ for a review).
Also,theanalysisofthesealgorithms,whi hwebrie yreviewbelow,hastendedto
fo usonsimpliedalgorithmsthat onlyuseoneeigenve torat atime.
theeigenve torisseenasasolvingarelaxationofanNP-harddis retegraphparti-
tioningproblem[3℄,andit an beshownthat utsbasedonthese ondeigenve tor
give a guaranteed approximation to the optimal ut [9, 3℄. This analysis an be
extendedto lustering bybuildingaweightedgraphinwhi hnodes orrespondto
datapointsandedgesarerelatedtothedistan ebetweenthepoints. Sin ethema-
jorityofanalysesinspe tralgraphpartitioningappeartodealwithpartitioningthe
graphinto exa tlytwoparts,these methodsare thentypi allyapplied re ursively
to nd k lusters (e.g. [9℄). Experimentallyit hasbeenobserved that using more
eigenve torsanddire tly omputingak waypartitioningisbetter(e.g.[5,1℄).
Here, we build upon the re ent work of Weiss [11℄ and Meila and Shi [6℄, who
analyzedalgorithmsthat usekeigenve torssimultaneouslyin simplesettings. We
propose a parti ular manner to use the k eigenve tors simultaneously, and give
onditionsunder whi h thealgorithm anbeexpe ted todowell.
2 Algorithm
GivenasetofpointsS =fs
1
;:::;s
n gin R
l
thatwewantto lusterintoksubsets:
1. Formthe aÆnity matrixA 2 R nn
dened by A
ij
= exp( jjs
i s
j jj
2
=2
2
) if
i6=j,andA
ii
=0.
2. Dene D to be the diagonal matrixwhose (i;i)-element is the sumof A's i-th
row,and onstru tthematrixL=D 1=2
AD 1=2
. 1
3. Find x
1
;x
2
;:::;x
k
, the k largest eigenve tors of L ( hosen to be orthogonal
to ea h other in the ase of repeated eigenvalues), and form the matrixX =
[x
1 x
2 :::x
k
℄2R nk
bysta kingtheeigenve torsin olumns.
4. FormthematrixY fromXbyrenormalizingea hofX'srowstohaveunitlength
(i.e. Y
ij
=X
ij
=(
P
j X
2
ij )
1=2
).
5. Treatingea hrowofY asapointinR k
, lusterthemintok lustersviaK-means
oranyotheralgorithm(thatattemptstominimizedistortion).
6. Finally,assigntheoriginalpoints
i
to lusterjifandonlyifrowiofthematrix
Y wasassignedto lusterj.
Here, thes aling parameter 2
ontrols how rapidly theaÆnityA
ij
falls o with
the distan e betweens
i and s
j
, and wewill laterdes ribe amethod for hoosing
it automati ally. We also note that this is only one of a large family of possible
algorithms,andlaterdis usssomerelatedmethods(e.g.,[6℄).
At rst sight, this algorithm seems to make little sense. Sin e we run K-means
in step 5, whynot just apply K-meansdire tly to thedata? Figure 1e showsan
example. Thenatural lustersin R 2
do not orrespondto onvexregions, andK-
meansrundire tlyndstheunsatisfa tory lusteringinFigure1i. Buton ewemap
the pointsto R k
(Y's rows), theyform tight lusters (Figure 1h) from whi h our
method obtainsthegood lusteringshownin Figure1e. Wenote thatthe lusters
inFigure 1hlieat90 Æ
to ea hotherrelativeto theorigin( f.[8℄).
1
Readersfamiliarwithspe tralgraphtheory[3 ℄maybemorefamiliarwiththeLapla-
ianI L. Butasrepla ingLwithI Lwould ompli ateourlaterdis ussion,andonly
hangestheeigenvalues(fromito1 i)andnottheeigenve tors,weinsteaduseL.
3.1 Informaldis ussion: The \ideal" ase
Tounderstandthealgorithm,itisinstru tiveto onsideritsbehaviorinthe\ideal"
aseinwhi h allpointsindierent lustersareinnitelyfarapart. Forthesakeof
dis ussion, suppose that k=3,and that thethree lusters of sizes n
1 ,n
2 andn
3
areS
1 ,S
2 ,andS
3 (S=S
1 [S
2 [S
3 ,n=n
1 +n
2 +n
3
). Tosimplifyourexposition,
also assume that the points in S = fs
1
;:::;s
n
g are ordered a ordingto whi h
luster theyare in,sothat therstn
1
pointsare in luster S
1
,thenextn
2 in S
2 ,
et . We will also use \j 2 S
i
" as a shorthand for s
j 2 S
i
. Moving the lusters
\innitely"farapart orrespondsto zeroingall theelementsA
ij
orrespondingto
pointss
i ands
j
indierent lusters. Morepre isely,dene
^
A
ij
=0ifx
i andx
j are
in dierent lusters, and
^
A
ij
=A
ij
otherwise. Alsolet
^
L,
^
D,
^
X and
^
Y bedened
asin thepreviousalgorithm,butstartingwith
^
AinsteadofA. Notethat
^
Aand
^
L
arethereforeblo k-diagonal:
^
A= 2
4 A
(11)
0 0
0 A
(22)
0
0 0 A
(33) 3
5
;
^
L= 2
4
^
L (11)
0 0
0
^
L (22)
0
0 0
^
L (33)
3
5
(1)
wherewehaveadoptedthe onventionofusingparenthesizedsupers riptstoindex
into subblo ks of ve tors/matri es, and
^
L (ii)
= (
^
D (ii)
) 1=2
A (ii)
(
^
D (ii)
) 1=2
. Here,
^
A (ii)
=A (ii)
2R n
i
n
i
isthematrixof\intra- luster"aÆnitiesfor lusteri. Forfu-
tureuse,alsodene
^
d (i)
2R ni
tobetheve tor ontaining
^
D (ii)
'sdiagonalelements,
and
^
d2R n
to ontain
^
D'sdiagonalelements.
To onstru t
^
X,wend
^
L'srst k=3eigenve tors. Sin e
^
Lisblo kdiagonal,its
eigenvaluesandeigenve torsaretheunionoftheeigenvaluesandeigenve torsofits
blo ks(the latterpadded appropriately withzeros). It isstraightforwardto show
that
^
L (ii)
has a stri tly positive prin ipal eigenve tor x (i)
1 2 R
ni
with eigenvalue
1. Also, sin eA (ii)
jk
> 0(j 6=k),the next eigenvalue is stri tly lessthan 1. (See,
e.g.,[3℄). Thus,sta king
^
L'seigenve torsin olumnsto obtain
^
X,wehave:
^
X = 2
6
4 x
(1)
1
~
0
~
0
~
0 x
(2)
1
~
0
~
0
~
0 x
(3)
1 3
7
5 2R
n3
: (2)
A tually, asubtletyneeds to be addressedhere. Sin e 1is a repeated eigenvalue
in
^
L, we ouldjust aseasily havepi ked anyother 3orthogonalve torsspanning
thesamesubspa eas
^
X's olumns,anddenedthemtobeourrst3eigenve tors.
That is,
^
X ouldhavebeenrepla ed by
^
XR for anyorthogonal matrixR2R 33
(R T
R=R R T
=I). Notethatthisimmediatelysuggeststhatoneuse onsiderable
aution in attempting to interpret the individual eigenve torsof L, asthe hoi e
of
^
X's olumns is arbitrary upto arotation, and an easily hange due to small
perturbations to A oreven dieren es in the implementation of the eigensolvers.
Instead, what we an reasonably hope to guarantee about the algorithm will be
arrivedat notby onsidering the(unstable) individual olumns of
^
X,but instead
thesubspa e spannedbythe olumnsof
^
X,whi h anbe onsiderablymorestable.
Next,whenwerenormalizeea hof
^
X'srowstohaveunitlength,weobtain:
^
Y = 2
4
^
Y (1)
^
Y (2)
^
Y (3)
3
5
= 2
4
~
1
~
0
~
0
~
0
~
1
~
0
~
0
~
0
~
1 3
5
R (3)
where wehave used
^
Y (i)
2 R n
i
k
to denote the i-th subblo k of
^
Y. Letting y^ (i)
denotethej-throwof
^
Y (i)
,wethereforeseethaty^
j
isthei-throwoftheorthogonal
matrixR . Thisgivesusthefollowingproposition.
Proposition 1 Let
^
A's o-diagonal blo ks
^
A (ij)
, i 6= j, be zero. Also assume
that ea h luster S
i
is onne ted.
2
Then thereexistkorthogonal ve torsr
1
;:::;r
k
(r T
i r
j
=1if i=j,0otherwise)sothat
^
Y'srowssatisfy
^ y (i)
j
=r
i
(4)
for alli=1;:::;k; j =1;:::;n
i .
Inother words, there are k mutuallyorthogonal pointson thesurfa eof theunit
k-sphere aroundwhi h
^
Y'srowswill luster. Moreover,these lusters orrespond
exa tlyto thetrue lusteringoftheoriginaldata.
3.2 The general ase
Inthegeneral ase,A'so-diagonalblo ksarenon-zero,butwestillhopetore over
guaranteessimilarto Proposition 1. ViewingE =A
^
A asaperturbation tothe
\ideal"
^
AthatresultsinA=
^
A+E,weask: When anweexpe ttheresultingrows
ofY to luster similarlyto therowsof
^
Y? Spe i ally, whenwilltheeigenve tors
ofL,whi hwenowviewasaperturbedversionof
^
L,be\ lose"tothoseof
^
L?
Matrixperturbationtheory[10℄indi atesthatthestabilityoftheeigenve torsofa
matrixisdeterminedbytheeigengap. Morepre isely,thesubspa espannedby
^
L's
rst3eigenve torswillbestableto small hanges to
^
Lifand onlyiftheeigengap
Æ=j
3
4
j, thedieren ebetweenthe3rdand4theigenvaluesof
^
L,is large. As
dis ussed previously, the eigenvaluesof
^
L is theunion ofthe eigenvaluesof
^
L (11)
,
^
L (22)
,and
^
L (33)
,and
3
=1. Letting (i)
j
bethej-thlargesteigenvalueof
^
L (ii)
,we
thereforeseethat
4
=max
i
(i)
2
. Hen e,theassumptionthatj
3
4
jbelargeis
exa tlytheassumptionthat max
i
(i)
2
beboundedawayfrom1.
AssumptionA1. There existsÆ>0so that,foralli=1;:::;k, (i)
2
1 Æ.
Note that (i)
2
depends onlyon
^
L (ii)
,whi hin turn depends onlyonA (ii)
=
^
A (ii)
,
thematrixofintra- lustersimilaritiesfor luster S
i
. Theassumptionon (i)
2 hasa
verynaturalinterpretation inthe ontextof lustering. Informally,it apturesthe
ideathatifwewantanalgorithmtondthe lustersS
1
;S
2 andS
3
,thenwerequire
that ea h of these sets S
i
really look likea \tight" luster. Consider an example
in whi h S
1
= S
1:1 [S
1:2
, where S
1:1 and S
1:2
are themselvestwowell separated
lusters. Then S =S
1:1 [S
1:2 [S
2 [S
3
lookslike (at least)four lusters, and it
would beunreasonabletoexpe t analgorithm to orre tlyguesswhat partitionof
thefour lustersintothree subsetswehadin mind.
This onne tionbetweentheeigengapandthe ohesivenessoftheindividual lusters
anbeformalizedin anumberofways.
AssumptionA1.1. DenetheCheeger onstant [3℄ofthe lusterS
i to be
h(S
i
)=min
I P
j2I;k 62I A
(ii)
jk
minf P
j2I
^
d (i)
j
; P
k 62I
^
d (i)
k g
: (5)
where theouter minimumis overallindex subsetsI f1;:::;n
i
g. Assume that
thereexists Æ>0sothat (h(S
i ))
2
=2Æ foralli.
2
This onditionissatisedby
^
A (ii)
>0(j6=k),whi histrueinour ase.
Assumption A1. Re all that
^
d (i)
j
= P
k A
(ii)
jk
hara terizes how \well onne ted"
orhow\similar" point j is to the other points in the same luster. The term in
the min
I
fg hara terizes how well(I;I) partitions S
i
into two subsets, and the
minimumoverIpi ksoutthebestsu hpartition. Spe i ally,ifthereisapartition
of S
i
'spointssothat theweightof theedgesa rossthe partitionis small, and so
thatea hofthepartitions hasmoderatelylarge\volume"(sumof
^
d (i)
j
's),thenthe
Cheeger onstantwill besmall. Thus,the assumptionthat theCheeger onstants
h(S
i
)belargeisexa tlythatthe lustersS
i
behardto splitintotwosubsets.
We an also relatethe eigengap to the mixing time of arandom walk(as in [6℄)
denedonthepointsofa luster,inwhi hthe han eoftransitioningfrompointi
toj isproportionalto A
ij
, sothatwetendtojump to nearby-points. Assumption
A1 is equivalent to assuming that, for su h a walk dened on the points of any
oneofthe lustersS
i
,the orrespondingtransitionmatrixhasse ondeigenvalueat
most1 Æ. Themixingtimeofarandomwalkisgovernedbythese ondeigenvalue;
thus,thisassumptionisexa tlythatthewalksmixrapidly. Intuitively,thiswillbe
truefor tight(or at leastfairly \well onne ted") lusters,and untrueif a luster
onsistsof twowell-separatedsetsof pointssothat therandom walktakesalong
timetotransitionfromonehalfofthe lustertotheother. AssumptionA1 analso
berelated to theexisten e of multiple pathsbetweenany twopoints in the same
luster.
AssumptionA2. Thereissomexed
1
>0,sothatforeveryi
1
;i
2
2f1;:::;kg,
i
1 6=i
2
,wehavethat
P
j2Si
1 P
k 2Si
2 A
2
jk
^
d
j
^
d
k
1
: (6)
Togainintuitionaboutthis, onsider the aseoftwo\dense" lusters i
1 andi
2 of
size (n) ea h. Sin e
^
d
j
measures how \ onne ted" point j is to other points in
thesame luster,itwill be
^
d
j
=(n)in this ase,sothesum,whi hisoverO(n 2
)
terms, is in turn divided by
^
d
j
^
d
k
=(n 2
). Thus, aslongasthe individual A
jk 's
aresmall,thesumwillalsobesmall,andtheassumptionwillhold withsmall
1 .
Whereas
^
d
j
measures how onne ted s
j 2 S
i
is to the rest of S
i ,
P
k :k 62S
i A
jk
measures how onne teds
j
is to pointsin other lusters. Thenextassumption is
thatallpointsmustbemore onne tedtopointsinthesame lusterthantopoints
inother lusters;spe i ally,that theratiobetweenthesetwoquantitiesbesmall.
AssumptionA3. Forsomexed
2
>0,foreveryi=1;:::;k,j2S
i
,wehave:
P
k :k 62S
i A
jk
^
dj
2
P
k ;l2Si A
2
k l
^
dk
^
dl
1=2
(7)
Forintuition aboutthis assumption, again onsider the aseof densely onne ted
lusters(aswedidpreviously). Here,thequantityinparenthesesontherighthand
side isO(1), so thisbe omes equivalent todemanding that thefollowingratio be
small: ( P
k :k 62Si A
jk )=
^
d
j
=( P
k :k 62Si A
jk )=(
P
k :k 2Si A
jk
)=O(
2 ).
Assumption A4. There is some onstant C >0 sothat for everyi =1;:::;k,
j=1;:::;n
i
,wehave
^
d (i)
j
( P
ni
k =1
^
d (i)
k )=(Cn
i ).
Thislastassumptionisafairlybenignonethatnopointsina lusterbe\toomu h
less" onne tedthanotherpointsinthesame luster.
Theorem2 LetassumptionsA1,A2,A3andA4hold. Set= p
k(k 1)
1 +k
2
.
IfÆ>(2+ 2),thenthereexistk orthogonalve torsr
1
;::: ;r
k (r
T
i r
j
=1ifi=j,
0otherwise)sothat Y'srowssatisfy
1
n k
X
i=1 n
i
X
j=1 jjy
(i)
j r
i jj
2
2
4C
4+2 p
k
2
2
(Æ p
2) 2
: (8)
Thus,therowsofY willform tight lustersaroundkwell-separatedpoints(at90 Æ
fromea hother)onthesurfa eofthek-spherea ordingtotheir\true" lusterS
i .
4 Experiments
Totestouralgorithm,weappliedittoseven lusteringproblems. Notethatwhereas
2
waspreviouslydes ribedasahuman-spe iedparameter,theanalysisalsosug-
gests a parti ularly simple way of hoosing it automati ally: For the right 2
,
Theorem 2predi ts thatthe rowsofY willform k \tight" lustersonthe surfa e
of the k-sphere. Thus, we simply sear h over 2
, and pi k the value that, after
lustering Y's rows, gives the tightest (smallest distortion) lusters. K-means in
Step5ofthealgorithmwasalsoinexpensivelyinitializedusingthepriorknowledge
that the lusters areabout90 Æ
apart.
3
Theresultsof ouralgorithmare shown in
Figure 1a-g. Giving the algorithm only the oordinates of the points and k, the
dierent lustersfoundareshownin theFigureviathedierentsymbols(and ol-
ors,whereavailable). Theresultsaresurprisingly good: Even for lustersthat do
notform onvexregions orthat are not leanly separated (su h asin Figure 1g),
thealgorithm reliablynds lusterings onsistentwith what ahumanwouldhave
hosen.
We note that there are other, related algorithms that an givegood results ona
subset of these problems, but we are aware of no equally simple algorithm that
angiveresults omparabletothese. Forexample,wenotedearlierhowK-means
easilyfailswhen lustersdonot orrespondto onvexregions(Figure1i). Another
alternativemaybeasimple\ onne ted omponents"algorithmthat,forathreshold
,drawsanedgebetweenpointss
i ands
j
wheneverjjs
i s
j jj
2
,andtakesthe
resulting onne ted omponentstobethe lusters. Here, isaparameterthat an
(say)be optimizedto obtain thedesired numberof lusters k. Theresult of this
algorithmonthethree ir les-joineddatasetwithk=3isshowninFigure 1j.
One of the\ lusters" it found onsists of a singleton point at (1:5;2). It is lear
thatthis methodisverynon-robust.
Wealso ompareourmethodtothealgorithmofMeilaandShi[6℄(seeFigure1k).
Theirmethod is similarto ours, ex eptfor theseemingly osmeti dieren e that
theynormalizeA'srowstosumto1anduseitseigenve torsinsteadofL's, anddo
notrenormalizetherowsofX tounit length. Arenementofouranalysissuggests
thatthismethodmightbesus eptibletobad lusteringswhenthedegreetowhi h
dierent lustersare onne ted( P
j
^
d (i)
j
)variessubstantiallya ross lusters.
3
Brie y, we let the rst luster entroid be a randomly hosen row of Y, and then
repeatedly hoose as the next entroid the row of Y that is losest to being 90 Æ
from
all the entroids(formally, from the worst- ase entroid)already pi ked. The resulting
K-meanswasrunonlyon e(norestarts)togivetheresultspresented. K-meanswiththe
more onventionalrandominitializationandasmallnumberofrestartsalsogaveidenti al
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
nips, 8 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
lineandballs, 3 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
fourclouds, 2 clusters
(a) (b) ( )
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
squiggles, 4 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
twocircles, 2 clusters
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
threecircles−joined, 2 clusters
(d) (e) (f)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
threecircles−joined, 3 clusters
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
Rows of Y (jittered, randomly subsampled) for twocircles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
two circles, 2 clusters (K−means)
(g) (h) (i)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
threecircles−joined, 3 clusters (connected components)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
lineandballs, 3 clusters (Meila and Shi algorithm)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
nips, 8 clusters (Kannan et al. algorithm)
(j) (k) (l)
Figure1: Clustering examples,with lusters indi atedbydierent symbols(and olors,
whereavailable). (a-g)Resultsfromouralgorithm,wheretheonlyparametervarieda ross
runswask. (h)Rows ofY (jittered, subsampled)for two ir les dataset. (i)K-means.
(j)A\ onne ted omponents"algorithm. (k)MeilaandShialgorithm. (l)Kannanetal.
Therearesomeintriguingsimilaritiesbetweenspe tral lusteringmethodsandKer-
nelPCA,whi hhasbeenempiri allyobservedtoperform lustering[7,2℄. Themain
dieren ebetweentherststepsofouralgorithmandKernelPCAwithaGaussian
kernelis thenormalizationof A(to form L)and X. These normalizationsdoim-
provetheperforman eofthealgorithm,butitisalsostraightforwardtoextendour
analysisto prove onditionsunder whi hKernel PCAwillindeedgive lustering.
While dierent in detail, Kannan et al.[4℄ givean analysis of spe tral lustering
thatalsomakesuseofmatrixperturbationtheory,forthe aseofanaÆnitymatrix
with row sums equal to one. They also present a lustering algorithm based on
k singular ve tors, one that diers from ours in that it identies lusters with
individual singular ve tors. In our experiments, that algorithm very frequently
gavepoor results(e.g.,Figure 1l).
A knowledgments
WethankMarina Meilafor helpful onversationsaboutthis work. Wealsothank
Ali e Zheng for helpful omments. A. Ng is supported by a Mi rosoft Resear h
fellowship. Thisworkwasalso supportedbyagrantfrom IntelCorporation,NSF
grantIIS-9988642,andONRMURI N00014-00-1-0637.
Referen es
[1℄ C.Alpert,A.Kahng,andS.Yao. Spe tralpartitioning: Themoreeigenve tors,the
better. Dis reteAppliedMath,90:3{26,1999.
[2℄ N.Christianini, J.Shawe-Taylor,and J.Kandola. Spe tralkernelmethodsfor lus-
tering. InNeuralInformationPro essingSystems14,2002.
[3℄ F.Chung. Spe tralGraphTheory. Number92inCBMSRegionalConferen eSeries
inMathemati s.Ameri anMathemati alSo iety,1997.
[4℄ R. Kannan, S. Vempala, and A. Vetta. On lusterings|good, bad and spe tral.
InPro eedings ofthe41st Annual SymposiumonFoundationsof Computer S ien e,
2000.
[5℄ J.Malik,S.Belongie,T.Leung,andJ.Shi. Contour andtextureanalysisfor image
segmentation.InPer eptualOrganizationforArti ialVisionSystems.Kluwer,2000.
[6℄ M.MeilaandJ.Shi.Learningsegmentationbyrandomwalks.InNeuralInformation
Pro essingSystems13,2001.
[7℄ B.S holkopf, A.Smola,andK.-RMuller. Nonlinear omponentanalysisasakernel
eigenvalueproblem. NeuralComputation,10:1299{1319, 1998.
[8℄ G.S ottandH.Longuet-Higgins. Featuregroupingbyrelo alisationofeigenve tors
oftheproximitymatrix. InPro .British Ma hineVisionConferen e,1990.
[9℄ D. Spielman and S. Teng. Spe tral partitioning works: Planar graphs and nite
element meshes. InPro eedings of the 37th Annual Symposium on Foundations of
ComputerS ien e,1996.
[10℄ G.W.StewartandJ.-G.Sun. MatrixPerturbationTheory. A ademi Press, 1990.
[11℄ Y.Weiss. Segmentationusingeigenve tors: A unifyingview. InInternationalCon-
feren e onComputer Vision,1999.