YiSun,PeterTino,andIanNabney
NeuralComputingResear hGroup,AstonUniversity,
AstonTriangle,BirminghamB47ET
UnitedKingdom
fsuny,tinop,nabneyitgaston.a .uk
http://www.n rg.aston.a .uk/
Abstra t. WeanalysehowtheGenerativeTopographi Mapping(GTM)
anbe modiedto ope withmissingvaluesinthe trainingdata.Our
approa hisbasedonanExpe tation-Maximisation(EM)methodwhi h
estimates the parameters of the mixture omponentsand at the same
timedealswiththemissingvalues.Wein orporatethisalgorithmintoa
hierar hi alGTM.Weverifythemethodonatoydataset(usingasingle
GTM)andarealisti dataset(usingahierar hi alGTM). Theresults
showouralgorithm anhelpto onstru tinformativevisualisationplots,
evenwhensomeofthetrainingpointsare orruptedwithmissingvalues.
1 Introdu tion
Data visualisation, whi h plays akeyrole in developinggood models forlarge
quantitiesofdata,isanimportantaidindimensionredu tion,givesinformation
aboutlo aldeviations inperforman eandprovidesauseful he kforobje tive
quantitativemeasures.However,in manyappli ationstheinputdata is
in om-plete.Thereforeitisimportantto knowhowtousetheavailabledata andhow
tore onstru tthemissingvalues.Forexample,in thepharma euti aleld,
s i-entistsuse omputermodellingtoexamineandanalysethemole ularstru ture
of ompounds and high throughput s reening to assess their intera tion with
biologi altargets.Many ompounds arenots reenedagainsta ompleteset of
targets, yet wedo notwantto ex ludeallsu h ompounds from data analysis
sin ethatrisksmissingpotentialdrugs.
Thehierar hi algenerativetopographi mapping(GTM)modelisan
intera -tivedatavisualisationte hnique,whi henablestheuserto onstru tarbitrarily
detailedproje tionplots.Thebasi buildingblo kistheGTM[1℄.Theproblem
onsideredhereistotraintheGTMmodelwithin ompletedataandre onstru t
the missing values. This way the data, in ludingthe missing omponents, an
beshown in avisualisationplotthat isas\faithful" aspossible. For
hierar hi- al GTM,thein ompletedata anbedisplayedatalllevelsofthehierar hyof
visualisationplots.
set by using an EM algorithm.For visualisation purposes, the missing data is
lled in by omputing the posterior mean. In [2℄, the GTM was trained only
with omplete data,and an additional onditionwasaddedto re onstru t the
missingdata. In ontrast,ouralgorithmismoregeneri .
Sin eour algorithm is basedon Gaussian mixturemodels (GMM)and the
EM algorithm, in se tion2 webrie y introdu ethe EMalgorithm for GMMs.
The GTM with in omplete data algorithm is detailed in se tion 3. Se tion 4
givesa basi introdu tion to hierar hi alGTM. Weillustrate the algorithmin
se tion5withatoydataandahighdimensionaldatasetfrom owdiagnosti s
ofanoilpipeline.Se tion6dis ussestheresult.
2 The EM Algorithm for Gaussian Mixture Models
TheEMalgorithmisespe iallyrelevantsin eitisageneralmethod for
param-eterestimationin mixturemodelsthat isbasedontheideaofllinginmissing
data. This se tion introdu es brie y the algorithm for nding the maximum
likelihoodparametersofaGaussianmixturemodel[3℄.
We onsider amixturedensity
P(t n )= K X j=1 P(t n jj; j )P(j); (1)
whi hisgeneratingthe(i.i.d.)dataT=ft
n g
N
n=1
.Inthis aseea h omponentof
themixtureisdenotedbyj andparametrisedby
j
,andP(j)istheprior
prob-abilityforthemixture omponentj.Then theloglikelihoodofthe parameters
giventhedatasetis
L()= N X n=1 log K X j=1 P(t n jj; j )P(j): (2)
The binary indi ator variablesz
nj
are introdu ed to spe ify whi h omponent
of themixturegenerated thedata point. z
nj =1ifandonly ift n is generated by omponentj, otherwisez nj
=0.Thenequation(2) anbere-writtenasthe
ompletedataloglikelihoodfun tion:
L ()= N X n=1 K X j=1 z nj log [P(t n jz nj ;)P(z nj ;)℄: (3) Sin ez nj
isnotknown,theexpe tationE[z
nj jt n ; j ℄ofz nj
giventhe urrent
parameter values
j
is omputed. This is the probability that the Gaussian j
generatedthedatapointt
n
andisdenotedbyr
nj
.ThisistheE-stepoftheEM
algorithm: r nj = j j j 1=2 expf 1 2 (t n j ) T 1 j (t n j )gP(j) P K k =1 j k j 1=2 expf 1 (t n k ) T 1 k (t n k )gP(k) : (4)
The means
j
and ovarian ematri es
j
of thejth omponentGaussian are
updatedintheM-stepusingthedatasetweightedbyther
nj : t+1 j = P N n=1 r nj t n P N n=1 r nj (5) t+1 j = P N n=1 r nj (t n t+1 j )(t n t+1 j ) T P N n=1 r nj (6)
Theequationsaboveareforfull ovarian ematri es,buttherearesimilar
equa-tionsforother ovarian estru tures.
3 Generative Topographi Mapping and In omplete Data
3.1 The GenerativeTopographi Mapping
The generative topographi mapping(GTM) [1℄ is a nonlinear latent variable
modelthat useslatent(orhidden)variablesto modelaprobabilitydistribution
in the data spa e. It is a onstrained mixture of Gaussians whose parameters
areoptimisedusingtheexpe tation-maximisation(EM) algorithm.
FortheGTM,tdenotesthedatain aD-dimensionalEu lideanspa eandx
denotesthelatentvariablesinanL-dimensionallatentspa e.Consideringa
non-lineartransformationfromthelatentspa etothedataspa eusingaradialbasis
fun tionnetwork(seee.g.[4℄),thelatentdataismappedtodataspa ebyaradial
basisfun tiony=W(x)withweightsWandabasisfun tionmatrix.The
goalofthelatentvariablemodelisto ndarepresentationforthedistribution
p(t)intermsofanumberKoflatentpointsx
j
(j=1;2;:::K)and orresponding
Gaussiandistributions entredony (x
j
;W )[1℄.Thedatadensityisdenedby
P(tjW ;)= 1 K K X j=1 P(tjx j ;W ;) (7) and P(tjx j ;W ;)= 2 D=2 exp 2 ky (x j ;W ) tk 2 (8)
whereWand theinversevarian e anbettedbymaximumlikelihoodwith
theEMalgorithm.
Thelatent spa erepresentation ofthe point t
n
, i.e. the proje tion of t
n , is
taken to be the mean P K j=1 r nj x j
of the posteriordistribution on the latent
spa e.
3.2 In orporating missingvalues into the EM algorithm for the
GTM model
Tohandle missing valuesin the data set, wewrite data pointst
n as (t o n ;t m n ),
odenotesubve torsandsubmatri esoftheparametersmat hingthemissingand
observed omponentsofthedata. TheEMalgorithm treatsboth theindi ator
variables z
nj
and themissing inputst m
n
ashidden variables.FortheGTM, as
the ovarian ematrixis onstrainedtobeisotropi ,
j =
1
I,the ovarian e
of missing and observed values mo
j
is equal to 0. The expe ted value in the
E-step is taken with respe t to both sets of hidden variables. If we knew the
valuesoftheindi atorvariablesz
nj
,wewouldwrite thenegativeloglikelihood
fun tion as L(W ;)= N X n=1 K X j=1 z nj n D 2 ln(2) D 2 ln+ 2 h kt o n y o j k 2 + kt m n y m j k 2 io (9)
Aftertaking theexpe tation,thesuÆ ientstatisti s fortheparameters
in- ludethree unknownterms,z
nj ,z nj t m n andz nj t m n t m n
.Sowemust al ulatethe
expe tations forthese threeterms.Following[5℄,weintrodu e:
^ t m nj =E(t m n jz nj =1;t o n ; j )=(y m j ) old (10)
whi histhe least-squaresregressionbetweent m n and t o n predi tedbyGaussian
j,and`old' denotesthevalue omputedinthelastM-step.
Theexpe tation of z nj is E[z nj jt o n ; j ℄= r nj
(equation (4)) measuredonly
ontheobserveddimensionsoft
n .FortheGTM, we al ulate: E[z nj t m n jt o n ; j ℄=E[z nj jt o n ; j ℄E[t m n jz nj =1;t o n ; j ℄=r nj ^ t m nj =r nj (y m j ) old (11)
IntheM-step,themissingvaluesareexpressedusingtheposteriormeans:
E[t m n jt o n ; j ℄= K X j=1 r nj E[t m n jz nj =1;t o i ; j ℄ (12)
and the weights are then updated to W
new
as used way for GTM [1℄. The
varian eisupdatedby:
1 = 1 ND N X n=1 K X j=1 r nj kt o n y o j k 2 +E[z nj kt m n y m j k 2 ℄ (13) where E[z nj kt m n y m j k 2 ℄=E[kt m n y m j k 2 jz nj =1℄ =( 1 ) old +( ^ t m nj ) T ( ^ t m nj ) 2( ^ t m nj ) T y m j +(y m j ) T y m j (14) andy m =(W new (x j )) m .
4 Hierar hi al GTM
4.1 An introdu tion to hierar hi al GTM
Fora omplexdataset,asingletwo-dimensionalvisualisation plotmaynotbe
suÆ ientsin eitisdiÆ ultto apturealloftheinterestingaspe ts inthedata
set.Thereforeahierar hi alvisualisationsystemisdesirable.
Given a training data set T = ft
1 ;t
2 ;:::;t
N
g, the probability, assigned to
this set byahierar hyof GTMsorganised in hierar hi altreeT, is al ulated
by onsidering the hierar hi al GTM T asa mixture of GTMs [6℄, with
mix-ture omponentsbeingtheleavesM.Theparametersofthehierar hy(weights
W ,inversevarian e andparent- onditionalmixture oeÆ ients) arettedby
maximumlikelihood usingtheEM algorithm.Mixture oeÆ ientsforthe
mix-ture omponentsMare al ulatedre ursivelybymultiplyingparent- onditional
mixture oeÆ ientsdownthepathfromtheroottoM.
Givenadatapointt
n
andasubmodelMinthehierar hyT,wehavethree
typesofhiddenvariables:1)ResponsibilityofParent(M),theparentofM,for
generatingt
n
.2)Parent- onditionalresponsibilityfort
n
,giventhatParent(M)
generatedt
n
,and3)Responsibilityoflatentspa e entresx
j
ofMforgenerating
t
n .
Toavoidnumeri alproblemsarisingfrommultipli ationofsmallprobabilities
andtospeedupthetrainingpro ess,theGTMsondeeperlevelsaretrainedonly
ondatapointsforwhi htheparentmodelhasresponsibilitygreaterthansome
pre-setthreshold.Inourexperiments=10 3
.
4.2 Parameterinitialisation
Havingtrained GTMs down to level ` of thehierar hi al tree T, we hoose a
parentmodelN atlevel`and,basedonitsvisualisationplot,wesele t\regions
of interest" for hild GTMs Mat level`+1.More pre isely,the visualisation
plotoftheparentGTMN showslow-dimensionalrepresentationsinthelatent
spa eofdataspa epointsfromthetrainingset.
Theregionsofinterestaresele tedasfollows:Theuserrstsele tspoints
i ,
i=1;2;:::;A,inthelatentspa ethat orrespondto\ entres"ofthesubregions
the user is interested in.The points
i
are then transformed viathe map y
N
dened bytheparentGTMN tothedataspa e
y N ( i )=W N N ( i ) (15)
The regionsof interestare given bythe Voronoi ompartments [7℄in the data
spa e orrespondingtothepointsy
N ( i ),i=1;2;:::;A: V i = t2< D jd(t;y N ( i ))=min j d( t;y N ( j )) ; (16)
whered(;)istheEu lideandistan einthedataspa e< D
.Allpointsin V
i are
We initialise the parameters W
M
of hild GTMs M, so that ea h GTM
initiallyapproximatesprin ipal omponentanalysis(PCA)ofthe orresponding
Voronoi ompartment. For GTM M orresponding to a ompartment V
i , we
rstevaluatethe ovarian ematrixof trainingpointsin V
i
and obtaintherst
L prin ipaleigenve tors.Next,wedetermineW
M
byminimising theerror
E= 1 2 KM X j=1 kW M M (x M j ) U x M j k 2 ; (17)
wherethe olumnsofUaretherstLprin ipaleigenve torsofthedata
ovari-an ematrix(see[1℄).
Following[1℄,parameter
M
isinitialisedtobethelargeroftheL+1
eigen-valuefrom PCA, that representsthevarian eof thedata awayfrom the PCA
plane,orthesquareofhalfofthegridspa ingofthePCA-proje tedlatentdata
pointsindata spa e.
5 Experiment
Inourexperiments,GTMmodelsweretrainedintwoways:(A1)thealgorithm
dened in se tion 3:2 and (A2) standard EM applied to a dataset with the
missingvaluesrepla edbytheun onditionalmean.
5.1 The toy data
200trainingdatapointsweregeneratedrandomlyintheinterval[0;2℄ast
1 .The
variablet
2
wasthen omputedbythefun tiont
2 =t 1 +1:25sin(2t 1 ).Aspheri al
Gaussian noisewithstandarddeviation0.1 wasaddedto t
2
oordinates. Then
wedeleted 30%of thevalues in t
2
randomly. Figure 1 showsthe result using
A1andA2.Aftertraining,thenegativeloglikelihoodis1.62and2.66perdata
respe tively.
5.2 Oildata
Thisexamplearisesfromtheproblemofdeterminingthefra tionofoilina
multi-phase pipeline arrying amixture of oil, waterand gas. Thedata set onsists
of 1000 12-dimensional points. Points in the data set are lassied into three
dierentmulti-phase ow ongurations:homogeneous,annularandlaminar[8℄.
Figure2showsthevisualisationresults.A hierar hyofGTMs uptolevel3
wastrainedon thedata set.Foreverylevel,1515=225latent data points
weresele tedinthe2-dimensionallatentspa eandthenumberofGaussianbasis
fun tionsis4 4=16.Thenalvisualisationplotforthe omplete(un orrupted)
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
After 15 iterations of training.
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
After 15 iterations of training.
(a)UsingtheEMbasedalgorithm (b)Usingun onditionalmeanmethod
Fig.1. The toy problem: the omplete data points are plotted as ir les while the
entresoftheGaussianmixtureareplottedasplussigns.The entresare joinedbya
line a ording to their orderinginthe(one-dimensional)latent spa e(K =60).The
starsrepresentthemissingvalues.Thedis ssurroundingea hplussignrepresenttwo
standarddeviations'widthofthenoisemodel.
Werandomlydeleted 30%of valuesin thedata set.Themaximumnumber
of orrupted oordinates per data point is 6. Again we ompare the negative
log likelihood of A1and A2.Here wejust measured thevaluesof negativelog
likelihood for the top level GTM, sin e the likelihood for lower level models
dependsonwherethe\regionsofinterest"aresele ted.Forthein ompletedata
set,after10training y les,usingtheEMalgorithm,thenegativeloglikelihoodis
3:39perdatapoint,whileusingun onditionalmeanllinginthemissingdata,
thenegativelog likelihood is 1:31.UsingourEMbasedalgorithmfordealing
with missing values an indeed be bene ial as it an be seen by omparing
thetoplevel(root)visualisationplotsandthese ondvisualisationplotsonthe
se ondlevelofthehierar hy.Thesese ond-levelplotsshowbetterseparationof
lassesandmat hbetterto themodelstrainedonthe ompletedataset.
6 Con lusions
Inthispaper,wehaveshownhowin ompletedata anbein ludedin the
hier-ar hi alGTMtraining.Thealgorithmfordealingwithmissingvaluesbasedon
the EMalgorithm and Gaussian mixture models is aviableapproa h for data
Referen es
1. C. M. Bishop, M. Svensen, and C.K.I. Williams. GTM: TheGenerative T
opo-graphi Mapping. NeuralComputation,10(1):215{235,1998.
2. M.
A. Carreira-Perpi~nan. Re onstru tion of Sequential Data with Probabilisti
Models and Continuity Constraints. In SaraA. Solla, Todd K.Leen, and
Klaus-RobertMuller, editors, Advan es in Neural Information Pro essing Systems,
vol-ume12. TheMITPress, 2000.
3. A.P.Dempster,N.M.Laird,andD.B.Rubin.MaximumLikelihoodfromIn omplete
DataviatheEMAlgorithm. J.Roy.Stat. So .B,39:1{38,1977.
4. C.M. Bishop. Neural Networks forPattern Re ognition. Oxford UniversityPress,
NewYork,N.Y.,1995.
5. Z.GhahramaniandM.I.Jordan.Learningfromin ompletedata.Te hni alreport,
AILaboratory,MIT,1994.
6. P.Tino and I.Nabney. Constru ting lo alized non-linearproje tion manifoldsin
aprin ipledway:hierar hi alGenerativeTopographi Mapping. Te hni alreport,
2000.
7. F. Aurenhammer. Voronoi diagrams - survey of a fundamental geometri data
stru ture. ACMComputingSurveys,3:345{405, 1991.
8. C.M.Bishop andG.D. James. AnalysisofMulti-phaseFlows UsingDual-energy
GammaDensitometryand NeuralNetworks. Nu learInstrumentsandMethodsin
1
2
3
Homogeneous
Annular
Laminar
1
2
3
(a)1
2
3
Homogeneous
Annular
Laminar
1
2
3
1
2
3
Homogeneous
Annular
Laminar
1
2
3
(b) ( )Fig.2.Datavisualisationfor oildatabyusinghierar hi alGTM. Plot(a)shows the
result of training on the omplete data set. Plot (b) shows the result of using the
EMalgorithm learningfromin ompletedata,while plot( )shows thesamedataset