• No results found

Identifying significant edges in graphical models of molecular networks.

N/A
N/A
Protected

Academic year: 2021

Share "Identifying significant edges in graphical models of molecular networks."

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

ContentslistsavailableatSciVerseScienceDirect

Artificial

Intelligence

in

Medicine

j o ur na l h o mep a g e :w w w . e l s e v i e r . c o m / l o c a t e / a i i m

Identifying

significant

edges

in

graphical

models

of

molecular

networks

Marco

Scutari

a,∗

,

Radhakrishnan

Nagarajan

b

aGeneticsInstitute,UniversityCollegeLondon,DarwinBuilding,GowerStreet,WC1E6BTLondon,UnitedKingdom

bDivisionofBiomedicalInformatics,DepartmentofBiostatistics,CollegeofPublicHealth,UniversityofKentucky,725RoseStreet,MultidisciplinaryScienceBldg,230F,Lexington,KY

40536-0082,USA

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received6December2011

Receivedinrevisedform

14December2012 Accepted16December2012 Keywords: Graphicalmodels Bayesiannetworks Modelaveraging L1norm Molecularnetworks

a

b

s

t

r

a

c

t

Objective:Modellingtheassociationsfromhigh-throughputexperimentalmoleculardatahasprovided

unprecedentedinsightsintobiologicalpathwaysandsignallingmechanisms.Graphicalmodelsand networkshaveespeciallyproventobeusefulabstractionsinthisregard.Adhocthresholdsareoften usedinconjunctionwithstructurelearningalgorithmstodeterminesignificantassociations.Thepresent studyovercomesthislimitationbyproposingastatisticallymotivatedapproachforidentifyingsignificant associationsinanetwork.

Methodsandmaterials:Anewmethodthatidentifiessignificantassociationsingraphicalmodelsby

estimatingthethresholdminimisingtheL1normbetweenthecumulativedistributionfunction(CDF)

oftheobservededgeconfidencesandthoseofitsasymptoticcounterpartisproposed.Theeffectiveness oftheproposedmethodisdemonstratedonpopularsyntheticdatasetsaswellaspubliclyavailable experimentalmoleculardatacorrespondingtogeneandproteinexpressionprofiles.

Results:Theimprovedperformanceoftheproposedapproachisdemonstratedacrossthesyntheticdata

setsusingsensitivity,specificityandaccuracyasperformancemetrics.Theresultsarealso demon-stratedacrossvaryingsamplesizesandthreedifferentstructurelearningalgorithmswithwidelyvarying assumptions.Inallcases,theproposedapproachhasspecificityandaccuracycloseto1,whilesensitivity increaseslinearlyinthelogarithmofthesamplesize.Theestimatedthresholdsystematically outper-formscommonadhoconesintermsofsensitivitywhilemaintainingcomparablelevelsofspecificityand accuracy.Networksfromexperimentaldatasetsarereconstructedaccuratelywithrespecttotheresults fromtheoriginalpapers.

Conclusion:Currentstudiesusestructurelearningalgorithmsinconjunctionwithadhocthresholds

foridentifyingsignificantassociationsingraphicalabstractionsofbiologicalpathwaysandsignalling mechanisms.Suchanadhocchoicecanhavepronouncedeffectonattributingbiologicalsignificanceto theassociationsintheresultingnetworkandpossibledownstreamanalysis.Thestatisticallymotivated approachpresentedinthisstudyhasbeenshowntooutperformadhocthresholdsandisexpectedto alleviatespuriousconclusionsofsignificantassociationsinsuchgraphicalabstractions.

© 2012 Elsevier B.V.

1. Introductionandbackground

Graphicalmodels[1,2]areaclassofstatisticalmodelswhich combinetherigourofaprobabilisticapproachwiththeintuitive representationofrelationshipsgivenbygraphs.Theyarecomposed byasetX={X1,X2,...,XN}ofrandomvariablesdescribingthe quan-titiesofinterestandagraphG=(V,E)inwhicheachnodeorvertex

v

Vis associatedwithoneoftherandomvariables inX(they areusuallyreferredtointerchangeably).Theedgese∈Eareused toexpressthedependencerelationshipsamongthevariablesinX. Thesetoftheserelationshipsisoftenreferredtoasthedependence

∗Correspondingauthor.

E-mailaddresses:[email protected](M.Scutari),[email protected]

(R.Nagarajan).

structureofthegraph. Differentclassesof graphsexpressthese

relationshipswithdifferentsemantics,whichhaveincommonthe principlethatgraphicalseparationoftwoverticesimpliesthe con-ditionalindependenceofthecorrespondingrandomvariables[2].

ThetwoexamplesmostcommonlyfoundinliteratureareMarkov

networks[3,4],whichuseundirectedgraphs,andBayesiannetworks

(BNs)[5,6],whichusedirectedacyclicgraphs.

Inprinciple,therearemanypossiblechoicesforthejoint

dis-tribution of X, dependingon the natureof the data. However,

literaturehasfocusedmostlyontwocases:thediscretecase[3,7], inwhichbothXandtheXiaremultinomialrandomvariables,and

thecontinuouscase[3,8],inwhichXismultivariatenormaland

theXiareunivariatenormalrandomvariables.Intheformer,the parametersofinterestaretheconditionalprobabilitiesassociated witheachvariable,usuallyrepresentedas conditional probabil-itytables;inthelatter,theparametersofinterestarethepartial

0933-3657© 2012 Elsevier B.V.

http://dx.doi.org/10.1016/j.artmed.2012.12.006

Open access under CC BY license.

(2)

correlationcoefficientsbetweeneachvariableand itsneighbours (i.e.theadjacentnodesinG).

TheestimationofthestructureofthegraphGiscalledstructure

learning[1,4],andinvolvesdeterminingthegraphstructurethat

encodestheconditionalindependenciespresentinthedata.Ideally itshouldcoincidewiththedependencestructureofX,oritshould atleastidentifyadistributionascloseaspossibletothecorrectone intheprobabilityspace.Severalalgorithmshavebeenpresentedin theliteratureforthisproblem,thankstotheapplicationofmany results from probability, information and optimisation theory. Despitedifferencesintheoreticalbackgroundsand terminology, theycanallbegroupedintoonlythreeclasses:constraint-based

algorithms,thatarebasedonconditionalindependencetests;

score-based algorithms,that are based on goodness-of-fit scores; and

hybridalgorithms,thatcombinetheprevioustwoapproaches.For

someexamplesseeBrombergetal.[9],CasteloandRoverato[10], Friedmanetal.[11],Larra ˜nagaetal.[12]andTsamardinosetal.

[13].

Ontheotherhand,thedevelopmentoftechniquesforassessing thestatisticalrobustnessofnetworkstructureslearnedfromdata (e.g.thepresenceofartefactsarisingfromnoisydata)hasbeen limited.Structurelearningalgorithmsarecommonlystudied

mea-suring differences from the true (known) structure of a small

numberofreferencedatasets[14,15].Theusefulnessofsuchan approachininvestigatingnetworkslearnedfromreal-worlddata setsislimited,sincethetruestructureoftheirprobability distribu-tionisunknown.

Amoresystematicapproachtomodelassessment,andin partic-ulartotheproblemofidentifyingstatisticallysignificantfeatures inanetwork, hasbeendevelopedbyFriedmanet al.[16]using bootstrap resampling [17] and model averaging [18]. It can be

summarisedasfollows:

1.Forb=1,2,...,m:

(a)sampleanewdatasetXbfromtheoriginaldataXusingeither parametricornonparametricbootstrap;

(b)learnthestructureofthegraphicalmodelGb=(V,Eb)from

Xb.

2.Estimatetheprobabilitythateachpossibleedgeei,i=1,...,kis presentinthetruenetworkstructureG0=(V,E0)as

ˆ P(ei)= 1 m m

b=1 1{ei∈Eb}, (1)

where1{ei∈Eb}istheindicatorfunctionoftheevent{ei∈Eb}(i.e. itisequalto1ifei∈Eband0otherwise).

Theempiricalprobabilities ˆP(ei)areknownasedgeintensitiesor

arcstrengths,andcanbeinterpretedasthedegreeofconfidence

thateiispresentinthenetworkstructureG0 describingthetrue dependencestructureofX.1However,theyaredifficulttoevaluate,

becausetheprobabilitydistributionofthenetworksGbinthespace

ofthenetworkstructuresisunknown.Asaresult,thevalueofthe confidencethreshold(i.e.theminimumdegreeofconfidenceforan edgetobesignificantandthereforeacceptedasanedgeofG0)isan unknownfunctionofboththedataandthestructurelearning algo-rithm.Thisisaseriouslimitationintheidentificationofsignificant edgesandhasledtotheuseofadhoc,pre-definedthresholdsin spiteoftheimpactonmodelassessmentevidencedbyseveral stud-ies[16,19].AnexceptionisNagarajanetal.[20],whoseapproach willbediscussedbelow.

1 Theprobabilities ˆP(e

i)areinfactanestimatoroftheexpectedvalueofthe{0,

1}randomvectordescribingthepresenceofeachpossibleedgeinG0.Assuch,they

donotsumtooneandaredependentononeanotherinanontrivialway.

Apartfromthislimitation,Friedman’sapproachisverygeneral andcanbeusedinawiderangeofsettings.Firstofall,itcanbe appliedtoanykindofgraphicalmodelwithonlyminor adjust-ments(forexample,accountingforthedirectionoftheedgesin BNs,seeSection4).Nodistributionalassumptiononthedatais requiredinadditiontotheonesneededby thestructure learn-ingalgorithm.Noassumptionismadeonthelatter,either,soany score-based, constraint-based or hybridalgorithm can beused. Furthermore,parallelcomputingcaneasilybeusedtooffsetthe additionalcomputationalcomplexityintroducedbymodel averag-ing,becausebootstrapisembarrassinglyparallel.

Inthis paper,wepropose astatisticallymotivatedestimator fortheconfidencethresholdminimisingtheL1normbetweenthe cumulativedistributionfunction(CDF)oftheobservedconfidence levelsandtheCDFoftheconfidencelevelsoftheunknown net-workG0.Subsequently,wedemonstratetheeffectivenessofthe proposedapproachbyre-investigatingtwoexperimentaldatasets fromNagarajanetal.[20]andSachsetal.[21].

2. Selectingsignificantedges

Considertheempiricalprobabilities ˆP(ei)definedinEq.(1),and denotethemwith ˆp={pi,ˆ i=1,...,k}.ForagraphwithNnodes, k=N(N−1)/2.Furthermore,considertheorderstatistic

ˆ

p(·)=(ˆp(1),pˆ(2),...,pˆ(k)) with pˆ(1)pˆ(2)···pˆ(k) (2) derivedfrom ˆp.Itisintuitivelyclearthatthefirstelementsof ˆp(·) aremorelikelytobeassociatedwithnon-significantedges,and thatthelastelementsof ˆp(·)aremorelikelytobeassociatedwith significantedges.Theidealconfiguration ˜p(·)of ˆp(·)wouldbe ˜ p(i)=

1 if e(i)∈E0 0 otherwise , (3) thatisthesetofprobabilitiesthatcharacterisesanyedgeaseither significantor non-significant withoutany uncertainty. In other words,

˜

p(·)={0,...,0,1,...,1}. (4) Sucha configurationarisesfromthelimitcase inwhich allthe networksGbhaveexactlythesamestructure.Thismayhappenin

practicewithaconsistentstructurelearningalgorithmwhenthe samplesizeislarge[22,23].

Ausefulcharacterisationof ˆp(·)andp(˜·)canbeobtainedthrough theempiricalCDFsoftherespectiveelements,

Fpˆ(·)(x)= 1 k k

i=1 1{pˆ(i)<x} (5) and Fp˜(·)(x)=

0 if x∈(−∞,0) t if x∈[0,1) 1 if x∈[1,+∞) . (6)

Inparticular,tcorrespondstothefractionofelementsof ˜p(·)equal tozeroandisameasureofthefractionofnon-significantedges.At thesametime,tprovidesathresholdforseparatingtheelements of ˜p(·),namely

e(i)∈E0⇔p˜(i)>Fp˜−(1·)(t) (7)

whereFp˜−(1·)(t)=infxR{Fp˜(·)(x)t}isthequantilefunction[24].

Moreimportantly,estimatingt fromdataprovidesa statisti-callymotivated threshold for separatingsignificant edgesfrom non-significantones.Inpractice,thisamountstoapproximating

(3)

Fig.1. TheempiricalCDFFpˆ(·)(left),theCDFFp˜(·)(centre)andtheL1normbetweenthetwo(right),shadedingrey. theideal, asymptoticempirical CDF Fp˜(·) withits finite sample

estimateFpˆ(·).Suchanapproximationcanbecomputedinmany

differentways,dependingonthenormusedtomeasurethe dis-tancebetweenFpˆ(·)andFp˜(·)asafunctionoft.Commonchoicesare

theLpfamilyofnorms[25],whichincludestheEuclideannorm, andCsiszar’sf-divergences[26],whichincludeKullback–Leibler divergence.

TheL1norm L1(t; ˆp(·))=

|Fpˆ(·)(x)−Fp˜(·)(x;t)|dx (8)

appearstobeparticularlysuitedtothisproblem;anexampleis showninFig.1.Firstofall,notethatFpˆ(·) ispiecewiseconstant,

changingvalueonlyatthepoints ˆp(i);thisdescendsfromthe defi-nitionofempiricalCDF.Therefore,fortheproblemathandEq.(8)

simplifiesto

L1(t; ˆp(·))=

xi∈{{0}∪pˆ(·)∪{1}}

|Fpˆ(·)(xi)−t|(xi+1−xi), (9)

whichcanbecomputedinlineartimefrom ˆp(·).Itsminimisationis alsostraightforwardusinglinearprogramming[27].Furthermore,

comparedtothemorecommonL2norm

L2(t; ˆp(·))=

[Fpˆ(·)(x)−Fp˜(·)(x;t)] 2 dx (10) ortheLnorm L∞(t; ˆp(·))= max x∈[0,1]{|Fpˆ(·)(x)−Fp˜(·)(x;t)|}, (11) theL1 normdoesnotplaceasmuchweightonlargedeviations comparedtosmallones,makingitrobustagainstawidevarietyof configurationsof ˆp(·).

Thentheidentificationofsignificantedgescanbethoughtof eitherasaleastabsolutedeviationsestimationoranL1approximation oftheform

ˆ

t=argmin

t∈[0,1] L1

(t; ˆp(·)) (12)

followedbytheapplicationofthefollowingrule:

e(i)∈E0⇔pˆ(i)>Fpˆ−1

(·)(ˆt). (13)

Notethat,eventhoughedgesareindividuallyidentifiedas signif-icantornon-significant,theyarenotidentifiedindependentlyof eachotherbecause ˆtisafunctionofthewhole ˆp(·).

Asimpleexampleisillustratedbelow.

Example1. Consideragraphicalmodelbasedonanundirected graphGwithnodesetV={A,B,C,D}.Thesetofpossibleedgesof Gcontains6elements:(A,B),(A,C),(A,D),(B,C),(B,D)and(C,D). Supposethatwehaveestimatedthefollowingconfidencevalues:

ˆ

pAB=0.2242, pACˆ =0.0460, pADˆ =0.8935, pBCˆ =0.3921,

ˆ pBD=0.7689, pCDˆ =0.9439. (14) Then ˆp(·)={0.0460,0.2242,0.3921,0.7689,0.8935,0.9439}and Fpˆ(·)(x)=

0 if x∈(−∞,0.0460) 1 6 if x∈[0.0460,0.2242) 2 6 if x∈[0.2242,0.3921) 3 6 if x∈[0.3921,0.7689) 4 6 if x∈[0.7689,0.8935) 5 6 if x∈[0.8935,0.9439) 1 if x∈[0.9439,+∞) . (15)

TheL1normtakestheform L1(t; ˆp(·))=|0−t|(0.0460−0)+

1 6−t

(0.22420.0460) +

2 6−t

(0.39210.2242) +

3 6−t

(0.76890.3921) +

46−t

(0.8935−0.7689) +

56−t

(0.9439−0.8935) +|1−t|(1−0.9439) (16)

and is minimised for ˆt=0.4999816. Therefore, an edge is

deemed significant if its confidence is strictly greater than

Fpˆ(1

·)(0.4999816)=0.3921,or,equivalently,ifithasconfidenceof

atleast0.7689;only(A,D),(B,D)and(C,D)satisfythiscondition (Fig.2).

3. Simulationresults

Wetestedtheproposedapproachonsyntheticdatasetsusing threeestablishedperformancemeasures:sensitivity,specificityand

accuracy.Sensitivityisgivenbytheproportionofedgesofthetrue

networkstructurethathavebeencorrectlyidentifiedassignificant.

Specificityisgivenbytheproportionoftheedgesmissingfromthe

truenetworkstructurethathavebeencorrectlyidentifiedas non-significant.Accuracyisgivenbytheproportionofedgescorrectly identifiedaseithersignificantornon-significantoverthesetofall

(4)

Fig.2.TheCDFsFpˆ(·)andFp˜(·)(ˆt),respectivelyinblackandgrey(left),andtheL1(t; ˆp(·))norm(right)fromExample1. possibleedges.Tothatend,wegenerated400datasetsofvarying

sizes(100,200,500,1000,2000,5000,10,000and20,000)from

threediscreteBNscommonlyusedasbenchmarks:

•theALARMnetwork[28],anetworkdesignedtoprovideanalarm messagesystemforintensivecareunitpatientmonitoring.Its

truestructure iscomposedby37nodesand 46edges(of666

possibleedges),anditsprobabilitydistributionhas509 parame-ters;

•theHAILFINDER network[29],anetwork designedtoforecast severesummerhailinnortheasternColorado.Itstruestructure iscomposedby56nodesand66edges(of1540possibleedges), anditsprobabilitydistributionhas2656parameters;

•theINSURANCEnetwork[30],anetworkdesignedtoevaluatecar insurancerisks.Itstruestructureiscomposedby27nodesand 52edges(of351possibleedges),anditsprobabilitydistribution has984parameters.

Threedifferentstructurelearningalgorithmswereconsidered: •theIncrementalAssociationMarkovBlanket(IAMB)

constraint-basedalgorithm[31].IAMBwasusedtolearntheMarkovblanket ofeachnodeasapreliminarysteptoreducethenumberofits candidateparentsandchildren;anetworkstructuresatisfying theseconstraintsisthenidentifiedasintheGrow–Shrink algo-rithm[32].Conditionalindependencetestswereperformedusing ashrinkagemutualinformationtest[33]with˛=0.05.Suchatest, unlikethemorecommonasymptotic2mutualinformationtest, isvalidandhasbeenshowntoworkreliablyevenonsmall sam-ples.An˛=0.01wasalsoconsidered;however,theresultswere notsignificantlydifferentfrom˛=0.05andwillnotbediscussed separatelyinthispaper;

•theHillClimbing(HC)score-basedalgorithmwiththeBayesian Dirichletequivalentuniform(BDeu)scorefunction,theposterior distributionofthenetworkstructurearisingfromauniformprior distribution[7].Theequivalentsamplesizewassetto10.Thisis thesameapproachdetailedinFriedmanetal.[16],althoughthey consideredonly100(insteadof500)bootstrapsamplesforeach scenario;

•theMax–MinHillClimbing(MMHC)hybridalgorithm[13],which

combinestheMax–MinParentsandChildren(MMPC)andHC.

TheconditionalindependencetestusedinMMPCandthescore

functionsusedin HCare theones illustratedin theprevious points.

Theperformancemeasureswereestimatedforeachcombinationof network,samplesizeandstructurelearningalgorithmasfollows:

1.asampleoftheappropriatesizewasgeneratedfromeitherthe

ALARM,theHAILFINDERortheINSURANCEnetwork;

2.weestimatedtheconfidencevalues ˆpforallpossibleedgesfrom 200and500nonparametricbootstrapsamples.Sinceresultsare verysimilar,theywillbediscussedtogether;

3.we estimated theconfidence threshold ˆt, and identified

sig-nificantand non-significant edgesin the network. Note that

thedirectionoftheedgespresentinthenetworkstructureis effectivelyignored,becausetheproposedapproachfocusesonly thoseedges’presence.Significantedgeswerethenusedtobuild anaveragednetworkstructure;

4.wecomputedsensitivity,specificityandaccuracycomparingthe averagednetworkstructuretothetrueone,whichisknownfrom theliterature.

Thesestepswererepeated50timesinordertoestimateboththe performancemeasuresandtheirvariability.

Allthesimulationsand thethresholdsestimation were

per-formed with the bnlearn package [34,35] for R [36], which

implementsseveralmethodsforstructurelearning,parameter esti-mationandinferenceonBNs(includingtheapproachproposedin Section2).

Theaveragevaluesofsensitivity,specificity,accuracyand ˆtfor thenetworksacrossvarioussamplesizes(n)areshowninFigs.3

(IAMB),4(HC)and5(MMHC).Sincethenumberofparametersis non-constantacrossthenetworks,anormalisedratioofthesizeof thegeneratedsampletothenumberofparametersofthenetwork (i.e.n/p)isusedasareferenceinsteadoftherawsamplesize(i.e.n). Intuitively,asampleofsizeofn=1000maybelargeenoughto esti-matereliablyasmallnetworkwithfewparameters,sayp=100,but itmaybetoosmallforalargernetworkwithp=10,000.Onarelated note,densernetworks(i.e.networkswithalargenumberofedges

comparedtothenumberofnodes)usuallyhaveahighernumber

ofparametersthansparserones(i.e.networkswithfewedges). Severalinterestingtrendsemergefromtheestimated quanti-ties.Asexpected,sensitivityincreasesasthesamplesizegrows. ThisprovidesanempiricalverificationthatthecombinationofHC andBDeisindeedconsistent,asprovedbyChickering[23].No anal-ogousresultexistsforIAMBorMMHC,althoughintuitivelytheir sensitivityshouldimproveaswellwiththesamplesizeduetothe consistencyoftheconditionalindependencetestsusedbythose algorithms.Moreover,evenwhenn/pisextremelylowa substan-tialproportionofthenetworkstructurecanbecorrectlyidentified. Whenn/pisatleast0.2(i.e.1observationevery5parameters),HC

successfullyrecoversfromabout50%(forALARMandINSURANCE)

to75%(forHAILFINDER)ofthetruenetworkstructure.Incontrast,

IAMBandMMHCsuccessfullyrecoverfromabout45%to50%of

HAILFINDER,butonlyabout26%to40%ofALARMand19%to30%

(5)

n

p

Accur

ac

y

Specificity

Sensitivity

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(g)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(h)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(i)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(d)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(e)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(f)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(a)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(b)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(c)

Fig.3.Averagesensitivity,specificityandaccuracyofIAMBfortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%confidenceintervals,and

thedottedverticallineisn=p.

thesparsity-inducingeffectofshrinkagetests[37],whichincrease specificityatthecostofsensitivity.Forvaluesofn/pgreaterthan1 (i.e.moreobservationsthanparameters)theincreasein sensitiv-ityslowsdownforallcombinationsofnetworksandalgorithms, reachingaplateau.

Overall, sensitivity seems to have an hyperbolic behaviour,

growingveryrapidlyforn/p1andthenconverging asymptoti-callyto1forn/p>1.Thusweexpectittoincreaselinearlyona log(n/p)scale.Theslowerconvergencerateobservedforthe INSUR-ANCEnetworkcomparedtotheothertwonetworksislikelytobea consequenceofitshighedgedensity(1.92edgespernode)relative

toALARM(1.24)andHAILFINDER(1.17).Slowerconvergencemay

alsobeanoutcomeofinherentlimitationsofstructurelearning algorithmsinthecaseofdensenetworks[1,38].

Furthermore,bothspecificityandaccuracyarecloseto1forall thenetworksandthesamplesizesconsideredintheanalysis,even

atverylown/pratios.Suchhighvaluesarearesultofthelow

num-beroftrueedgesinALARM,HAILFINDERandINSURANCEcompared

totherespectivenumbersofpossibleedges.Thisistruein

partic-ularfortheALARMandHAILFINDERnetworks.Thelowervalues

observedfortheINSURANCEnetworkcanbeattributedagaintothe inherentlimitationsofstructurelearningalgorithmsinmodelling densenetworks.Thesparsity-inducingeffectofshrinkagetestsis againevidentforbothIAMBandMMHC;bothspecificityand accu-racyactuallydecreaseslightlyasn/pgrowsandtheinfluenceof shrinkagedecreases.

Itisalsoimportanttonotethat,asshowninFig.6,theaverage valueoftheconfidencethreshold ˆt doesnot exhibitany appar-enttrendasafunctionofn/p.Inaddition,itsvariabilitydoesnot appeartodecreaseasn/pgrows.Thissuggeststhattheoptimal ˆt

dependsstronglyonthespecificsampleusedintheestimationof theconfidencevalues ˆp,evenforrelativelylargesamples.However,

(6)

n

p

Accur

ac

y

Specificity

Sensitivity

ALARM

HAILFINDE

R

INSURANCE

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(g)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(h)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(i)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(d)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(e)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(f)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(a)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(b)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(c)

Fig.4. Averagesensitivity,specificityandaccuracyofHCfortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%confidenceintervals,andthe

dottedverticallineisn=p.

specificity,sensitivityandaccuracyestimatesappearontheother handtobeverystable(allconfidenceintervalsshowninFigs.3–5

areverysmall).

From Fig.6, it is alsoapparent that the threshold estimate ˆ

t canbesignificantly lowerthan 1even forhighvalues of n/p. Thisbehaviourisobservedconsistentlyacrossthethreenetworks (ALARM,HAILFINDER,INSURANCE).Theseresultsareinsharp con-trastwithadhocthresholdscommonly foundintheliterature, whichareusuallylarge[e.g.0.8in16].Alargethresholdcan cer-tainlybeusefulinexcludingnoisyedges,whichmayresultfrom artefactsatthemeasurementanddynamicallevelsandfromfinite sample-sizeeffects.However,whilealargeadhocthresholdcan certainlyminimisefalsepositives,itisalsoexpectedtoaccentuate falsenegatives.Sucha conservativechoicecanhavea profound impactonthe network topology, resulting in artificiallysparse networks.ThethresholdestimatorintroducedinSection2achieves

a goodtrade-offbetweenincorrectlyidentifyingnoisyedgesas significantanddisregardingsignificantones.Asanexample,the differenceinsensitivity,specificityandaccuracybetweenthe esti-matedthreshold ˆtandseverallarge,adhocones(t=0.70,0.80,0.90, 0.95)forHCisshowninFig.7(thecorrespondingplotsforIAMB andMMHCaresimilar,andareomittedforbrevity).The thresh-old ˆtsystematicallyoutperformstheadhocthresholdsintermsof sensitivity,inparticularforlowvaluesofn/p.Thedifference pro-gressivelyvanishesasn/pgrows.Allthresholdshavecomparable levelsofspecificityandaccuracy.

Onarelatednote,falsenegativesacrossadhocthresholdsmay alsobeattributedtothefactthatedgesareconsideredasseparate, independententitiesasfarasthechoiceofthethresholdis con-cerned–i.e.a0.99thresholdisexpectedtoidentifyassignificant about1in100edgesinthenetwork.However,inabiologicalsetting thestructureofthenetworkisanabstractionfortheunderlying

(7)

n

p

Accur

ac

y

Specificity

Sensitivity

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(g)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(h)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(i)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(d)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(e)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(f)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(a)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(b)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(c)

Fig.5. Averagesensitivity,specificityandaccuracyofMMHCfortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%confidenceintervals,and

thedottedverticallineisn=p.

functional mechanisms; as an example, consider the signalling

pathwaysin atranscriptionalnetwork.Insuchacontext,edges areclearlynotindependent,butappearinconcertalongsignalling pathways.Thisinterdependenceisaccountedforintheproposed approach(thatisbasedonthefullsetˆpofestimatedconfidence

values),but it is not commonly consideredin choosing ad hoc

thresholds.Forinstance,edgesappearing withindividual confi-dencevaluesfarbelowthe[0.80,1]rangemaynotnecessarilybe identifiedassignificantbyanadhocthreshold.However,the pro-posedapproachrecognisestheirinterplayandcorrectlyidentifies themassignificant.Thisaspect,alongwiththestrongdependence betweentheoptimal ˆtandtheactualsamplethenetworkislearned from,maydiscouragetheuseofanapriorioradhocconfidence thresholdinfavourofmorestatisticallymotivatedalternatives.

4. Applicationstomolecularexpressionprofiles

In order to demonstrate the effectiveness of the proposed

approachonexperimentaldatasets, wewillexaminetwogene

expressiondatasetsfromNagarajanetal.[20]andSachsetal.[21]. Alltheanalyseswillbeperformedagainwiththebnlearnpackage. FollowingImotoetal.[39],wewillconsidertheedgesoftheBNs disregardingtheirdirectionwhendeterminingtheirsignificance. Edgesidentifiedassignificantwillthenbeorientedaccordingtothe directionobservedwiththehighestfrequencyinthebootstrapped networksGb.Whilesimplistic,thiscombinedapproachallowsthe

proposedestimatortohandletheedgeswhosedirectioncannot

bedeterminedbythestructurelearningalgorithmpossiblydueto scoreequivalentstructures[40].

(8)

n

p

t

^

ALARM

HAILFINDER

INSURANCE

MMHC

HC

IAMB

0.2 0.4 0.6 0.8 0 10 20 30 40

(g)

0 2 4 6

(h)

0 5 10 15 20

(i)

0.2 0.4 0.6 0.8 0 10 20 30 40

(d)

0 2 4 6

(e)

0 5 10 15 20

(f)

0.2 0.4 0.6 0.8 0 10 20 30 40

(a)

0 2 4 6

(b)

0 5 10 15 20

(c)

Fig.6.Averageestimatedsignificancethreshold(ˆt)fortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%confidenceintervals.

4.1. Differentiationpotentialofagedmyogenicprogenitors

Inarecentstudy[20]theinterplaybetweencrucialmyogenic (Myogenin,Myf-5,Myo-D1), adipogenic(C/EBP˛,DDIT3,FoxC2, PPAR),andWnt-relatedgenes(Lrp5,Wnt5a)orchestratingaged myogenicprogenitordifferentiationwasinvestigatedby Nagara-janetal.usingclonalgeneexpressionprofilesinconjunctionwith BNstructurelearningtechniques.Theobjectivewastoinvestigate possiblefunctionalrelationshipsbetweenthesediverse differen-tiationprogramsreflectedbytheedgesintheresultingnetworks. TheclonalexpressionprofilesweregeneratedfromRNAisolated

across 34 clones of myogenic progenitors obtained across

24-month-oldmiceandreal-timeRT-PCRwasusedtoquantifythe

geneexpression.Suchanapproachimplicitlyaccommodates inher-entuncertaintyingeneexpressionprofilesandjustifiedthechoice ofprobabilisticmodels.

In the same study, the authors proposed a non-parametric

resampling approach to identify significant functional

relationships. Starting from Friedman’s definition of

confi-dencelevels(Eq.(1)),theycomputedthenoise floordistribution ˆ

f={fˆ1,fˆ2,...,fkˆ}oftheedgesbyrandomlypermutingthe

expres-sionofeach gene andperformingBNstructure learning onthe

resultingdatasets.Anedgeeiwasdeemedsignificantif ˆpi>max

{fˆ l∈ˆf}

ˆ

fl.

In addition to revealing several functional relationships docu-mentedinliterature,thestudyalsorevealednewrelationshipsthat wereimmunetothechoiceofthestructurelearningtechniques. Theseresultswereestablishedacrossclonalexpressiondata

nor-malisedusingthreedifferenthousekeepinggenesandnetworks

learnedwiththreedifferentstructurelearningalgorithms. Theapproachpresentedin[20]hastwoimportantlimitations. First,thecomputationalcostofgeneratingthenoisefloor distri-butionmaydiscourageitsapplicationtolargedatasets.Infact, thegenerationoftherequiredpermutationsofthedataandthe subsequentstructurelearning(inadditiontothebootstrap resam-plingandthesubsequentlearningrequiredfortheestimationof

(9)

diff

erence

ALARM

0.0 0.1 0.2 0.3 0 10 20 30 40

(a)

0 10 20 30 40

(b)

0 10 20 30 40

(c)

t = 0.70

t = 0.80

t = 0.90

t = 0.95

diff

erence

HAILFINDER

0.00 0.05 0.10 0.15 0.20 0.25 0 2 4 6

(d)

0 2 4 6

(e)

0 2 4 6

(f)

t = 0.70

t = 0.80

t = 0.90

t = 0.95

n

p

diff

erence

INSURANCE

0.00 0.05 0.10 0.15 0.20 0 5 10 15 20

(g)

0 5 10 15 20

(h)

0 5 10 15 20

(i)

t = 0.70

t = 0.80

t = 0.90

t = 0.95

Fig.7. Differenceinsensitivity,specificityandaccuracybetweentheestimatedthreshold ˆtandseveraladhocones(t=0.70,0.80,0.90,0.95)forHCovern/p.

ˆ

p)essentiallydoublesthecomputationalcomplexityofFriedman’s approach.Second,alargesamplesizemayresultinanextremely lowvalueofmax(ˆf),andthereforeinalargenumberoffalse posi-tives.

Inthepresentstudy,were-investigatethemyogenic progen-itorclonalexpressiondatanormalisedusinghousekeepinggene GAPDHwiththeapproachoutlinedinSection2andtheIAMB algo-rithm.Itisimportanttonotethatthisstrategywasalsousedin theoriginalstudy[20],henceitschoice.Theorderstatistic ˆp(·)was computedfrom500bootstrapsamples.TheempiricalCDFFpˆ(·),the

estimatedthresholdandthenetworkwiththesignificantedgesare showninFig.8.

Alledgesidentifiedassignificantintheearlierstudy[20]across thevariousstructurelearningtechniquesandnormalisation tech-niqueswerealsoidentifiedbytheproposedapproach(seeFig.3D in[20]).IncontrasttoFig.8,theoriginalstudyusingIAMBand nor-malisationwithrespecttoGAPDHalonedetectedaconsiderable

numberofadditionaledges(seeFig.3Ain[20]).Thusitisquite possiblethattheapproachproposedinthispaperreducesthe num-beroffalsepositivesandspuriousfunctionalrelationshipsbetween thegenes.Furthermore,theapplicationoftheproposedapproach inconjunctionwiththealgorithmfromImotoetal.[39]reveals directionalityoftheedges,incontrasttotheundirectednetwork reportedbyNagarajanetal.[20].

4.2. Proteinsignallinginflowcytometrydata

Inalandmarkstudy,Sachsetal.[21]usedBNsforidentifying causal influences in cellular signalling networks from

simulta-neous measurement of multiple phosphorylated proteins and

phospholipidsacrosssinglecells.Theauthorsusedabatteryof per-turbationsinadditiontotheunperturbeddatatoarriveatthefinal

network representation. Agreedy searchscore-based algorithm

(10)

Fig.8.TheempiricalCDFFpˆ(·)forthemyogenicprogenitorsdatafromNagarajanetal.[20](ontheleft),andthenetworkstructureresultingfromtheselectionofthe

significantedges(ontheright).TheverticaldashedlineintheplotofFpˆ(·)representsthethresholdF

−1 ˜ p(·)(ˆt).

Fig.9.TheempiricalCDFof ˆp(·)fortheflowcytometrydatafromSachsetal.[21](ontheleft),andthenetworkstructureresultingfromtheselectionofthesignificantedges

(ontheright).TheverticaldashedlineintheplotofFˆp(·)representsthethresholdF

−1 ˜ p(·)(ˆt). accommodatesforvariationsinthejointprobabilitydistribution acrosstheunperturbedandperturbeddatasetswasusedtoidentify theedges[41].Moreimportantly,significantedgeswereselected usinganarbitrarysignificancethresholdof0.85(seeFig.3,[21]).A detailedcomparisonbetweenthelearnednetworkandfunctional relationshipsdocumentedintheliteraturewaspresentedinthe

samestudy.

We investigated theperformance of theproposed approach

inidentifyingsignificantfunctionalrelationshipsfromthesame

experimental data. However, we limit ourselves to the data

recorded without applying any molecular intervention, which

amountto854observationsfor11variables.Wecompareand con-trastourresultstothose obtainedusing anarbitrarythreshold of0.85.Thecombinationofperturbedandnon-perturbed obser-vationsstudiedinSachsetal.[21]cannotbeanalysedwithour approach,becauseeachsubsetofthedatafollowsadifferent prob-abilitydistributionandthereforethereisnosingle“true”network G0.Analysisoftheunperturbeddatausingtheapproachpresented inSection2revealstheedgesreportedintheoriginalstudy.The resultingnetworkisshowninFig.9alongwithFpˆ(·)andthe

esti-matedthreshold.Fromtheplot ofFpˆ(·) we canclearly seethat

significantandnon-significantedgespresentwidelydifferent lev-elsofconfidence,tothepointthatanythresholdbetween0.4and 0.9resultsinthesamenetworkstructure.This,alongwiththevalue oftheestimatedthreshold(ˆp(i)0.93),showsthatthenoisinessof thedatarelativetothesamplesizeislow.Inotherwords,the sam-pleisbigenoughforthestructurelearningalgorithmtoreliably selectthesignificantedges.Theedgesidentifiedbytheproposed

methodwerethesameasthoseidentifiedby[21]usinggeneral

stimulatorycuesexcludingthedatawithinterventions(seeFig. 4Ain[21],Supplementaryinformation).Incontrastto[21],using Imotoetal.[39]approachinconjunctionwiththeproposed thresh-oldingmethodwewereabletoidentifythedirectionsoftheedges inthenetwork.Thedirectionscorrelatedwiththefunctional rela-tionshipsdocumentedinliterature(Table3,[21],Supplementary information)aswellaswiththedirectionsoftheedgesinthe net-worklearnedfrombothperturbedandunperturbeddata(Fig.3,

[21]).

5. Conclusions

Graphicalmodelsandnetworkabstractionshaveenjoyed

con-siderableattentionacrossthebiologicalandmedicalcommunities. Suchabstractionsareespecially usefulin decipheringthe

inter-actions between the entities of interest from high-throughput

observationaldata.Classicaltechniquesforidentifyingsignificant edgesintheresultinggraphrely onadhocthresholdingofthe edgeconfidenceestimatedfromacrossmultipleindependent

real-isationsof networkslearned fromthegiven data.Largead hoc

thresholdvaluesareparticularlycommon,andarechoseninan efforttominimise noisyedgesin theresulting network. While usefulinminimisingfalsepositives,suchachoicecanaccentuate falsenegativeswithpronouncedeffectonthenetworktopology.

The present study overcomesthis caveatby proposing a more

straightforwardandstatisticallymotivatedapproachfor identify-ingsignificantedgesinagraphicalmodel.Theproposedestimator

(11)

levelsand theCDFoftheirasymptotic,idealconfiguration.The effectivenessoftheproposedapproachisdemonstratedonthree syntheticdatasets[28–30]andongeneexpressiondatasetsacross twodifferentstudies[20,21].However,theapproachisdefinedina moregeneralsettingandcanbeappliedtomanyclassesofgraphical modelslearnedfromanykindofdata.

Acknowledgements

ThisworkwassupportedbytheUKTechnologyStrategyBoard

(TSB) and Biotechnology & Biological Sciences Research Coun-cil(BBSRC),grantTS/I002170/1(MarcoScutari)andtheNational

LibraryofMedicine,grantR03LM008853(Radhakrishnan

Nagara-jan).MarcoScutariwouldalsoliketothankAdrianaBroginifor proofreadingthepaperandprovidingusefulsuggestions.

References

[1]KollerD,FriedmanN.Probabilisticgraphicalmodels:principlesandtechniques.

Cambridge,MA,USA:MITPress;2009.

[2]PearlJ.Probabilisticreasoninginintelligentsystems:networksofplausible

inference.SanMateo,CA,USA:MorganKaufmann;1988.

[3]WhittakerJ.Graphicalmodelsinappliedmultivariatestatistics.Hoboken,NJ,

USA:Wiley;1990.

[4]EdwardsDI.Introductiontographicalmodelling.2nded.NewYork,USA:

Springer;2000.

[5]NeapolitanRE.LearningBayesiannetworks.EnglewoodCliffs,NJ,USA:Prentice

Hall;2003.

[6]KorbK,NicholsonA.Bayesianartificialintelligence.2nded. London,UK:

Chapman&Hall;2010.

[7]Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks:

the combination of knowledge and statistical data. Machine Learning

1995;20(3):197–243,availableasMicrosoftTechnicalReportMSR-TR-94-09.

[8]GeigerD,HeckermanD.LearningGaussiannetworks,Tech.rep.,Microsoft

Research,Redmond,Washington,MicrosoftTechnicalReportMSR-TR-94-10;

1994.

[9]BrombergF,MargaritisD,HonavarV.EfficientMarkovnetworkstructure

dis-coveryusingindependencetests.JournalofArtificialIntelligenceResearch

2009;35:449–85.

[10]CasteloR,RoveratoA.ArobustprocedureforGaussiangraphicalmodelsearch

from microarraydatawith plarger thann.Journalof MachineLearning

Research2006;7:2621–50.

[11]FriedmanN,Pe’erD,NachmanI.LearningBayesiannetworkstructurefrom

massivedatasets:the“sparsecandidate”algorithm.In:LaskeyKB,PradeH,

editors.Proceedingsof15thconferenceonuncertaintyinartificialintelligence

(UAI).MorganKaufmann;1999.p.206–21.

[12]Larra ˜nagaP,SierraB,GallegoMJ,MichelenaMJ,PicazaJM.LearningBayesian

networksbygeneticalgorithms:acasestudyinthepredictionofsurvivalin

malignantskinmelanoma.In:KeravnouET,GarbayC,BaudRH,WyattJC,

edit-ors.Proceedingsofthe6thconferenceonartificialintelligenceinmedicinein

Europe(AIME).Springer;1997.p.261–72.

[13]TsamardinosI,BrownLE,AliferisCF.TheMax–MinHill-ClimbingBayesian

networkstructurelearningalgorithm.MachineLearning2006;65(1):31–78.

[14]Elidan G.Bayesian networkrepository;2001 http://www.cs.huji.ac.il/site/

labs/compbio/Repository

[15]MurphyP,AhaD.UCImachinelearningrepository;1995http://archive.ics.

uci.edu/ml

[16]FriedmanN,GoldszmidtM,WynerA.DataanalysiswithBayesiannetworks:

abootstrapapproach.In:LaskeyKB,PradeH,editors.Proceedingsofthe15th

annualconferenceonuncertaintyinartificialintelligence(UAI).Morgan

Kauf-mann;1999.p.206–15.

[17]EfronB,TibshiraniR.Anintroductiontothebootstrap.London,UK:Chapman

&Hall;1993.

CambridgeUniversityPress;2008.

[19]HusmeierD.Sensitivityandspecificityofinferringgeneticregulatory

inter-actions from microarray experiments with dynamic Bayesian networks.

Bioinformatics2003;19:2271–82.

[20]NagarajanR,DattaS,ScutariM,BeggsML,NolenGT,PetersonCA.Functional

relationshipsbetweengenesassociatedwithdifferentiationpotentialofaged

myogenicprogenitors.FrontiersinPhysiology2010;1(21):1–8.

[21]SachsK, Perez O, Pe’er D, Lauffenburger DA, NolanGP. Causal

protein-signalingnetworks derivedfrommultiparametersingle-cell data.Science

2005;308(5721):523–9.

[22]LauritzenSL.Graphicalmodels.Oxford,UK:OxfordUniversityPress;1996.

[23]ChickeringDM.Optimalstructureidentificationwithgreedysearch.Journalof

MachineLearningResearch2002;3:507–54.

[24]DeGrootMH,SchervishMJ.Probabilityandstatistics.4thed.Reading,MA,USA:

Addison-Wesley;2011.

[25]KolmogorovAN,FominSV.Elementsofthetheoryoffunctionsandfunctional

analysis.Rochester,NewYork,USA:GraylockPress;1957.

[26]CsiszárI,ShieldsP.Informationtheoryandstatistics:atutorial.Boston,MA,

USA:NowPublishersInc.;2004.

[27]NocedalJ,WrightSJ.Numericaloptimization.NewYork,USA:Springer-Verlag;

1999.

[28]BeinlichIA,SuermondtHJ,ChavezRM,CooperGF.TheALARMmonitoring

system:acasestudywithtwoprobabilisticinferencetechniquesforbelief

networks.In:HunterJ,CooksonJ,WyattJ,editors.Proceedingsofthe2nd

Euro-peanconferenceonartificialintelligenceinmedicine(AIME).Springer-Verlag;

1989.p.247–56.

[29]AbramsonB,BrownJ,EdwardsW,MurphyA,WinklerRL.Hailfinder:aBayesian

systemforforecastingsevereweather.InternationalJournalofForecasting

1996;12(1):57–71.

[30]BinderJ,KollerD,RussellS,KanazawaK.Adaptiveprobabilisticnetworkswith

hiddenvariables.MachineLearning1997;29(2-3):213–44.

[31]TsamardinosI,AliferisCF,StatnikovA.AlgorithmsforlargescaleMarkov

blanketdiscovery.In:RussellI,HallerSM,editors.Proceedingsofthe16th

internationalFloridaartificialintelligenceresearchsocietyconference.AAAI

Press;2003.p.376–81.

[32]MargaritisD.LearningBayesiannetworkmodelstructurefromdata.Ph.D.

the-sis.SchoolofComputerScience,Carnegie-MellonUniversity,Pittsburgh,PA,

AvailableasTechnicalReportCMU-CS-03-153;May2003.

[33]HausserJ,StrimmerK.EntropyinferenceandtheJames–Steinestimator,with

applicationtononlineargeneassociationnetworks.StatisticalApplicationsin

GeneticsandMolecularBiology2009;10:1469–84.

[34]ScutariM.bnlearn:Bayesiannetworkstructurelearning,Rpackageversion2.7;

2011http://www.bnlearn.com/

[35]ScutariM.LearningBayesiannetworkswiththebnlearnRpackage.Journalof

StatisticalSoftware2010;35(3):1–22.

[36]RDevelopmentCoreTeam.R:alanguageandenvironmentforstatistical

computing.Vienna, Austria:RFoundationforStatisticalComputing;2011

http://www.R-project.org

[37]Scutari M, Brogini A. Bayesian network structure learning with

per-mutation tests. Communications in Statistics – Theory and Methods

2012;41(16–17):3233–43.Specialissue“Statisticsforcomplexproblems:

per-mutationtestingmethodsandrelatedtopics”.Proceedingsoftheconference

“statisticsforcomplexproblems:themultivariatepermutationapproachand

relatedtopics”,Padova,June14–15,2010.

[38]DashD,DruzdzelMJ.Ahybridanytimealgorithmfortheconstructionofcausal

modelsfromsparsedata.In:LaskeyKB,PradeH,editors.Proceedingsofthe15th

conferenceonuncertaintyinartificialintelligence(UAI).MorganKaufmann;

1999.p.142–9.

[39]ImotoS,KimSY,ShimodairaH,AburataniS,TashiroK,KuharaS,etal.Bootstrap

analysisofgenenetworksbasedonBayesiannetworksandnonparametric

regression.GenomeInformatics2002;13:369–70.

[40]ChickeringDM.AtransformationalcharacterizationofequivalentBayesian

networkstructures.In:BesnardP,HanksS,editors.Proceedingsofthe11th

conferenceonuncertaintyinartificialintelligence(UAI).MorganKaufmann;

1995.p.87–98.

[41]CooperGF, Yoo C.Causaldiscoveryfroma mixtureofexperimentaland

observationaldata.In:LaskeyKB,PradeH,editors.Proceedingsof15th

confer-enceonuncertaintyinartificialintelligence(UAI).MorganKaufmann;1999.

SciVerse w w w . e l s e v i e r . c o m / l o c a t e / a i i m CC BY license. http://www.cs.huji.ac.il/site/labs/compbio/Repository http://archive.ics.uci.edu/ml http://www.bnlearn.com/ http://www.R-project.org

References

Related documents

The absence of both a morning peak and a mid-morning dip in the radical production rate is attributed to the particularly high rate from secondary sources (all formed as a result

Vacancy rate in Tokyo area, which was on a rise, has diverted downwards.. Copyright(C) Nomura Research Institute, Ltd. All rights reserved.. Copyright(C) Nomura Research

The scant evidence found on mechanisms suggests that interventions that have modi fi ed intermediate outcomes, such as citizen participation in monitoring activities and providers

1967 Palais des Beaux-Arts, Brussels, Belgium Stedelijk Museum, Amsterdam, Netherlands Palais des Beaux-Arts, Charleroi, Belgium 1968 Musée des Beaux-Arts, Antwerp, Belgium.

Dairies in the northern part of the state were more likely to submit specimens and submitted more frequently than southern herds when herd size was accounted for in the

O+G Constant dialog between the customer and the suppliers has led to better understanding among all parties. Day-to-day work activities generate issues which are then incorporated

Objectives: Restorative proctocolectomy and ileal pouch anal anastomosis (IPAA), with diverting ileostomy, are established ulcerative colitis (UC) treatments.. The routine use

The consequences are felt especially by horses entered into jump racing – the sector responsible for 80 per cent of racecourse fatalities.. But animals racing on the Flat also suffer