Identifying significant edges in graphical models of molecular networks.

(1)

ContentslistsavailableatSciVerseScienceDirect

Artiﬁcial

Intelligence

in

Medicine

j o ur na l h o mep a g e :w w w . e l s e v i e r . c o m / l o c a t e / a i i m

Identifying

signiﬁcant

edges

in

graphical

models

of

molecular

networks

Marco

Scutari

a,∗

,

Radhakrishnan

Nagarajan

b

a_Genetics_Institute,_University_College_London,_Darwin_Building,_Gower_Street,_WC1E_6BT_London,_United_Kingdom

b_Division_of_Biomedical_Informatics,_Department_of_{Biostatistics,}_College_of_Public_Health,_University_of_Kentucky,₇₂₅_Rose_Street,_{Multidisciplinary}_Science_Bldg,_230F,_Lexington,_KY

40536-0082,USA

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received6December2011

Receivedinrevisedform

14December2012 Accepted16December2012 Keywords: Graphicalmodels Bayesiannetworks Modelaveraging L1norm Molecularnetworks

a

b

s

t

r

a

c

t

Objective:Modellingtheassociationsfromhigh-throughputexperimentalmoleculardatahasprovided

unprecedentedinsightsintobiologicalpathwaysandsignallingmechanisms.Graphicalmodelsand networkshaveespeciallyproventobeusefulabstractionsinthisregard.Adhocthresholdsareoften usedinconjunctionwithstructurelearningalgorithmstodeterminesigniﬁcantassociations.Thepresent studyovercomesthislimitationbyproposingastatisticallymotivatedapproachforidentifyingsigniﬁcant associationsinanetwork.

Methodsandmaterials:Anewmethodthatidentiﬁessigniﬁcantassociationsingraphicalmodelsby

estimatingthethresholdminimisingtheL1normbetweenthecumulativedistributionfunction(CDF)

oftheobservededgeconﬁdencesandthoseofitsasymptoticcounterpartisproposed.Theeffectiveness oftheproposedmethodisdemonstratedonpopularsyntheticdatasetsaswellaspubliclyavailable experimentalmoleculardatacorrespondingtogeneandproteinexpressionproﬁles.

Results:Theimprovedperformanceoftheproposedapproachisdemonstratedacrossthesyntheticdata

setsusingsensitivity,specificityandaccuracyasperformancemetrics.Theresultsarealso demon-stratedacrossvaryingsamplesizesandthreedifferentstructurelearningalgorithmswithwidelyvarying assumptions.Inallcases,theproposedapproachhasspecificityandaccuracycloseto1,whilesensitivity increaseslinearlyinthelogarithmofthesamplesize.Theestimatedthresholdsystematically outper-formscommonadhoconesintermsofsensitivitywhilemaintainingcomparablelevelsofspecificityand accuracy.Networksfromexperimentaldatasetsarereconstructedaccuratelywithrespecttotheresults fromtheoriginalpapers.

Conclusion:Currentstudiesusestructurelearningalgorithmsinconjunctionwithadhocthresholds

foridentifyingsignificantassociationsingraphicalabstractionsofbiologicalpathwaysandsignalling mechanisms.Suchanadhocchoicecanhavepronouncedeffectonattributingbiologicalsignificanceto theassociationsintheresultingnetworkandpossibledownstreamanalysis.Thestatisticallymotivated approachpresentedinthisstudyhasbeenshowntooutperformadhocthresholdsandisexpectedto alleviatespuriousconclusionsofsignificantassociationsinsuchgraphicalabstractions.

1. Introductionandbackground

Graphicalmodels[1,2]areaclassofstatisticalmodelswhich combinetherigourofaprobabilisticapproachwiththeintuitive representationofrelationshipsgivenbygraphs.Theyarecomposed byasetX={X1,X2,...,XN}ofrandomvariablesdescribingthe quan-titiesofinterestandagraphG=(V,E)inwhicheachnodeorvertex

v

∈Vis associatedwithoneoftherandomvariables inX(they areusuallyreferredtointerchangeably).Theedgese∈Eareused toexpressthedependencerelationshipsamongthevariablesinX. Thesetoftheserelationshipsisoftenreferredtoasthedependence

∗Correspondingauthor.

E-mailaddresses:[email protected](M.Scutari),[email protected]

(R.Nagarajan).

structureofthegraph. Differentclassesof graphsexpressthese

relationshipswithdifferentsemantics,whichhaveincommonthe principlethatgraphicalseparationoftwoverticesimpliesthe con-ditionalindependenceofthecorrespondingrandomvariables[2].

ThetwoexamplesmostcommonlyfoundinliteratureareMarkov

networks[3,4],whichuseundirectedgraphs,andBayesiannetworks

(BNs)[5,6],whichusedirectedacyclicgraphs.

Inprinciple,therearemanypossiblechoicesforthejoint

dis-tribution of X, dependingon the natureof the data. However,

literaturehasfocusedmostlyontwocases:thediscretecase[3,7], inwhichbothXandtheXiaremultinomialrandomvariables,and

thecontinuouscase[3,8],inwhichXismultivariatenormaland

theXiareunivariatenormalrandomvariables.Intheformer,the parametersofinterestaretheconditionalprobabilitiesassociated witheachvariable,usuallyrepresentedas conditional probabil-itytables;inthelatter,theparametersofinterestarethepartial

http://dx.doi.org/10.1016/j.artmed.2012.12.006

Open access under CC BY license.

(2)

correlationcoefﬁcientsbetweeneachvariableand itsneighbours (i.e.theadjacentnodesinG).

TheestimationofthestructureofthegraphGiscalledstructure

learning[1,4],andinvolvesdeterminingthegraphstructurethat

encodestheconditionalindependenciespresentinthedata.Ideally itshouldcoincidewiththedependencestructureofX,oritshould atleastidentifyadistributionascloseaspossibletothecorrectone intheprobabilityspace.Severalalgorithmshavebeenpresentedin theliteratureforthisproblem,thankstotheapplicationofmany results from probability, information and optimisation theory. Despitedifferencesintheoreticalbackgroundsand terminology, theycanallbegroupedintoonlythreeclasses:constraint-based

algorithms,thatarebasedonconditionalindependencetests;

score-based algorithms,that are based on goodness-of-ﬁt scores; and

hybridalgorithms,thatcombinetheprevioustwoapproaches.For

someexamplesseeBrombergetal.[9],CasteloandRoverato[10], Friedmanetal.[11],Larra ˜nagaetal.[12]andTsamardinosetal.

[13].

Ontheotherhand,thedevelopmentoftechniquesforassessing thestatisticalrobustnessofnetworkstructureslearnedfromdata (e.g.thepresenceofartefactsarisingfromnoisydata)hasbeen limited.Structurelearningalgorithmsarecommonlystudied

mea-suring differences from the true (known) structure of a small

numberofreferencedatasets[14,15].Theusefulnessofsuchan approachininvestigatingnetworkslearnedfromreal-worlddata setsislimited,sincethetruestructureoftheirprobability distribu-tionisunknown.

Amoresystematicapproachtomodelassessment,andin partic-ulartotheproblemofidentifyingstatisticallysigniﬁcantfeatures inanetwork, hasbeendevelopedbyFriedmanet al.[16]using bootstrap resampling [17] and model averaging [18]. It can be

summarisedasfollows:

1.Forb=1,2,...,m:

(a)sampleanewdatasetX∗_bfromtheoriginaldataXusingeither parametricornonparametricbootstrap;

(b)learnthestructureofthegraphicalmodelGb=(V,Eb)from

X∗_b.

2.Estimatetheprobabilitythateachpossibleedgeei,i=1,...,kis presentinthetruenetworkstructureG0=(V,E0)as

ˆ P(ei)= 1 m m

b=1 1{e_i∈E_b}, (1)

where1{e_i∈E_b}istheindicatorfunctionoftheevent{ei∈Eb}(i.e. itisequalto1ifei∈Eband0otherwise).

Theempiricalprobabilities ˆP(ei)areknownasedgeintensitiesor

arcstrengths,andcanbeinterpretedasthedegreeofconﬁdence

thateiispresentinthenetworkstructureG0 describingthetrue dependencestructureofX.1_However,_they_are_difﬁcult_to_evaluate,

becausetheprobabilitydistributionofthenetworksGbinthespace

ofthenetworkstructuresisunknown.Asaresult,thevalueofthe confidencethreshold(i.e.theminimumdegreeofconfidenceforan edgetobesignificantandthereforeacceptedasanedgeofG0)isan unknownfunctionofboththedataandthestructurelearning algo-rithm.Thisisaseriouslimitationintheidentificationofsignificant edgesandhasledtotheuseofadhoc,pre-definedthresholdsin spiteoftheimpactonmodelassessmentevidencedbyseveral stud-ies[16,19].AnexceptionisNagarajanetal.[20],whoseapproach willbediscussedbelow.

1 _The_{probabilities ˆ}_P₍_e

i)areinfactanestimatoroftheexpectedvalueofthe{0,

1}randomvectordescribingthepresenceofeachpossibleedgeinG0.Assuch,they

donotsumtooneandaredependentononeanotherinanontrivialway.

Apartfromthislimitation,Friedman’sapproachisverygeneral andcanbeusedinawiderangeofsettings.Firstofall,itcanbe appliedtoanykindofgraphicalmodelwithonlyminor adjust-ments(forexample,accountingforthedirectionoftheedgesin BNs,seeSection4).Nodistributionalassumptiononthedatais requiredinadditiontotheonesneededby thestructure learn-ingalgorithm.Noassumptionismadeonthelatter,either,soany score-based, constraint-based or hybridalgorithm can beused. Furthermore,parallelcomputingcaneasilybeusedtooffsetthe additionalcomputationalcomplexityintroducedbymodel averag-ing,becausebootstrapisembarrassinglyparallel.

Inthis paper,wepropose astatisticallymotivatedestimator fortheconfidencethresholdminimisingtheL1normbetweenthe cumulativedistributionfunction(CDF)oftheobservedconfidence levelsandtheCDFoftheconfidencelevelsoftheunknown net-workG0.Subsequently,wedemonstratetheeffectivenessofthe proposedapproachbyre-investigatingtwoexperimentaldatasets fromNagarajanetal.[20]andSachsetal.[21].

2. Selectingsigniﬁcantedges

Considertheempiricalprobabilities ˆP(ei)deﬁnedinEq.(1),and denotethemwith ˆp={pi,ˆ i=1,...,k}.ForagraphwithNnodes, k=N(N−1)/2.Furthermore,considertheorderstatistic

ˆ

p(·)=(ˆp(1),pˆ(2),...,pˆ(k)) with pˆ(1)pˆ(2)···pˆ(k) (2) derivedfrom ˆp.Itisintuitivelyclearthatthefirstelementsof ˆp(·) aremorelikelytobeassociatedwithnon-significantedges,and thatthelastelementsof ˆp(·)aremorelikelytobeassociatedwith significantedges.Theidealconfiguration ˜p(·)of ˆp(·)wouldbe ˜ p(i)=

1 if e(i)∈E0 0 otherwise , (3) thatisthesetofprobabilitiesthatcharacterisesanyedgeaseither signiﬁcantor non-signiﬁcant withoutany uncertainty. In other words,

˜

p(·)={0,...,0,1,...,1}. (4) Sucha conﬁgurationarisesfromthelimitcase inwhich allthe networksGbhaveexactlythesamestructure.Thismayhappenin

practicewithaconsistentstructurelearningalgorithmwhenthe samplesizeislarge[22,23].

Ausefulcharacterisationof ˆp(·)andp(˜·)canbeobtainedthrough theempiricalCDFsoftherespectiveelements,

Fpˆ(·)(x)= 1 k k

i=1 1{pˆ(i)<x} (5) and Fp˜(·)(x)=

⎧

⎨

⎩

0 if x∈(−∞,0) t if x∈[0,1) 1 if x∈[1,+∞) . (6)

Inparticular,tcorrespondstothefractionofelementsof ˜p(·)equal tozeroandisameasureofthefractionofnon-signiﬁcantedges.At thesametime,tprovidesathresholdforseparatingtheelements of ˜p(·),namely

e(i)∈E0⇔p˜(i)>Fp˜−(1·)(t) (7)

whereFp˜−(1·)(t)=inf_x_∈_R{Fp˜(·)(x)t}isthequantilefunction[24].

Moreimportantly,estimatingt fromdataprovidesa statisti-callymotivated threshold for separatingsigniﬁcant edgesfrom non-signiﬁcantones.Inpractice,thisamountstoapproximating

(3)

Fig.1. TheempiricalCDFFpˆ(·)(left),theCDFFp˜(·)(centre)andtheL1normbetweenthetwo(right),shadedingrey. theideal, asymptoticempirical CDF Fp˜(·) withits ﬁnite sample

estimateFpˆ(·).Suchanapproximationcanbecomputedinmany

differentways,dependingonthenormusedtomeasurethe dis-tancebetweenFpˆ(·)andFp˜(·)asafunctionoft.Commonchoicesare

theLpfamilyofnorms[25],whichincludestheEuclideannorm, andCsiszar’sf-divergences[26],whichincludeKullback–Leibler divergence.

TheL1norm L1(t; ˆp(·))=

|Fpˆ(·)(x)−Fp˜(·)(x;t)|dx (8)

appearstobeparticularlysuitedtothisproblem;anexampleis showninFig.1.Firstofall,notethatFpˆ(·) ispiecewiseconstant,

changingvalueonlyatthepoints ˆp(i);thisdescendsfromthe deﬁ-nitionofempiricalCDF.Therefore,fortheproblemathandEq.(8)

simpliﬁesto

L1(t; ˆp(_·))=

x_i∈{{0}∪pˆ(·)∪{1}}

|Fpˆ(·)(xi)−t|(xi+1−xi), (9)

whichcanbecomputedinlineartimefrom ˆp(·).Itsminimisationis alsostraightforwardusinglinearprogramming[27].Furthermore,

comparedtothemorecommonL2norm

L2(t; ˆp(·))=

[Fpˆ(·)(x)−Fp˜(·)(x;t)] 2 dx (10) ortheL_∞norm L∞(t; ˆp(·))= max x∈[0,1]{|Fpˆ(·)(x)−Fp˜(·)(x;t)|}, (11) theL1 normdoesnotplaceasmuchweightonlargedeviations comparedtosmallones,makingitrobustagainstawidevarietyof conﬁgurationsof ˆp(·).

Thentheidentiﬁcationofsigniﬁcantedgescanbethoughtof eitherasaleastabsolutedeviationsestimationoranL1approximation oftheform

ˆ

t=argmin

t∈[0,1] L1

(t; ˆp(·)) (12)

followedbytheapplicationofthefollowingrule:

e(i)∈E0⇔pˆ(i)>F_p_ˆ−1

(·)(ˆt). (13)

Notethat,eventhoughedgesareindividuallyidentifiedas signif-icantornon-significant,theyarenotidentifiedindependentlyof eachotherbecause ˆtisafunctionofthewhole ˆp(·).

Asimpleexampleisillustratedbelow.

Example1. Consideragraphicalmodelbasedonanundirected graph_GwithnodesetV=_{A,B,C,D_}.Thesetofpossibleedgesof Gcontains6elements:(A,B),(A,C),(A,D),(B,C),(B,D)and(C,D). Supposethatwehaveestimatedthefollowingconﬁdencevalues:

ˆ

pAB=0.2242, pACˆ =0.0460, pADˆ =0.8935, pBCˆ =0.3921,

ˆ pBD=0.7689, pCDˆ =0.9439. (14) Then ˆ_p(_·₎₌_{0.0460,0.2242,0.3921,0.7689,0.8935,0.9439_}and Fpˆ(·)(x)=

⎧

⎪

⎨

⎪

⎩

0 if x∈(−∞,0.0460) 1 6 if x∈[0.0460,0.2242) 2 6 if x∈[0.2242,0.3921) 3 6 if x∈[0.3921,0.7689) 4 6 if x∈[0.7689,0.8935) 5 6 if x∈[0.8935,0.9439) 1 if x∈[0.9439,+∞) . (15)

TheL1normtakestheform L1(t; ˆp(·))=|0−t|(0.0460−0)+

1 6−t

₍₀_.₂₂₄₂₋₀_.₀₄₆₀₎ +

2 6−t

₍₀_.₃₉₂₁₋₀_.₂₂₄₂₎ +

3 6−t

₍₀_.₇₆₈₉₋₀_.₃₉₂₁₎ +

4₆−t

(0.8935−0.7689) +

5₆−t

(0.9439−0.8935) +|1−t|(1−0.9439) (16)

and is minimised for ˆt=0.4999816. Therefore, an edge is

deemed signiﬁcant if its conﬁdence is strictly greater than

F_p−_ˆ₍1

·)(0.4999816)=0.3921,or,equivalently,ifithasconﬁdenceof

atleast0.7689;only(A,D),(B,D)and(C,D)satisfythiscondition (Fig.2).

3. Simulationresults

Wetestedtheproposedapproachonsyntheticdatasetsusing threeestablishedperformancemeasures:sensitivity,speciﬁcityand

accuracy.Sensitivityisgivenbytheproportionofedgesofthetrue

networkstructurethathavebeencorrectlyidentiﬁedassigniﬁcant.

Speciﬁcityisgivenbytheproportionoftheedgesmissingfromthe

truenetworkstructurethathavebeencorrectlyidentifiedas non-significant.Accuracyisgivenbytheproportionofedgescorrectly identifiedaseithersignificantornon-significantoverthesetofall

(4)

Fig.2.TheCDFsFpˆ(·)andFp˜(·)(ˆt),respectivelyinblackandgrey(left),andtheL1(t; ˆp(·))norm(right)fromExample1. possibleedges.Tothatend,wegenerated400datasetsofvarying

sizes(100,200,500,1000,2000,5000,10,000and20,000)from

threediscreteBNscommonlyusedasbenchmarks:

•theALARMnetwork[28],anetworkdesignedtoprovideanalarm messagesystemforintensivecareunitpatientmonitoring.Its

truestructure iscomposedby37nodesand 46edges(of666

possibleedges),anditsprobabilitydistributionhas509 parame-ters;

•theHAILFINDER network[29],anetwork designedtoforecast severesummerhailinnortheasternColorado.Itstruestructure iscomposedby56nodesand66edges(of1540possibleedges), anditsprobabilitydistributionhas2656parameters;

•theINSURANCEnetwork[30],anetworkdesignedtoevaluatecar insurancerisks.Itstruestructureiscomposedby27nodesand 52edges(of351possibleedges),anditsprobabilitydistribution has984parameters.

Threedifferentstructurelearningalgorithmswereconsidered: •theIncrementalAssociationMarkovBlanket(IAMB)

constraint-basedalgorithm[31].IAMBwasusedtolearntheMarkovblanket ofeachnodeasapreliminarysteptoreducethenumberofits candidateparentsandchildren;anetworkstructuresatisfying theseconstraintsisthenidentiﬁedasintheGrow–Shrink algo-rithm[32].Conditionalindependencetestswereperformedusing ashrinkagemutualinformationtest[33]with˛=0.05.Suchatest, unlikethemorecommonasymptotic2_mutual_information_test, isvalidandhasbeenshowntoworkreliablyevenonsmall sam-ples.An˛=0.01wasalsoconsidered;however,theresultswere notsigniﬁcantlydifferentfrom˛=0.05andwillnotbediscussed separatelyinthispaper;

•theHillClimbing(HC)score-basedalgorithmwiththeBayesian Dirichletequivalentuniform(BDeu)scorefunction,theposterior distributionofthenetworkstructurearisingfromauniformprior distribution[7].Theequivalentsamplesizewassetto10.Thisis thesameapproachdetailedinFriedmanetal.[16],althoughthey consideredonly100(insteadof500)bootstrapsamplesforeach scenario;

•theMax–MinHillClimbing(MMHC)hybridalgorithm[13],which

combinestheMax–MinParentsandChildren(MMPC)andHC.

TheconditionalindependencetestusedinMMPCandthescore

functionsusedin HCare theones illustratedin theprevious points.

Theperformancemeasureswereestimatedforeachcombinationof network,samplesizeandstructurelearningalgorithmasfollows:

1.asampleoftheappropriatesizewasgeneratedfromeitherthe

ALARM,theHAILFINDERortheINSURANCEnetwork;

2.weestimatedtheconﬁdencevalues ˆpforallpossibleedgesfrom 200and500nonparametricbootstrapsamples.Sinceresultsare verysimilar,theywillbediscussedtogether;

3.we estimated theconﬁdence threshold ˆt, and identiﬁed

sig-niﬁcantand non-signiﬁcant edgesin the network. Note that

thedirectionoftheedgespresentinthenetworkstructureis effectivelyignored,becausetheproposedapproachfocusesonly thoseedges’presence.Signiﬁcantedgeswerethenusedtobuild anaveragednetworkstructure;

4.wecomputedsensitivity,speciﬁcityandaccuracycomparingthe averagednetworkstructuretothetrueone,whichisknownfrom theliterature.

Thesestepswererepeated50timesinordertoestimateboththe performancemeasuresandtheirvariability.

Allthesimulationsand thethresholdsestimation were

per-formed with the bnlearn package [34,35] for R [36], which

implementsseveralmethodsforstructurelearning,parameter esti-mationandinferenceonBNs(includingtheapproachproposedin Section2).

Theaveragevaluesofsensitivity,speciﬁcity,accuracyand ˆtfor thenetworksacrossvarioussamplesizes(n)areshowninFigs.3

(IAMB),4(HC)and5(MMHC).Sincethenumberofparametersis non-constantacrossthenetworks,anormalisedratioofthesizeof thegeneratedsampletothenumberofparametersofthenetwork (i.e.n/p)isusedasareferenceinsteadoftherawsamplesize(i.e.n). Intuitively,asampleofsizeofn=1000maybelargeenoughto esti-matereliablyasmallnetworkwithfewparameters,sayp=100,but itmaybetoosmallforalargernetworkwithp=10,000.Onarelated note,densernetworks(i.e.networkswithalargenumberofedges

comparedtothenumberofnodes)usuallyhaveahighernumber

ofparametersthansparserones(i.e.networkswithfewedges). Severalinterestingtrendsemergefromtheestimated quanti-ties.Asexpected,sensitivityincreasesasthesamplesizegrows. ThisprovidesanempiricalveriﬁcationthatthecombinationofHC andBDeisindeedconsistent,asprovedbyChickering[23].No anal-ogousresultexistsforIAMBorMMHC,althoughintuitivelytheir sensitivityshouldimproveaswellwiththesamplesizeduetothe consistencyoftheconditionalindependencetestsusedbythose algorithms.Moreover,evenwhenn/pisextremelylowa substan-tialproportionofthenetworkstructurecanbecorrectlyidentiﬁed. Whenn/pisatleast0.2(i.e.1observationevery5parameters),HC

successfullyrecoversfromabout50%(forALARMandINSURANCE)

to75%(forHAILFINDER)ofthetruenetworkstructure.Incontrast,

IAMBandMMHCsuccessfullyrecoverfromabout45%to50%of

HAILFINDER,butonlyabout26%to40%ofALARMand19%to30%

(5)

n

p

Accur

ac

y

Specificity

Sensitivity

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(g)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(h)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(i)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(d)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(e)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(f)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(a)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(b)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(c)

Fig.3.Averagesensitivity,speciﬁcityandaccuracyofIAMBfortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%conﬁdenceintervals,and

thedottedverticallineisn=p.

thesparsity-inducingeffectofshrinkagetests[37],whichincrease speciﬁcityatthecostofsensitivity.Forvaluesofn/pgreaterthan1 (i.e.moreobservationsthanparameters)theincreasein sensitiv-ityslowsdownforallcombinationsofnetworksandalgorithms, reachingaplateau.

Overall, sensitivity seems to have an hyperbolic behaviour,

growingveryrapidlyforn/p1andthenconverging asymptoti-callyto1forn/p>1.Thusweexpectittoincreaselinearlyona log(n/p)scale.Theslowerconvergencerateobservedforthe INSUR-ANCEnetworkcomparedtotheothertwonetworksislikelytobea consequenceofitshighedgedensity(1.92edgespernode)relative

toALARM(1.24)andHAILFINDER(1.17).Slowerconvergencemay

alsobeanoutcomeofinherentlimitationsofstructurelearning algorithmsinthecaseofdensenetworks[1,38].

Furthermore,bothspeciﬁcityandaccuracyarecloseto1forall thenetworksandthesamplesizesconsideredintheanalysis,even

atverylown/pratios.Suchhighvaluesarearesultofthelow

num-beroftrueedgesinALARM,HAILFINDERandINSURANCEcompared

totherespectivenumbersofpossibleedges.Thisistruein

partic-ularfortheALARMandHAILFINDERnetworks.Thelowervalues

observedfortheINSURANCEnetworkcanbeattributedagaintothe inherentlimitationsofstructurelearningalgorithmsinmodelling densenetworks.Thesparsity-inducingeffectofshrinkagetestsis againevidentforbothIAMBandMMHC;bothspeciﬁcityand accu-racyactuallydecreaseslightlyasn/pgrowsandtheinﬂuenceof shrinkagedecreases.

Itisalsoimportanttonotethat,asshowninFig.6,theaverage valueoftheconﬁdencethreshold ˆt doesnot exhibitany appar-enttrendasafunctionofn/p.Inaddition,itsvariabilitydoesnot appeartodecreaseasn/pgrows.Thissuggeststhattheoptimal ˆt

dependsstronglyonthespeciﬁcsampleusedintheestimationof theconﬁdencevalues ˆp,evenforrelativelylargesamples.However,

(6)

n

p

Accur

ac

y

Specificity

Sensitivity

ALARM

HAILFINDE

R

INSURANCE

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(g)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(h)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(i)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(d)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(e)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(f)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(a)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(b)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(c)

Fig.4. Averagesensitivity,speciﬁcityandaccuracyofHCfortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%conﬁdenceintervals,andthe

dottedverticallineisn=p.

speciﬁcity,sensitivityandaccuracyestimatesappearontheother handtobeverystable(allconﬁdenceintervalsshowninFigs.3–5

areverysmall).

From Fig.6, it is alsoapparent that the threshold estimate ˆ

t canbesignificantly lowerthan 1even forhighvalues of n/p. Thisbehaviourisobservedconsistentlyacrossthethreenetworks (ALARM,HAILFINDER,INSURANCE).Theseresultsareinsharp con-trastwithadhocthresholdscommonly foundintheliterature, whichareusuallylarge[e.g.0.8in16].Alargethresholdcan cer-tainlybeusefulinexcludingnoisyedges,whichmayresultfrom artefactsatthemeasurementanddynamicallevelsandfromfinite sample-sizeeffects.However,whilealargeadhocthresholdcan certainlyminimisefalsepositives,itisalsoexpectedtoaccentuate falsenegatives.Sucha conservativechoicecanhavea profound impactonthe network topology, resulting in artificiallysparse networks.ThethresholdestimatorintroducedinSection2achieves

a goodtrade-offbetweenincorrectlyidentifyingnoisyedgesas significantanddisregardingsignificantones.Asanexample,the differenceinsensitivity,specificityandaccuracybetweenthe esti-matedthreshold ˆtandseverallarge,adhocones(t=0.70,0.80,0.90, 0.95)forHCisshowninFig.7(thecorrespondingplotsforIAMB andMMHCaresimilar,andareomittedforbrevity).The thresh-old ˆtsystematicallyoutperformstheadhocthresholdsintermsof sensitivity,inparticularforlowvaluesofn/p.Thedifference pro-gressivelyvanishesasn/pgrows.Allthresholdshavecomparable levelsofspecificityandaccuracy.

Onarelatednote,falsenegativesacrossadhocthresholdsmay alsobeattributedtothefactthatedgesareconsideredasseparate, independententitiesasfarasthechoiceofthethresholdis con-cerned–i.e.a0.99thresholdisexpectedtoidentifyassigniﬁcant about1in100edgesinthenetwork.However,inabiologicalsetting thestructureofthenetworkisanabstractionfortheunderlying

(7)

n

p

Accur

ac

y

Specificity

Sensitivity

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(g)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(h)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(i)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(d)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(e)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(f)

0.2 0. 4 0. 6 0. 8 1.0 0 10 20 30 40

(a)

0.2 0. 4 0. 6 0. 8 1.0 0 2 4 6

(b)

0.2 0. 4 0. 6 0. 8 1.0 0 5 10 15 20

(c)

Fig.5. Averagesensitivity,speciﬁcityandaccuracyofMMHCfortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%conﬁdenceintervals,and

thedottedverticallineisn=p.

functional mechanisms; as an example, consider the signalling

pathwaysin atranscriptionalnetwork.Insuchacontext,edges areclearlynotindependent,butappearinconcertalongsignalling pathways.Thisinterdependenceisaccountedforintheproposed approach(thatisbasedonthefullsetˆpofestimatedconﬁdence

values),but it is not commonly consideredin choosing ad hoc

thresholds.Forinstance,edgesappearing withindividual confi-dencevaluesfarbelowthe[0.80,1]rangemaynotnecessarilybe identifiedassignificantbyanadhocthreshold.However,the pro-posedapproachrecognisestheirinterplayandcorrectlyidentifies themassignificant.Thisaspect,alongwiththestrongdependence betweentheoptimal ˆtandtheactualsamplethenetworkislearned from,maydiscouragetheuseofanapriorioradhocconfidence thresholdinfavourofmorestatisticallymotivatedalternatives.

4. Applicationstomolecularexpressionproﬁles

In order to demonstrate the effectiveness of the proposed

approachonexperimentaldatasets, wewillexaminetwogene

expressiondatasetsfromNagarajanetal.[20]andSachsetal.[21]. Alltheanalyseswillbeperformedagainwiththebnlearnpackage. FollowingImotoetal.[39],wewillconsidertheedgesoftheBNs disregardingtheirdirectionwhendeterminingtheirsignificance. Edgesidentifiedassignificantwillthenbeorientedaccordingtothe directionobservedwiththehighestfrequencyinthebootstrapped networksGb.Whilesimplistic,thiscombinedapproachallowsthe

proposedestimatortohandletheedgeswhosedirectioncannot

bedeterminedbythestructurelearningalgorithmpossiblydueto scoreequivalentstructures[40].

(8)

n

p

t

^

ALARM

HAILFINDER

INSURANCE

MMHC

HC

IAMB

0.2 0.4 0.6 0.8 0 10 20 30 40

(g)

0 2 4 6

(h)

0 5 10 15 20

(i)

0.2 0.4 0.6 0.8 0 10 20 30 40

(d)

0 2 4 6

(e)

0 5 10 15 20

(f)

0.2 0.4 0.6 0.8 0 10 20 30 40

(a)

0 2 4 6

(b)

0 5 10 15 20

(c)

Fig.6.Averageestimatedsigniﬁcancethreshold(ˆt)fortheALARM,HAILFINDERandINSURANCEnetworksovern/p.Barsrepresent95%conﬁdenceintervals.

4.1. Differentiationpotentialofagedmyogenicprogenitors

Inarecentstudy[20]theinterplaybetweencrucialmyogenic (Myogenin,Myf-5,Myo-D1), adipogenic(C/EBP˛,DDIT3,FoxC2, PPAR),andWnt-relatedgenes(Lrp5,Wnt5a)orchestratingaged myogenicprogenitordifferentiationwasinvestigatedby Nagara-janetal.usingclonalgeneexpressionprofilesinconjunctionwith BNstructurelearningtechniques.Theobjectivewastoinvestigate possiblefunctionalrelationshipsbetweenthesediverse differen-tiationprogramsreflectedbytheedgesintheresultingnetworks. TheclonalexpressionprofilesweregeneratedfromRNAisolated

across 34 clones of myogenic progenitors obtained across

24-month-oldmiceandreal-timeRT-PCRwasusedtoquantifythe

geneexpression.Suchanapproachimplicitlyaccommodates inher-entuncertaintyingeneexpressionproﬁlesandjustiﬁedthechoice ofprobabilisticmodels.

In the same study, the authors proposed a non-parametric

resampling approach to identify signiﬁcant functional

relationships. Starting from Friedman’s deﬁnition of

conﬁ-dencelevels(Eq.(1)),theycomputedthenoise ﬂoordistribution ˆ

f={_fˆ₁_,_fˆ₂_,_._._._,_fkˆ_}_of_the_edges_by_randomly_permuting_the

expres-sionofeach gene andperformingBNstructure learning onthe

resultingdatasets.Anedgeeiwasdeemedsigniﬁcantif ˆpi>max

{_fˆ l∈ˆf}

ˆ

fl.

In addition to revealing several functional relationships docu-mentedinliterature,thestudyalsorevealednewrelationshipsthat wereimmunetothechoiceofthestructurelearningtechniques. Theseresultswereestablishedacrossclonalexpressiondata

nor-malisedusingthreedifferenthousekeepinggenesandnetworks

learnedwiththreedifferentstructurelearningalgorithms. Theapproachpresentedin[20]hastwoimportantlimitations. First,thecomputationalcostofgeneratingthenoiseﬂoor distri-butionmaydiscourageitsapplicationtolargedatasets.Infact, thegenerationoftherequiredpermutationsofthedataandthe subsequentstructurelearning(inadditiontothebootstrap resam-plingandthesubsequentlearningrequiredfortheestimationof

(9)

diff

erence

ALARM

0.0 0.1 0.2 0.3 0 10 20 30 40

(a)

0 10 20 30 40

(b)

0 10 20 30 40

(c)

t = 0.70

t = 0.80

t = 0.90

t = 0.95

diff

erence

HAILFINDER

0.00 0.05 0.10 0.15 0.20 0.25 0 2 4 6

(d)

0 2 4 6

(e)

0 2 4 6

(f)

t = 0.70

t = 0.80

t = 0.90

t = 0.95

n

p

diff

erence

INSURANCE

0.00 0.05 0.10 0.15 0.20 0 5 10 15 20

(g)

0 5 10 15 20

(h)

0 5 10 15 20

(i)

t = 0.70

t = 0.80

t = 0.90

t = 0.95

Fig.7. Differenceinsensitivity,speciﬁcityandaccuracybetweentheestimatedthreshold ˆtandseveraladhocones(t=0.70,0.80,0.90,0.95)forHCovern/p.

ˆ

p)essentiallydoublesthecomputationalcomplexityofFriedman’s approach.Second,alargesamplesizemayresultinanextremely lowvalueofmax(ˆf),andthereforeinalargenumberoffalse posi-tives.

Inthepresentstudy,were-investigatethemyogenic progen-itorclonalexpressiondatanormalisedusinghousekeepinggene GAPDHwiththeapproachoutlinedinSection2andtheIAMB algo-rithm.Itisimportanttonotethatthisstrategywasalsousedin theoriginalstudy[20],henceitschoice.Theorderstatistic ˆp(·)was computedfrom500bootstrapsamples.TheempiricalCDFFpˆ(·),the

estimatedthresholdandthenetworkwiththesigniﬁcantedgesare showninFig.8.

Alledgesidentifiedassignificantintheearlierstudy[20]across thevariousstructurelearningtechniquesandnormalisation tech-niqueswerealsoidentifiedbytheproposedapproach(seeFig.3D in[20]).IncontrasttoFig.8,theoriginalstudyusingIAMBand nor-malisationwithrespecttoGAPDHalonedetectedaconsiderable

numberofadditionaledges(seeFig.3Ain[20]).Thusitisquite possiblethattheapproachproposedinthispaperreducesthe num-beroffalsepositivesandspuriousfunctionalrelationshipsbetween thegenes.Furthermore,theapplicationoftheproposedapproach inconjunctionwiththealgorithmfromImotoetal.[39]reveals directionalityoftheedges,incontrasttotheundirectednetwork reportedbyNagarajanetal.[20].

4.2. Proteinsignallinginﬂowcytometrydata

Inalandmarkstudy,Sachsetal.[21]usedBNsforidentifying causal inﬂuences in cellular signalling networks from

simulta-neous measurement of multiple phosphorylated proteins and

phospholipidsacrosssinglecells.Theauthorsusedabatteryof per-turbationsinadditiontotheunperturbeddatatoarriveattheﬁnal

network representation. Agreedy searchscore-based algorithm

(10)

Fig.8.TheempiricalCDFFpˆ(·)forthemyogenicprogenitorsdatafromNagarajanetal.[20](ontheleft),andthenetworkstructureresultingfromtheselectionofthe

signiﬁcantedges(ontheright).TheverticaldashedlineintheplotofFpˆ(·)representsthethresholdF

−1 ˜ p(·)(ˆt).

Fig.9.TheempiricalCDFof ˆp(·)fortheﬂowcytometrydatafromSachsetal.[21](ontheleft),andthenetworkstructureresultingfromtheselectionofthesigniﬁcantedges

(ontheright).TheverticaldashedlineintheplotofFˆp(·)representsthethresholdF

−1 ˜ p(·)(ˆt). accommodatesforvariationsinthejointprobabilitydistribution acrosstheunperturbedandperturbeddatasetswasusedtoidentify theedges[41].Moreimportantly,signiﬁcantedgeswereselected usinganarbitrarysigniﬁcancethresholdof0.85(seeFig.3,[21]).A detailedcomparisonbetweenthelearnednetworkandfunctional relationshipsdocumentedintheliteraturewaspresentedinthe

samestudy.

We investigated theperformance of theproposed approach

inidentifyingsigniﬁcantfunctionalrelationshipsfromthesame

experimental data. However, we limit ourselves to the data

recorded without applying any molecular intervention, which

amountto854observationsfor11variables.Wecompareand con-trastourresultstothose obtainedusing anarbitrarythreshold of0.85.Thecombinationofperturbedandnon-perturbed obser-vationsstudiedinSachsetal.[21]cannotbeanalysedwithour approach,becauseeachsubsetofthedatafollowsadifferent prob-abilitydistributionandthereforethereisnosingle“true”network G0.Analysisoftheunperturbeddatausingtheapproachpresented inSection2revealstheedgesreportedintheoriginalstudy.The resultingnetworkisshowninFig.9alongwithFpˆ(·)andthe

esti-matedthreshold.Fromtheplot ofFpˆ(·) we canclearly seethat

significantandnon-significantedgespresentwidelydifferent lev-elsofconfidence,tothepointthatanythresholdbetween0.4and 0.9resultsinthesamenetworkstructure.This,alongwiththevalue oftheestimatedthreshold(ˆp(i)0.93),showsthatthenoisinessof thedatarelativetothesamplesizeislow.Inotherwords,the sam-pleisbigenoughforthestructurelearningalgorithmtoreliably selectthesignificantedges.Theedgesidentifiedbytheproposed

methodwerethesameasthoseidentiﬁedby[21]usinggeneral

stimulatorycuesexcludingthedatawithinterventions(seeFig. 4Ain[21],Supplementaryinformation).Incontrastto[21],using Imotoetal.[39]approachinconjunctionwiththeproposed thresh-oldingmethodwewereabletoidentifythedirectionsoftheedges inthenetwork.Thedirectionscorrelatedwiththefunctional rela-tionshipsdocumentedinliterature(Table3,[21],Supplementary information)aswellaswiththedirectionsoftheedgesinthe net-worklearnedfrombothperturbedandunperturbeddata(Fig.3,

[21]).

5. Conclusions

Graphicalmodelsandnetworkabstractionshaveenjoyed

con-siderableattentionacrossthebiologicalandmedicalcommunities. Suchabstractionsareespecially usefulin decipheringthe

inter-actions between the entities of interest from high-throughput

observationaldata.Classicaltechniquesforidentifyingsigniﬁcant edgesintheresultinggraphrely onadhocthresholdingofthe edgeconﬁdenceestimatedfromacrossmultipleindependent

real-isationsof networkslearned fromthegiven data.Largead hoc

thresholdvaluesareparticularlycommon,andarechoseninan efforttominimise noisyedgesin theresulting network. While usefulinminimisingfalsepositives,suchachoicecanaccentuate falsenegativeswithpronouncedeffectonthenetworktopology.

The present study overcomesthis caveatby proposing a more

straightforwardandstatisticallymotivatedapproachfor identify-ingsigniﬁcantedgesinagraphicalmodel.Theproposedestimator

(11)

levelsand theCDFoftheirasymptotic,idealconﬁguration.The effectivenessoftheproposedapproachisdemonstratedonthree syntheticdatasets[28–30]andongeneexpressiondatasetsacross twodifferentstudies[20,21].However,theapproachisdeﬁnedina moregeneralsettingandcanbeappliedtomanyclassesofgraphical modelslearnedfromanykindofdata.

Acknowledgements

ThisworkwassupportedbytheUKTechnologyStrategyBoard

(TSB) and Biotechnology & Biological Sciences Research Coun-cil(BBSRC),grantTS/I002170/1(MarcoScutari)andtheNational

LibraryofMedicine,grantR03LM008853(Radhakrishnan

Nagara-jan).MarcoScutariwouldalsoliketothankAdrianaBroginifor proofreadingthepaperandprovidingusefulsuggestions.

References

[1]KollerD,FriedmanN.Probabilisticgraphicalmodels:principlesandtechniques.

Cambridge,MA,USA:MITPress;2009.

[2]PearlJ.Probabilisticreasoninginintelligentsystems:networksofplausible

inference.SanMateo,CA,USA:MorganKaufmann;1988.

[3]WhittakerJ.Graphicalmodelsinappliedmultivariatestatistics.Hoboken,NJ,

USA:Wiley;1990.

[4]EdwardsDI.Introductiontographicalmodelling.2nded.NewYork,USA:

Springer;2000.

[5]NeapolitanRE.LearningBayesiannetworks.EnglewoodCliffs,NJ,USA:Prentice

Hall;2003.

[6]KorbK,NicholsonA.Bayesianartiﬁcialintelligence.2nded. London,UK:

Chapman&Hall;2010.

[7]Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks:

the combination of knowledge and statistical data. Machine Learning

1995;20(3):197–243,availableasMicrosoftTechnicalReportMSR-TR-94-09.

[8]GeigerD,HeckermanD.LearningGaussiannetworks,Tech.rep.,Microsoft

Research,Redmond,Washington,MicrosoftTechnicalReportMSR-TR-94-10;

1994.

[9]BrombergF,MargaritisD,HonavarV.EfﬁcientMarkovnetworkstructure

dis-coveryusingindependencetests.JournalofArtiﬁcialIntelligenceResearch

2009;35:449–85.

[10]CasteloR,RoveratoA.ArobustprocedureforGaussiangraphicalmodelsearch

from microarraydatawith plarger thann.Journalof MachineLearning

Research2006;7:2621–50.

[11]FriedmanN,Pe’erD,NachmanI.LearningBayesiannetworkstructurefrom

massivedatasets:the“sparsecandidate”algorithm.In:LaskeyKB,PradeH,

editors.Proceedingsof15thconferenceonuncertaintyinartiﬁcialintelligence

(UAI).MorganKaufmann;1999.p.206–21.

[12]Larra ˜nagaP,SierraB,GallegoMJ,MichelenaMJ,PicazaJM.LearningBayesian

networksbygeneticalgorithms:acasestudyinthepredictionofsurvivalin

malignantskinmelanoma.In:KeravnouET,GarbayC,BaudRH,WyattJC,

edit-ors.Proceedingsofthe6thconferenceonartiﬁcialintelligenceinmedicinein

Europe(AIME).Springer;1997.p.261–72.

[13]TsamardinosI,BrownLE,AliferisCF.TheMax–MinHill-ClimbingBayesian

networkstructurelearningalgorithm.MachineLearning2006;65(1):31–78.

[14]Elidan G.Bayesian networkrepository;2001 http://www.cs.huji.ac.il/site/

labs/compbio/Repository

[15]MurphyP,AhaD.UCImachinelearningrepository;1995http://archive.ics.

uci.edu/ml

[16]FriedmanN,GoldszmidtM,WynerA.DataanalysiswithBayesiannetworks:

abootstrapapproach.In:LaskeyKB,PradeH,editors.Proceedingsofthe15th

annualconferenceonuncertaintyinartiﬁcialintelligence(UAI).Morgan

Kauf-mann;1999.p.206–15.

[17]EfronB,TibshiraniR.Anintroductiontothebootstrap.London,UK:Chapman

&Hall;1993.

CambridgeUniversityPress;2008.

[19]HusmeierD.Sensitivityandspeciﬁcityofinferringgeneticregulatory

inter-actions from microarray experiments with dynamic Bayesian networks.

Bioinformatics2003;19:2271–82.

[20]NagarajanR,DattaS,ScutariM,BeggsML,NolenGT,PetersonCA.Functional

relationshipsbetweengenesassociatedwithdifferentiationpotentialofaged

myogenicprogenitors.FrontiersinPhysiology2010;1(21):1–8.

[21]SachsK, Perez O, Pe’er D, Lauffenburger DA, NolanGP. Causal

protein-signalingnetworks derivedfrommultiparametersingle-cell data.Science

2005;308(5721):523–9.

[22]LauritzenSL.Graphicalmodels.Oxford,UK:OxfordUniversityPress;1996.

[23]ChickeringDM.Optimalstructureidentiﬁcationwithgreedysearch.Journalof

MachineLearningResearch2002;3:507–54.

[24]DeGrootMH,SchervishMJ.Probabilityandstatistics.4thed.Reading,MA,USA:

Addison-Wesley;2011.

[25]KolmogorovAN,FominSV.Elementsofthetheoryoffunctionsandfunctional

analysis.Rochester,NewYork,USA:GraylockPress;1957.

[26]CsiszárI,ShieldsP.Informationtheoryandstatistics:atutorial.Boston,MA,

USA:NowPublishersInc.;2004.

[27]NocedalJ,WrightSJ.Numericaloptimization.NewYork,USA:Springer-Verlag;

1999.

[28]BeinlichIA,SuermondtHJ,ChavezRM,CooperGF.TheALARMmonitoring

system:acasestudywithtwoprobabilisticinferencetechniquesforbelief

networks.In:HunterJ,CooksonJ,WyattJ,editors.Proceedingsofthe2nd

Euro-peanconferenceonartiﬁcialintelligenceinmedicine(AIME).Springer-Verlag;

1989.p.247–56.

[29]AbramsonB,BrownJ,EdwardsW,MurphyA,WinklerRL.Hailﬁnder:aBayesian

systemforforecastingsevereweather.InternationalJournalofForecasting

1996;12(1):57–71.

[30]BinderJ,KollerD,RussellS,KanazawaK.Adaptiveprobabilisticnetworkswith

hiddenvariables.MachineLearning1997;29(2-3):213–44.

[31]TsamardinosI,AliferisCF,StatnikovA.AlgorithmsforlargescaleMarkov

blanketdiscovery.In:RussellI,HallerSM,editors.Proceedingsofthe16th

internationalFloridaartiﬁcialintelligenceresearchsocietyconference.AAAI

Press;2003.p.376–81.

[32]MargaritisD.LearningBayesiannetworkmodelstructurefromdata.Ph.D.

the-sis.SchoolofComputerScience,Carnegie-MellonUniversity,Pittsburgh,PA,

AvailableasTechnicalReportCMU-CS-03-153;May2003.

[33]HausserJ,StrimmerK.EntropyinferenceandtheJames–Steinestimator,with

applicationtononlineargeneassociationnetworks.StatisticalApplicationsin

GeneticsandMolecularBiology2009;10:1469–84.

[34]ScutariM.bnlearn:Bayesiannetworkstructurelearning,Rpackageversion2.7;

2011http://www.bnlearn.com/

[35]ScutariM.LearningBayesiannetworkswiththebnlearnRpackage.Journalof

StatisticalSoftware2010;35(3):1–22.

[36]RDevelopmentCoreTeam.R:alanguageandenvironmentforstatistical

computing.Vienna, Austria:RFoundationforStatisticalComputing;2011

http://www.R-project.org

[37]Scutari M, Brogini A. Bayesian network structure learning with

per-mutation tests. Communications in Statistics – Theory and Methods

2012;41(16–17):3233–43.Specialissue“Statisticsforcomplexproblems:

per-mutationtestingmethodsandrelatedtopics”.Proceedingsoftheconference

“statisticsforcomplexproblems:themultivariatepermutationapproachand

relatedtopics”,Padova,June14–15,2010.

[38]DashD,DruzdzelMJ.Ahybridanytimealgorithmfortheconstructionofcausal

modelsfromsparsedata.In:LaskeyKB,PradeH,editors.Proceedingsofthe15th

conferenceonuncertaintyinartiﬁcialintelligence(UAI).MorganKaufmann;

1999.p.142–9.

[39]ImotoS,KimSY,ShimodairaH,AburataniS,TashiroK,KuharaS,etal.Bootstrap

analysisofgenenetworksbasedonBayesiannetworksandnonparametric

regression.GenomeInformatics2002;13:369–70.

[40]ChickeringDM.AtransformationalcharacterizationofequivalentBayesian

networkstructures.In:BesnardP,HanksS,editors.Proceedingsofthe11th

conferenceonuncertaintyinartiﬁcialintelligence(UAI).MorganKaufmann;

1995.p.87–98.

[41]CooperGF, Yoo C.Causaldiscoveryfroma mixtureofexperimentaland

observationaldata.In:LaskeyKB,PradeH,editors.Proceedingsof15th

confer-enceonuncertaintyinartiﬁcialintelligence(UAI).MorganKaufmann;1999.

SciVerse

w w w . e l s e v i e r . c o m / l o c a t e / a i i m

CC BY license.

http://www.cs.huji.ac.il/site/labs/compbio/Repository

http://archive.ics.uci.edu/ml

http://www.bnlearn.com/

http://www.R-project.org