Chain event graph MAP model selection

(1)

University of Warwick institutional repository: http://go.warwick.ac.uk/wrap

This paper is made available online in accordance with

publisher policies. Please scroll down to view the document

itself. Please refer to the repository record for this item and our

policy information available from the repository home page for

further information.

To see the final version of this paper please visit the publisher’s website

.

Access to the published version may require a subscription.

Author(s): PA Thwaites, G Freeman and JQ Smith

Article Title: Chain Event Graph MAP model selection

Year of publication: 2009

Link to published article:

http://www2.warwick.ac.uk/fac/sci/statistics/crism/research/2009/paper

09-07

(2)

PeterA.Thwaites, GuyFreeman and JimQ. Smith DepartmentofStatistis

UniversityofWarwik CoventryUKCV47AL

Abstrat

Whenlookingforgeneralstruturefroma -nite disrete data set it is quiteommon to searh over the lass of Bayesian Networks (BNs). The lass of Chain Event Graph (CEG)modelsishowevermuhmore expres-sive and is partiularly suited to depiting hypotheses about how situations might un-fold. TheCEGretainsmanyofthedesirable qualities of the BN. In partiular it admits onjugate learningon its onditional proba-bilityparametersusingprodutDirihlet pri-ors. TheBayesFatorsassoiatedwith dier-entCEGmodels anthereforebealulated in anexpliitlosedform,whihmeansthat searhforthemaximumaposteriori(MAP) modelin thislass an be enatedby evalu-atingthesorefuntionofsuessivemodels andoptimizing. AswithBNs,byhoosingan appropriate prior overthe model spae, the onjugay property ensures that this sore funtionislinearinthedierentomponents of theCEG model. Loalsearhalgorithms an therefore be devised whih unveil the rih lass of andidate explanatory models, and allowusto seletthe mostappropriate. In thispaperweonentrateon this disov-ery proess and upon the soring of models within thislass.

1 INTRODUCTION

TheChain EventGraph(CEG), introdued inSmith &Anderson(2008),Thwaites,Smith&Cowell(2008) andSmith,Riomagno&Thwaites(2009),isa graph-ialmodel speiallydesignedto embody the ondi-tionalindependenestrutureofproblemswhosestate spaesarehighlyasymmetrianddonotadmita nat-ural produt struture. There are many senariosin

mediine, biologyandeduationwhere suh asymme-triesarisenaturally(forexamplesseeSmith& Ander-son(2008)),andwherethemainfeaturesofthemodel lassannotbefullyapturedbyasingleBNorevena ontextspeiBN.AkeypropertyoftheCEG frame-workis that these graphialmodelsare qualitativein theirtopologies{theyenodesetsofonditional inde-pendenestatementsabouthowthingsmighthappen, withoutprespeifyingtheprobabilitiesassoiatedwith theseevents. EahCEGmodelanthereforebe identi-edwithauniqueexplanationofhowsituationsmight unfold.

For a detailed formal desriptionand motivation for using aCEGmodeland anoutlineofsomeof its im-pliitonditional independenestrutureseeSmith& Anderson(2008). Inthispaperitwasshownthat the CEG is a more expressive graphial model than the BNinthatanyasymmetriesarerepresentedexpliitly in the topology of the CEG, and in that CEGs an beusedtoexpressamuhrihersetofonditional in-dependenestatementsnotsimultaneouslyexpressible throughasingleBN.It wasalsoshownthatthelass of BNs is ontained within that of CEGs. This is a propertywhih we exploit later, sinewith appropri-ate prior settings, it follows that BNmodel seletion proeduresanbenestedwithin thoseforCEGs. The CEG is an event-based (rather than variable-based) graphialmodel,and isafuntionof anevent tree. Any problem on a nite disrete data set an bemodelledusinganeventtree,buttheyare partiu-larlysuitedtoproblemswithasymmetristatespaes. Unfortunately,itisalmostimpossibletoreadthe on-ditional independene properties of amodel from an event treerepresentation, asonly trivial independen-ies are expressed within its topology. The CEG el-egantly solves this problem, enoding a rih lass of onditionalindependene statementsthroughitsedge andvertexstruture.

(3)

di-tree'snon-leafvertiesorsituations(Shafer(1996)). A probability treean then bespeiedby atransition matrixonV(T),whereabsorbingstatesorrespondto leaf-verties. Transitionprobabilities are zeroexept fortransitionsto asituation's hildren(seeTable1).

Table1: Partofthetransition matrixforExample1

v 1 v 2 v 3 v 4 v 5 v 6

::: v 1 1 v 2 1 ::: v 0 1 2 3

0 0 0 ::: 0 0 ::: v

1

0 0 0

5

0 0 ::: 4

0 ::: v

2

0 0 0 0

4

5

::: 0 0 ::: . . . . . . . . . . . . . . .

LetT(v)bethesubtreerootedinthesituationvwhih ontainsallvertiesafterv in T. Wesaythat v

1 and v

2

arein thesamepositionif:

thetrees T(v 1

)and T(v 2

)are topologially iden-tial,

thereisamapbetweenT(v 1

)andT(v 2

)suhthat theedgesinT(v

2

)arelabelled,underthismap,by thesameprobabilitiesastheorrespondingedges in T(v

1 ).

The set W(T) of positions w partitions S(T). The transporterCEG (Thwaites,Smith &Cowell2008)is adiretedgraphwithvertiesW(T)[fw

1

g,withan an edge e from w

1 to w

2 6= w

1

for eah situation v

2 2 w

2

whih is a hild of a xed representative v

1 2 w

1

for some v 1

2 S(T), and an edge from w 1 to w

1

foreah leaf-nodev2V(T)whih isahildof somexedrepresentativev

1 2w

1

forsomev 1

2S(T). Forthepositionwin ourtransporterCEG,wedene the oretF(w)to bew togetherwith theset of out-goingedgesfromw. Wesaythatw

1 andw

2

areinthe samestageif:

the orets F(w 1

) and F(w 2

) are topologially idential,

there is a map between F(w 1

) and F(w 2

) suh that the edges in F(w

2

) are labelled, under this map,bythesameprobabilitiesasthe orrespond-ingedgesin F(w

1 ).

TheCEGC(T)isthenamixedgraphwithvertexset W(C)equaltothevertexsetofthetransporterCEG, direted edge set E

d

(C) equal to the edge set of the transporterCEG,andundiretededgesetE

u

(C) on-sistingofedgeswhihonnettheomponentpositions

onstrution proess is illustratedin Example 1,and anexampleCEGin Figure2.

Example1

ConsiderthetreeinFigure1whihhas11atoms (root-to-leafpaths). Symmetriesinthetreeallowustostore thedistribution in 5onditional tableswhih ontain 11(6free)probabilities. ThetransporterCEGis pro-dued byombiningthe vertiesfv

4 ;v

5 ;v

7

g into one positionw

4

,thevertiesfv 6

;v 8

gintoonepositionw 5

, and all leaf-nodes into a single sink-node w

1 . The CEGC (Figure2)hasanundiretededgeonneting thepositionsw

1 andw

2

astheselieinthesamestage {theiroretsaretopologiallyidential,andtheedges of theseoretsarrythesameprobabilities.

v

0 v

_inf

1 v

inf

2 v

₁

v

2 v

₃

v

4 v

5 v

6 v

₇

v

8 θ

₁

θ

₂

θ

₃

θ

₄

θ

₅

θ

4 θ

5 θ

6 θ

7 θ

₈

θ

₉

θ

₈

θ

9 θ

₁₀

θ

11 θ

8 θ

9 θ

₁₀

θ

₁₁

Figure1: TreeforExample1

w

₀

w

inf

w

1 w

₂

w

3 w

4 w

5 θ

₁

θ

₂

θ

3 θ

₄

θ

5 θ

₄

θ

₅

θ

₆

θ

₇

θ

₈

θ

₉

θ

₁₀

θ

11

Figure2: CEGforExample1

(4)

oret. Fast propagationalgorithmsforasimpleCEG were developed in Thwaites, Smith & Cowell (2008). These exploitedthegraph'sembeddedonditional in-dependenies to fatorize its mass funtion over lo-al masses on orets. In this paper we demonstrate howthis fatorizationof thejointmassfuntion over agiveneventspaeanalsobeusedasaframeworkfor searhing overa spae of promising andidate CEGs todisovermodelsthatprovidegoodqualitative expla-nationsoftheunderlyingdatageneratingproessofa givendataset. Beausethesesearhmethodsare sim-ilar towellknownalgorithms usedfor searhing BNs weareabletousesimilarargumentsforsettingup hy-perparameters overpriorssothat the priorsoverthe modelspaedeomposeasolletions ofloalbeliefs. As the CEGanexpress a riher lassof onditional independene struturesthantheBN,CEGmodel se-letionallowsfortheautomatiidentiationofmore subtle features ofthedata generatingproessthanit would be possible to express (and therefore to eval-uate) through the lass of BNs. Simple examplesof the types of struture that mightexist and ould be disoveredaregivenbelow.

Setion2introduesthetehniquesforlearningCEGs andomparesthesewiththoseforlearningBNs. Se-tion 3 onsists of an example illustrating the advan-tages of searhing over the extended andidate set availablewhenlearningCEGs,andsetion4ontains further disussionofthetheory.

2 LEARNING CEGs

The reasontheCEG sharesthe onjugayproperties oftheBNisthatwithompleterandomsamplingthe likelihood separates into produtsof termswhihare onlyafuntionofparametersassoiatedwithone om-ponentofthemodel. IntheBNeahtermisassoiated withavariableanditsparents;intheaseoftheCEG, the model omponentis the oret. Furthermore, the term in the likelihood orresponding to a partiular oret is proportional to one obtained from multino-mial samplingonthe setof unitsarriving attheroot of theoret.

From our CEG denition, if w 1

;w 2

2 ufor some u, thentheorrespondingedgesintheoretsF(w

1 )and F(w

2

)arrythesameprobabilities. So,foreah mem-beruofthesetofstagespresribedbythemodelunder onsiderationforourCEG,weanlabeltheedges leav-ingubytheirprobabilitiesunder thismodel. Wean thenletx

un

bethetotalnumberofsampleunits pass-ing through an edge labelled

un

; and thelikelihood L()forourCEGmodelis givenby

L() = Y

u Y

n

un x

un

For BNs, the assumptions of loal and global inde-pendene,andtheuseofDirihletpriorsensures on-jugay. The analogue for CEGs is to give the ve-tors of probabilities assoiated with the stages inde-pendent Dirihlet distributions. Then the struture of the likelihood L() results in prior and posterior distributions for the CEG model whih are produts of Dirihletdensities. The resultof this onjugayis that themarginallikelihood ofeah CEGis therefore the produtof themarginallikelihoods ofits ompo-nent orets. Expliitly, the marginal likelihood of a CEGC is

Y u

( P

n

un ) (

P n

( un

+x un

)) Y

n (

un +x

un ) (

un )

where, asabove

uindexesthestages ofC

nindexestheoutgoingedgesofeahstage

un

aretheexponentsofourDirihletpriors x

un

arethedataounts

As weare atuallyinterestedin p(modelj data),and thisisproportionaltop(datajmodel)p(model),we needto setbothparameterpriorsandprior probabil-itiesforthepossiblemodels.

(5)

tributionontheleavesisneessarilyaprioriDirihlet (seeFreeman&Smith(2009)). Modularityonditions then result in oret distributions being Dirihletand mutually independent.

Exatlyanalogously withBNs,parametermodularity in CEGs implies that whenever CEG models share some aspet of their topology, we assign this aspet thesamepriordistributionineahmodel. Whensuh priors reet our beliefs in a given ontext, this an redue ourproblemdramatiallyto oneofsimply ex-pressing prior beliefs about the possible oret distri-butions (ie. theloal dierenesin modelstruture). As eah CEG model is essentially a partition of the vertiesintheunderlyingtreeinto setsofstages,this requirement ensures that when two partitions dier onlyin whetheror notsomesubsetofvertiesbelong tothesamestage,thepriorexpressionsforthemodels dieronlyinthetermrelatingtothisstage.The sepa-rationofthelikelihoodmeansthatthisloaldierene propertyis retainedintheposteriordistribution. Now,ourandidatesetis muhriherthanthe orre-spondingandidateBNset,andwillprobablyontain modelswehavenotpreviouslyonsideredinour anal-ysis. Again, evoking modularity, if wehave no infor-mation to suggest otherwise, we follow standard BN pratieandletp(model)beonstantforallmodelsin the lass ofCEGs. We now usethe logarithm of the marginallikelihood ofaCEGmodelasitssore,and maximise this soreoverourset of andidate models to ndtheMAPmodel.

Our expression has thenie property that the dier-ene in sorebetweentwomodels whih areidential exept for apartiularsubset of orets, is afuntion of thesubsores onlyof theprobabilitytables on the orets where they dier. Various fast deterministi and stohastialgorithms antherefore bederivedto searhoverthemodelspae,evenwhenthisislarge{ see Freeman&Smith (2009)forexamplesofthese in thepartiular asewhere theunderlyingeventtreeis xed. This property is of ourse shared by thelass ofBNs.

Wesetthepriorsofthehyperparameterssothatthey orrespond to ounts of dummy units through the graph. This an be done by setting a Dirihlet dis-tribution ontheroot-to-sinkpaths,andforsimpliity we hoose a uniform distribution for this. It is then easy to hek(see Freeman &Smith (2009))that in thespeialasewheretheCEGisexpressibleasaBN, theCEGsoreaboveisequaltothestandardsorefor a BN using the usual priorsettings as reommended in,forexample,Cooper&Herskovits(1992)and Hek-erman,Geiger&Chikering(1995). As aomparison

amultivariatelikelihood,themarginallikelihoodona BNisexpressibleas

Y i2V " Y m ( P n imn ) ( P n ( imn +x imn )) Y n ( imn +x imn ) ( imn ) # where

iindexesthesetofvariablesoftheBN nindexesthelevelsofthevariable X

i

m indexes vetorsof levels of the parental vari-ablesofX

i

The importane ofthis resultis that were werstto searh the spae of BNs for the MAP model, then weould seamlesslyrene thismodel usingthe CEG searh sore desribed above. Suh embellishments will allow us to searh over models ontaining on-text spei information or Noisy AND/OR gates. Furthermore any model we nd will have an assoi-ated interpretation whih an be stated in ommon language, and an be disussed and ritiqued by our lient/expert foritsphenomenologialplausibility. FortheCEGinFigure2,weputauniformpriorover the11root-to-leafpaths,whihinturnallowsusto as-signourstagepriorsasfollows: weassignaDi(3;4;4) priortothestage identiedbyw

0

, aDi(3;4)priorto thestageu

1 (w

1 ;w

2

),aDi(2;2)priortoeahofthe stages identied by w

3 and w

5

, and a Di(3;3) prior to the stage identied byw

4

. We would then havea marginallikelihoodof

(11) (11+N)

(3+x 01

) (4+x 02

) (4+x 03

) (3) (4) (4)

(7) (7+x

01 +x

02 )

(3+x 14

+x 24

) (4+x 15 +x 25 ) (3) (4) (4) (4+x

03 )

(2+x 36

) (2+x 37

) (2) (2)

(6) (6+x

15 +x 24 +x 36 )

(3+x 48

) (3+x 49

) (3) (3)

(4) (4+x

25 +x

37 )

(2+x 510

) (2+x 511

) (2) (2)

where, with aslightabuseof notation, we letfor ex-ample x

24

bethedatavalueassoiatedwiththeedge leavingw

2

labelled 4

;andwhereN isthesamplesize = P 3 n=1 x 0n .

(6)

through a BN, we nd that some variables have no outomes given partiular vetors of valuesof anes-tral variables. We annot simply set probabilities to zerointhisinstaneasaDirihletdistributionisthen no longer appropriate and so the usual model sele-tion proedures fails. Furthermore, this is one type of senario whih annot bemodelled adequately us-ing the standard lasses of ontext-spei BNs. By omparison, sine suh models exist within the lass ofCEGmodels,theyanofourseberevealed(andif appropriate,seleted)byCEG-basedonjugatesearh algorithms.

3 A SIMPLE SIMULATED MODEL

In this setion we onsider a simple example whih demonstratestheversatilityofourmethod. Ourlient is analysing a medial data set relating to an inher-ited ondition. A random sample of 100(51 female, 49 male) people has been taken from a population whohavehadreentanestorswiththeondition. For eahindividualinthesampleareordhasbeenkeptof whether ornot they displayeda partiular symptom in their teens, and whether or not they then devel-opedthe ondition in middle age. Thedata is given in Table 2, where A = 0;1 orresponds to female, male; B = 1 orresponds to the individual display-ingthesymptom; andC=1orrespondsto the indi-vidual developingthe ondition. Our lientdoes not knowwhetherdisplayingthesymptomisindependent ofgender,buthavinglookedatthedata,believesthat itisnot.

Table2: Dataforexample(N =100)

A

0 1

B B

0 1 0 1

C 0 33 6 10 12

1 6 6 9 18

Using his medial knowledge, our lient has deided that the model lies in a andidate lass of six, but is unwilling toexpressanypreferenefor apartiular modelwithinthisset.

In eah of these six models B is not independent of A. Thefurther onditionalindependene strutureof themodelsisgivenby (i)Cq(A;B),(ii)CqA jB, (iii)CqBjA,(iv)CqBj(A=1)(thereisone distri-bution fordevelopingtheonditiongiventhatgender ismale),(v)CqAj(B=1)(thereisonedistribution for developingtheonditiongiventhat symptom was

distributionfordevelopingtheonditiongiventhatan individualismaleORdisplayedthesymptom,andone distributionfordevelopingtheonditiongiventhatan individualisfemaleANDdidnotdisplaythesymptom {aNoisyORgate).

The models are depited in Figure 3. Only the rst threeofthesemodelsanberepresentedasBNs,with thefourthandfthasontext-speiBNsofthetype desribed in, for example, Boutilier et al (1996) or Poole & Zhang (2003). The sixth would need us to reatenewvariablesinorderforustorepresentitasa BN{anotherexamplewouldbeCq(A;B)j

A B , whihhasaCEGsimilar tothat of(ii), butwith the edgesleavingw

2

swappedsothatB=1jA=1isthe edge from w

2 to w

3

, and B = 0j A =1 is theedge from w

2 tow

4 .

Weanread,forexampleCEG(ii)asfollows:

w 1

andw 2

arenotinthesamestage,soAq/B, edges labelled B = 0 onverge at w

3 , so CqA j(B=0). Similarly,edgeslabelledB =1 onvergeatw

4

,soCqAj(B=1),andombining these wegetCqAjB.

InCEG(v)byontrast:

edgeslabelledB=1onvergeatasingleposition, soCqAj (B=1),but edgeslabelledB =0do not,sowedonothaveCqAj(B=0).

TheCEGportraystheontext-speionditional in-dependene propertiesof the model in its topology{ theontext-speiBNdoesnot.

Note that our lient's andidate set is a restrition of the set of possible models { he has for instane dismissed models whih enode statements suh as CqB j (A = 0) or CqA j (B = 0) and all mod-elswhere AqB. Infatthere are15possiblemodels in thefullandidatesetifwerequireAtobeaparent ofB andB tobeatemporalpredeessorofC,and30 ifwerelaxtheparentalondition,butrequirethatAis atemporalpredeessorofBisatemporalpredeessor ofC. Notethatthereareonly4possibleBNswhereA isaparentofBandBisatemporalpredeessorofC, and8possibleBNswhereAisatemporalpredeessor of B isatemporal predeessorofC. Byusing CEGs we anquikly have a lear idea of the full range of andidatemodels,andalsoourlearningmethodworks for allmodelsin this range,inludingmodelssuh as Cq(A;B)jMAX(A;B)orCq(A;B)j

A B

(7)

A=

0 A

=1

B=0|A=0

B=1

_|A=

0 B=

0|A

=1

B=

1|A

=1

w0

w

1 C=0

w

2 w

3 w

inf

C=1

(i)

C=

0|B

=0

etc

_.

B=0|

A=0

B

=

1|A

=

₀

B

=

0|

A

=

1 B=1|A=1

(ii)

w0

w

1 w

2 w3

w4

w

inf

A

=0

A

=1

etc.

C=0|A

₌₀

(iii)

(iv)

B=0|A

=0

B

=

1|A

=

₀

B=

0|

A=

1 B=1|A=1

(v)

B=0|A=0

B

=1

_|A

=0

(vi)

Figure3: CEGsin ourandidateset

As our lient has no partiular preferene for any of the six models, it makes sense to let p(model) be a onstant value for all models in the andidate set, as suggested in setion 2. This allows us to use p(datajmodel)asameasureforp(modeljdata),and weanthenletthesoreofamodelbeitslogmarginal likelihood.

ThethreemodelsexpressibleasBNsouldofoursebe sored usingtheexpression forBNsgivenabove,and thiswouldgiveusthesameanswerasourmethod us-ingCEGs. ButnotethattheBN-expressionsforthese models are more omplex and less transparent than ourCEG-expressions.Weouldperhapsusealearning method speially adapted for ontext-spei BNs to sore thefourth and fthmodels (seefor example Feelders&vanderGaag(2005)),butitisnotevident howwewouldsorethesixthmodel(onsistentlywith thesoringoftheothermodels)usingaBN-based ap-proah.

The soreformodel (i)deomposesinto four ompo-nentsassoiatedwiththeoretsatw

0 ;w

1 ;w

2 andw

3 . The omponentsassoiatedwiththeoretsat w

0 ;w

1 and w

2

(8)

ponents assoiated withthe oretsat fw i

g i3

. Sor-ing our6 models weobtain -202.79,-199.37, -199.15, -197.58,-197.53and-196.45. Weanseethatmodel(i) istheleastappropriate,indiatingthatCq/(A;B)and thattheremustbesomesortofdependenyofConA and/orB. Models(iv),(v) and(vi)sorebetterthan models (ii)and (iii), indiating that this dependeny is at best ontext-spei, and that the most appro-priatemodelisnotgoingtobeexpressibleasaBN.In fat thebest model in the andidateset isthe Noisy OR gate, a model whih ould not be seleted by a standardBN-basedlearningalgorithm.

LookingattheCEGsinFigure3,weanseethat mod-els(iv)and(vi)anbearrivedatbymakingone alter-ation tomodel(iii), andthatmodels(v)and(vi)an bearrivedatbymakingonealterationtomodel(ii). It iseasytoseehoweÆientalgorithmsouldbereated to searhoverthemodelspaein thisexample.

Returning to the premise of our example, we share these results with our lient, who then wants us to hekwhetheraNoisyORgatewithAqBmightsore better thanCEGmodel (vi). This model is depited in Figure4. Theadditionalinformation in this CEG anbereadasfollows:

thereisanundiretededgeonnetingw 1

andw 2

, sothesetwopositionsareinthesamestage. Now positions in the same stage have their edges la-belledidentially,sotheedgesleavingw

1 andw

2 havelabelsthatdonotdependonthevalueofA. ConsequentlyAqB.

The sore for this new model is -202.09, indiating that this model is notasgood as model (vi). Thisis unsurprising given that the data in Table 1 suggests stronglythat AisnotindependentofB.

C=0

_|A=0

,B=0

B

=1

A

=0

A

=1

etc

_.

B=0

B=1

w0

w1

w

2 w3

w

4 winf

Figure4: CEGfornewAqB model

Clearly, searhing over the lass of CEGs is diretly analogous tosearhingoverthelassof BNs,but the lass of CEG models is muh more expressive. This rihnesshasanassoiateddisadvantage{thelassof allBNsisalreadydiÆulttosearhinlargeproblems, and various methods havebeendeveloped to restrit thesearhtosubsetsofthelass(seeforexamplevan Gerven &Luas (2004), where the lass ofBNs that haveedge-ongurationsonsistentwithagiven span-ningtreearesearhed). ThenumberofpossibleCEGs available for even a small number of verties is ex-tremely large. Therefore, in even moderately sized problems it is usually eÆaious to rst restrit the modellasstosomethingsmaller.

Beause eahmodelin thislassis qualitatively ex-pressed in any given ontext, this task is muh eas-ier than it mightrst appear. Thus, for example, in the eduational examplesonsidered in Freemanand Smith (2009), the ontext demands that the under-lying event treeis onsistent withthe order students study ourses, and that ertain verties ould never reasonably be ombined into the samestage. These sortsofontextuallydenedonstraintsanreadilybe inorporated into ustomized searh algorithms, and theeÆienyofthesearhproedureimproved. Thus, although more eort is needed to set up ustomized searhspaes forCEGsthan forBNs, wehavefound thatthesubsequentdiretinterpretabilityofanyMAP modelmorethanompensatesforthiseort.

It is also notunusual for morequantitative informa-tion tobeavailable,suhasonetypeofstage ombi-nationbeingproportionatelymoreprobablethan an-other. Thisanallowonetousefullyfurtherreneand improvethesearh, althoughthen theframework the CEGprovidesisnolongertotallyqualitative.

(9)

Of ourse, just as with BNs, the onjugay does not neessarilyontinuetoholdwhensamplingisnot om-plete. In this ase approximate or numerial searh algorithms needto beemployedwith onsequentloss of aurayor speed in soring and omparing mod-els. Howeverin this asethemethods forestimating BNs with missing values (see for example Riggelsen (2004))anusually beextendedsothat theyalso ap-ply to CEGs. Wewill report onour ndingson this topi inalaterpaper.

Lastly, it might be argued that ontext-spei BNs an be used to portray any set of onditional inde-pendene propertiesof amodel,andthat itwouldbe a better use of resoures developing improved learn-ing methods for these graphs. In fat, as noted in setion 2,thereare signiantsets ofsenarioswhih annot easily be modelled with ontext-spei BNs, whihannone-the-lessbemodelledwithCEGs. More importantly perhaps,an analyst modelling with BNs and their variants may not be aware just how many dierentmodelsareavailableas possibleexplanations oftheunderlyingdatageneratingproessoftheirdata set. Thisisnotaproblemenounteredbytheanalyst modellingwithCEGs.

Aknowledgements

This researh hasbeen partlyfunded bythe UK En-gineering and Physial Sienes Researh Counil as part of the projet Chain Event Graphs: Semantis andInferene(grantno. EP/F036752/1).

Referenes

[1℄ C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-spei independene in Bayesian Networks. In Proeedings of the 12th Conferene on Unertainty in Artiial Intelli-gene,pages115{123,Portland,Oregon,1996. [2℄ G. F. Cooper and E. Herskovits. A Bayesian

method for the indution of Probabilisti Net-works from data. Mahine Learning, 9(4):309{ 347,1992.

[3℄ A. Feelders and L. van der Gaag. Learning BayesianNetwork parameters with prior knowl-edgeaboutontext-speiqualitativeinuenes. In Proeedings of the 21st Conferene on Uner-tainty in Artiial Intelligene, Arlington, Vir-ginia,2005.

[4℄ G.FreemanandJ.Q.Smith. Bayesianmodel se-letionofChainEventGraphs. ResearhReport, CRiSM,2009.

tionof the Dirihlet distribution throughGlobal and Loal independene. Annals of Statistis, 25:1344{1369,1997.

[6℄ D. Hekerman. A tutorial on Learning with Bayesian Networks. In M. I. Jordan, editor, Learning in Graphial Models, pages 301{354. MITPress,1998.

[7℄ D. Hekerman, D. Geiger, and D. Chikering. LearningBayesianNetworks:Theombinationof knowledgeand statistial data. Mahine Learn-ing,20:197{243,1995.

[8℄ D. PooleandN.L.Zhang. Exploitingontextual independene in probabilisti inferene. Jour-nalofArtiialIntelligeneResearh,18:263{313, 2003.

[9℄ C.Riggelsen. LearningBayesianNetwork param-etersfrominompletedatausingimportane sam-pling. InProeedings of the2ndEuropean Work-shop on Probabilisti Graphial Models, pages 169{176,Leiden,2004.

[10℄ G. Shafer. The Art of Causal Conjeture. MIT Press,1996.

[11℄ T. Silander, P. Kontkanen, and P. Myllymaki. OnthesensitivityoftheMAPBayesianNetwork struturetotheequivalentsamplesizeparameter. InProeedings of the 23rd Conferene on Uner-taintyin Artiial Intelligene,Vanouver,2007. [12℄ J.Q.SmithandP.E.Anderson. Conditional in-dependene and Chain Event Graphs. Artiial Intelligene,172:42{68,2008.

[13℄ J. Q. Smith, E. M. Riomagno, and P. A. Thwaites. Causal analysis with Chain Event Graphs. Submitted to Artiial Intelligene, 2009.

[14℄ P. A. Thwaites, J. Q. Smith, and R. G. Cowell. Propagationusing Chain EventGraphs. In Pro-eedingsofthe24thConfereneonUnertaintyin ArtiialIntelligene, Helsinki,2008.