SCEG structure learning by dynamic programming

Chapter 4 Standard Bayesian CEG Model Selection

4.3 Exhaustive CEG Model Search

4.3.2 SCEG structure learning by dynamic programming

Take a process described by a set of N random variables Z ₌_{_Z₁_{, . . . , Z}_N_}_{. To}

search over the Z₋_{SCEG model space I will present the dynamic programming}

algorithm developed by Cowell and Smith (2014). Recall that a Z₋_{SCEG model}

is characterised by the following properties:

1. Any variable orderI = (i1, . . . , iN) is compatible with the representation of

the process using an event tree T(I).

2. For any variable order I, situations are at the same distance d from the root node in T(I) if and only if they corresponds to the same variable

Zid+1, id+1∈I, or they are all leaf nodes.

3. A stage only merges situations associated with the same variable.

stage.

Let H(I) = {H1(I), . . . ,HN(I)} be the hyper-stage structure when a variable

order I is adopted. The definition of a SCEG guarantees that the hyper-stage structure H(I) is a partition of the set of situations in T(I) such that each set

Hj(I), j = 1, . . . , N, gathers all situations in T(I)associated with a variable Zij

and so only them.

The additive modularity of the log posterior probability of a SCEG then guarantees that removing the last variableZi∗

N, i

∗

N ∈I∗, from the variable set does not change

the actual best variable order I∗ for the remaining N−1 variables. Explicitly, if

I∗ _{= (i}∗

1, . . . , i∗N)is the best variable order for a CEG model to represent a process

described by the variable set Z _then _I∗

N−1 = (i∗1, . . . , i∗N−1), IN∗−1 ⊂ I∗, is the

best variable order for a CEG model to express the subprocess corresponding to the variable set Z_\{_Z_i∗

N}.

Example 7 (Train Booking with three demographic variables). Return to the example of the train booking described in Section 3.1. Suppose that the decision maker wants to understand the interactions between only the demographic variables Country (C), Visit (V) and Age (A). So, the set Z ₌ _{_{A, C, V}_} _{spans six} Z₋_{compatible Event Trees given by the six possible different permutations of}

these variables.

From equation 4.10 we have that

Q(C) = 3 X

n=1

QUn(C), for all C∈C, (4.12) where U1, U2 and U3 are, respectively, the stage structures associated with variables A, C and V. The score QUi(C), i= 1,2,3, depends on the variable order. However each highest scored stage structureUn, n = 1,2,3,can be found indepen-

dently from each other given a known variable order. In particular, if the variable or- derZ(I∗_{) = (C, V, A)}_{provides the MAP SCEG}_C∗_{, then we have necessarily that}

the best variable order for the setZ2

={C, V} has to be Z2(I∗) = (C, V).

of finding the best variable order for N −1variables constitutes a subproblem of discovering the best variable order for N variables. Therefore, for every subset

Zk ₌_{_Z

i1, . . . , Zik} ⊂ Z, k = 1, . . . , N, we need to find the best sink variable Zi ∈Zk given that we have already found the best variable order for every subset

Zk−1 _⊂ _Zk

. Embedding this recursive structure into a dynamic programming algorithm enables us to search efficiently the whole SCEG model space.

The general algorithm for learning SCEGs is given in Algorithm 6. I will further explain each of its three steps below.

Algorithm 6:Find the best scoring SCEG when no variable order is specified

Input: A complete data setD on a set ofN finite discrete variables Z

and a parameterα¯.

Output: The best scoring SCEG found.

1 Discover the best sink variable for all2N non-empty subsets of Z. 2 Find the best variable order I∗ = (i1, . . . , iN).

3 Recover the highest scoring SCEG usingI∗.

Step 1: Discover the best sink variable

The Algorithm 7 is the most computationally intensive step of the general dynamic programming algorithm for Z₋_{SCEG model search. It begins by initialising two}

2N_{-size arrays} _scores _and _sinks _{where each element corresponds to a subset of}

Z_{. It then proceeds to determine the best sink variable of each non-empty subset}

ofZ_{by examining them in order of increasing size, starting with singleton subsets.}

For every variable Zn in a set Zk+1 it is necessary to calculate the local score of

the best staged tree spanned by the set of variables Zk_{∪ {}_Z

n} such that Zn is

the sink variable andZk ₌_Zk+1_{\ {}_Z

n}. To do this, the algorithm first requires a

local auxiliary variablescoreLand a functionBLS(Zn,Zk). The best local score

associated withZk _{has already been computed and store in} _score_{since the algo-}

rithm looks at subsets ordered by increasing size. So the function BLS(Zn,Zk)

with the sink variable Zn. Observe that this does not require the best variable

order of Zk_.

Algorithm 7:Find the best sink variables for every non-empty subset ofZ.

Input: A complete data setD on a set ofN finite discrete variables Z

and a parameterα¯.

Output: A set-indexed arraysinksthat for each subset Zk_⊂_Z _returns

the sink variable for the highest scoring CEG spanned byZk_.

1 for kin 1→n do

2 for Zk⊂Z such that |Zk|=kdo 3 scores[Zk]←0

4 sinks[Zk]← −1 5 for Zi ∈Zk do

6 Zk−1←Zk\ {Z_i}

7 scoreL←BLS(Zi,Zk−1) +scores[Zk−1]

8 if sinks[Zk] =−1or scoreL > scores[Zk]then 9 scores[Zk]←scoreL

10 sinks[Zk]←Z_i 11 returnsinks

Example 7 (Train Booking with three demographic variables - cont.). Return to Example 7. Now suppose that the decision maker asks his analyst to model that problem. The analyst has then decided to present the MAP SCEG C∗ _{to the}

decision maker. For this purpose, he uses the dynamic programming algorithm 6. In the first step the algorithm needs to find a triple (Zk_{, Z}

∗, q) for every set Zk _⊆ _Z_{, where} _Z

∗ is the best sink variable for Zk and q is the highest score

associated withZk_{. The algorithm starts from singleton subsets and so it obtains}

the following triples: ({A}, A, q1), ({V}, V, q2) and ({C}, C, q3).

Next the algorithm examines the sets of size two. For instance, take the set

Z2 ₌_{_{A, V}_}_{. To find the best sink variable for this subset it is necessary to} compare the highest scored stage tree associated with the variable order given by

Note that for the variable orderZ2(I1)we need to compute only the best scoreqa

associated withV since the score of A has already been computed previously and it is equal to q1. Analogously for the variable order Z2(I2) we have to find only the best score qb forA. Now assume that q1+qa < q2+qb. Then with regard to

the setZ2

the best sink variable is A and its highest score is q4 =q2+qb. Doing

similar computations for the other two subsets of size two the algorithm provides the following triples: ({A, V}, A, q4), ({A, C}, C, q5) and ({V, C}, V, q6).

To finalise step 1 the algorithm needs to search for the best sink variable in the setZ_{. Of course, there are three possible candidates:} _A_,_V _and _C_{. For example,}

take the variable A. In this case, we have to find the score of the MAP staged tree T(Z₎ _when _A _{is the sink variable. This corresponds only to computing the}

score qc of A and then adding it to the score q6 that was previously calculated. This is because the best variable order of a staged tree does not change if the sink variable is eliminated. Repeating the same procedure for the other two candidate variables we can obtain the score qd for V and qe for C. Now assume that

q6 +qc > q5 +qd > q4 +qe. It then follows that the algorithm stores the triple

({A, V, C}, A, q7), where q7 =q6 +qc.

Step 2: Find the best order of the best sinks

Now the algorithm 8 finds the best order of the best sink variables starting with the complete set Z_{. Assume that at iteration} _{k, k}₌_{N, . . . ,}₁_{, we have to determine}

the best sink variables for a set of variables Zk

lef t ⊆ Z. For this purpose, the

algorithm first recovers the best sink variableZik of Z

lef t from the indexed array

sinks. Next it removes Zik from Z

lef t and then begins the iteration k−1. The

variable Zik is stored in thek

th _{element of an} _n _{dimensional integer indexed array}

order of variables.

By carrying on these algorithmic iterations in decreasing order, it then follows that by the end of the algorithm the array order contains the variable order for the highest scoring SCEG. It is also straightforward to see that the root variable Zi1

Algorithm 8: Find the best variable order

Input: The set indexed array sinks.

Output: A integer-indexed array of the variable ordering for the highest scoring CEG.

1 Zk_{lef t}=Z

2 for i←nto1 do

3 order[i]←sinks[Zk_{lef t}] 4 Zk_{lef t}←Zk_{lef t}\ {order[i]} 5 returnorder

corresponds to the variable order[1] whilst the terminate variable Zin is stored

in order[n]. The computational complexity of this step is linear inn.

Example 7 (Train Booking with three demographic variables - cont.). In the second step the algorithm needs to find the best variable order. Using the triple calculated in the previous step, this task is now very simple. Starting with the full set Z _it

follows that the best sink variable isA. Next the algorithm identifies the best sink variable for the set Z2

lef t = Z\{A}. So the result is V since Z2lef t = {C, V}.

Finally the updated set Z1

lef t = Z2lef t\{V}= {C} is a singleton set and so this

step terminates. The best variable order is then given byZ(I∗_{) = (C, V, A)}_.

Step 3: Recover the highest scoring SCEG

Having found the best variable orderI∗_{, we have first to define its corresponding}

event tree T(Z(I∗₎₎ _{and hyper-stage structure} _H_{. Next it is necessary only to}

Algorithm 9: Recover the highest scoring SCEG

Input: A complete data setD on a set ofN finite discrete variables Z_{, a}

parameterα¯ and the best variable order I∗= (i1, . . . , iN).

Output: The best scoring SCEG found.

1 Define the event treeT(Z(I∗)).

2 Define the hyper-stage structureH forT(Z(I∗)), where each set Hn

corresponds to the variableZin.

3 Obtain the best SCEG using the Algorithm 5 with inputsD,T(Z(I∗)), H

andα¯.

This dynamic programming algorithm for CEG model search (Cowell and Smith, 2014) closely resembles the dynamic programming algorithm for BN learning (Si- lander and Myllymaki, 2006). The main difference is that in the dynamic programming algorithm for BN model selection there is a pre-processing step where all local scores are pre-computed. Therefore the MAP BN can be recovered quite directly and at little extra cost. This is because we do not need to run an algorithm given the best BN variable order to find the best parent configuration for each variable: the parent set associated with each variable is actually stored in memory by the algorithm. So instead of recalculating this quantity, the dynamic programming algorithm for CEG learning calculates the local scores as required and caches them. Despite the additional computational time required by the third step to recover the best CEG, the three-step approach adopted for CEG model search is justified because of its much reduced memory cost and also its computational simplicity. The SCEG model space is far larger than the BN model space. So there are many more partitions in it whose scores should need to be computed and stored. In the BN framework a local score for a variable Zk is calculated based on an unordered

set of its parents. This implies that given a variableZkand a subset of variablesZk

storing the best set of parents for Zk in Zk together with the best local store is

computationally cheap. The same observation does not hold for CEGs because whilst the variable order of Zk _{does not change the score} _Q

Uk associated with

orders permute the leaf nodes of the event tree spanned byZk_{∪ {}_Z

k}, where Zk

is the last variable. Therefore, a fast recovery of the MAP SCEG would require us to store the best stage configuration for every pair (Zk,Zk), where Zk is a

possible ordered sequence of Zk_{. This would add a complexity of factorial order}

in the algorithm and so can often exceed the cost of running the Algorithm 5 to recover the MAP SCEG.

4.4 Challenges and Technical Advances for CEG Model Se-

In document The dynamic chain event graph (Page 102-109)