Chapter 4 Standard Bayesian CEG Model Selection
4.3 Exhaustive CEG Model Search
4.3.2 SCEG structure learning by dynamic programming
Take a process described by a set of N random variables Z ={Z1, . . . , ZN}. To
search over the Z−SCEG model space I will present the dynamic programming
algorithm developed by Cowell and Smith (2014). Recall that a Z−SCEG model
is characterised by the following properties:
1. Any variable orderI = (i1, . . . , iN) is compatible with the representation of
the process using an event tree T(I).
2. For any variable order I, situations are at the same distance d from the root node in T(I) if and only if they corresponds to the same variable
Zid+1, id+1∈I, or they are all leaf nodes.
3. A stage only merges situations associated with the same variable.
stage.
Let H(I) = {H1(I), . . . ,HN(I)} be the hyper-stage structure when a variable
order I is adopted. The definition of a SCEG guarantees that the hyper-stage structure H(I) is a partition of the set of situations in T(I) such that each set
Hj(I), j = 1, . . . , N, gathers all situations in T(I)associated with a variable Zij
and so only them.
The additive modularity of the log posterior probability of a SCEG then guarantees that removing the last variableZi∗
N, i
∗
N ∈I∗, from the variable set does not change
the actual best variable order I∗ for the remaining N−1 variables. Explicitly, if
I∗ = (i∗
1, . . . , i∗N)is the best variable order for a CEG model to represent a process
described by the variable set Z then I∗
N−1 = (i∗1, . . . , i∗N−1), IN∗−1 ⊂ I∗, is the
best variable order for a CEG model to express the subprocess corresponding to the variable set Z\{Zi∗
N}.
Example 7 (Train Booking with three demographic variables). Return to the ex- ample of the train booking described in Section 3.1. Suppose that the decision maker wants to understand the interactions between only the demographic vari- ables Country (C), Visit (V) and Age (A). So, the set Z = {A, C, V} spans six Z−compatible Event Trees given by the six possible different permutations of
these variables.
From equation 4.10 we have that
Q(C) = 3 X
n=1
QUn(C), for all C∈C, (4.12) where U1, U2 and U3 are, respectively, the stage structures associated with vari- ables A, C and V. The score QUi(C), i= 1,2,3, depends on the variable order. However each highest scored stage structureUn, n = 1,2,3,can be found indepen-
dently from each other given a known variable order. In particular, if the variable or- derZ(I∗) = (C, V, A)provides the MAP SCEGC∗, then we have necessarily that
the best variable order for the setZ2
={C, V} has to be Z2(I∗) = (C, V).
of finding the best variable order for N −1variables constitutes a subproblem of discovering the best variable order for N variables. Therefore, for every subset
Zk ={Z
i1, . . . , Zik} ⊂ Z, k = 1, . . . , N, we need to find the best sink variable Zi ∈Zk given that we have already found the best variable order for every subset
Zk−1 ⊂ Zk
. Embedding this recursive structure into a dynamic programming algorithm enables us to search efficiently the whole SCEG model space.
The general algorithm for learning SCEGs is given in Algorithm 6. I will further explain each of its three steps below.
Algorithm 6:Find the best scoring SCEG when no variable order is specified
Input: A complete data setD on a set ofN finite discrete variables Z
and a parameterα¯.
Output: The best scoring SCEG found.
1 Discover the best sink variable for all2N non-empty subsets of Z. 2 Find the best variable order I∗ = (i1, . . . , iN).
3 Recover the highest scoring SCEG usingI∗.
Step 1: Discover the best sink variable
The Algorithm 7 is the most computationally intensive step of the general dynamic programming algorithm for Z−SCEG model search. It begins by initialising two
2N-size arrays scores and sinks where each element corresponds to a subset of
Z. It then proceeds to determine the best sink variable of each non-empty subset
ofZby examining them in order of increasing size, starting with singleton subsets.
For every variable Zn in a set Zk+1 it is necessary to calculate the local score of
the best staged tree spanned by the set of variables Zk∪ {Z
n} such that Zn is
the sink variable andZk =Zk+1\ {Z
n}. To do this, the algorithm first requires a
local auxiliary variablescoreLand a functionBLS(Zn,Zk). The best local score
associated withZk has already been computed and store in scoresince the algo-
rithm looks at subsets ordered by increasing size. So the function BLS(Zn,Zk)
with the sink variable Zn. Observe that this does not require the best variable
order of Zk.
Algorithm 7:Find the best sink variables for every non-empty subset ofZ.
Input: A complete data setD on a set ofN finite discrete variables Z
and a parameterα¯.
Output: A set-indexed arraysinksthat for each subset Zk⊂Z returns
the sink variable for the highest scoring CEG spanned byZk.
1 for kin 1→n do
2 for Zk⊂Z such that |Zk|=kdo 3 scores[Zk]←0
4 sinks[Zk]← −1 5 for Zi ∈Zk do
6 Zk−1←Zk\ {Zi}
7 scoreL←BLS(Zi,Zk−1) +scores[Zk−1]
8 if sinks[Zk] =−1or scoreL > scores[Zk]then 9 scores[Zk]←scoreL
10 sinks[Zk]←Zi 11 returnsinks
Example 7 (Train Booking with three demographic variables - cont.). Return to Example 7. Now suppose that the decision maker asks his analyst to model that problem. The analyst has then decided to present the MAP SCEG C∗ to the
decision maker. For this purpose, he uses the dynamic programming algorithm 6. In the first step the algorithm needs to find a triple (Zk, Z
∗, q) for every set Zk ⊆ Z, where Z
∗ is the best sink variable for Zk and q is the highest score
associated withZk. The algorithm starts from singleton subsets and so it obtains
the following triples: ({A}, A, q1), ({V}, V, q2) and ({C}, C, q3).
Next the algorithm examines the sets of size two. For instance, take the set
Z2 ={A, V}. To find the best sink variable for this subset it is necessary to compare the highest scored stage tree associated with the variable order given by
Note that for the variable orderZ2(I1)we need to compute only the best scoreqa
associated withV since the score of A has already been computed previously and it is equal to q1. Analogously for the variable order Z2(I2) we have to find only the best score qb forA. Now assume that q1+qa < q2+qb. Then with regard to
the setZ2
the best sink variable is A and its highest score is q4 =q2+qb. Doing
similar computations for the other two subsets of size two the algorithm provides the following triples: ({A, V}, A, q4), ({A, C}, C, q5) and ({V, C}, V, q6).
To finalise step 1 the algorithm needs to search for the best sink variable in the setZ. Of course, there are three possible candidates: A,V and C. For example,
take the variable A. In this case, we have to find the score of the MAP staged tree T(Z) when A is the sink variable. This corresponds only to computing the
score qc of A and then adding it to the score q6 that was previously calculated. This is because the best variable order of a staged tree does not change if the sink variable is eliminated. Repeating the same procedure for the other two candidate variables we can obtain the score qd for V and qe for C. Now assume that
q6 +qc > q5 +qd > q4 +qe. It then follows that the algorithm stores the triple
({A, V, C}, A, q7), where q7 =q6 +qc.
Step 2: Find the best order of the best sinks
Now the algorithm 8 finds the best order of the best sink variables starting with the complete set Z. Assume that at iteration k, k=N, . . . ,1, we have to determine
the best sink variables for a set of variables Zk
lef t ⊆ Z. For this purpose, the
algorithm first recovers the best sink variableZik of Z
k
lef t from the indexed array
sinks. Next it removes Zik from Z
k
lef t and then begins the iteration k−1. The
variable Zik is stored in thek
th element of an n dimensional integer indexed array
order of variables.
By carrying on these algorithmic iterations in decreasing order, it then follows that by the end of the algorithm the array order contains the variable order for the highest scoring SCEG. It is also straightforward to see that the root variable Zi1
Algorithm 8: Find the best variable order
Input: The set indexed array sinks.
Output: A integer-indexed array of the variable ordering for the highest scoring CEG.
1 Zklef t=Z
2 for i←nto1 do
3 order[i]←sinks[Zklef t] 4 Zklef t←Zklef t\ {order[i]} 5 returnorder
corresponds to the variable order[1] whilst the terminate variable Zin is stored
in order[n]. The computational complexity of this step is linear inn.
Example 7 (Train Booking with three demographic variables - cont.). In the second step the algorithm needs to find the best variable order. Using the triple calculated in the previous step, this task is now very simple. Starting with the full set Z it
follows that the best sink variable isA. Next the algorithm identifies the best sink variable for the set Z2
lef t = Z\{A}. So the result is V since Z2lef t = {C, V}.
Finally the updated set Z1
lef t = Z2lef t\{V}= {C} is a singleton set and so this
step terminates. The best variable order is then given byZ(I∗) = (C, V, A).
Step 3: Recover the highest scoring SCEG
Having found the best variable orderI∗, we have first to define its corresponding
event tree T(Z(I∗)) and hyper-stage structure H. Next it is necessary only to
Algorithm 9: Recover the highest scoring SCEG
Input: A complete data setD on a set ofN finite discrete variables Z, a
parameterα¯ and the best variable order I∗= (i1, . . . , iN).
Output: The best scoring SCEG found.
1 Define the event treeT(Z(I∗)).
2 Define the hyper-stage structureH forT(Z(I∗)), where each set Hn
corresponds to the variableZin.
3 Obtain the best SCEG using the Algorithm 5 with inputsD,T(Z(I∗)), H
andα¯.
This dynamic programming algorithm for CEG model search (Cowell and Smith, 2014) closely resembles the dynamic programming algorithm for BN learning (Si- lander and Myllymaki, 2006). The main difference is that in the dynamic program- ming algorithm for BN model selection there is a pre-processing step where all local scores are pre-computed. Therefore the MAP BN can be recovered quite directly and at little extra cost. This is because we do not need to run an algorithm given the best BN variable order to find the best parent configuration for each variable: the parent set associated with each variable is actually stored in memory by the algorithm. So instead of recalculating this quantity, the dynamic programming al- gorithm for CEG learning calculates the local scores as required and caches them. Despite the additional computational time required by the third step to recover the best CEG, the three-step approach adopted for CEG model search is justified because of its much reduced memory cost and also its computational simplicity. The SCEG model space is far larger than the BN model space. So there are many more partitions in it whose scores should need to be computed and stored. In the BN framework a local score for a variable Zk is calculated based on an unordered
set of its parents. This implies that given a variableZkand a subset of variablesZk
storing the best set of parents for Zk in Zk together with the best local store is
computationally cheap. The same observation does not hold for CEGs because whilst the variable order of Zk does not change the score Q
Uk associated with
orders permute the leaf nodes of the event tree spanned byZk∪ {Z
k}, where Zk
is the last variable. Therefore, a fast recovery of the MAP SCEG would require us to store the best stage configuration for every pair (Zk,Zk), where Zk is a
possible ordered sequence of Zk. This would add a complexity of factorial order
in the algorithm and so can often exceed the cost of running the Algorithm 5 to recover the MAP SCEG.