Challenges and Technical Advances for CEG Model Selection

Chapter 4 Standard Bayesian CEG Model Selection

4.4 Challenges and Technical Advances for CEG Model Selection

Learning a BN corresponds to learning a restricted set of partitions which prevents us to explore the context-specific conditional independences and possible asym- metries in the development of a process. In contrast, the CEG model space is structurally more flexible. It is therefore more expressive in terms of graphical rep- resentation of conditional independences. However, these advantages come at a computational cost for CEG model selection because a CEG probability space with a moderate number of atoms is absolutely gigantic and dwarfs its corresponding BN model space by orders of magnitude.

Thus consider a Z_{-SCEG model space} C _{spanned by a set of}_{N, N} _≥₂_{, discrete}

random variables Z ₌_{_Z₁_{, Z}₂_{, . . . , Z}_N_}_{, where each random variable} _Z_n _has _L_n

finite number of categories. LetMn(I) be the number of situations associated with

the nth _variable _X

in in anZ-compatible event tree T(X(I)). Since inT(X(I))

every situation at distance k from the root situation s0 hasLk+1 children, it then

follows that Mn(I) = 1, if n = 1, and Mn(I) = Qnj=1−1Lij, if n = 2, . . . , N.

The total number of partitions of these situations is then given by the Mth n(I) Bell number BMn(I) = PMn(I)−1 i=0 !Mn(I)−1 i

Bi (Spivey, 2008). Recall that the Bell

numbers Bi grows very fast with i; for instance, B1 = 1, B2 = 2, B4 = 15, B8 = 4,140 and B16 ≈ 1010. Now remember that each partition constitutes a

different stage structure Un and so a distinct SCEG model corresponding to a

the SCEG model space is written down by |C_|₌X I∈I N Y n=1 BMn(I), (4.13)

where I is the set of all possible permutations I.

This implies that the complexity of this space grows exponentially in terms of Bell numbers and depends on not only the number of variables but also the number of categories that each variable has. Therefore, searching over the SCEG model space is enormously more challenging computationally than searching over its corresponding BN model space: there are far more partitions to explore.

For instance, consider a process defined by a set of four binary random variables whose order is known. Learning a BN model requires us to calculate only 15 (P4

i=12i−1) local scores whilst learning a SCEG model implies the computation of

4,158 (P4

i=1{B2i−1−1}) local scores. Note that in this simple example learning a

SCEG model demands the calculation of 277 times more local scores than learning a BN model and so it requires much more computational time and memory resource. Of course, the computation of CEG local scores can be abbreviated if we use the fact that these 4,158 scores are yielded by only 279 distinct sub-partition scores. Even in this case, we must compute 19 times more scores for learning a CEG model than for learning a BN model. On the other hand, this approach implies to spend more memory resource since we have to store the sub-partition scores. However, this extra memory cost is more than justified by computational time saving. For empirical studies about computational time required to learn a CEG model, see Silander and Leong (2013). Further discussion about computational cost associated with learning a CEG model can also be found in Cowell and Smith (2014).

Therefore, the dynamic programming search method quickly becomes infeasible as the number of random variables in Z _{increases to an even moderate size.}

In this case, heuristic search strategies such as the agglomerative clustering are needed to scale up the size of the SCEG model space to search over (Silander and Leong, 2013, Cowell and Smith, 2014). A promising fast approximation is

to embed the heuristic within the dynamic programming algorithm (Silander and Leong, 2013). Exploring this alternative, Silander and Leong (2013) were able to search over model space defined by up to 18 random variables in less than 10 minutes. Those authors showed empirically that the AHC approach performed better than K-mean clustering methods when they are used in conjunction with the dynamic programming model search. However the AHC algorithm is much slower.

To implement these approximations, I have note that it is necessary only to rewrite the functionBLS(Zi,Zk)used in Algorithm 7. Instead of looking at the scores of

all possible stage structures this function will now find the best stage partitionUn

associated with the variableXn in a setZk using the adopted heuristic algorithm.

During the modelling process the identification of a partial order for the variables in Z _{based on the domain information may enable modellers to reduce the com-}

putational complexities in these full search methods. Particularly, the definition of a block order as I propose in Definition 25 provides us with a well-ordered partition of Z_{. This enables us to greatly reduce the space of allowed models that}

the search needs to be carried out on. Thus, to find the highest scoring SCEG it suffices to look over the CEG model subspace constituted by those obtained by permuting the variables within each block B_b_,_b _{= 1, . . . , B}_.

Definition 25(Variable Block Order). Take a setZ ₌_{_Z₁_{, . . . , Z}_n_}_of_N _discrete

random variables. Ablock order ofZ _{is a partition}B _{= (}B₁_{, . . . ,}B_B₎_{, such that}

a Z₋_SCEG C(T(I)), where I = (i1, . . . , iN), has non-zero probability a priori if

and only if for any pair of variablesZin ∈Bb1 and Zin+1 ∈Bb2,n= 1, . . . , N−1,

we have that b1 ≤b2.

The Algorithm 10 that I developed implements this idea by adding a loop in the Algorithm 7 to control for the blocks. Note that the functionBLS and the other steps of the algorithm do not change. Also observe that the concept of block order and its corresponding algorithmic implementation constitute new developments for this thesis.

Algorithm 10: Find the best sink variables for every non-empty subset ofZ

consistent with a block ordering B.

Input: A complete data setD on a set ofN finite discrete variables Z, a block orderingB_{= (}B₁_{, . . . ,}B_B₎ _{and a parameter}_α_¯_.

Output: A set-indexed arraysinksthat for each subsetZl_⊂_Z _consistent

with the block ordering returns the sink variable for the highest scoring SCEG spanned byZl_.

1 l←0

2 for bin 1→B do 3 for kin 1→ |Bb|do

4 l←l+ 1

5 for Bk_b ⊂Bb such that|Bkb|=k do

6 Zl=Sb_j−₌₀1B_j∪Bk_b, where B₀ =∅ 7 scores[Zl]←0 8 sinks[Zl]← −1 9 for Zi ∈Bkb do 10 Zl(−1)←Zl\ {Z_i} 11 scoreL←BLS(Zi,Zl(−1)) +scores[Zl(−1)]

12 if sinks[Zl] =−1 orscoreL > scores[Zl]then 13 scores[Zl]←scoreL

14 sinks[Zl]←Z_i

15 returnsinks

Parallel computation is a good option that can speed up exhaustive model searches. I now briefly propose some original ways to implement this using the algorithms discussed previously. The key observation here is that the local scores QUn as-

sociated with a variable Zin at level ℓn−1 in the event tree can be independently

computed from the local scores of variables at other levels. The speed-up gain can be substantial especially for the last levels of large event trees. When a variable order is known, the loop over the sequence of variablesZ(I)(line 5 of Algorithm 5) can be directly parallelised. In the case of a full search without a variable order

the parallel programming can be easily implemented over the intra-level loop to find the best sink variables. This corresponds to parallelising the computation of the inner loop over the set of variables Z_n _{(line 5 of Algorithm 7). If we have a}

block order, parallel computation can then be introduced over the blocks (line 2 of Algorithm 10) and inside the blocks (line 7 of Algorithm 10).

In document The dynamic chain event graph (Page 109-113)