Stage 4: D e ploym ent
5.5 Stage 2 Introduction
University
of Cape
Town
26
thesis which is presented in detail in Chapter Eight. Various BN learning research studies that can be upgraded or adapted to learn DBN models are discussed in the next section.
University
of Cape
Town
27
INDEX SALES LEVEL
PROMOTION STAFF ALLOCATION
PRODUCT PACKAGING
1. low no 4 poor
2. low yes 2 good
3. medium yes 5 poor
4. high no 8 poor
5. low yes 1 good
6. high no 3 good
. . . . .
. . . . .
n . . . .
Sales Level Product
Packaging
Staff Allocation
Promotion
A Bayesian Network Structure
Sampled Dataset
Figure 2.11: Learning Research Discovering Network Structures from Environments using Algorithms
Larranaga et al. [23] used a GA to generate variable orderings that are representations of chromosomes. The GA searches for a near-optimal ordering between variables. The variables evolved several networks during BN learning from complete data. They used genetic operators, such as rank selection to select parents for reproduction, crossover and mutation to generate new individuals. They used a K2 algorithm to estimate the quality of Bayesian structure from the variable ordering. K2 is a greedy heuristic algorithm that continually adds new nodes (or variables) to a current learning network until the node cannot maximize the probability score of the network. More information on heuristic algorithms is presented in Chapter Three.
Larranaga et al. showed in their experiments that a GA can discover good networks from data.
There was no clarity though, on how their results proved better than what they had addressed in the literature survey of their work. For future research, it was intended to use only genetic algorithms from the best variable ordering obtained.
William et al. [39] identified sensitivity to variable ordering as a shortcoming of most greedy score-based algorithms such as K2. Variable ordering is the combination of attributes to form nodes in Bayesian structure learning. When an algorithm is sensitive to the ordering of network variables (A, B, C, D), it implies that the emerged optimal network of input pattern {B, C, D, A} will probably be different
University
of Cape
Town
28
from input pattern {D, A, C, B}. They developed a genetic algorithm to mitigate this problem by searching the permutation space of variables using a fitness function. Specifically, William et al. used an order crossover (OX) to generate a population from six variable nodes. For instance, OX reproduces offspring:
} 1 , 4 , 2 , 6 , 3 , 5 { }
4 , 1 , 3 , 5 , 2 , 6
{ 2
1 = and O =
O
from parents:
} 6 , 2 , 3 , 5 , 1 , 4 { }
5 , 1 , 2 , 6 , 4 , 3
{ 2
1 = and P =
P
Also, they implemented mutation as swapping of number positions as shown. Moreover, they refer to P1, P2, O1 and O2 as individuals. However, they claimed that they have minimized part of the K2 limitations and therefore, proposed that their results have the potential to use the GA only to learn networks in future research.
It is interesting to note that a similar approach is used by Myers et al. [25] who also made representations of GA elements and operators such as gene, crossover, etc. They developed a variant of a genetic algorithm as described below. They illustrated structure learning with five variable nodes, where each node is represented as a gene and a network structure is represented as chromosomes. Myers et al.
used uniform crossover and mutation as add, delete and reverse operators that are commonly used in K2 algorithms. They illustrated with two parents simultaneously as follows:
Parents (P1 and P2) →
P1 = [{A} {B A} {C A} {D B C} {E C}], P2 = [{A B} {B} {C B} {D B} {E C D}]
offspring (O1 and O2) →
O1 = [{A} {B A} {C B} {D B C} {E C D}], O2 = [{A B} {B} {C A} {D B} {E C}]
These operators are used to modify the chromosome structure, but they are prone to cyclic structures. The improvement on what they used as genetic operators paves the way for future research.
Friedman et al. [24] proposed an algorithm by integrating a Markov Chain Monte Carlo (MCMC) procedure with variable ordering based on approximated posterior probabilities for scoring and finding
University
of Cape
Town
29
the network structures. This approach finds the posterior distributions on orders of variables rather than in network structures. They have two main ideas in their approach. They developed an expression to find the posterior probability of data, given a known order of network variables. An order of variables is defined as a total order on the index set of the variables [21]. In the second instance, they used the Monte Carlo sampling method to find the posterior probability over some sets of orders.
Friedman et al. estimate probabilities of some edge features from given orders of network variables. They sampled network structures from different orders of network variables by doing a random walk while they estimated the probabilities to investigate for the presence or absence of some other structural features in the structures. They repeated this process to return a representative final network that contained most of the features, and claim that their approach is reliable for learning Bayesian networks from data.
Koivisto et al. [21] developed two versions of exact algorithms for structure discovery in Bayesian Networks using several mathematical theorems and propositions. It was presented that learning optimal Bayesian networks from data is a computationally hard task, but infinite networks can quickly be generated from data. This served as motivation for the researchers, as deriving optimal networks from data has not been extensively studied.
A version of their algorithm uses an exact posterior probability to find a quick network, usually a DAG, from data. The second version of the algorithm focuses on finding optimal network structures from data. From their investigations, they came up with three observations, as follows: (i) it is necessary to consider all possible local dependency structures to derive an optimal network; (ii) for sufficiently small datasets with a small number of network variables, it is believed that exact algorithms can afford to exhaust a solution space to generate an optimal network structure; and (iii) using an exact algorithm on large datasets to find optimal networks takes almost super-exponential time. Koivisto et al. strongly urged that exact Bayesian learning algorithms be improved, so as to minimize computational complexity problems. These observations are therefore strong motivations for researchers in this field, that advanced frameworks will be required for all possible learning processes rather than relying completely on an algorithm. From Koivisto et al.’s contributions, they modified the exact algorithm to learn multiply connected larger networks better than Monte Carlo methods. They used strategy that suitably restricts a priori approach on the network structures during learning processes.
This is what makes researchers and practitioners stress that learning Bayesian networks from massive datasets is a computationally intensive problem which requires ongoing research [20] [23]. The intensity on the datasets implies that learning the two constituents of Bayesian networks (network structure and conditional probability tables (CPTs)) possibly comes to a halt after too much processor time has been taken. Some methods proposed to address this problem are presented in the next section.
University
of Cape
Town
30