and linkage tree. We then describe three MBEAs: GA, EDA, and Gene-pool Op-timal Mixing Evolutionary Algorithm (GOMEA [12]). We will demonstrate how different EA variants can be created by mixing and matching an EA with different types of linkage model.
Since EAs are population-based optimization algorithms, the population size is one, if not the most, essential parameter. If the population size is too small, there might not be enough information (e.g., good partial solutions) in the popu-lation to synthesize the optimal solution. Also, small popupopu-lations generally have low diversity, and due to the selection pressure, EA solvers will then prematurely converge to suboptimal solutions. On the other hand, if the population size is too large, EA solvers might overly explore the search space, which leads to inefficient performance. Consequently, the allowed computing time budget might be used up before any good solutions are found. Efficiency is of great importance, especially in real-world applications like DNEP, where the evaluations of candidate solutions are (computationally) expensive. Proper population size setting is, therefore, a necessary condition for EAs to achieve good performance. However, the optimal population size depends on the size of the specific problem instance at hand, on the problem structure, and also, on the particular components of the EA solver being employed. For example, in [12], experimental results showed that the minimally re-quired population size for a solver to reliably solve a problem instance varies across problem sizes, problem structures, and working mechanisms of different EA variants.
Therefore, in practice, it is nearly impossible to determine the optimal population size before actually solving the problem. Although there exist several guidelines of parameter settings for each EA, their effectiveness is not guaranteed when carried from laboratory benchmarks to industrial optimization tasks, or from one real-world problem to another. Practitioners often have to run their chosen EA solver multi-ple times with different population sizes before (accidentally) getting a value that yields acceptable results. In this thesis, we will consider a population-sizing-free scheme that was originally proposed for the parameter-less GA [13]. This scheme has been shown to be an effective method to eliminate the requirement of tuning the population size for different population-based EAs. We also present how to use the scheme as a framework for running experiments and providing a fair comparison of the performance of different EAs in solving real-world problems, where neither the optimal solutions nor the optimal population sizes are known.
The remainder of this chapter is organized as follows. Section 2.2 describes three linkage models and the type of dependency structure that each can capture. Section 2.3 outlines three EA solvers and how they can be customized with different linkage models to create multiple EA variants. Section 2.4 presents the population-sizing-free scheme that can be used to make all EA variants proposed in this chapter parameter-less. Finally, Section 2.5 concludes the chapter.
2.2. Problem structures and linkage models
A key part that characterizes an evolutionary algorithm (EA) is its variation op-erators (VOs), i.e., how new solutions are generated. VOs can be model-based, meaning that the way variation is performed is governed by a (learnable) model.
Models of particular interest are linkage models, which are used to encode the link-age information of the optimization problem instance at hand. A linklink-age model often contains information about groups of inter-dependent decision variables, also known as linkage sets. Variables in the same linkage set are dependent on each other in the sense that when being considered together, they have a significant con-tribution to the quality of a solution [12]. These variables should thus be jointly considered when performing variation. Variation operators can, for instance, make use of the information in linkage models to juxtapose partial solutions from existing solutions to generate new solutions. A linkage model that matches correctly with the problem dependency structure is crucial for variation operators to efficiently mix and preserve good partial solutions (i.e., building blocks) in the population in order to efficiently create high-quality solutions. Problem-specific knowledge (PSK), if available, can be used to directly construct the linkage model of the problem. If such valuable PSK is not available, or difficult to transcribe into linkage information, linkage information can be inferred from the working population of EAs by linkage learning (LL) procedures [12], in which problem variables having some degree of dependency are identified. In the following we give a general definition of a linkage model, called the Family of Subset (FOS) model, and subsequently describe three variants that can be used in practice.
2.2.1. Family of subset linkage model
Let L = {1, 2, . . . , l} be the set of all l decision variables of the problem instance at hand. We use the Family of Subsets (FOS [12]) concept to encode different linkage models. A FOS, denoted F , consists of subsets of the set L, i.e., F = {F1, F2, . . . , F|F |} where Fi⊆ L, i ∈ {1, 2, . . . |F|}. Thus, F ⊆ P(L), i.e., F is a subset of the power set of L. Each subset Fi is a linkage set containing decision variables that exhibit some degree of dependency and should thus be jointly treated when performing variation. A FOS F is said to be complete if every problem variable is contained in at least one linkage set, i.e., ∀i ∈ L, ∃j ∈ {1, 2, . . . , |F |} : i ∈ Fj. The completeness property ensures that all problem variables are considered when performing variation following linkage sets in F . We here consider three complete FOS models: univariate, marginal product, and linkage tree.
2.2.2. Univariate model
The univariate factorization (UF) model contains only singleton sets. It therefore expresses the assumption that all problem variables are independent from each other, i.e., Fi= {i}, i ∈ L = {1, 2, . . . , l}. Because there is only one possible configuration, the univariate model does not require any linkage learning.
2.2.3. Marginal product model
The marginal product (MP) model is a partitioning of the set L of all problem variables, i.e.,S
Fi∈FFi= L and ∀Fi, Fj ∈ F, Fi6= Fj: Fi∩ Fj = ∅. Variables in the same linkage set have some degree of dependency while variables in differ-ent linkage sets are considered to be independdiffer-ent [12]. The well-known Extended Compact Genetic Algorithm (ECGA [14]) employs the MP as its linkage model.
2.2. Problem structures and linkage models
The MP model is often learned from a population P of n individuals using a greedy algorithm that optimizes a model scoring metric as follows. First, the FOS F assumes the univariate structure and is then scored by the chosen metric.
Next, all possible merges of two linkage sets Fi and Fj are tried out and the merge that improves the scoring metric the most is chosen. The merged linkage set is added into F , replacing the two constituent linkage sets, i.e., F ← (F \ {Fi, Fj})∪{Fi∪Fj}. This procedure continues until no merging event can improve the scoring metric anymore. The scoring metric employed by ECGA is named the Combined Complexity Criterion (CCC), which is the sum of the Compressed Population Complexity and the Model Complexity [14]. CCC needs to be minimized.
The Compressed Population Complexity (CPC) is the cost of representing the whole population of n individuals with FOS model F and is calculated as
CP C = n
|F |
X
i=1
H(XFi) (2.1)
where XFi is a set of random variables, in which each random variable Xk corre-sponds with a problem variable xk in the linkage set Fi. The entropy of a random variable measures the uncertainty associated with that random variable or, quan-titatively speaking, the number of bits required to describe that random variable [15]. H(XFi) is the joint entropy of the marginal distribution of XFi. We have
H(XFi) = − X
x∈Ω(X
F i)
P (XFi = x) log2P (XFi= x) (2.2)
where Ω(XFi) is the the sample space for random variables XFi, i.e., all possible string values of the problem variables in Fi when being jointly considered. The probability P (XFi = x) can be computed by counting the frequency of x in the population P. Because the entropy of a collection of random variables is less than or equal to the sum of individual entropies [15], minimizing CPC favors FOS models that contain large linkage sets. In contrast, minimizing the Model Complexity (MC) favors simpler FOS models that contain smaller linkage sets [16]. The MC is the cost of representing or storing FOS model F and is calculated as
M C = log2(n + 1)
|F |
X
i=1
(|Ω(XFi)| − 1) (2.3)
The greedy model building algorithm, which has a worst-case runtime O(nl3), needs to minimize the CCC metric. Instead of computing this scoring metric for the whole FOS F at every merging trial, the CCC metric improvement (i.e., CCC metric decrease) can be computed more efficiently considering only problem variables in the two involved linkage sets Fi and Fj [12] as
CCC(Fi, Fj) = n[H(XFi) + H(XFj) − H(XFi∪Fj)]+
log2(n + 1)[(|Ω(XFi)| − 1) + (|Ω(XFj)| − 1) − (|Ω(XFi∪Fj)| − 1)]
(2.4)
2.2.4. Linkage tree model
Because each configuration of the MP model is a partitioning of the set L, every problem variable can exist in only one linkage set. Therefore, any two variables are either dependent (if they are in the same linkage set) or independent (if they are in different linkage sets). In the LT model, a problem variable can exist in multiple linkage sets. The LT model is thus more expressive than the MP model in the sense that any two variables can be both dependent and independent. While the MP model has a flat structure, the LT model arranges its linkage sets in a hierarchical tree structure. The lowest level (i.e., leaf nodes) contains univariate linkage sets of each variable separately, i.e., Fi = {i}, i ∈ L = {1, 2, . . . , l}. Higher levels contain multivariate linkage sets, in which each linkage set Fi is formed by merging two mutually exclusive linkage sets Fj and Fkat lower levels, i.e., Fj∩ Fk = ∅, |Fj| <
|Fi|, |Fk| < |Fi| and Fj∪ Fk= Fi. The highest level (i.e., the root node) contains the set L itself. Thus, an LT can encode different levels of dependency, from the totally independent state in the leaf nodes to the all-dependent state in the root node [17]. An LT over a set of problem variables {1, 2, 3, 4, 5, 6, 7, 8} can be, e.g., {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {1, 3}, {2, 5}, {4, 6}, {1, 3, 7}, {4, 6, 8}, {2, 4, 5, 6, 8}, {1, 2, 3, 4, 5, 6, 7, 8}}, which is visualized as a binary tree with the root node in Figure 2.1. An LT model configuration for a set L of l problem variables is thus a FOS F of exactly 2l − 1 linkage sets.
1 3 7 2 5 8 4 6 1,2,3,4,5,6,7,8
2,4,5,6,8
4,6,8 2,5
4,6 1,3
1,3,7
Figure 2.1: An example linkage tree with the root node for a problem of eight decision variables.
The LT model can be learned from the population of n individuals by a hi-erarchical clustering procedure named the Unweighted Pair Group Method with Arithmetic Mean (UPGMA). The FOS F is built in a bottom-up manner. First, F is initialized with l univariate linkage sets Fi = {i}, i ∈ L = {1, 2, . . . , l}. Then, UPGMA iteratively merges the two linkage sets Fiand Fj that are the most simi-lar. The newly created linkage set Fi∪ Fj is added to the F . The two constituents linkage sets Fi and Fj are still kept in the LT F but they are not considered for merging anymore. The merging operations continue until the root node (i.e., the set L itself) is created. To calculate the similarity between two linkage sets Fiand Fj, we take the average of the mutual information (MI) over all pairs of problem variables (X, Y ) where X ∈ Fi and Y ∈ Fj [17] as follows