6.4 An Adaptive and Dynamic Multi-Objective Genetic Algorithm for QoS-aware Service
6.4.6 Tuning Possibilities
The efficiency of a GA (for a given problem size, encoding, and set of parameters) depends on the implementation of the genetic operators and on the resources on which the algorithm is executed. In order to speed up existing GA, several authors studied parallelization methods and proposed taxonomies of parallel GA models and implementations (e.g., [2, 21, 67]). Nowostawski and Poli [67] point out three main motivations to parallelize a GA:
• Some problems require a very large population that is highly memory-consuming and thus makes it impossible to efficiently run a sequential genetic algorithm on a single machine. • Sequential genetic algorithms can get trapped in certain regions of the search space, and thus
in local optima. Parallel algorithms however can search different regions in parallel and thus reduce the risk to return local optima.
• For complex problems, fitness evaluation can be very time-consuming. As the fitness has to be computed once for each individual, it is a natural candidate for parallel processing. There exist several approaches for parallelizing a GA. The three main classes are briefly discussed in the following:
1. Distributed fitness evaluation (also called master-slave model) is a functional decomposition model [21], i.e., independent tasks are run concurrently, on multiple processors. The evaluation of the fitness for an individual only requires knowledge about the individual, and no global information on the level of the entire population. Therefore, fitness evaluation is a natural candidate for parallelization. It can be parallelized using a master-slave model. A master pro- cess distributes the evaluation of individuals to n slave processes and waits for them to return the computed fitness value. In a synchronous implementation, the master waits for all slaves
Inputs
Load Representation of deployments, links and their QoS, aggregation functions (cf. 4.3.3), and change records and optimization profiles (cf. 5.2.2)
Perform Change Analysis (cf. 5.2.3)
Determine optimization approach (cf. 6.4.4) Abstract WF connectionRegistry
Initialization method, mutation rate and
OP(t) Perform optimization (cf. 6.4) STAGING OPTIMIZATION
Set of Pareto solutions of last generation
DECISION AND EXECUTION
Choose one solution of the Pareto pool Compose the Deployments
Execute the Deployments acc. to abstract WF
LOGGING
Execution statistics Logging Component
LOGGING Log statistics about deployment execution
request get Analyze, aggregate, write into write into Persistent Memory PM
Central Registry Component
SERVICE CATALOG request get write/ update
Figure 6.5: Overview of the Optimization Process
to return before continuing with the next steps of the algorithm. If an asynchronous model is chosen, the master can proceed after a fraction of the slaves returned, without waiting for the slowest processors. This changes however the behavior of the overall algorithm and it cannot
6.4. AD-MOGA 123 safely be compared with its sequential equivalent. It has to be pointed out that the handling of the master and slaves processes introduces a communication overhead. Therefore, such a model is only profitable for problem settings with very complex fitness evaluation functions, in which the gain from parallel processing by far outweighs the introduced communication. 2. Island models constitute a data decomposition approach [21], i.e., the algorithm is concurrently
evolved on different subsets of data. Typically, the population is divided into several subpop- ulations (also called islands), each containing a subset of the individuals. The algorithm is run on each island and each individual only competes with the other individuals belonging to the same subpopulation [67]. In order to converge towards a global solution, individuals from time to time have to be exchanged between subpopulations. For this process referred to as migration [67], several parameters have to be introduced: the migration rate, migration inter- val, migration scheme (defining which individuals are migrated: the best, the worst, random ones, etc.) and the topology (defining if individuals can migrate to any other subpopulation or only neighboring ones). Depending on the number of subpopulations and individuals per subpopulation, coarse-grained (few subpopulations with many individuals) and fine-grained (many subpopulations with few individuals) models are distinguished.
3. Hybrid methods combine ideas from the above-mentioned and other parallelization models. For example, subpopulations can be overlapping. In this case, no migration is defined between subpopulations, but individuals in overlapping regions can evolve in all of them. The spatial structure then defines the interaction between individuals [67]. If there are many of those overlapping subpopulations, each containing very few individuals, the algorithm is said to be massively parallel [67]. The spatial structure in which the individuals are placed is commonly a two-dimensional grid, but can also be a hypercube, for example. The ideal case is to assign one individual to each processing element as disposal. This technique can be used on massively par- allel computers for very large problem settings requiring complex computations. Furthermore, subpopulations could be dynamic, self-adapting, etc., leading to new hybrid parallelization methods.
All of the above mentioned methods come with costs, be they related to the communication be- tween processes, or to the need for new parameters that have to be determined properly. Therefore, the gain from parallelization in general and a specific method in particular, has to be evaluated per algorithm and problem setting. Additionally, the choice is dependent on the available hard- ware and computer architectures, because each method is best applied to a specific architecture. For example, coarse-grained island models map well to a multiple-instruction, multiple data stream (MIMD) model [2, 67]. Regarding the computational gain obtained with parallelization, Alba and Tomassini [2] propose different definitions of speedup. They point out that a parallel GA has to be compared with its exact sequential counterpart under identical conditions, which is not straightfor- ward due to the stochastic nature of a GA. For example, the number of iterations studied in the sequential and the parallel case cannot be fixed because it can lead to solutions of different quali- ties. Rather, the stopping criterion of the parallel algorithm has to be such that the quality of the solutions is identical to the ones obtained with the sequential algorithm. Most of the parallelization methods will most likely not lead to a linear speedup function, but rather to a super-linear speedup
followed by a negative speedup for a number of processes higher than a certain threshold. This in turn has to be determined per algorithm, platform, and problem setting.
Although a first and very basic approach for parallelizing a GA for the QoS-aware service selection problem has been studied by the author and P.P. Beran in [13], it is argued that the rather small problem settings encountered in realistic scenarios and the relatively inexpensive fitness functions do not make parallelization a very promising direction. However, in the context of scalability studies, in which AD-MOGA is applied to very large (simulated) problem spaces, parallelization can be a future research direction.