2.6 Evolutionary algorithms
2.6.1 Algorithm adjuncts and concerns
Evolutionary algorithms, such as GAs, can be improved upon by a variety of further tactics (not discussed as yet) which have the purpose to in some way explore the search space more effectively or exploit good solutions in order to find better ones or other solutions in regions not populated as was discussed previously under diversity in performance.
Selection is the term used to describe which solutions are chosen to either derive new ones from (‘breed’) or which ones persist into the next generation, depending on the life-cycle scheme used. Goldberg (1989) described the Roulette Wheel approach in which the probability of a solution being selected (to breed with another) is proportional to its fitness, under stochastic choice. This was found to lead to genetic drift due to the bias of the procedure. The Tournament variation
4A class being a complex data type, in the sense of the object-oriented analysis and design
(Goldberg and Deb, 1991) chooses two (or more) solutions at random, compares them (for fitness5) and chooses the most fit for breeding. This has been shown to be less biased and good for parallel implementation. Gen and Cheng (2000) provide references to other possible methods of selection: the µ &λ methods are deterministic ones that select the best of (parents and offspring); truncation and
block are deterministic also, ranking in order of fitness and choosing the best, while the steady state and generational selections take the similar approach, choosing a subset of the population to replace, where the subset might be all, or replacing the worst n with n offspring.
Elitism, first proposed by De Jong(1975), acts as adjunct to selection, in that it archives the best solutions (or some subset of them) and allows (some of) them to be re-inserted into the population for comparison or breeding. This prevents these best solutions from being lost through normal action of genetic operators or random loss due to non-selection.
A variety of archiving schemes are used in different algorithms, having the common characteristic of maintaining some subset of best solutions, and differing in how solutions enter and exit the archive; exit might be by time (or generation number) expiry, or replacement by better solutions; entry might be governed by quality criteria or by archive size. For example, Knowles and Corne (2000) in their PAES algorithm, use a size-limited archive to which entry is gained by an offspring when it is better than its parent or when it is better than a solution in the archive in which case it replaces the one it is better than.
An alternative, one might say opposite, approach to elitism and archiving is that of restart (Fukunaga, 1998) in which one run is halted and instead a new one begun with a (probably) different starting population. In this case it is the multiplicity of runs which leads to the best solutions found, as more initial conditions are experienced and it is known that with stochastic algorithms, some starting populations lead to better results than others. The ‘new’ runs can of course be carried out in the same run, by discarding the final population (having
written it out to storage or archived it internally) and generating another starting population.
The time complexity of an optimisation problem can be exacerbated by objec- tive functions which themselves are computationally expensive, as is the case, for example, in computational fluid dynamics (CFD), in which a (very) large number of equations need to be solved numerically per function evaluation. Under such circumstances, an objective function might take many minutes, or even hours or days to calculate. Population based algorithms such as some Evolutionary ones suffer from this due to the large numbers of function evaluations (population size × number of generations) needing to be performed.
Surrogate models (Forresteret al., 2008) are one way to circumvent this prac- tical problem. These models, which might also be known as metamodels, once created act as alternative objective function sources, but being much cheaper com- putationally in use, perhaps orders of magnitude cheaper, through curve-fitting or response surface modelling depending upon the number of decision variables. As well as being less time complex, surrogate models also need a useful degree of accuracy.
The models are created as data in a bottom-up manner, through sampling of well considered data points (through a design of experiments) using the expensive computational route. The model is assessed at various quality points (such as noise, constraints, robustness), with data being replaced or additionally created as necessary for the validity of the final model. It is assumed that the (engineering) function behaves in a smooth continuous way, such that non-sampled data points can be calculated with requisite accuracy.
An alternative to a surrogate model is to use the Kriging method which acts as the surrogate, as created by Krige, and as discussed byForrester et al.(2008). In this approach, it is assumed that the observed responses (the objective functions) derive from a stochastic process even though they may arise from deterministic codes. A set of random vectors is generated from the sample data and a set of correlation values is created between them using the Kriging basis function, from
which a correlation matrix is created for all the sample and observed response data, along with a covariance matrix. The covariances then govern a Gaussian process modelling interpolated values for data points not in the sampled set. This method has been shown to be the best (linear) unbiased predictor of interpolated values.
Evolutionary algorithms have been applied successfully to a variety of multi- objective optimisation problems to various extents, where multi had tended to be between 2 objective functions (OFs) and up to around 4. Problems having more than this number of OFs are now commonly referred to asmany to highlight this boundary, above which success has been seen to diminish either in ability to converge or in providing diversity of solutions, or with requisite longer run times (Khareet al.,2003).
A study byHughes(2005) found that the hypervolume quality metric indicated that the Pareto ranking method tends to inhibit algorithm performance compared to those which use other than Pareto ranking methods. That study also found that while it is better to generate an entire Pareto set in one run where it is possible, splitting a many objective optimisation problem (MaOOP), especially for the higher dimensions, into a collection of single objective problems, works better (for convergence) than their Pareto ranking algorithm equivalent MOOPs, such as NSGA-II.
Knowles and Corne (2007) found that for MaOOPs with ten or more OFs, a purely random search could show better convergence than an MOEA. This seems to be largely because with more OFs, the more likely it is that a solution is non- dominated in at least one dimension thus reducing the Pareto ranking hierarchy. The study found that the mode rank shifted away from 1, tending to lead to middle ranks given a higher fitness than warranted, and selection pressure being reduced. They also found that MOEAs could be efficient for MaOOPs with 10 or more OFs, if part of the Pareto front could be discarded/ignored, through preference direction. They noted that investigation of degeneracy might be fruitful, in which several distinct decision vector points map to the same objective vector, where less
uniform degeneracy has a greater effect on possible out performance of random search.
With the above in mind, approaches based on the supplanting of Pareto dom- inance with alternative techniques have been variously proposed, as outlined by
Purshouse and Fleming (2007) in which they also use standard ranking and di- versity techniques to illustrate the concepts of dominance resistance and active diversity promotion. The work showed that techniques for promoting diversity, as well as being very influential on outcome, could be in some cases deleterious. As noted above, in higher dimensions it is more likely that solutions become non- dominated, leading to selection based solely upon the secondary diversity criterion, and they noted that even in an enumeration of search space, the proportion of non- dominated soltions could be very high, and that standard operators struggle to find newly dominant solutions. As Knowles and Corne (2007) found that prefer- ence direction could be way to enable MOEAs to work in higher dimensions, they highlight the potential merit of preferability operators and suggest investigation of non-Pareto dominance comparators such as using perfomance indicators e.g. hypervolume, as criteria.
TheIshibuchiet al.(2008) review gives an alternative dominance ranking that eliminates more solutions, thus increasing selection pressure, and a ranking scheme which takes account of the number of objectives by which one a solution may be better than another. Decreasing the dimensionality of objective space by bundling up objectives together may be possible, where a translation between domains is feasible.
Saxena and Deb (2007) showed that MOEAs could work well in higher di- mensions using an approach based on principal components analysis (PCA) with non-linear dimensionality reduction. This has been shown to be effective for prob- lems with up to 50 objectives. They suggest that their parameter ‘k‘ for their “MVU-PCA-NSGA-II” proposal, should take the value ‘k‘ = d√Me not only for the problems they tests, but for use in general (where M is the number of objec- tives.)