C
ONTINUOUS OPTIMISATION ALGORITHMS ADDRESS MANY PROBLEMSin a variety of areas such as image compression [109, 110], improvement of manufacturing processes [5], structural design [72,9], scheduling problems (notably, Job-shop scheduling problems) [143], cryptanalysis [208], object recognition and clustering [259], economics (notably the Load Dispatch problem) [302,79,2], antenna design [34,14], spring-mass systems [53] and more. Each of these problem domains do not necessarily use the same set of algorithms. There are many to choose from, and some are more suitable than others for certain tasks.
The field of numerical optimisation in general is vast, and includes a wealth of gradient-based methods, stochastic algorithms such as evolutionary optimisers, and many hybridised methods. In general, the problem of optimising a function with real-valued parameters is defined by Nocedal and Wright [199] as:
min x∈Rn f(x) ( ci(x) = 0 i∈ E ci(x)>0 i∈ I (3.1)
More specifically, this is the minimisation of a function with an input vectorxinRn, subject to a set of
constraints on each variable. E andIare simply integer sets containing the indices of constraint functions. Of course, in the case whereE = ∅ andI = ∅, the problem is unconstrained, which, while a special case, is a considerably different problem to solve. The methods to solve constrained and unconstrained optimisation problems are quite varied [292].
Focus in this chapter is given toglobaloptimisation (which is contained within numerical optimisation [292]) and deals with the task of finding aglobaloptimum for a certain known or unknown functionf(x)across the entire set ofRn. The search for a global optimum is made more difficult by the presence of local minima and local
maxima. These are simply the best solutions for a specificsubsetof the input parameters (solutions which are the best for alocalregion) but not the whole set of input parameters. These tend to cause algorithms to converge, in the same manner as it would for a global optimum.
Specific interest here is given tostochasticoptimisation usingEvolutionary Algorithms (EAs).EAsare attractive due to their inherent parallel natures, which can be exploited effectively in the same style as agent-based models. The greater subset of stochastic optimisers are sometimes known as metaheuristics [168] in recognition of the few assumptions made by these algorithms of the target problem. As demonstrated by Section3.5, this is also attractive because gradient information may be entirely missing.
The problem of finding the real-valued vectorxwhich minimisesf(x)has been successfully addressed by various derivative-free optimisers such as Genetic Algorithms [107], Simulated Annealing [145], Particle Swarm Optimisation [142,262], Firefly Algorithm [299], and other bio-inspired algorithms [298], as well as first-order optimisers such as Gradient Descent and its many variants [199]. It is important, however, not to disregard gradient-based (first-order) methods due to their use of derivatives. The value in using a derivative-free optimiser is the ability to treat a function as a “black-box”, which is particularly useful when gradient information is unavailable. For context, a brief introduction on first-order optimisation is provided here.
3.1. INTRODUCTION 39
3.1.1
First-order Optimisation
Algorithms such as Gradient Descent, Quasi-Newton and other variations termed gradient-based methods [199] are known as first-order optimisation algorithms due to their use of first-order derivatives. Assuming that gradient information is available, these algorithms typically follow a deterministic process to iteratively improve upon a solution vector using gradient information.
Gradient descent involves moving from a random location on a curve towards the global optimum by successive additions of0< α <1step-sized instantaneous gradients [168]. A simple example is an unknown functionf(x), with a known derivative functionf0(x). Assuming that there are no bounds on the values ofx(ie. anunconstrained problem),xcan be initialised to an arbitrary location (such asx= 0) and thenf0(x)can be iteratively added to it untilf0(x) = 0. At this point, the global optimum would have been found, assuming the function does not contain local minima (monotonically decreasing).
In stochastic optimisation, some randomisation is employed either by perturbing a single solution in space and accepting a better solution with a certain probability (such as Simulated Annealing [145]), or in a population of individuals where collective cooperation tends to accept better solutions with some inertia (perhaps better described as “scepticism”).
3.1.2
Stochastic Derivative-free Optimisation
Nocedal and Wright note that should gradient information not be available, it is often adequate to obtain an estimate using finite differencing [199]. They do concede that this is not always appropriate when there is the potential for noise. Generally, algorithms which do not rely on derivatives make very little assumptions about the problem at hand. These algorithms treat the optimisation function as a “black box” [168]. One or more candidate solution vectors are improved by using its corresponding value of optimisation function as a measure of “fitness”. It may also be computationally expensive to compute this fitness, which further demands that the optimiser make as few iterations as possible to minimise the number of evaluations performed. This is sometimes used as an additional measure of the effectiveness of an optimiser [299].
Derivative-free optimisers are categorised into those which are based on trajectory methods, and population- based methods [168]. Trajectory-based methods involve the use of a single candidate solution which is improved over time. Examples of these include the Hill-Climbing technique, which is a very simple technique conceptually similar to Gradient Descent. A more sophisticated example is Nelder and Mead’s Simplex method [196] and Kirkpatrick’s Simulated Annealing [145] algorithm. These methods evolve a certain candidate solution over time, the former being a generalised simplex, and the latter a point inn-dimensional space when applied to continuous optimisation problems.
Elegant nature-inspired algorithms such as theFirefly Algorithm (FA)[299] and theParticle Swarm Optimiser (PSO)[142] make use of firefly flashing behaviour and bird flocking behaviour respectively. While not strictly constrained to their natural counterparts, they still show their source of inspiration in their formulation. TheFAfor instance, uses a light decay function to degrade the perceived fitness of other fireflies based on the distance between them. ThePSOdeparts somewhat from its natural source of inspiration, but still maintains either a global or local “flock leader”, depending on the variant (gbestorlbest). These algorithms are unified by the term Evolutionary Algorithms, which accentuates their inspiration in some form from simplified natural phenomena [21] rather than specific usage of evolutionary and genetic phenomena.
There are several problems which plague the field of evolutionary algorithms, which are given with regard to the fitness landscape of the optimisation function by Weise and colleagues as [293]:
40 3. CONTINUOUS GLOBAL OPTIMISATION
1. Deceptiveness- Areas where gradient information is misleading.
2. Neutrality- A zero-gradient in ranges of the optimisation function.
3. Epistasis- Interdependency among parameters with respect to the objective function value.
4. Premature Convergence- A stagnation of the search algorithm before the global optimum is found.
5. Ruggedness- High variation in gradient information causing a “rugged” fitness landscape.
6. Noise- The presence of inaccuracies or stochasticity.
A function isdeceptiveshould it contain local optima. If it contains regions of zero or near-zero gradient, it is said to containneutrality.Epistasisoccurs when a function’s variables are dependent on each other in some fashion. Premature convergenceoccurs when an optimiser’s search stagnates around a local optimum, rather than a global optimum. A function isruggedif there is great variation in gradient data, and finally, a function may also benoisy if it is subject to Gaussian noise, or simply inaccuracies brought on by factors such as stochasticity.
These problems pertain to the function’s landscape. Weise et al. also discuss additional problems in the algorithmsthemselves, including overfitting and over-simplification [293]. Given all these difficulties, a number of algorithms have surfaced which perform differently in the presence of each of these. For instance, it has been suggested that a particle swarm optimiser adapted for use in combinatorial search spaces may be generally less effective than Genetic Programming [280].
Using the “No Free Lunch” theorems of Wolpert and Macready [294], it can be shown that the number of specialisedEAswill increase [293], due to the impossibility of one unifyingEAoutperforming every otherEAon every problem. Restricting the domain of anEAallows one to tailor an algorithm to specifically overcome the issues listed above to some extent. This argument was used by van den Bergh, who made the assumption that an algorithm can be designed to successfully outperform others in a specific subset of problems [285].
To be able to compare algorithms by how well they solve optimisation problems in the presence of specific issues is very important. Their stochastic nature also ensures that not every execution is precisely the same. It is therefore sometimes difficult to objectively compare these algorithms, but fortunately, there is a variety of test functions that have been designed for this purpose. As will be explained later, there are other means of comparison. The focus of this chapter is on the use of evolutionary algorithms, due to their ability to optimise functions which have no obvious or accessible analytical forms. Such functions may simply be the measurement of some quantity within a simulation of a fully constructed agent-based model.
3.1.3
Calibrating Agent-based Models
(I, p.11)As mentioned in Section1.6, calibrating the parameters of an agent-based model can be seen as an optimisation problem. A notable example of this was investigated by Calvez and Hutzler who used a Genetic Algorithm to optimise various aspects of the ant foraging model [28]. In that paper, the authors bring forth three key issues specifically relevant to this practice:
1. The choice of fitness function.
2. Random variation in fitness due to model stochasticity.