• No results found

4 Optimization methodology

4.2 Particle Swarm Optimization (PSO)

Particle Swarm Optimization is one of several methods for global optimization from the field of Computational Intelligence (CI). It was developed by Kennedy and Eberhart (1995) and is a population-based evolutionary computation algorithm for problem solving which simulates social behaviour in swarms. The advantages of PSO are its global search capability and the relatively small population of particles in comparison to other methods, for example Genetic Algorithms (GA) (Ujjin and Bentley 2003), which makes PSO a very fast optimization algorithm. Nevertheless, there are lots of other methods for global optimization, which can be separated into two basic groups of methods: deterministic and probabilistic methods, whereas the probabilistic methods can be subdivided into Monte Carlo algorithms and Evolutionary Computation (EC) algorithms (Figure 4-1).

4.2.1 Short survey of global optimization methods

Deterministic methods such as the simplex algorithm (Nelder 1965) and the quasi-newton method (Shanno 1970) are effective at finding local solutions in the search space, but cannot guarantee that the solution is the global optimum solution if a search space is multi-modal. Users need to be aware of this and decide whether a given local solution is sufficient or not. In practice if the search space is multi- dimensional and highly non-linear, a priori knowledge is necessary to select a good starting point near an optimum for deterministic methods to yield good results. Otherwise, they are likely to get stuck in local optima that are far away from the true optimum and consequently deliver poor optimization results. The advantage of deterministic methods is that they require a lot less computation time than probabilistic methods and are able to quickly give sufficient optimization results if appropriately initialized.

Figure 4-1: Overview of optimization methods derived from (Weise 2009)

In contrast, probabilistic methods possess the ability to globally search for an optimum, but they are not deterministic, which means that no guarantee can be given that the solution obtained is a global optimum. In general, probabilistic methods use a selection process, often random, in order to define the input samples within an allowed input space. Most of these methods follow the following pattern (Fishman 2008):

 Definition of the input space.

 Selection of a group of input samples in the input space at random or based on previous results.

 Evaluation of the input samples using a fitness function.  Combination of the fitness results into a set of interim results.

 Repetition of the previous steps until a pre-defined end criterion is reached.

Although, these basic steps are similar for all algorithms, they differ significantly in terms of the selection of input or update samples and in the way fitness results are used to converge to an optimum. A detailed description of most of the probabilistic algorithms can be found in Schwefel (1995) and Weise (2009).

For the optimization of the substrate feed at ABPs, PSO was chosen for global optimization due to the following reasons: (1) The ADM1 model is high dimensional and highly nonlinear, which requires the use of a probabilistic optimization method; (2) A study on the feasibility of global optimization methods conducted by Ebel (2009) shows that ES, GA and PSO are well-suited for this kind of optimization problems; (3) Optimization should require as few fitness function evaluations as possible because of the computationally intensive ADM1 model. For this case, PSO has proven to be faster in terms of convergence and more robust than GA (Wolf et. al 2008).

4.2.2 Introduction to PSO

PSO belongs to the field of Swarm Intelligence and works on the basis that a group of individuals, called particles, which communicate with each other as they move around, is able to find a global optimum in large, complex and highly nonlinear search spaces. In determining how to move, particles exchange information with their neighbours, thereby influencing their behaviour and eventually the movement of the whole swarm. This process allows a swarm to move towards the most interesting site in a search space, as information about interesting sites is slowly propagated to the whole swarm. Thus, in PSO, the behaviour of each particle in a swarm is governed by two basic principles: particle communication and particle movement.

Particle communication is controlled by the parameter KN, defined as the number of neighbouring

particles a particle is exchanging information with. To guarantee sufficient particle communication KN

has to be carefully selected. If it is too small the propagation of important information to all particles might take too long and if too large, particles might get stuck in a local optimum. The probability

) (t

Pr for a particle to be reached at least once after the t

th

run is described by formula 4.18 (Clerc 2006), where N is the number of particles and KN is the number of neighbours for information

exchange. ( ) 1 ( ) 1 1 t N K r P t N        (4.18)

As the probability increases quickly with t, even with a small number of neighbours, KN, information

propagation throughout the whole swarm can be rapid.

The movement of a PSO particle in a search space is defined in terms of its position vector x(t) and three parameter vectors:

Velocity (v): The speed at which the particle moves through the search space.

Personal best position (p): The best position a particle has currently found.

Global best position (g): The best position found by informants of a particle. Using these parameters a particle’s position and velocity are updated at the tth

1 1 2 2 ( 1) ( ) ( ) ( ( ) ( )) ( ( ) ( )) ( 1) ( ) ( ) t t t c t t c t t t t t             v v p x g x x x v (4.19)

The function ( ) t and the constants c and 1 c are parameters that determine the importance of the 2 three different vectors ( )v t , ( )ptx( )t and ( )gtx( )t , where ( )v t is the actual velocity of a particle,

( )t  ( )t

p x is the distance between the actual particle position x and the personal best position pand ( )t  ( )t

g x is the distance between x and g. ( ) t is a time varying inertia weight as introduced by Yuhui and Eberhart (1998) and represents the confidence of a particle in its direction of movement. Constants c and 1 c , so-called acceleration constants, represent the confidence of a particle in its 2

personal best position and its best reported global position, respectively. Furthermore, the influence of

p and g is manipulated by 1 and 2, which are random numbers between 0 and 1 and cause a

random oscillation of the particles around p and g. Using these mechanisms a swarm of particles moves through a search space looking for an optimal solution to a defined optimization problem. In a similar fashion to GAs, at each iteration all particles are evaluated using a fitness function and this information is used to update the current position, personal best position and global best position of each particle as can be seen in the following pseudo code.

swarm = createSwarm (nParticles);

while (End criteria) // e.g. Max Number of Iterations or expected fitness swarm = evaluate (swarm);

globalBest = getGlobalBest(swarm); for i = 1 to numberParticles localBest = getLocalBest(swarm(i)); swarm(i).velocity = updateVelocity(localBest,globalBest); swarm(i).position = updatePosition(swarm(i).velocity); end; end; return globalBest;

4.2.3 Literature Review on PSO

As previously mentioned PSO was developed by Kennedy and Eberhart (1995), who developed the algorithm based on the simulations of Stone and Reynolds (1987) and Heppner and Grenander (1990). The main aspect of that original PSO was that particles of a swarm were able to remember their personal best position and global best position in the search space, which is often referred to as autobiographical memory and publicized public knowledge. Kennedy and Eberhart first tested PSO on the well-known Schaffer f6 function, which was used as a benchmark for GA by Davis (1991). Three years later, Yuhui and Eberhart (1998) introduced an extension of the original PSO by implementing inertia weights to counter too fast convergence and improve exploration qualities of the swarm.

Without inertia weights, the original PSO behaves similar to a local search if initial particle positions are badly conditioned. It was found that values of 0.8<<1.2 deliver good performance results, which was confirmed later on by Carlisle and Dozer (2000) and Trelea (2003). Eberhart and Shi (2000) further investigated the use of inertia weights and came to the conclusion that a linearly decreasing  from 0.9 to 0.4 over the whole number of generations delivers good optimization results. A first performance comparison of PSO and GA was made by Angeline (1998). It was shown that PSO finds solutions near the optimum faster than GA, although exploitation is weaker. Due to the problems of too fast conversion of the original PSO, Clerc and Kennedy (2002) performed a detailed analysis of the inner workings of PSO and introduced constriction coefficients to control the dynamical characteristics of the swarm, namely exploration and exploitation. Three different ways of implementing constriction were presented and tested on test problems.

In 2002 and the following years several adapted PSO algorithms were developed. An overview of these methods can be found in Banks et. al (2007) and Poli et. al (2007). The most relevant are briefly mentioned in this paragraph. The Guaranteed Conversion PSO (GCPSO) was developed by Van Den Bergh and Engelbrecht (2002) and uses an adapted velocity update equation, which causes the best particle to perform a random search within a pre-set radius around g, which is defined by a scaling factor. The disadvantage of GCPSO is that some a priori knowledge about the search space is required to properly choose the scaling factor, whereas an advantage lies in the reduced swarm sizes, which significantly reduces the number of true function evaluations. Riget and Vesterstrøm (2002) found out that swarm stagnation is closely related to the diversity of the particles in the swarm. To assure a minimum of diversity, particles attract each other until a minimum diversity level is reached and then go into repulsion mode until they reach maximum diversity again. This PSO is called ARPSO (Attractive and Repulsive PSO). Another method to assure diversity and to prevent quick convergence was introduced by Silva et. al (2002) who implemented predator activity for the particle swarm by adding an additional particle, which is attracted to the best particle and drives off the rest of the swarm (PSPPO-Particle Swarm Predator Prey Optimization). The result of that approach is that particles are forced to explore other areas of the search space, which makes this method well-suited for highly nonlinear multimodal optimization problems. To allow particles to move faster through the search space, Kennedy (2003) developed a Gaussian PSO where the velocity vector is replaced by a random number around the mean of p and g with a Gaussian distribution. This allows particles to “teleport” to locations in the search space, which should allow for a quick exploration of the search space. On the contrary, Veeramachaneni et. al (2003) try to slow down convergence to increase exploration of the search space by introducing another velocity component which represents the influence of the fittest nearest neighbor particle. The method is known as Fitness-Distance-Ratio PSO (FDR-PSO). Another manipulation of the original PSO’s velocity vector is presented by Parsopoulos (2004). The constricted PSO is adapted, introducing two velocity vectors one depending on p and the other depending on g.

These two velocities are then combined to a unified velocity update, which should increase the explorative behavior of the swarm. One last interesting adaptation of the original PSO was developed by Kaewkamnerdpong and Bentley (2005) based on the principle that a particle’s perception of its surrounding particles is not optimal but sometimes incorrect or blurred. Thus, particles are no longer allowed to communicate directly but can only observe particles in their pre-set perceptive range. Within this range, information from particles that are closest is considered to be more accurate.

The configuration of PSO is also the topic of several publications, as the global search capability of the algorithm is significantly affected by swarm size, inertia weight and maximum particle speed. Therefore, Carlisle and Dozer (2001) provide guidelines for achieving a working, although not optimal PSO parameter set. Furthermore, Parsopoulos and Vrahatis (2002) developed a method to find optimal initial particle positions using the Nelder Mead Simplex algorithm (NMS) (Nelder 1965). Results show that apart from the additional computational effort for NMS, the improved initial positions may speed up the global search significantly. Last but not least, Zhang et. al (2005) further investigated the guideline from Carlisle and Dozier and tried different parameter combinations on problems of high dimensionality. The results clearly state that greater swarm sizes are necessary to tackle high dimensional optimization problems.

All in all, it can be said that lots of different PSO algorithms exist that mostly address the well-known problems of premature convergence or the tradeoff between true function evaluations and full exploration of the search space. Nevertheless, the basic equations of the original PSO are still used in most of these PSO implementations (O'Reilly et. al 2005) and remain mostly unchanged. The high number of publications (18,40518) investigating PSO algorithms or using PSO for practical application shows that PSO is one of the most used Swarm Intelligence Optimization methods of our time.