3. Neural Networks
4.2 Optimization Techniques
Global optimization algorithms imitating principles of nature have been very useful in recent times and have been applied in various domains. Such phenomena can be found in annealing processes, central nervous systems and biological evolution, which in turn have lead to the field of Evolutionary Computation (EC) [19].
Evolutionary computation includes; Genetic algorithms, evolutionary programming, evolution strategies, classifier systems, genetic programming and numerous other problem solving approaches that are based on biological observations. These observations date back to Charles Darwin’s ‘theory of evolution’, hence the term Evolutionary Algorithms [20].
This term ‘evolutionary algorithm’ refers to evolutionary processes used by computer-based problem solving systems. Evolutionary algorithms maintain a population of variables or structures, which evolve according to rules of selection, reproduction, recombination and mutation. Each individual in the population receives a measure of its fitness in its environment which is equivalent to its function value. Individuals with high fitness are chosen more readily for reproduction. Exploration across the domain is done by recombination and mutation by perturbing particular variables or individuals. Although simplistic from a biologist's viewpoint, these algorithms are sufficiently complex to provide robust and powerful adaptive search mechanisms [19].
50
4.2.1 Particle Swarm Optimization (PSO)
Particle Swarm Optimization (PSO) is an algorithm proposed by James Kennedy and R. C.
Eberhart in 1995 [21], motivated by social behavior of organisms such as bird flocking and fish schooling. PSO as an optimization tool provides a population-based search procedure in which individuals called particles change their position with time. This method is related to evolutionary programming and was discovered through simulation of the simplified social model, namely, the synchronized flocking of birds or the movement of a school of fish. In a PSO system, particles fly around in a multidimensional search space. During flight, each particle adjusts its position according to its own experience, and according to the experience of a neighboring particle, making use of the best position encountered by itself and its neighbor. Thus, a PSO system combines local search methods with global search methods, attempting to balance exploration and exploitation [22].
Thus, PSO is a technique used to globally optimize non-linear functions. Since, a school of fish profits from previous experiences and discoveries of each member during the search for food [23]. As with most global optimization techniques particle swarm initializes a population of points, called particles, and uses them to calculate their function values either by selecting them randomly or at equal spacing across the space. At each iteration, this method influences the update of each particle according to the following:
• Each particle’s best function value,
• The function value of its fittest neighbor,
• An element of randomness
51 This strategy is illustrated in Figure 4.1.
Figure 4-1 Pi = particle’s best position, Pg = neighbor’s best position, r = random direction, Pn = next move
Figure 4.1 shows how this method updates the particles, influenced by the above directions.
Algorithm:
1. Create random particle population and assign to them random positions, 2. Evaluate their function values at their current positions,
3. Choose a set of neighbors for each particle,
4. Update the particle population using the new direction, called the velocity, 5. Repeat until a stopping criterion is met.
The MLP and RBF network architectures were optimized using PSO, to find ‘Optimal Architecture’ MLP and RBF networks. The pseudo code for this technique is in Appendix A.
p
gp
ir x
Pn
52
4.2.2 Metropolis-Hastings Algorithm (MHA)
In mathematics and physics, the Metropolis-Hastings algorithm is an algorithm to generate a sequence of samples from the joint distribution of two or more variables. The purpose of such a sequence is to approximate the joint distribution. This algorithm is an example of a Markov chain Monte Carlo algorithm. The Metropolis-Hastings algorithm can draw samples from any probability distribution P(x), requiring only that the density can be calculated at x [24].
The key to the Metropolis-Hastings algorithm is to create a sampling strategy by which the probability of being in state a and moving to state b is the same as from b to a, subject to a few regularity conditions. This series of draws is accomplished by proposal and acceptance/rejection of candidate values x∗ [24].
Algorithm:
for j=1:N
draw x∗ from ( | )q x x∗ j , draw u U: (0,1)
if ( | ) ( )
( | ) ( )
j
j j
q x x f x u q x x f x
∗ ∗
< ∗
then xj+1= x∗ else xj+1= xj
53 The proposal of a candidate value x∗is conducted through a proposal function ( | )q x x∗ j the
form of which is quite arbitrary. In order to get a lot of x values, the values of x∗ should not be rejected too often. The arbitrariness of the proposal function ( | )q x x∗ and the lack of theory guiding our choice leave lots of room for experimentation [25]. The two functions that suggest themselves are the uniform and normal distributions [24]. The arbitrariness of the proposal function is supported in the acceptance/rejection step, which corrects for unlikely steps from x to xj ∗by accepting them only with the ratio of moving from one to the other.
4.2.3 Genetic Algorithm (GA)
The genetic algorithm is a model of machine learning, which derives its behavior from a metaphor of the processes of evolution in nature. This is done by the creation within a machine of a population of individuals represented by chromosomes. The individuals in the population then go through a process of evolution. Some variables or individuals are better or fitter than others. Those that are better are more likely to survive and regenerate their genetic material. Reproduction allows the creation of genetically radically different offspring that are still of the same general species. At the molecular level what occurs is that a pair of chromosomes bump into one another, exchange chunks of genetic information and drift apart. This is the recombination operation, which genetic algorithms generally refer to as crossover, because of the way that genetic material crosses over from one chromosome to another. Crossover takes place when the selection of who gets to mate is
54 made a function of the fitness of the individual. The genetic algorithm uses stochastic processes, but the result is distinctly non-random [19].
Firstly, an initial population of parent individuals (feasible solutions) is randomly created.
Each individual is represented by a chromosome, a string of characteristic genes. Secondly, all the individuals are ranked with a fitness function appropriate to the problem at hand.
The fittest of these, pass directly to the following generation; a process of ‘elitism’.
Thirdly, a breeding population is formed by selecting top-ranking individuals from those that remain. This is the natural selection step. Lastly, these selected individuals undergo certain transformation via genetic operators to reduce children for the next generation.
Operators include recombination by crossover and mutation (a randomly generated gene that is somehow altered). Mutation ensures a genetic diversity in the population [19]. This process is repeated for a certain number of generations, or until some stopping criterion is met.
Algorithm:
1. generate initial population 2. generate offspring:
• selection: probability of being accepted according to fitness
• crossover: parents are paired to generate offspring
• mutation: each string has a very small chance of being mutated
• selection/rejection: selection or rejection of the new generation according to some criterion
3. Repeat step 2 unit stopping criterion is met.
55
4.2.4 Genetic Algorithm with Metropolis-Hastings Algorithm
In the above algorithm the selection/rejection step of the new generated population is not specified, and many methods can be used to do this step. Metropolis-Hastings Algorithm has been adopted in this text to make the selection or rejection of the new generation in the genetic algorithm.
The Genetic Algorithm Metropolis-Hastings Algorithm (GAMHA) method was then used to manipulate the architectures in the individual MLP and RBF networks so as to optimize the committee network. The pseudo code for this technique is in Appendix A.
4.3 Conclusion
The importance of neural network architecture is emphasized again in this chapter. Two optimization strategies are introduced. These strategies will be applied to neural networks in Chapter 5 to find ‘optimal architecture’ networks. Particle swarm optimization is a recently developed algorithm for global optimization; its algorithm is derived from the synchronized flocking of birds during the hunt for food. Genetic algorithm is then introduced. This algorithm is based on the theory of evolution or specifically ‘survival of the fittest’. The Metropolis-Hastings Algorithm is used as the selection step in this modified genetic algorithm.
56