SANE IMPLEMENTATION - EVOLVING NEURAL NETWORKS OBJECTIVES OF CHAPTER

4 EVOLVING NEURAL NETWORKS OBJECTIVES OF CHAPTER

4.3 SANE IMPLEMENTATION

SANE evolves both a neuron and a network blueprint population. Each neuron in the hidden layer specifies a set of weights and connections to the input layer and output layer. Each genotype in the network population specifies a grouping of neurons to include in the network. The neuron evolution thus searches for effective partial solutions (i.e., neurons), whilst the blueprint evolution searches for effective combinations of these partial solutions (i.e., networks) (Moriarty & Miikkulainen, 1998).

Each neuron represents a hidden neuron in a 3-layer partially connected feed-forward network. Gene pairs in each neuron genotype encode an even number of connection- weight combinations. The first gene in a pair encodes a neuron connection and the second gene encodes the weight for that particular connection. Each connection gene is an integer value that ranges between zero and the total number of input and output nodes less one. Decoding a connection gene assigns a connection to either an input or an output node. Should the integer value in the connection field be less than the total number of input nodes, the connection is made to the corresponding input node number. Otherwise, the connection is made to the corresponding output node number. Connections are thus probabilistically assigned to either input or output nodes, based on the number ratio of input to output nodes. Each weight gene is a floating point value with a gaussian distribution around the mean 0, with a standard deviation of typically 2. Initially the connection and weight genes are randomly allocated to each neuron in the population (Moriarty & Miikkulainen, 1998).

Each genotype in the network blueprint population is comprised of a set of neuron pointers (i.e., address pointers) to neuron structures. The number of neurons in each network is fixed, depending on the complexity of the problem. Initially the neuron address pointers are assigned randomly to neuron structures (Moriarty & Miikkulainen, 1998).

SANE’s evolutionary generational algorithm operates in two main phases – evaluation and recombination. The macro flow chart of the SANE algorithm is illustrated in Figure 4-7 (Moriarty & Miikkulainen, 1998).

Evaluate blueprint network population fitness from interaction with environment Begin Rank blueprint network population by fitness Determine neuron fitness from participation in best 5 network Rank neuron population by fitness

Apply one point crossover to top 20% of neuron population Apply mutation operations to non- breeding neurons

Apply one point crossover to top 20% of network

population

Mutate the non- breeding network population Termination criteria satisfied? End Yes No Evaluation stage

Recombination for neuron population

Recombination for blueprint network population

4.3.1 Evaluation stage

The evaluation phase determines the fitness of each neuron and network in the populations. Network blueprints are evaluated based on reinforcement from direct interaction with either a dynamic simulation or real world environment (section 3.2). Figure 4-8 illustrates the process of evaluation for a single network genotype. The plant state is initially typically set to an initial condition that lies in the region around the desired set points. The dynamic state equations are solved, using fourth order Runga-Kutta, for a single sample period. The plant enters a new state, st, and the state variables and other inputs determine the neurocontroller's control action, at. The error from the desired state for the current sample period is calculated and stored for fitness calculation, rt, purposes. The control action will determine the solution of the state equations for the next sample period. The current plant state is evaluated to determine if a premature failure criterion has been triggered. Should premature failure occur, a specified maximum error is assigned to the remaining time steps (i.e., sample periods) of the trial. Should premature failure not occur, the described sequence of events is repeated, until the evaluation trial terminates. A fitness value is assigned to the blueprint network based on the criterion specified by the objective function.

Each neuron genotype contained within a blueprint network is assigned a fitness based on the summed fitness of the best five networks that the neuron participated in. Utilising only the best five networks prevents an average or "aging" neuron with several pointers from dominating more effective neuron discoveries that have few pointers (Moriarty & Miikkulainen, 1998).

Evaluation for simulated dynamic

systems

Set plant state from which to start the individuals evaluation

Solve state equations (Runga-Kutta) for a single sample period

Input current plant state to individual for calculation of control

action

Calculate error from set point (current sample period) for fitness calculation

Determine fitness function by integrating

the ITAE equation Premature failure criteria met? Last sample instance for evaluation?

Add penalty for premature failure End Yes Yes No No

4.3.2 Recombination for the neuron population

After evaluation, the neuron population and the network blueprint population are ranked based on the assigned fitness. For each neuron in the top 20 [%] of the neuron population (i.e., elite population), a mate is selected randomly from the neurons that have a higher fitness than that particular neuron. Thus, the neuron ranked 3rd may only reproduce with the neurons ranked 2nd and first. Two offspring neurons are created from a one-point crossover operator. One of the offspring neurons is randomly selected to enter the population. The other offspring neuron is replaced randomly by one of the parent neurons, after which the offspring neuron is inserted into the population. Copying one of the parent neurons as the second offspring, reduces the effect of adverse neuron mutation on the blueprint population. The two offspring replace the most ineffective neurons in the population according to rank. This replaces a number of the most ineffective neurons (i.e, double the number of elite neurons) in the population after each generation. No mutation operator is used on the elite or breeding neurons (Moriarty & Miikkulainen, 1998).

Only the non-elite (non-breeding) portion of the population partakes in neuron mutation. For connection genes a 2 [%] probability exists that a connection may be randomly reassigned to either an input or output node. For weight genes a mutation 4 [%] probability exists for a random gaussian weight adjustment and a 0.1 [%] probability for a weight sign inversion. The original weight is modified within a standard deviation of 1.0 (Moriarty & Miikkulainen, 1998).

This aggressive, elitist breeding strategy is normally not incorporated in neurocontrollers evolution, as this would generally lead to premature convergence of the population. As SANE provides for pressure against convergence, SANE performs well with this aggressive strategy (Moriarty & Miikkulainen, 1998).

4.3.3 Recombination for the network population

Crossover in the blueprint population results in the exchange of address pointers to the neuron population. Should a parent point to a specific neuron, one of its children will consequently also point to that particular neuron (Moriarty & Miikkulainen, 1998). To avoid convergence in the blueprint population, a twofold mutation strategy is incorporated. A 0.2 [%] probability exists that a pointer is reassigned to a random neuron in the neuron population. This promotes the use of neurons other than the neurons in the elite neuron population. A neuron that does not participate in any

blueprint evolution takes advantage of this knowledge, by reassigning breeding neuron pointers to offspring neuron pointers with a 50 [%] probability. These two mutation operators preserve neuron pointers in the top blueprints, by not mutating any breeding (elite) networks (Moriarty & Miikkulainen, 1998).

As pointers are occasionally reassigned to offspring neurons, new neuron structures are evaluated. As neuron pointers are also reassigned to exact copies of parent neurons, some resilience against adverse mutation at the neuron level is incorporated. If pointers were not reassigned to neuron copies, several blueprint networks may point to the same neuron. Any mutation in that neuron would consequently affect each network that points to it. This copy strategy limits possible adverse effects to only a few blueprint networks. This is similar to schema promotion in standard evolutionary algorithms. As evolution progresses, highly fit schemata (neurons) become more prevalent in the population. Mutation to one copy of the schemata should not affect other copies in the population (Moriarty & Miikkulainen, 1998).

In document A neurocontrol paradigm for intelligent process control using evolutionary reinforcement learning (Page 99-104)