Adaptive Behaviour and Neural Network Knowledge

Chapter 5: Design and Evaluation of an Evolution Model

5.3 Evaluation of Our Evolution Model

5.3.2 Adaptive Behaviour and Neural Network Knowledge

Our evolution model produces agents that contain innate, fixed knowledge that allows them to behave effectively in their environment. This is achieved through the

processes of selection and variation in the context of the population’s experience of the environment. As we have shown, this was effective in producing agents that can adapt to the environment demonstrating stable characteristics. However, after increasing the level of instability beyond a certain point, the model is considered ineffective in dealing with such environments, due to the model’s limited ability to adapt beyond a certain rate of change.

In order to demonstrate this further, consider firstly an effective solution in a stable environment.

The above figure (Figure 5-12) is a trace of an agent effectively navigating its environment in order to locate three food objects. The neural network knowledge required to produce such behaviour is shown in the following two figures.

Figure 5-12 A trace of a successfully evolved agent in a stable environment. Food Food Food Poison Agent Starting position Direction of movement

1 2 3 4

5

6

7

8

9

Input – 360° vision

Figure 5-13 A successfully evolved neural network responsible for mapping visual input regarding green objects (food in the stable environment) to an output action.

1

2

3

Output – actions

(rotate left, move forward, rotate right)

200°-240° 240°-280° 280°-320° 320°-360° 0°-40° 40°-80° 80°-120° 120°-160° 160°-200°

1 2 3 4

5

6

7

8

9

Input – 360° vision

Figure 5-14 A successfully evolved neural network responsible for mapping visual input regarding red objects (poison in the stable environment) to an output action.

1

2

3

Output – actions

(rotate left, move forward, rotate right)

Figure 5-13 and 5-14 represent the portions of a successfully evolved agent’s neural network responsible for mapping visual input regarding green and red objects (food and poison, respectively, in the stable environment) to an output action. Thus, 9 input neurons and 27 weights are devoted to each object with the bottom output neurons being shared. The chosen action, then, is influenced by the position of both green and red objects.

The colour and thickness of the connections represent their sign and strength. Red is negative (inhibitory) and green positive (excitatory). Their strength is also

numerically represented above each input neuron, the order of the values

corresponding to the index of the output connection (i.e., the first value above each input neuron corresponds to the strength of the connection to the first output neuron). The visual range that each input neuron is responsible for is also indicated above these values.

Using this information we can extract the agent’s knowledge responsible for its adaptive behaviour. The following figures were produced from this information and represent the agent’s actions that are most likely to occur, given the position of the objects relative to the agent.

1

2

3

4

5

6

7

8

9

Move Forward Rotate Right Rotate Right Rotate Left Rotate Left Rotate Left Rotate Left Rotate Left Rotate Right

Figure 5-15 The influence of the position of green objects (food in the stable environment) on the agent’s behaviour.

Figure 5-15 indicates how the agent comes to orientate itself towards green (food) objects. Note that regions 4 and 5 are directly ahead of the agent, between 320°and 40°. The above diagram indicates that our agent implements a strategy of moving the food object into region 4 before moving forward and, thereby, successfully locating it. That is, if the food object is generally to the right of that region, then the agent will rotate in that direction. Should the food object be to the left of region 4 then the agent rotates in that direction.

The following figure (Figure 5-16) represents the agent’s reactions to red (poison) objects. What is apparent from this diagram is that the agent demonstrates an aversion towards poison existing in positions directly ahead of it. If it should occur in region 4, the agent rotates to its right, bringing the poison towards region 3. Should the poison occur in region 5 then the agent performs a leftward rotation bringing it to region 6. What is also apparent from the mappings is that the agent demonstrates a preference towards poison existing in region 2, that is, to its left between 240° and 280°. This can be deduced since the forward movements and leftward rotation mappings of regions 6, 7, 8, 9 and 1 will bring the object to this position. Once it is in region 2, the move forward mappings will help keep it there.

1

2

3

4

5

6

7

8

9

Rotate Left Move Forward Move Forward Move Forward Rotate Left Rotate Left Move Forward /Rotate Right Rotate Left Rotate Right

Figure 5-16 The influence of the position of red objects (poison in the stable environment) on the agent’s behaviour.

The above description of the agent’s behaviour helps us to understand why it is successful in the stable environment, but why is the evolution model not successful in producing agents that are effective under increasing levels of instability?

The evolution model is limited in its ability to effectively adapt to such situations for the following reasons. The problem is of a difficult nature that requires two solutions containing opposite attributes. One solution entails producing behaviour to locate green objects while avoiding red objects. The other solution requires that the agents locate red objects while avoiding green objects. These two solutions, then, are of a conflicting nature, with the requirement that only one should be present in the agent to adapt effectively to the characteristics of the environment. Thus, as the characteristics of the environment change, after a fitness consequence reversal, it is required that the previously implemented solution be discarded for its opposite. This is a time

consuming process for the evolution model as it utilizes limited feedback at a

population or generation level to direct changes and, in addition, the changes are slow to implement since the solutions are of an opposite nature that do, however, utilize the same portions (memory) of an agent’s neural network. Thus, it is required that

previous solutions are unlearnt, as opposed to being set aside, while the new, appropriate solution is relearnt. In addition, since feedback only occurs at the end of an individual’s lifetime, the incorrect solution is implemented throughout the agent’s lifetime without any resistance (i.e., lifetime learning) until a sufficient number of generations have elapsed for it to be unlearnt. As a result, negative fitness scores are recorded until the previous, now incorrect, solution has been rendered ineffective.

This process is repeated each time a fitness consequence reversal occurs and, therefore, makes adaptation and assimilation of such changes difficult for evolution models to effectively deal with.

In document Design, evaluation and comparison of evolution and reinforcement learning models (Page 69-73)