Automatic Induction of Machine Code Genetic Programming (AIM-

2.2 GP on Small Robotic Platforms

2.2.3 Automatic Induction of Machine Code Genetic Programming (AIM-

Automatic Induction of Machine Code - Genetic Programming (AIM-GP)[96,97] (formerly

AIM-GP

Compiling Genetic Programming System or CGPS [90]) performs the direct evolution of the machine code instructions which will be executed by the processor. This differs to most GP implementations which typically operate on higher level virtual machine instructions. An obvious advantage of this approach is the exceptionally compact representation of (LGP) programs and the fast execution of the code since the CPU directly executes the

instructions rather than a virtual machine interpreting instructions. The code representation (machine code) is cleverly encapsulated within a standard C function as a casted string. The simplicity of the LGP representation3_{allows the GP engine to be implemented}

with a small (32kB) memory footprint.

AIM-GP was first implemented on a PC (Sun-SPARC) and benchmarked on a clas- sification task (determining if a presented Swedish word was a noun or not) and shown to outperform NN approaches [91]. Following this, the approach was then applied to the more ambitious task of evolving desirable behaviours for miniature, computationally constrained robots, in real-time and within the real-world environment (i.e. in situ)

The Khepera robots (c.f. [18] for a detailed description of the more recent Khepera platform) were used for numerous experiments. This mobile robot platform offered 8 range sensors on a small (6 cm wide x 5 cm heigh) cylindrical robot with 2 wheels, each with their own controller motor. The robots were placed in various irregular environments (90 cm x 70 cm typically) with configurations having walls, dead ends and obstacles. The aim was then to develop logic to enable the robots to autonomously travel fast and straight while avoiding collisions with walls, obstacles and other robots [92,93]. Subse- quent experiments were aimed at more complex tasks such as seeking/following objects and locations (defined by darkness) [10,95,96,98] which implicitly required the lower level functionality of obstacle avoidance.

Various objective functions were supplied and the fitness calculated and fed back after a short 400 ms delay. The robots (whether simulated, tethered to a PC or fully au- tonomous) maintained a population of (50) programs each of which evolves the functional mapping of sensor values to output motor speeds (including reverse speeds). A few, typically 4, elite programs are selected from the population and were evaluated (400 ms after executing the logic) and then subject to a tournament selection which allows the best performing 2 programs to breed with their offspring replacing the 2 worst performing programs in the pool. Interestingly, it was observed that the sequential nature of evaluation could induce a motivation for programs to place the robot in a worse state than before a

3_{In this LGP implementation, each line of code is defined by the assignment of output which is the result}

program is run in order to sabotage other programs. However, it was not clear how much detriment this competitive behaviour caused to the collective system performance. As is common in scenarios where GP is employed, careful construction of the fitness functions was needed to elicit evolution of desirable behaviours. For example negative fitness was attributed to the sensor values (larger implies closer to objects, hence more likely to col- lide) which had to be balanced with a positive fitness for moving straight and fast. Without the correct biasing of the positive and negative feedback the robot would simply, and un- interestingly, not move at all. Nonetheless, the approach did evolve complex behaviours, such as backing up after collisions, and ultimately, mostly avoiding collisions altogether. It was reported to take about 200-300 generations (equivalent) to converge to a population where collisions were infrequent.

A later enhancement [94–96,98] of the system incorporated an "event memory" which recorded the fitness achieved for a particular action taken given various sensor inputs. A second simultaneous GP process was then employed to learn a fitness prediction function that predicts the fitness of an action given the current sensor values. This process uses the current "event memory" table as an input training data set where the differences be- tween the predicted and actual fitnesses experienced are treated as an error that should be minimised by symbolic regression. Interestingly, similar approaches by others [109] coevolving NN as fitness predictors have also been shown to significantly improve the time to evolve to a good solution by reducing the number of candidate solutions needing to be evaluated in the real world.

The incorporation of the fitness predictor allowed the removal of the 300-400ms delay used to provide fitness feedback for each program. While the idle delay was removed, the fitness for a given program is still assigned 500 ms after the program is executed. Because the rate of evaluation of programs increased by 2000 fold, the size of the "event memory" and subsequently the program population, needed to be significantly increased (to 10 000). The limited RAM (256 kB) permits a small population to be employed causing less robust behaviour and more frequently getting stuck in local optima. This was demonstrated on the tethered experiments to dramatically reduce the time required to achieve good system performance (down to about 1.5 minutes from originally 40-60 minutes).

An interesting discussion about the need for a "childhood" period was presented based on experience that when the system needed exploratory motivation early in evolution to achieve good performance later. While some of this occurs intrinsically due to the random initial population, they did additionally introduce stochastic behaviours (noise) early on to assist the exploration. It was observed that when the system was exposed to more diverse situations in the early period, that this generally led to better behaviours later providing there was a bias to retain earlier experiences in the "events table". It is possible that providing the "events table" has a good representation of significant scenarios (i.e. the problem space) that it does not matter when the experiences are acquired.

Further investigation is needed to understand the dynamics of the population diversity at different stages of evolution and how this affects performance. Of particular interest would be how this impacts on the plasticity of the system to adapt to new environments or changed objectives. There is also no explicit analysis of meeting an "acceptable" performance, however the system clearly demonstrates "desirable" behaviour in a relatively short period. The approach does not leverage any parallel mechanisms of evolution (i.e. no Island-model sharing of programs or events), however it is likely that it could benefit from their application.

This research demonstrated the ability to learn desirable robot behaviours online and in situ using simple LGP programs generated by theAIM-GP approach combined with a coevolved fitness predictor. It is one of the few examples where GP has been deployed on a computationally constrained platform and performed online evolution in non-simulated environments. As such, the AIM-GP approach provides rich stimulus for the research of this thesis.

In document In situ Distributed Genetic Programming: An Online Learning Framework for Resource Constrained Networked Devices (Page 52-55)