2.2 GP on Small Robotic Platforms
2.2.2 Genetic Evolution of PDL Processes
PDL (Process Description Language) [124] is a tool for representing behaviours as pro- PDL
cess networks. A process network generates behaviours by representing sub-cognitive actions, like forward motor speed and button pressed etc, as quantitive entities that are influenced by logic (often in feedback loops) which act to drive the action quantities over a number of iterations. PDL has been implemented in LISP, for simulation experiments, and in C for robots [123–125] to generate dynamic emergent behaviours (typically using "PC-like" computers). WhilePDLitself is not an EA approach, further explanation of how
PDLis used to generate behaviours is provided to aid understanding of how EA has been applied.
Program 2 PDL code for a process to achieve a desired forward speed behaviour.
void up_to_default_forward_tend (void) {
if (value(go_forward) < 10) add_value(go_forward,1); }
Program 3 PDL code for a reversing “reflex response” to front collisions.
void front_collision(void) { if ((value(bumper0) > 0) || (value(bumper1) > 0) || (value(bumper11) > 0) || (value(bumper2) > 0) || (value(bumper10)> 0)) add_value(go_forward,-80); }
Program 2 andProgram 3 are examples of behaviours using thePDL with the C lan- guage. It can be seen in Program 2 that the desired speed (10) is not simply set with go_forward = 10 but rather the motor speed quantity (go_forward) is iteratively incre- mented until the desired speed is reached. An interesting outcome of this representation
is that different behaviours can be included that influence the same action quantities, re- sulting in complex emergent behaviours with smoother transitions between action states.
Figure 2.2shows an example of multiple processes feed in to reveal an emergent function which in turn self-enforces change (positive or negative) at the low layer, which ultimately drives the emergent behaviour.
Controlled
Quantity
Stabilisation
Negative
Change
Positive
Change
Self
Enforcing
Emergent Functionality
Figure 2.2: An example PDL process configuration. Recreated from Fig. 3.11 in [124]. The
original figure description reads "Emergent functionality pattern observed in process networks. There are two opposing forces, possibly stabilised. The processes impact a controlled quantity that indirectly gives rise to emergent functionality. The positive process feeds on itself."
An example of this occurs with the inclusion of Program 3 which affects the same motor speed quantity by significantly decrementing it when a sensor detects collision i.e. add_value(go_forward,-80). SinceProgram 2will push the system to a limit of forward speed 10, whenProgram 3 subtracts 80, the system will suddenly reverse at speed -70. However, in the absence of further collisions detected from the front sensors, the system will decelerate but still travel in reverse for 70 iterations, before ramping up forward speed back to 10. Representing this same smooth behaviour with sequential logic would be significantly more complicated.
To achieve the additive parallel effect of process influences, thePDLarchitecture locks the most recent sensor readings then executes all the processes once, summing their effects on all action quantities, before updating the action quantities with the summed effects. After some predefined period has lapsed, new sensor readings are read in and the process repeated. This delay affects the rate that the sense-action loop is executed which significantly influences the resulting behaviours which are, by definition, a function of time.
The PDL architecture is extended to evolving behaviours on robots with online evo- lution in [123]. This is achieved by the introduction of a process selection mechanism, called Selectron and with a mutation operator. Selectron employs a population of PDL
processes, initially constructed with multiple instances of each process. In PDL, process influences are typically additive so multiple instances or “clones” of a process effectively reinforce the single process behaviour. The Selectron mechanism probabilistically clones or deletes processes based on their contribution (positive or negative) to the average “sat- isfaction” of the robot’s behaviour over some defined period. Note if the average “satisfac- tion” does not change over the period then a random increase or decrease in probability of keeping the process is performed to help overcome “dead lock” situations. The “satis- faction” quantity is also implemented as aPDLprocess and is periodically updated (every 10 cycles which corresponds to about 0.5 to 1 seconds).
The “satisfaction” quantity is similar to other “motivation” quantities (also PDL pro- cesses) which are used to implicitly guide the robot to perform useful behaviours. A good example of this is tying the robots propensity to travel to a charging station as an inverse square function to the remaining battery power. Thus, the lower the battery, the more the robot is “motivated” to seek charging.
This online learning mechanism was implemented on a small wheeled robot using a pocket PC and a LegoTechnics™ body and demonstrated rapid success in generating primitive behaviours. This occurs by the population composition evolving to contain only processes that positively contribute to “satisfaction”. Due to stochastic reasons however, sometimes (about 20 percent of cases), subpopulations of good strategies die out early in the evolution. If this occurs the system typically relies on mutations to regenerate useful
lost processes, however in some instances, the evolutionary process is able to evolve to a stable solution by balancing subpopulations of competing (different) processes rather than relying on the system to evolve a single stable process.
While the evolution of logic in [123] is not strictly GP (more like evolution of human seeded rule structures via occasional mutation), a more recent attempt to use GP on the PDL architecture has been performed by [113]. In their experiment, they attempt to evolve a controller for the “Back Up a Tractor-Trailer Truck” problem. Additional genetic operators are introduced to provide the syntactic richness that GP offers, however like the first attempt to solve this with GP [69], it is performed in a centralised offline manner using simulation and without consideration of evolving on an embedded platform. As such, [113] falls outside the scope of directly relevant work to this thesis though it does provide an insight as to how [123] could be extended to use GP.
Nonetheless, thePDLarchitecture has been shown as an elegant mechanism for pro- ducing dynamic emergent behaviours and has been demonstrated on PC-class platforms. Furthermore, an online learning approach was demonstrated by employing EA to bias the representation of PDL processes within a population in order to achieve desirable, though primitive, behaviours. However, similar toGPN, the logic is distributed as parallel process- ing elements internal to the agent with no logic shared between robots during evolution. As such,PDLis not considered a distributed evolutionary approach.