Discussion - In situ Distributed Genetic Programming: An Online Learning Framework for Resource

3.6.1 Challenges of In Situ Evolution

In our solutions we were able to ensure that the system does not enter a state where it is impossible to evolve to an acceptable solution. Since the actuation, LEDs and radio,

cannot permanently alter the environment (including itself), an acceptable solution for these problems always exists and therefore the GA can always potentially find one of these solutions. In more general applications it will be important to design node hardware and software architectures such that IDGP programs are executed and evaluated in isolation from core functionality to ensure that evolved programs cannot hog critical resources or cause undesirable actuations.

This raises the issue of which problems are suitable targets for IDGP implementation as a topic for further research. In some cases it may be possible to impose constraints during evolution to ensure solutions stay within safe behaviour. However setting constraints inherently reduces the solution space that can be searched meaning that better solutions may exist that cannot be evaluated.

Evolving logic post deployment (in situ) has both opportunities and disadvantages. In this section we discuss some of the lessons learnt from the challenges faced when evolving logic in situ across a WSAN.

The evaluation of logic in situ avoids reliance on synthetic (simulated) models of the environment, and there is no better representation of the environment than the environment itself. Not only does this prevent the evolution of logic being made brittle by exploit- ing artefacts in the simulation, it also means that the mote does not need to perform any simulation at all which is a great benefit for resource-constrained devices.

A significant disadvantage of this approach however, is that there is only one sin- gle shared environment for all programs across all motes. It is impossible to reset the environment back to exactly the same initial conditions prior to the evaluation of every program. Hence, each program may leave a “footprint” which may help or hinder other programs and this generates a credit assignment problem which can slow or even prevent the system from converging to a specific fitness. Even more important is whether the environment could be changed in a way that permanently prevents the acceptable objective to be fulfilled. Unlike offline evolution, one cannot go back in time and trial another optimisation trajectory if the current one has failed.

3.6.2 The Performing-Learning Paradox

Offline learnt logic can deliver immediate high performance upon deployment, however due to unexpected changes in the environment, may not perform well over the longer term. Online learning, on the other hand, allows the system to adapt to unexpected changes, and so it should, in theory, be able to provide better performance than that of offline developed logic under unanticipated environmental conditions. Unfortunately, it requires time to learn how to achieve the acceptable performance and there is no guarantee that an acceptable solution will be found within an acceptable timeframe. Furthermore, learning implies that the system is not performing optimally.

Paradoxically, to achieve better performance, logic that is likely to perform worse than the current best known solution must be executed in order to provide a chance of discov- ering better performing solutions. To an external observer, one would see this as variation of performance over time. Implicitly, performance is dependant on the fitness as mea- sured within the observer’s critiquing period. Ideally then, the best current performing logic would be executed within this critiquing period, and learning (evaluating and executing other programs) would occur outside of this period. If performance is calculated continuously however, the average observed performance will match the pool fitness. Here, optimising the average (pool) performance is desirable.

For many problems satisficing is acceptable, however within that there is an additional class where additional learning is beneficial. However the rate of learning could be im- peded. Meet the need and maximise learning with available resources (slack). There is little disadvantage to utilise unused resources to maximise learning.

3.6.3 Challenges of Distributed Evolution

There are many challenges with evolving logic on devices simultaneously across a net- work. Perhaps the most important aspect is setting the fitness objective appropriately and supplying fitness information to the motes. Section 3.5 highlights the importance of this with the unexpected evolution of the “denial of service” strategy. The global (external) fitness feedback needed to be altered to a combination of strong positive reinforcement

with small, but frequent, self-provided negative reinforcement. We recommend this approach in systems where the framework and evolving code share the same communica- tions medium. Unfortunately identifying whether the problem representation is the main reason why a system becomes stuck in a local-minima is usually non-trivial.

In systems where motes maximise a purely local fitness, evolution is reasonably straight forward since it is essentially many motes evolving independently of each other. Motes can still benefit from sharing optional information (such as programs), and will often con- verge on the same solution for a given objective. With a high selective pressure acting on local environmental differences, speciation of logic will occur and motes will essentially revert to individual evolution as demonstrated in section 3.4.2. Not to be confused with niching which is a popular mechanism for deliberately divided a search space across nodes. Note this requires some level of coordination (whether implicit or explicit) [17].

However, when the fitness objective requires cooperation or feedback from other motes, the potential for information to be lost or misleading then arises. For example, in section 3.5, motes evaluate their programs asynchronously which can reward “leech programs” which rely on the prior program to perform well. With strong elitism in the population, such programs can easily dominate the pool. Randomisation of the order in which programs are executed will quickly penalise leech programs, but not until after significant damage to the pool fitness. Some form of fitness memory may be useful to reduce this effect.

The random order of programs (synchronised or not) causes a “macro-crossover” of programs, where various combinations of programs from different subpopulations are executed simultaneously. For example, Mote A may be executing its elite program while Mote B is executing a random program. For distributed evolution, the probability simultaneously executing programs that perform well together needs to be significantly high otherwise good fitness genetic material could be “forgotten” (since a combination of a good program executing with a poor program is likely to collectively result in poor performance). Clearly, biasing the population composition and/or evaluation schedule will be important towards achieving evolution of a good solution. As such, further investigation into this topic is recommended.

3.6.4 IDGP Configuration

The configuration of IDGP framework parameters impacts evolution performance (speed to find good solutions and quality of final solution). Design considerations, such as using homogeneous or heterogeneous nodes, are important to how the system will achieve its purpose. The size and composition of the population directly affects both evolution speed and performance, we require configurations that maximise performance within the size constraints imposed by the memory limitations of the device. For some objective functions, all acceptable solutions may be larger than the maximum program length LM ax,

in which case we either increase LM ax by reducing the number of programs Npop, we

increase the physical memory size, or we adopt a human coded solution which can afford to be up to Npop times the size of an IDGP program.

Richer syntax allows more complex problems to be generated and ultimately richer evolved behaviours. Additionally, higher level instructions allow more compact representation of logic. Ideally, IDGP programs should be as syntactically rich as possible in order to maximise the efficiency and performance of the solution programs. This will allow more programs (i.e. a larger population) to reside within the memory constraints of the platform, which in turn allows for greater genetic diversity to be maintained which will assist learning. Determining the optimal instruction set and program size for a particular objective remains an open issue for future investigation.

In this section we used a small instruction set of 3 instructions to demonstrate IDGP for simple well-defined objectives. Subsequent experiments were however performed using a larger instruction set of 13 instructions to solve the "PacketForwarder-RfmToLeds” objective and demonstrated similar evolutionary trajectories.

The NFL theorem states that no solution finding mechanism is better than others over all problems. IDGP is well suited to complex WSAN problems with no well known solution, or where adaptivity of logic is likely to be necessary. IDGP is not a solution for every WSAN problem, and we recommend the designer to estimate whether an alternative approach, such as human-crafted logic, could produce acceptable logic more efficiently.

In document In situ Distributed Genetic Programming: An Online Learning Framework for Resource Constrained Networked Devices (Page 118-123)