Compact Program Representation - In situ Distributed Genetic Programming: An Online Learning Fr

3.2 Design

3.2.2 Compact Program Representation

This section provides considerations and recommendations for achieving compact program representation. Essentially, it is recommended that a virtual machineSection 3.2.2.1

be employed to abstract physical hardware and allow the execution of high level instruction in programs represented using prefix notation (Section 3.2.2.2). Additionally, program metadata is discussed in Section 3.2.2.3 and a novel program identifier presented as a useful means for fast, course approximation of diversity between programs.

3.2.2.1 Virtualisation of Hardware

As discussed in Section 3.2.1, achieving compact program representation is important. One method for achieving this is evolution of direct machine code as discussed inSection 2.2.3 since this is a one to one mapping with the instructions of the device architecture. However, the low level representation has drawbacks with management of the program execution. Significant care must be taken to ensure that the processor cannot execute

code that does not yield, accesses memory it shouldn’t or execute instruction that could halt or crash the device. Additionally, there may other (framework) services being ran on the microcontroller that should not be interfered with by the evolved logic. Furthermore, in networks of devices with different processors (which are becoming more common due to convergence of communications standards), direct machine code will not be able to be shared usefully to other devices. For these reasons it is recommended to“virtualise” the hardware so that programs are sandboxed and can be safely terminated and cannot access restricted operations or memory.

A process virtual machine (VM) offers a software mechanism to easily abstract the VM

sensing, computing and actuation capabilities of any platform. It can also provide a compact representation of complex logic by employing high-level instructions. Each VM instruction, which is usually represented by only a few bytes, can potentially map many native instructions to a single instruction. This means the number of bytes required to represent high level complex programs is likely to be far less than the equivalent machine code representation and furthermore such programs could run on different architectures. Additionally, a smaller number of high level instructions are typically easily understood than many low level instructions. There are two disadvantages with employing VMs however. Firstly, a VMs usually requires significant resources (in RAM and ROM and CPU utilisation). Secondly, using high level instructions potentially reduces the novelty of so- lution that will be achieved since these are likely to be used as building blocks for more complex behaviour rather than other novel solutions emerging from tabula rasa.

However since we aim to achieve complex behaviour on networks of devices (which may have different microcontrollers), we recommend a VM approach. A light-weight VM can largely be implemented with an “execute” function that takes the program and iterates over the instructions and executing the appropriate function calls that they map to.

3.2.2.2 Prefix Notation

Assuming a VM approach is adopted, the program representation needs to be decided. How the programs are likely to be used and manipulated should be taken into account.

Like any modern architecture, the native architecture size needs to be set. This dictates the size of each instruction, so ideally we wish to have a small size. Typically these range from 8-bit architectures, 16-bit, 32-bit and more recently on PC-class computers 64-bit architectures. 8-bit architectures are becoming less common, even for microcontrollers these are often too restrictive since it is more difficult to access memory locations larger than the native architecture bit-width. Nonetheless, a VM is not restricted since it can emulate wider-bit architectures. The choice of bit-architecture should reflect the maximum complexity expected for the problem but in general one should choose the smallest possible that will meet the needs of the problem at hand.

Keith and Martin [63] suggests that a modular representation is that of a genome interpreter that uses a prefix ordering scheme, general data support, a 2-byte node representation, and a jump-table mechanism. This is a clean and modular approach, though not necessarily the most efficient. We also recommend the prefix notation scheme (over tree-based style using node pointers), since this will ensure the program size remains small which impacts the number of programs that can be instantiated as well as the number of packets (or bandwidth) required to transmit a program. A further advantage of prefix notation is that it can be used to guarantee syntactically correct programs. This is possible since the instruction set and number and argument types are known for each instruction.

To avoid the complexity of memory management of variables, we recommend fixing the number of variables available by effectively reserving instruction numbers (or “Op codes”) for the variables to be made available. With a case statement implementation the size of each instruction and the number of parameters can be stored in a lookup table so that the program tree can be quickly traversed without the need to evaluate any instructions. This is particularly useful for conditional branching, generating a program listings or determining the size of the code (or parts therein).

The crossover operation for standard prefix notation if fairly complicated compared to LGP [148]. With prefix notion (or node representation) a branch needs to be identified that can be cut and another branch inserted. However this requires type-checking before the branches are swapped and after the swap the maximum tree depth may have exceeded.

With LGP on the other hand, if programs are the same number of lines, one simply cuts at the same randomly selected line and swaps the 2 portions to yield 2 offspring of exactly the same size as the parents. Thus, LGP can avoid the computationally expensive ma- chinery typical to tree-based representations [103] since the crossover point is between any line and connecting fragments is always valid.

Since we wish to employ tree-based representation however due to the benefit keeping complex nested functionality, we propose a hybrid light-weight representation that is a hybrid of the 2 schemes. This linear-tree-based-hybrid representation is effectively the same representation most programmers are accustomed to. An example of programs represented in this form is shown in Figure 3.1, as is the result of a crossover operation where different lines on the parent programs were used as the crossover point. The reason for this is that the number of instructions within a tree (the program is effectively a sequence of traditional tree-based programs) may differ. Therefore, ideally the crossover point in each program should roughly be before the line number where the total cumu- lative instructions up to that line number (summing all the trees up to the line number) added to the other program portion would exceed the maximum program size. Put simply, attempt to make the crossover points (line numbers) such that offspring do not exceed the maximum program size.

The instruction set will vary based on the platform’s capabilities, and by the capabil- ity needed to address the system objective. The choice of bit architecture will limit the number of instructions that can be enumerated, however this is typically not an issue. A set of low-level instructions will likely take longer to evolve useful or complex behaviours, however it is likely to evolve more novel and possibly more efficient behaviours due to not being seeded with high-level functionality. As stated before however, more low level instructions will be required to represent complex behaviours, which in turn requires larger program representation. For this reason, we recommend biases the instruction set with instructions representing complex functionality, however including some low level mathe- matical and logical instructions can often be enough to generate novel solutions.

Name Bytes Purpose

Program ID 4 Unique program identifier which embeds information on the distribution of functions within the program

Mutation Rate 2 Specifies the mutation rate during program genera- tion

Program Bytes 1 Specifies the length of the program in bytes Program 1 - LM ax The instructions ordered in prefix notation format

Table 3.1: IDGP Program metadata structure.

3.2.2.3 Program Metadata

In addition to the program instruction code (VM byte code), additional information (program metadata) may be desirable to have stored with the program. This is particularly useful for keeping any contextual information with a program if it is sent externally.

We propose as a minimal set of metadata that of Table 3.1. It is recommended that the program metadata block be placed before the byte code since when the full data is received, the metadata can be quickly parsed to ascertain how large the program is and potentially whether the program will be kept or not without even assessing the byte code. The proposed metadata block includes a program identifier which is preferably unique to every unique program. That is, if 2 programs are identical, then they will have the same ID and so one might wish to reject such a program since it offers no genetic diversity to the local population. Extending this further, we propose that this ID be generated based on the frequency of instructions within the program using a histogram representation. For example, if we use the instruction set available to the yellow program fragment in Figure 3.1 and treat the fragment as a program, then it would have the following frequencies of instructions as shown in Table 3.2. Note that no structural information is conveyed, however this could make a good extension to the ID.

The difference between IDs (sum of the absolute differences for each instruction count) can be used as a crude metric of program diversity. This implementation uses 4 bytes for the ID and is simply calculated with a single parse of the program. Thus this is a useful, compact and easily computed unique ID that can also be used for diversity calculations or simply ascertaining the distributions of terminals and functions.

Figure 3.1: An example of 2 programs using the linear-tree-based-hybrid representation (top) and how crossover and mutation can be applied to generate offspring.

An epigenetic metadata field is reserved for the mutation rate of programs when they act as parents. This epigenetic information can enable faster learning and faster rediscov- ery of good solutions [130] when unexpected events in the environment cause a dramatic change to the fitness landscape, however this feature is not utilised in these experiments. The final metadata field stores the number of bytes in the program (which typically differs from the number of instructions due to optional data fields) and is used by the framework for transmitting programs as multiple packets. In total, only 7 bytes (LM eta) are used for

metadata, however this information is extremely useful to receiving nodes.

In document In situ Distributed Genetic Programming: An Online Learning Framework for Resource Constrained Networked Devices (Page 80-85)