• No results found

2.3 Semantic Genetic Programming

2.3.1 Previous attempts of semantic GP

Traditional Genetic Programming searches the space of functions/programs by using search operators that manipulate their syntactic representation, regardless of their actual semantics/behaviour. For instance, subtree swap crossover is used to recombine functions represented as parse trees, regardless of trees representing boolean expressions, mathe- matical functions, or computer programs. Although this guarantees that offspring are always syntactically well-formed, there is no reason to believe that such a blind syntactic search can work well for different problems and across domains.

In recent literature, there are a number of approaches that use semantics to guide the search in the attempt to improve on GP with purely syntactic operators, as follows. Many individuals may encode the same function (i.e., they may have the same semantics). It is possible to enforce semantic diversity throughout evolution, by creating semantically unique individuals in the initial population [8, 25], and by discarding offspring of crossover and mutation when semantically coinciding with their parents [9, 7].

The semantics of a program can be directly and uniquely represented by enumerating the input-output pairs making up the computed function, or equivalently, by the vector of all output values of the program for a certain fixed order of all possible input values. Quang Uy et al. [88] have proposed a probabilistic measure of semantic distance between individuals based on how their outputs differ for the same set of inputs sampled at random. This distance is then used to bias semantically the search operators: mutation rejects offspring that are not sufficiently semantically similar to the parent; crossover chooses

only semantically similar subtrees to swap between parents.

Geometric crossover and geometric mutation [62, 57] are formal and representation- independent search operators that can be, in principle, specified to any search space and representation, once a notion of distance between individuals is provided. Simply stated, the offspring of geometric crossover are in the segment between parents, and the offspring of geometric mutation are in a ball around the parent, w.r.t. the considered distance. Many crossover and mutation operators across representations are geometric operators (w.r.t. some distance). Krawiec et al. [32, 35] have used a notion of semantic distance to propose a crossover operator for GP trees that is approximately a geometric crossover in the semantic space (i.e., a geometric semantic crossover). The operator was implemented approximately by using the traditional sub-tree swap crossover, generating a large number of offspring, and accepting only those offspring that were sufficiently “semantically intermediate” with respect to the parents. An analogous approach can be used to implement a geometric semantic mutation, with offspring lying in a small ball around the parent in the semantic space. Krawiec et al. [33] propose a semantic crossover- like operator that, given a pair of parents, finds a semantically intermediate procedure from a previously prepared library and in [34] they show how this operator leads to significant increase in search performance when compared to standard subtree-swapping crossover.

Whereas the semantically aware methods above are promising, as they have been shown to be better than traditional GP on a number of benchmark problems, their im- plementations are very wasteful as heavily based on trial-and-error: search operators are implemented via acting on the syntax of the parents to produce offspring, which are ac- cepted only if some semantic criterion is satisfied. More importantly from a theoretical perspective, these implementations do not provide insights on how syntactic and semantic searches relate to each other.

Geometric Semantic Genetic Programming (GSGP) introduced by Moraglio et al. [59, 58] is a form of genetic programming that uses geometric semantic crossover and geomet-

ric semantic mutation to search directly the semantic space of functions/programs. This is possible because, seen from a geometric viewpoint, the genotype-phenotype mapping of GP becomes surprisingly easy, and allows us to derive explicit algorithmic character- izations of geometric semantic operators for different domains following a simple formal recipe, which was used to derive specific forms of GSGP for a number of classic GP domains (i.e, Boolean functions, arithmetic functions and classifiers).

The fitness landscape seen by the geometric semantic operators is always a cone by construction, as the fitness of an individual is its semantic distance to the optimum (i.e., its fitness is the distance between its output vector and the output vector of the target function). This has the consequence that GP search on functions with geometric semantic operators is formally equivalent to a GA search on the corresponding output vectors with standard crossover and mutation operators. For example, for Boolean functions, geometric semantic GP search is equivalent to GA search on binary strings on the OneMax landscape, for any Boolean problem.

The equivalence between GSGP and GA is very attractive from a theoretical point of view as it opens the way to a rigorous theoretical analysis of the optimisation time of GSGP by simply reusing known runtime results for GAs on OneMax-like problems. This analysis obtained is general, as it applies to all problems of a certain domain (e.g., all Boolean functions are seen as OneMax by GSGP).

In the rest of the chapter we will formally define geometric semantic genetic pro- gramming and we will propose a formal recipe to design GSGP operators for different domains.