Genetic Algorithms
3.6 Schema Theorem
Recombining two identical strings with each other can, for example, lead to deletion of genes.
The crossover of different strings may turn out as an insertion of new genes into an individual.
Since the reproduction operations can change the length of a genotypes (therefore the name “variable-length”), variable-length strings need to be constructed of elements of the same type. There is no longer a constant relation between locus and type.
Fig. 3.6.a:
Insertion of random genes.
Fig. 3.6.b:
Deletion of genes.
Fig. 3.6: Search operators for variable-length strings (additional to those fromSection 3.4.2 andSection 3.4.3).
3.5.3 Crossover
For variable-length string chromosomes, the same crossover operations are available as for fixed-length strings except that the strings are no longer necessarily split at the same loci.
The length of the new strings resulting from such a cut and splice operation may differ from the length of the parents, as sketched in Figure 3.7. A special case of this class of recombination is the homologous crossover where only genes at the same loci are exchanged.
This method is discussed thoroughly inSection 4.6.7 on page 178.
( )
Fig. 3.7.a: Single-Point Crossover
( )
Fig. 3.7.b: Two-Point Crossover
( )
Fig. 3.7.c: Multi-Point Crossover
Fig. 3.7: Crossover of variable-length string chromosomes.
3.6 Schema Theorem
“Now if you were to alter masks every time fame circus approaches Do you really think your maker wouldn’t notice?”
— Aesop Rock [881]
The Schema Theorem is a special instance of forma analysis (discussed in Section 1.5.1 on page 62) for genetic algorithms. Matter of fact, it is older than its generalization and was first stated by Holland back in 1975 [136, 334, 138]. Here we will first introduce the basic concepts of schemata, masks, and wildcards before going into detail about the Schema Theorem itself, its criticism, and the related Building Block Hypothesis.
3.6.1 Schemata and Masks
Assume that the genotypes g in the search space G of genetic algorithms are strings of a fixed-length l over an alphabet11 Σ, i. e., G = Σl. Normally, Σ is the binary alphabet Σ ={true, false} = {0, 1}. From forma analysis, we know that properties can be defined on the genotypic or the phenotypic space. For fixed-length string genomes, we can consider the values at certain loci as properties of a genotype. There are two basic principles on defining such properties: masks and do not care symbols.
Definition 3.3 (Mask). For a fixed-length string genome G = Σl, we define the set of all genotypic masks Ml as the power set12 of the valid loci Ml = P({1, . . . , l}) [335]. Every mask mi∈ Ml defines a property φi and an equivalence relation:
g∼φih⇔ g[j]= h[j]∀j ∈ mi (3.4) The order “order(mi)” of the mask mi is the number of loci defined by it:
order(mi) =|mi| (3.5)
The defined length δ(mi) of a mask mi is the maximum distance between two indices in the mask:
δ(mi) = max{|j − k| ∀j, k ∈ mi} (3.6) A mask contains the indices of all elements in a string that are interesting in terms of the property it defines. Assume we have bit strings of the length l = 3 as genotypes (G = B3).
The set of valid masks M3is then M3={{1} , {2} , {3} , {1, 3} , {1, 3} , {2, 3} , {1, 2, 3}}. The mask m1 ={1, 2}, for example, specifies that the values at the loci 1 and 2 of a genotype denote the value of a property φ1 and the value of the bit at position 3 is irrelevant. There-fore, it defines four formae Aφ1=(0,0) ={(0, 0, 0) , (0, 0, 1)}, Aφ1=(0,1) ={(0, 1, 0) , (0, 1, 1)}, Aφ1=(1,0)={(1, 0, 0) , (1, 0, 0)}, and Aφ1=(1,1)={(1, 1, 0) , (1, 1, 1)}.
Definition 3.4 (Schema). A forma defined on a string genome concerning the values of the characters at specified loci is called Schema [136, 556].
3.6.2 Wildcards
The second method of specifying such schemata is to use don’t care symbols (wildcards) to create “blueprints” H of their member individuals. Therefore, we place the don’t care symbol * at all irrelevant positions and the characterizing values of the property at the others.
∀j ∈ 1..l ⇒ H[j]=
g[j] if j ∈ mi
∗ otherwise (3.7)
H[j] ∈ Σ ∪ {∗} ∀j ∈ 1..l (3.8)
(3.9) We now can redefine the aforementioned schemata like: Aφ1=(0,0) ≡ H1 = (0, 0,∗), Aφ1=(0,1)≡ H2= (0, 1,∗), Aφ1=(1,0)≡ H3= (1, 0,∗), and Aφ1=(1,1)≡ H4= (1, 1,∗). These schemata mark hyperplanes in the search space G, as illustrated inFigure 3.8for the three bit genome. Schemas correspond to masks and thus, definitions like the defined length and order can easily be transported into their context.
11Alphabets and such and such are defined inSection 30.3 on page 561.
12The power set you can find described inDefinition 27.9 on page 458.
3.6 Schema Theorem 133
( , , )1 0 0 ( , , )0 0 0
( , , )1 1 0 ( , , )0 1 0
( , , )1 0 1 ( , , )0 0 1
( , , )1 1 1 ( , , )0 1 1
g1
g0
g2
H =( , , )2 0 1 *
H =( , , )1 0 0 *
H =( , , )3 1 0 *
H =( , , )4 1 1 *
H =( , , )5 1 * *
Fig. 3.8: An example for schemata in a three bit genome.
3.6.3 Holland’s Schema Theorem
The Schema Theorem13 was defined by Holland [136] for genetic algorithms which use fitness-proportionate selection (see Section 2.4.3 on page 104) where fitness is subject to maximization [334, 138].
countOccurences(H, Pop)t+1≥ countOccurences(H, Pop)t∗ v(H)t
vt
(1− p) (3.10) where
• countOccurences(H, Pop)t is the number of instances of a given schema defined by the blueprint H in the population Pop of generation t,
• v(H)t is the average fitness of the members of this schema (observed in time step t),
• vtis the average fitness of the population in time step t, and
• p is the probability that an instance of the schema will be “destroyed” by a reproduction operation, i. e., the probability that the offspring of an instance of the schema is not an instance of the schema.
From this formula can be deduced that genetic algorithms will generate for short, above-average fit schemata an exponentially rising number of samples. This is because they will multiply with a certain factor in each generation and only few of them are destroyed by the reproduction operations. In the special case of single-point crossover (crossover rate cr) and single-bit mutation (mutation rate mr) in a binary genome of the fixed length l G = Bl
, the destruction probability p is noted inEquation 3.11.
p = crδ(H)
l− 1 + mrorder(H)
l (3.11)
3.6.4 Criticism of the Schema Theorem
The deduction that good schemata will spread exponentially is only a very optimistic as-sumption and not generally true. If a highly fit schema has many offspring with good fit-ness, this will also improve the overall fitness of the population. Hence, the probabilities in
13http://en.wikipedia.org/wiki/Holland%27s_Schema_Theorem[accessed 2007-07-29]
Equation 3.10will shift over time. Generally, the Schema Theorem represents a lower bound that will only hold for one generation [755]. Trying to derive predictions for more than one or two generations using the Schema Theorem as is will lead to deceptive or wrong results [525, 882].
Furthermore, the population of a genetic algorithm only represents a sample of limited size of the search space G. This limits the reproduction of the schemata but also makes statements about probabilities in general more complicated. Since we only have samples of the schemata H and cannot be sure if v(H)t really represents the average fitness of all the members of the schema (that is why we annotate it with t instead of writing v(H)).
Thus, even reproduction operators which preserve the instances of the schema may lead to a decrease of v(H)t+.. by time. It is also possible that parts of the population already have converged and other members of a schema will not be explored anymore, so we do not get further information about its real utility.
Additionally, we cannot know if it is really good if one specific schema spreads fast, even it is very fit. Remember that we have already discussed the exploration versus exploitation topic and the importance of diversity inSection 1.4.2 on page 41.
Another issue is that we implicitly assume that most schemata are compatible and can be combined, i. e., that there is low interaction between different genes. This is also not generally valid: Epistatic effects, for instance, can lead to schema incompatibilities. The expressiveness of masks and blueprints even is limited and can be argued that there are properties which we cannot specify with them. Take the set D3 of numbers divisible by three for example D3={3, 6, 9, 12, ..}. Representing them as binary strings will lead to D3= {0011, 0110, 1001, 1100, . . . } if we have a bit-string genome of the length 4. Obviously, we cannot seize these genotypes in a schema using the discussed approach. They may, however, be gathered in a forma. The Schema Theorem, however, cannot hold for such a forma since the probability p of destruction may be different from instance to instance.
3.6.5 The Building Block Hypothesis
According to Harik [265], the substructure of a genotype which allows it to match to a schema is called a building block. The Building Block Hypothesis (BBH) proposed by Goldberg [197], Holland [136] is based on two assumptions:
1. When a genetic algorithm solves a problem, there exist some low-order, low-defining length schemata with above-average fitness (the so-called building blocks).
2. These schemata are combined step by step by the genetic algorithm in order to form larger and better strings. By using the building blocks instead of testing any possible bi-nary configuration, genetic algorithms efficiently decrease the complexity of the problem.
[197]
Although it seems as if the Building Block Hypothesis is supported by the Schema Theorem, this cannot be verified easily. Experiments that originally were intended to proof this theory often did not work out as planned [883] (and also consider the criticisms of the Schema Theorem mentioned in the previous section). In general, there exists much criticism of the Building Block Hypothesis and, although it is a very nice model, it cannot yet be considered as proven sufficiently.