It is important to note that the algorithms presented so far work exclu- sively with conjunctive match-tree forms of schema as defined previously in section 4.6. Non-conjunctive forms of schema prove relatively difficult to work with for a number of reasons:
• Any set of schema of a conjunctive form has a single meet schema which is the most specific schema of the form that is more general than each member of the set. Sets of schemata of a non-conjunctive form may have more than one most specific schema that is more gen- eral than each member of the set.
Each conjunctive form of schema has a representation for schemata where each schema is a set of schema components and, using this representation, the meet schema for a given set of schemata is the intersection of the members of the set. By contrast this is seldom the case for non-conjunctive forms of schema. There seems no similarly efficient way to find the meet schema or schemata of two or more schemata for non-conjunctive forms.
• Where there is an efficient algorithm to find the string representation of a schema of a conjunctive form, no efficient algorithm was found to perform this task in general over non-conjunctive forms.
• Where there is an efficient algorithm to count the number of schemata represented by a given maximal schema of a conjunctive form, no efficient algorithm was found to perform this task in general over non-conjunctive forms.
5.9. DE-ROOTED CONJUNCTIVE FORMS 135 Significant effort went into constructing a method to analyze non-conjunctive forms of schema, resulting in a system able to find maximal schemata from any match-tree form. But the method proved too inefficient and complex to present here.
Amongst the classes of non-conjunctive forms of schema there is a rel- atively easy special case: the de-rooted conjunctive forms defined in the previous chapter. Finding the maximal schemata of a de-rooted conjunc- tive form of schema f on population P0may be treated as similar to finding
the sets of maximal schemata of the matching conjunctive form for f run on the subtrees of P0. The GetAnnotatedDAG-de-rootedalgorithm, given in
pseudocode 5.13, performs this process and generalizes the GetAnnotatedDAG- rootedalgorithm to de-rooted conjunctive forms of schema. GetAnnotatedDAG- de-rootedis given a de-rooted conjunctive form of schema as fnr and a set
of programs as P0. It finds the representative pairs with respect to a given
population P0for de-rooted conjunctive forms of schema.
The function operates as follows:
• It uses GetAnnotatedDAG-rooted to find the maximal pairs Rr for
the relevant rooted-conjunctive form of schema fr with respect to
the subtrees Q0 of the programs in P0.
Each of the desired maximal schemata with respect to form fnr and
set of programs P0 may be found in Rr.
• Each maximal pair for the conjunctive form will have a schema sr
and a set of subtrees Qr of programs in P0. For each such sr the
algorithm finds as Prthe set of programs that match sr. This is found
as the subset of P0with some subtree in Qr.
Pr will be returned as a representative program subset.
• GetAnnotatedDAG-non-rooted then uses the function AddMeets-DAG described in subsection 5.5 to both group the schemata from the max- imal pairs in Rrinto those with the same Prand to construct a meet-
FUNCTION GetAnnotatedDAG-de-rooted(fnr, P0)
// fnris a de-rooted match-tree form of schema
// P0is a set of programs
LETfrbe the rooted conjunctive form for fnr
n = descnode at root node of root branch of fnr
Q0=subtrees of program in P0at depth specified by n
Rr =GetAnnotatedDAG-rooted(fr, Q0)
M=Map: schema → set of programs.
FOR EACH< sr, Qr >∈ Rr
Pr=the subset of P0with each program having a subtree in Qr
ADD MAPPINGsr → Prto M
R =AddMeets-DAG(M , selected variant of AddMeets)
FOR EACH< S, P >∈ Rfrom small P to large P Pcf ull=SubsetCount(P )
Pcrep= Pcf ull−P<s0,P0>∈R,P0⊂PPcrep0
FOR EACH< S, P >∈ Rfrom large P to small P Screp=Ps∈S:(@<S0,P0>∈R:P0⊂P,s∈S0)screp
REMOVEeach s from S where ∃s0∈ S, s0more specific than s
RETURNR END
5.9. DE-ROOTED CONJUNCTIVE FORMS 137 DAG of pairs, each with a set of schemata and a program subset. The meet nodes added may be seen to represent sets of schemata but no single schema on its own.
Though the call to AddMeets-DAG is expensive, it is essential since the later steps in this algorithm require the graph R be a meet-semi- lattice.
• The function finds the program subset counts in the same way as GetAnnotatedDAG-rooted.
• The function finds the counts of schemata represented by a set of schemata S as the sum of the counts of schemata represented by the individual schemata in S.
• Members of S are then removed if they are more general than some other member of S and therefore are not maximal since they are more general than another schema that occurs in the same programs.
• In a final step, the most specific schemata from pairs with supersets of P as their program subsets are added to S. This ensures that the set S has all of the most specific schemata occurring in all its pro- grams.
The inclusion of this step in the algorithm depends on whether the desired schemata are really the sets of schemata, for example “sets of subtrees”, in which case it should be included or if they are the individuals in the sets of schemata, for example “subtrees”, in which case it should be omitted.
• Thus this function provides a way to find the representative pairs with respect to a population of programs and a de-rooted conjunc- tive form of schema.
5.10
Chapter summary
This chapter has provided several algorithms involved in the new method presented by this thesis. Each adds some component to the overall sys- tem of enumerating the DAG of representative pairs given a population and form of schema. This DAG contains all the representative program subsets, representative sets of schemata, maximal program subsets and maximal schemata.
The algorithms fall into several broad classes:
• Overall algorithms, presented in sections 5.2 and 5.9, which invoke and coordinate the lower-level algorithms. One algorithm works with conjunctive forms of schema and another works with de-rooted conjunctive forms of schema. The representation of forms of schema is as match-tree forms.
• Algorithms preparing a base mapping for which the meet semi-lattice closure is the set of maximal program subsets or maximal schemata. • Algorithms finding the meet semi-lattice closure of a given base map-
ping which are called the AddMeets variants.
• An algorithm finding the edges of an anti-transitive DAG based on the subset relation, given this DAG’s nodes as sets.
• An algorithm finding the schema components of a given form, oc- curring in a given population.
• An algorithm finding the tree representation of a schema, given its schema components.
• Algorithms counting subsets and subschemata.
Together, these algorithms produce a DAG of representative pairs, which may be used by the algorithms given previously in chapter 4. Of these, the
5.10. CHAPTER SUMMARY 139 core algorithms are the AddMeets variants. Each variant does the same job and produces the same maximal pairs, but the different variants are expected to have different complexity in practice. Part of the next chapter, the first to show the new method in use, will compare the many options for this part of the new method. The following two chapters will present the results of numerous experiments using the new method.
Chapter 7 will demonstrate the efficacy of the new method using sam- ple analyses. Beforehand, the following chapter will characterize how the new method works in practice, over many values for several parameters such as the population size, program size and AddMeets variant used.
Chapter 6
Characterizing experiments
6.1
Chapter introduction
Previous chapters have presented a powerful new method for analysis of GP schemata. Chapter 3 defined the new match-tree form of schema lan- guage and later chapters defined a method to analyze match-tree schemata shared between the programs of any given population.
This chapter aims to characterize the new method by assessing the time requirements, the space requirements and the size of the output set, over a range of parameter values like population size, program size and gener- ation number.