• No results found

Chapter introduction

This thesis gives a method to analyze the match-tree schemata in sets of GP programs. The previous chapter defined the concepts of maximal schema, maximal program subset, representative program subset, representative sets of subtrees. These structures enable the compression of large numbers of schemata and program subsets in a population down to the ones re- quired for typical analysis. This chapter presents algorithms to find these maximal and representative structures from a set of programs and a form of schema which is expressed as a match-tree form.

The combined system made up of the algorithms of this chapter takes a form of schema and a population of programs as input and produces an annotated DAG of maximal pairs as output. A flow diagram of this system is given in figure 5.1 The diagram shows that there are several steps in making the DAG and several classes of objects passed as input to and produced as output from each of these steps. The objects include:

• f : a form of schema specified as a match-tree form. This main system accepts only conjunctive match-tree forms. Section 5.9 gives an algo- rithm which generalizes this system to include de-rooted conjunctive

Figure 5.1: A flow diagram of the functions (bold text) and classes of ob- jects (text in ellipses) which are used to produce the annotated DAG of maximal pairs.

5.1. CHAPTER INTRODUCTION 97 forms of schema.

• P0: a population to be analyzed. Together with the form of schema,

this specifies the input to the system.

• C0: a set of schema components. C0is returned by GetSchemaCom-

ponentswhich is described in section 5.4. The ability to describe any schema as a set of schema components is key to the new method. C0has all the schema components required to describe the maximal

schemata for form f and population P0.

• M : either a mapping from schema components to sets of programs or a mapping from programs to sets of schema components. M serves as a base for one of the AddMeets variants, which use it to construct a meet-semi-lattice which has the maximal pairs as nodes.

• A set of maximal pairs, each with a maximal program subset and a maximal schema. It is constructed by one of the variants of GetMax- imalDAG of section 5.3, and has all maximal pairs with respect to f and P0.

• R: an anti-transitive DAG of maximal pairs. R is constructed by ei- ther GetMaximalDAG while creating the set of maximal pairs or Get- Edgesof section 5.6. While R has no transitive edges, it has a path going from each maximal pair r to each maximal pair more general than r.

• A schema tree for each maximal schema. GetSchemaTree of sec- tion 5.7 constructs the tree from each maximal schema’s set of schema components.

• A count structure for each maximal schema s counting the schemata of each order that are generalizations of s. SubschemaCountBy- Sizeof section 5.8 constructs this count structure from the maximal schema’s set of schema components.

• A count structure for each maximal schema s counting the schemata of each order that are represented by s.

• A count structure for each maximal program subset counting its sub- sets of each cardinality. It is constructed by SubsetCountBySize of section 5.8.

• A count structure for each maximal program subset counting the subsets of each cardinality that it represents.

• The final annotated DAG of maximal pairs. Each maximal pair is made up of a maximal schema and a maximal program subset. In addition, each maximal schema s is annotated with:

– stree which is the representation of s as a tree.

– scf ullwhich is a histogram of the number of subschemata of each

order.

– screp which is a histogram of the number of represented sub-

schemata of each order.

Each maximal program subset P is annotated with:

– Pcf ull which is a histogram of the number of subsets of each or-

der.

– Pcrepwhich is a histogram of the number of represented subsets

of each order.

5.1.1

Schema components

Schema components in this thesis are rooted paths of schema nodes. Noting that match-tree schemata are trees of schema nodes, any schema could be constructed as the union of a set of schema components.

5.1. CHAPTER INTRODUCTION 99 • s.fdisjunctis the schema form disjunct used to make the schema node.

Each schema node is associated with a single disjunct, and all chil- dren of the schema node are associated with a disjunct in the form’s node pattern pointed to by s.fdisjunct.cindex.

• s.fnis a node-match function. In particular s.fn = s.fdisjunct.fn.

• s.fcis a child-match function. In particular s.fc= s.fdisjunct.fc.

• s.Psub is a set of program subtrees from programs in P0 matching s.

The label of the root node of each program subtree in s.Psub must

match s. That is for each program subtree p in s.Psub, if p.v is the la-

bel of the root of p then s.fn(s.v, p.v). The children of the root node

of each program subtree in s.Psub must match s if given some map-

ping from children of the schema node to descendents of the pro- gram node. That is for each program subtree p in s.Psub, for some set

of pairs M s.fc(s, p, M ).

• s.P is the set of programs from P0 containing the subtrees in s.Psub.

The set s.P with all programs from the population P0 which match

the schema component which ends in s. The set s.P was used in the previous subsection to identify a mapping between programs and schema components.

A useful duality emerges from the use of schema components: a program subset is a set of programs, and a schema is a set of schema components. Also, the programs that match a schema are found as the programs that match each of the schema’s schema components, and the schema compo- nents that occur in a program subset are found as the schema components that occur in each of the program subset’s programs.

While there are typically many schemata, there are typically signifi- cantly fewer schema components. For conjunctive forms of schema with simple node-match functions subsection 5.4 presents an efficient algorithm

finding the schema components. The algorithm takes a form of schema and a population of programs.