pair for each object in the source system and adding the pair to the match. The algorithm will create a one-to-one mapping if possible (when the tar- get has the same number or more objects than source), otherwise it will create a many-to-one mapping by pairing the additional source objects to already matched target objects.
After finding an object mapping the matcher creates a state mapping. This is achieved by finding states that are identical when the object map- ping is used to substitute variables in the source system. Because the matcher only creates either fully or semi-constrained mappings the substi- tution for the source objects is unambiguous. If the mapping were allowed to be unconstrained (many-to-many) then the substitution would be much more problematic.1
Finally, the result is returned as a tuple containing the object and state match.
6.2
Generalising
This section describes a basic algorithm for learning generalised Q-Systems. The purpose of the learning system is to transform a series of observed example concrete systems into a set of generalised systems that are appli- cable to analogous situations. This enables the agent to use its experience of the world to predict and plan for the behaviour of objects in new and unfamiliar situations.
A good learning algorithm will produce models that are not over gen- eralised. Over generalised models will apply to many situations but will not correctly predict behaviour. Similarly a good learner will not under generalise by missing opportunities to merge examples describing similar systems.
1For example, how should a substitution be resolved if a source object is paired with
The core of the learning algorithm works by matching systems that have similar contexts and replacing them with a new system. The new system context is the intersection of the two contexts - the assertions that are common to both systems, however, the behaviour of the new system is the union of the behaviour of both systems (the learner assumes that be- haviour observed in one example and not the other was missed and could have been present in both examples given enough observations). The learner applies a combination of operations to convert a system includ- ing: replacing concrete objects with abstract variables, dropping assertions from the context, adding states to the behaviour, and, adding transitions to the behaviour. This is described in section 6.2.1.
Before the learner can generalize the systems it must select which sys- tems are to be generalised. The algorithm ‘LearnSystems’ determines the systems that should be generalised and constructs a match between them for use in model refinement. It is described in section 6.2.2.
6.2.1
Model Refinement Algorithm
The purpose of the model refinement algorithm is to create a generalised system from an existing model system and a newly observed example sys- tem (a history). The algorithm takes three inputs: the model, the history, and a matching from one to the other. The matching is assumed to be the best available matching between the two systems. Pseudo-code for the model refiner is shown in Algorithm 3.
The algorithm starts by using the object match to find a substitution for each input system that maps objects to the new general system. This is a form of ‘anti-unification’ in which paired objects are replaced by new variables (unless both objects are the same constant in which case they are not replaced by variables). The resulting ‘unifier’ contains two mappings of objects, one for each input system. Next, the new system’s context is calculated by finding assertions that are common to both the example and
6.2. GENERALISING 137
Algorithm 3RefineModel( model, history, match ) unif ier ←antiunify( match.objM atch )
context ←union( model.context/unif ier, history.context/unif ier ) behaviour ← ∅
for(smodel, shistory) in match.stateM atch do
add union( smodel/unif ier, shistory/unif ier ) to behaviour
for eachunmatched state s do add s/unif ier to behaviour
return new system( context, behaviour )
model contexts, given the substitution.
The behaviour of the new system is built by iterating through the tran- sitions in both the model and example systems. If a state in one system is matched to a state in the other, then the two states are generalised in a similar way to the context generalisation. If a state is not matched, then it is simply added to the new system. All transitions in both systems are added to the new system.
Finally, the new system with the generated context and behaviour is returned.
6.2.2
Model Selection Algorithm
The purpose of the model selection algorithm is to process an endless se- ries of example systems (‘histories’) and maintain a knowledge-base in the form of a set of generalised systems. The algorithm uses the RefineModel program described in the previous section. Pseudo-code for the learner is shown in Algorithm 4.
similarity scores (these can be used to change the relative importance of transition, state and context similarities), and a threshold for use in decid- ing whether or not two systems are similar enough to be generalised (the threshold can be used to make the learner a more or less ‘eager’ general- izer).
Algorithm 4LearnSystems( weights, threshold ) systems ←null
loop
history ←get next history()
// Match and score existing systems to new system...
matchings ←apply MatchSystems to each system in systems
scores ← ∅
for s, m in systems, matchings do
score ←(transition score( s, history, m ) × weights.transition) + (state score( s, history, m ) × weights.states) +
(context score( s, history, m ) × weights.context) add (score, s) to scores
// Refine best match and add to knowledge base...
if scores.max > threshold then
newSystem ←RefineModel( scores.best, history ) remove scores.best from systems
add newSystem to systems
else
add history to systems
The learner begins with an empty set of systems. It then enters an infinite loop and waits for new histories. For each new history the learner finds a matching and a similarity score (based on the matching) for each of the systems in its knowledge base.
6.3. A PLANNER 139