Generalising - Representing QualitativeAction Models for Learningin Complex Virtual Worlds

pair for each object in the source system and adding the pair to the match. The algorithm will create a one-to-one mapping if possible (when the target has the same number or more objects than source), otherwise it will create a many-to-one mapping by pairing the additional source objects to already matched target objects.

After finding an object mapping the matcher creates a state mapping. This is achieved by finding states that are identical when the object mapping is used to substitute variables in the source system. Because the matcher only creates either fully or semi-constrained mappings the substitution for the source objects is unambiguous. If the mapping were allowed to be unconstrained (many-to-many) then the substitution would be much more problematic.1

Finally, the result is returned as a tuple containing the object and state match.

6.2 Generalising

This section describes a basic algorithm for learning generalised Q-Systems. The purpose of the learning system is to transform a series of observed example concrete systems into a set of generalised systems that are appli- cable to analogous situations. This enables the agent to use its experience of the world to predict and plan for the behaviour of objects in new and unfamiliar situations.

A good learning algorithm will produce models that are not over generalised. Over generalised models will apply to many situations but will not correctly predict behaviour. Similarly a good learner will not under generalise by missing opportunities to merge examples describing similar systems.

1_{For example, how should a substitution be resolved if a source object is paired with}

The core of the learning algorithm works by matching systems that have similar contexts and replacing them with a new system. The new system context is the intersection of the two contexts - the assertions that are common to both systems, however, the behaviour of the new system is the union of the behaviour of both systems (the learner assumes that behaviour observed in one example and not the other was missed and could have been present in both examples given enough observations). The learner applies a combination of operations to convert a system includ- ing: replacing concrete objects with abstract variables, dropping assertions from the context, adding states to the behaviour, and, adding transitions to the behaviour. This is described in section 6.2.1.

Before the learner can generalize the systems it must select which systems are to be generalised. The algorithm ‘LearnSystems’ determines the systems that should be generalised and constructs a match between them for use in model refinement. It is described in section 6.2.2.

6.2.1 Model Refinement Algorithm

The purpose of the model refinement algorithm is to create a generalised system from an existing model system and a newly observed example system (a history). The algorithm takes three inputs: the model, the history, and a matching from one to the other. The matching is assumed to be the best available matching between the two systems. Pseudo-code for the model refiner is shown in Algorithm 3.

The algorithm starts by using the object match to find a substitution for each input system that maps objects to the new general system. This is a form of ‘anti-unification’ in which paired objects are replaced by new variables (unless both objects are the same constant in which case they are not replaced by variables). The resulting ‘unifier’ contains two mappings of objects, one for each input system. Next, the new system’s context is calculated by finding assertions that are common to both the example and

6.2. GENERALISING 137

Algorithm 3RefineModel( model, history, match ) unif ier ←antiunify( match.objM atch )

context ←union( model.context/unif ier, history.context/unif ier ) behaviour ← ∅

for(smodel, shistory) in match.stateM atch do

add union( smodel/unif ier, shistory/unif ier ) to behaviour

for eachunmatched state s do add s/unif ier to behaviour

return new system( context, behaviour )

model contexts, given the substitution.

The behaviour of the new system is built by iterating through the transitions in both the model and example systems. If a state in one system is matched to a state in the other, then the two states are generalised in a similar way to the context generalisation. If a state is not matched, then it is simply added to the new system. All transitions in both systems are added to the new system.

Finally, the new system with the generated context and behaviour is returned.

6.2.2 Model Selection Algorithm

The purpose of the model selection algorithm is to process an endless series of example systems (‘histories’) and maintain a knowledge-base in the form of a set of generalised systems. The algorithm uses the RefineModel program described in the previous section. Pseudo-code for the learner is shown in Algorithm 4.

similarity scores (these can be used to change the relative importance of transition, state and context similarities), and a threshold for use in decid- ing whether or not two systems are similar enough to be generalised (the threshold can be used to make the learner a more or less ‘eager’ general- izer).

Algorithm 4LearnSystems( weights, threshold ) systems ←null

loop

history ←get next history()

// Match and score existing systems to new system...

matchings ←apply MatchSystems to each system in systems

scores ← ∅

for s, m in systems, matchings do

score ←(transition score( s, history, m ) × weights.transition) + (state score( s, history, m ) × weights.states) +

(context score( s, history, m ) × weights.context) add (score, s) to scores

// Refine best match and add to knowledge base...

if scores.max > threshold then

newSystem ←RefineModel( scores.best, history ) remove scores.best from systems

add newSystem to systems

else

add history to systems

The learner begins with an empty set of systems. It then enters an infinite loop and waits for new histories. For each new history the learner finds a matching and a similarity score (based on the matching) for each of the systems in its knowledge base.

6.3. A PLANNER 139

In document Representing Qualitative Action Models for Learning in Complex Virtual Worlds (Page 149-153)