5.3 History Generating Algorithm
5.3.1 History Generation Heuristics
The algorithm described in the previous section provides an outline of the history generator. As has been noted there are a variety of options for implementing key steps in the process and these have not yet been de- scribed in detail. For example, the test for when an active history should be stopped and removed from the set of active histories. Each of these func- tions plays an important role in determining the types of histories that are generated. The ability of the generalizer to find good models will depend on how these heuristic functions are implemented. This section describes the heuristics of the generator and the different options for implementing them.
Temporal Restriction: Starting New Histories
This function is called for every snapshot and returns true if a new history should be started. It implements part of a history’s temporal restriction. The following methods are used to determine if a new history should be started:
• Every Snapshot: a simple implementation is to start a new history on every new snapshot. This has the advantage of avoiding the problem of missing useful transitions. However it can be computationally intensive, especially if histories are quite long (in which case there will be many histories active simultaneously).
• Always Exactly One: an alternative is to ensure that there is always a single active history by only starting a new one when the previ- ous one finishes. Like the ’Every Snapshot’ approach this approach ensures that no potentially useful transitions are missed, however it does preclude simultaneous active histories (which may have differ- ing contextual restrictions, i.e. focused on different sets of objects). • Wait for Event: this method starts a new history whenever some-
thing ‘interesting’ is observed. An interesting event may be the agent performing an action or it might be a change in the environment in- volving multiple objects. This method has the advantage of only starting histories when there is something worthwhile to learn. Po- tentially this makes the job of the generalizer easier since it has fewer histories to learn from. However, it has the disadvantage that useful transitions can be missed. This trade-off is determined by the precise definition of an interesting event.
• Teacher Initiated: a signal from the teacher can be used to start a new history. In this case the agent is relying on the teacher to ensure that no interesting behaviour is missed. The advantage to this method is
5.3. HISTORY GENERATING ALGORITHM 117 that histories should be more useful training examples. The disad- vantage of this approach is that the agent loses some autonomy.
Temporal Restriction: Stopping Histories
This function is called for every active history at every snapshot to de- termine if the history should be terminated. It implements part of the history’s temporal restriction. The following methods can be used to de- termine if an active history should be terminated:
• Fixed Time Limit: this method stops the history after a fixed dura- tion. It is a trivial method of preventing histories from becoming too long. The length of time for which the history is active can be mea- sured in either snapshots or action executions. For example, termi- nate the history if the agent has executed 10 actions since the history was initiated.
• Limit Size of History: stop the history once it has reached a certain size. The ‘size’ of a history can be measured in the following ways: the number of transitions, the number of states, the number of cy- cles, or the number of objects. This method has the advantage that the structure of histories can be made to be similar to the desired structure of generalised models (assuming there is a desirable struc- ture for generalised models). Note that restricting the history size to a single transition results in a dataset that is similar to those used in many existing action learning systems (see section 2.2).
• No New Interesting Observations: stop the history if no new ‘inter- esting’ observations have occurred for a fixed duration. Interesting observations are those that add something new to the history. These include: transitions to new states, new transitions between previ- ously observed states, and the addition of new objects to the context of the history. Potentially this method can make learning more effi- cient by automatically stopping histories when an observed system
has been ‘explored’ to the extent that observing new behaviour is unlikely. This is dependent on the particular system being observed and the precise definition of what is an interesting observation. The duration for which no new interesting observations are observed can be measured in elapsed actions or snapshots.
• Teacher Terminated: a signal from the teacher can be used to ter- minate a history. This has the advantage that the teacher can steer learning towards desired goal systems by limiting histories to rel- evant episodes of agent interactions. The disadvantage of this ap- proach is that the agent loses some autonomy.
Another possible heuristic ‘context failure’ uses a partially learned model when deciding to stop a history. In this case the history is stopped when the context of the learned model is no longer applicable to (satisfied by) the current world state. This would enable histories to discover new states not included in the model and automatically stop when this is no longer pos- sible. The context failure heuristic is not implemented in this experiment because the focus is on stand alone history generation, however it would make a good candidate for an integrated learner and history generator.
Contextual Restriction: Adding New Objects
The contextual restriction of a history determines which objects are con- sidered a part of the system under observation. A history only contains information about objects in its context and disregards the rest. The con- text is constructed by checking each snapshot to see if any new objects should be added. The following methods can be used to determine the result of the ‘find new objects’ function and hence which objects should be added to the context for a given history:
• All Objects: assume all observable objects are relevant to the system being observed. This approach is only feasible in trivial worlds with
5.3. HISTORY GENERATING ALGORITHM 119 only a few observable objects. It is useful for generating complete histories from a micro-world which can be used for comparison with the contextually restricted histories generated from complex worlds. • Any Objects That Change: if an object changes then it gets added to the current context. This approach ensures that anything directly affected by the observed system will definitely be included in the history. It has the disadvantage that objects affected by events ex- ogenous to the system will also be included. Objects are considered to have changed if a property changes or a relationship with another object changes.
• Any Relevant Objects That Change: if an object changes and it is in some way relevant to the current context then it gets added to the current context. This approach addresses the issue of avoiding adding objects to the context which have been affected by exogenous events. It does so by using a relevancy test to determine if a chang- ing object is likely to be part of the system under observation. The relevancy test is implemented in various ways, for example, objects are considered relevant if they are in close proximity to an object al- ready in the system context. Stricter tests include requiring that the object is touching or connected to a context object. A disadvantage of this approach is that a poorly chosen relevancy test will exclude important objects from a system history.
Updating Existing Transitions
The history generator constructs histories incrementally by adding new transitions as they are observed. When a new transition contains new ob- jects to be added to the history’s context then the history’s existing transi- tions must be updated with the state of the new objects. This ensures that histories are complete descriptions of observed systems over some time
period. The following methods can be used to update existing transitions with new context objects:
• Use Buffer: buffer sufficient snapshots such that the state of objects in previous transitions can be found by inspecting the buffer. This method ensures that the previous state of objects is correctly recorded. The disadvantage of this approach is that it is only feasible if certain unrealistic restrictions are made on the vision system and simulation. • Assume Not Changed: assume that objects not previously included in the context have not changed since the history began. This ap- proach is implemented by copying the object state from the most recent snapshot to the existing transitions. A problem with this ap- proach is that the assumption that new objects have not changed may not be true. This depends on the rules used to add new ob- jects to the history. If they have indeed changed then the history will contain false information on the state of some objects. This problem is avoided when using the ‘Any Objects That Change’ method for finding new context objects.
• Leave Unknown: leave values for objects in previous transitions as unknown. This approach has the advantage that it avoids making any false assumptions about object states. It has the disadvantage that the history will contain transitions with ambiguous states.