System Constants - The Application of Classical Conditioning to the Machine Learning of a Commo

Over the whole system including each of the various models there are 11 con-stants that determine the behaviour of the system above the four input pa-rameters of:

• The scenario.

• The input tracker noise percentage.

• The input video duration.

• The significance model to use.

Most of the constants are specific to each model, but there are some system-wide constants. This section looks at how the values for each constant were arrived at.

Each constant affects the utility of other constants – for instance, the reinforcement learning rate, k1, affects the number of rules that can be subject to the extinction process controlled by non-reinforcement learning rate, k₂. This means that the only way to guarantee that an optimal combination of settings has been chosen is to try every combination. This is not feasible though, as to try 8 possible values for each of the 11 constants in the context of each model gives 2.73 × 10⁸ combinations. Even running the maximum observed frame rate of the system of approximately 3000 frames per second³ and each combination was tried using only a one minute duration video with no noise as input, it would take over five years of constant processing to process every combination. Therefore, in the interests of feasibility, each setting had to be dealt with as if it independently affected the results. Further, it was not feasible to perform a full analysis of the results as described in the previous section; the only measures used were those directly output by the system:

The processing time and the number of rules in the output. While each model has its own separate constants, due to later models being based upon earlier models, many of the model parameters have equivalents in other models. In order to improve the feasibility of determining a value to each constant, each constant was reviewed only once, in the model that first uses that constant.

For each constant, the two measures were collated and plotted over ap-proximately eight possible settings. The final setting was set based on what appeared to be the best balance between the number of rules produced and the time taken to produce them. Appendix D lists the values that were used for each constant.

3This is an extreme frame rate. A more typical frame rate would be ap-proximately 100 frames per second; the lowest extreme is apap-proximately one frame per second.

5.6 Chapter Conclusion

This chapter has described how the system presented in chapter four was evalu-ated. The system was evaluated within the context of three different learning scenarios: A ball being thrown into the air, two objects rotating around a common axis and four balls colliding with one another. These three scenarios were generated as simulations based upon the physics equations relevant to each scenario plus a method of adding noise to the simulation. The output of the simulations was input into the system. The output of the system was com-pared in two ways. Firstly the output was comcom-pared against a proxy ground truth that was created for each scenario based upon a process comprising of human and deterministic decisions. Secondly where the input of the system included a level of noise, the output was compared against the output that was produced for the same video without the noise. The comparisons were based on several widely-used methods. However, the same widely-used meth-ods were not feasible for determining the large number of constants within the system and so a weaker form of analysis was used to determine what value should be used for each constant.

The results of the comparisons produced in this system are presented in the next chapter, chapter six. In the final chapter, chapter seven, those results are then used in the context of the hypotheses presented in chapter one to form a set of conclusions about the ideas presented in this thesis.

Chapter 6

Results

This chapter presents the results of the evaluation that was discussed in chap-ter five. Along with the results, the chapchap-ter analyses the salient points of the results. This analysis also allows the discussion to present an interpretation of which phenomena used in the system and its accompanying models worked well and those that worked less well.

During the evaluation of the system, a flaw in its design was discovered. A description of the flaw and the work-around introduced to reduce its impact is the topic of the first section of this chapter. This is presented first because it influences the remainder of the results. The second and third sections look at the primary results – the system’s ability to produce a model that matches the description of what the system was supposed to learn in each scenario.

This is done quantitatively in section two through a comparison between the output of the system and a proxy ground truth. The third section reviews the output of the system qualitatively.

The fourth and fifth sections look at the secondary results. The fourth sec-tion looks at how the system and its models respond to different levels of noise in the input data. The fifth section considers the computational performance of each model. The chapter then ends with a wider discussion of the results.

Note that so as to not break-up the text into very small chunks, spoiling flow, in each section that presents results in the form of charts, those charts are presented at the end of that section, after the accompanying discussion text.

6.1 The System Design Flaw

While the system was processing input data, it was found that due to the interaction between the event type hierarchy and an unforeseen aspect of the extension for multi-frame events, there was a combinatorial explosion in the number of event types at each level. The combinatorial explosion meant that

the system was unable to finish processing all but the shortest of video du-rations. This was rectified by capping the number of levels of the event type hierarchy at two levels of composite event types, creating a hierarchy of three levels when atomic event types are included.

The reason for the combinatorial explosion is due to a combination of the way the event-type hierarchy is represented and the situation of three or more multi-frame events that always occur together in a manner that means that they always overlap one another and all the overlaps fit within the size of the window. This can cause three more event types to be created at the next level up, which will lead to three event types at the level above that and will continue forever. This concept is shown in figur 6.1, which depicts a type configuration of event instances that for the purposes of this discussion occurs many times causing event types as shown. In the case of the exam-ple in figure 6.1, a third level would consist of the composite event type tu-ples ((T₁, T₂) , (T₁, T₃)), ((T₁, T₂) , (T₂, T₃)) and ((T₁, T₃) , (T₂, T₃)). When the results were being processed, not just three-way overlaps were observed, but even higher order overlaps – for example, in the case of the colliding scenario, some twenty-way overlaps were observed.

In theory, the infinite chain of levels is amortised by the fact that each new level has to wait for the events of the level below to reach the significance threshold. However, this issue is enhanced by a further factor. Three-way over-lapping event types cause a combinatorial explosion when other event types are associated in serial with them. When a further event type is associated serially with the three overlapping event types, six event types are created at the next level. Each event type that gets associated with the overlapping events effectively becomes a multiplier for the number of event types created, increasing the overall count for each higher level. This is demonstrated in figure 6.2. When this multiplier effect is combined with the infinite hierarchy and the larger numbers of atomic events used in the learning scenarios, it does not take very many frames or levels before the speed of processing one frame becomes too slow to feasibly process every scenario at each video duration and noise level. This led to the enforcement of a limit to the type hierarchy of two levels of compound events.

When developing the system, it was assumed that there were a finite num-ber of levels that could be created in the event type hierarchy. The reasoning was based on the fact that one-on-one serial associations and associations be-tween two-way overlaps would both only produce a single event type the next level up. This would mean that every level would have fewer event types than the level below it. Eventually each pairing and pairing of pairs and so on would combine to a single high-level pairing. The reduction in the number of event types at each level would thusly cause the number of levels to be finite.

Level 1

Level 2

A, B A, C

B, C

Figure 6.1: Two levels of multi-frame events. The window is marked by the two black vertical lines – solid for the current frame, dashed for the last frame.

Level 1

Level 2

A, B A, C

B, C A, D

B, D C, D

Figure 6.2: A serial association with a three-way overlap. The existence of the serial association increases the number of event types that are created at the second level of event types. The window is marked by the two black vertical lines – solid for the current frame, dashed for the last frame.

In document The Application of Classical Conditioning to the Machine Learning of a Commonsense Knowledge of Visual Events (Page 183-188)