The Models - Module 4 – Significance - The Application of Classical Conditioning to the Machine

4.5 Module 4 – Significance

4.5.3 The Models

The purpose of a model is to calculate the significance measure, V for each event type, based on each piece of evidence presented to that model. The models described below vary in complexity and fidelity with which they imi-tate the phenomena of classical conditioning. The chapter introduction noted that the system presented in this chapter only focuses on a sub-set of the phe-nomena that were described and analysed in chapter two. While some of those phenomena can be seen in all aspects of the system, it is the models and how they control a composite event type’s significance measure that draws most from the concepts of classical conditioning.

In the discussion of existing models of classical conditioning in chapter three, one categorisation applied to the models was whether a model was trial-level or real-time. In trial-trial-level models, the computation is dealt with after an event instance has terminated. In real-time models, the computation happens at every time-frame, and can cope with those frames being arbitrarily small.

The system presented in this chapter is arguably both: The system as a whole deals with real-time data, segmented into arbitrarily small frames. However,

at the level of the models used in the significance module, the system operates as a trial-level model.

There are ten models in total. The names for these models are:

1. Fixed Increment

2. Symmetrical Fixed Increment 3. Count Only

4. Absolute Acquire-Extinguish 5. Iterative Acquire-Extinguish

6. Temporal 7. Reacquiring 8. Blocking 9. Inhibition 10. Pre-Exposure

The first three models presented are in some sense, the “control” models, approaching the problem of calculating a significance measure without drawing from the concepts of classical conditioning, of which two were developed in a na¨ıve manner. The latter seven models do draw from the concepts of classical conditioning to varying degrees. Each of the last six models of conditioning is built based on the previous model, amending it to add further phenomena.

None of the models provide any form of theoretical explanation as to how any phenomenon of classical conditioning arises within biological systems.

This is deliberate and this ethos of only replicating phenomena as a black box is one of the founding hypotheses of this project, expressed in chapter one as hypothesis A. The consequence is that some of the phenomena have been developed in a highly biologically implausible manner; for example, by testing for the conditions in which the phenomenon has been observed to occur and then applying the effect of the phenomenon, rather than building a model where all phenomena are just cases of a single unified mechanism.

4.5.3.1 The Fixed Increment Model

The Fixed Increment model is the most na¨ıve of all the models used by this thesis. As its name suggests, each time it receives a piece of positive evidence, the model returns a significance measure that has changed by a constant f⁺. This is expressed in equation 4.10.

V_n+1= V_n+ f⁺ (4.10)

The constant f⁺ is a parameter defined prior to the system beginning to process any frames of input data and is set to be in the range 0 ≤ f⁺≤ 1. As described earlier, the significance measure is limited to the range 0 < V ≤ 1.

This means that ultimately, the Fixed Increment model considers an event instance pairing to be strongly associated after a fixed number of presenta-tions, with the number of presentations being ^τ^upper_f+ , where τ_upper is the upper significance threshold.

The model does not take into account any negative evidence. When the model is provided with some negative evidence via its respective input

func-tion, it always returns a significance measure value change of zero. By having a model that does not use the negative evidence that is provided, it allows for the utility of the negative evidence to be seen.

4.5.3.2 The Symmetrical Fixed Increment Model

The Symmetrical Fixed Increment model is the enhancement of the Fixed Increment model that does take into account any negative evidence that it is provided with. As with the non-symmetrical model, this model returns a significance measure value that has changed by a constant value f⁺, which is set to be in the range 0 ≤ f⁺ ≤ 1. When the model is provided with negative evidence, the model again returns a significance measure value that has changed by a constant value, this time by a second constant f⁻, which is set to be in the range −1 ≤ f⁻ ≤ 0. While the model is built with two separate constants, in practice, f is set to be equal to −f⁺. By setting the two constants to be the same magnitude with opposite signs, it allows for the Symmetrical Fixed Increment model to be more directly compared with the asymmetric version, as it gives equal weighting to both positive and negative evidence. As with the Fixed Increment model, the significance measure is limited to the range 0 ≤ V ≤ 1.

4.5.3.3 The Count Only Model

The third model is based on frequentist probabilities and is the final model of the ten models in this section that is not based on classical conditioning. For each event type, including atomic event types, a count of the number of times that a particular event type has been observed is stored as supplemental data.

These counts are then used to calculate a new significance measure.

When positive evidence of a composite event type is presented, the corre-sponding composite event type counter is incremented. If the component event types of that composite event instance are atomic event types, those event in-stances are counted too, but only if those particular atomic event types have not yet been observed within that frame.

When negative evidence of a composite event type is provided, only the observed atomic event types that had not yet been observed as part of any other evidence are used, and those atomic event type counts are incremented along with the other atomic event types.

Atomic event instances are counted by the model maintaining its own count for each observed atomic event type. Before the model increments an atomic event type count, the model searches for the event type in a list of atomic event types that have been incremented during that frame. Only if an atomic event type is not on the list is the count incremented. The list of seen atomic event instances is reset after all evidence produced in that frame has been processed, using the frameTick function discussed in section 4.6.5.

Every time a composite event type count is updated, the significance mea-sure is recalculated for that composite event type and any composite event

types that it is a component part of. When an atomic event type is updated, only the composite event types that the event type is a part of are updated.

The significance measure for the composite event type T1,2 composed of event types T₁ and T₂is calculated according to equation 4.11. The derivation of the function is provided in appendix B.

V = 2 |T_1,2|

|T₁| + |T₂| (4.11)

Where V is the significance value, |T_1,2| is the count for the composite event type and |T₁| and |T₂| are the independent counts for the corresponding component event types. As |T₁| and |T₂| are independent from |T_1,2|, they also include in their count those occurrences that appeared outside any pairing with each other.

4.5.3.4 The Absolute Acquire-Extinguish Model

The Absolute Acquire-Extinguish model maintains as supplemental data for each composite event type a count of how many instances of positive evidence have been presented and how many instances of negative evidence have been presented. These two counts are both used to calculate the significance mea-sure, which is recalculated with the presentation of each piece of evidence.

There are two components to the significance measure formula, one for posi-tive evidence and one for negaposi-tive evidence.

The positive component of the significance measure formula (V⁺) is known as the logistic function and is the most common sigmoid function, as shown in equation 4.12.

V⁺= 1

1 + e^−k¹^ǫ⁺ (4.12)

Where ǫ⁺ is the count of all the observed positive evidence and k1 is a learning rate constant that determines how fast a composite event type moves along the curve. The logistic function was chosen to replicate the s-shape of the classical conditioning acquisition curve. This is not the typical function in which models of classical conditioning display the s-shaped curve of acquisition. Typically models of classical conditioning such as the Rescorla-Wagner (1972) model add a percentage of the difference between a maximum significance value and the current significance value, leading to diminishing gains in the significance measure, forming an s-line asymptote at the maximum significance value. The common method does not have a slow initial start like the logistic curve, which, as discussed in chapter two, can have uses in minimising the effect of coincidental event instance pairings. The logistic curve is however used widely within artificial neural networks.

The negative component of the significance measure formula (V⁻), as ex-pressed in equation 4.13, is a simple linear decay.

V⁻= −k₂ǫ⁻ (4.13)

Where ǫ⁻ is the count of all the observed negative evidence and k₂ is a learning rate constant that determines how fast a composite event type moves along the curve. Again, this differs from the typical equation used in models of classical conditioning in that most apply an inverted version of the acquisi-tion funcacquisi-tion, decaying a large amount for high significance values and a small amount for small significance values. During the review of literature regarding classical conditioning, no experimental evidence could be found that suggests that extinction follows a sigmoidal decay, as is usually assumed by most mod-els such as the Rescorla-Wagner model. The data provided by Pavlov (1927, pp. 52–53) suggests that extinction follows a roughly linear decay. This obser-vation is reflected in the choice for the negative component of the significance measure formula.

These two component parts of the formula are combined together through addition, as shown in equation 4.14.

V = V⁺+ V⁻= 1

1 + e^−k¹^ǫ⁺ − k₂ǫ⁻ (4.14) The range of values the significance measure can take is actively enforced through checking the value and if it is out of range, replacing it with the appro-priate limit of the range. The reason the range needs to be actively enforced is due to the linear subtraction of negative evidence. If there is enough negative evidence, then the significance measure would become negative, forcing it out of the range the system expects the significance measure to be in.

4.5.3.5 The Iterative Acquire-Extinguish Model

The Iterative Acquire-Extinguish model covers the same phenomena as the Absolute Acquire-Extinguish model. The difference is that the significance measure formula has been changed in order to make it iterative, in the sense that the significance measure modification can be calculated at each frame based only on its value at the previous frame and the input of the current frame. By making the formula iterative, the model does not need to maintain absolute counts of the observed positive and negative evidence, meaning the model does not need to store any supplemental data about a given event type.

Unlike the Absolute Acquire-Extinguish model, where both counts of evi-dence are included in the one equation, the iterative model has two separate equations, one that is applied when a piece of positive evidence is encountered and one that is applied when a piece of negative evidence is encountered. Both equations calculate the change required for the significance measure value (∆V ) given the increase in the number of pieces of evidence observed. For completeness, the relationship between the significance measure before the supply of a piece of evidence (V_n) and after its supply (V_n+1) is stated in equation 4.15.

V_n+1= V_n+ ∆V (4.15)

The change in significance value given a piece of positive evidence is given by equation 4.16. In that equation k₁ is the same learning rate constant for positive evidence as the Absolute Acquire-Extinguish model; ∆x is the amount one reinforcement instance is to be counted (this is normally equal to one and is only included for completeness) and all other symbols are the same as they were previously defined.

∆V = (1 − V_n) e^k¹^∆x+ V_n− 1 e^k¹^∆x+ _V¹

n − 1 (4.16)

The change in significance value given a piece of negative evidence is given by equation 4.17. In that equation k₂ is the same learning rate constant for negative evidence as the Absolute Acquire-Extinguish model and all other symbols are the same as they were previously defined.

∆V = −k₂∆x (4.17)

The derivation of both significance measure change formulae are provided in appendix B.

The Iterative Acquire-Extinguish model was originally believed to produce the same results as the Absolute Acquire-Extinguish model. In testing how-ever, the results were found to be significantly different. This was found to be due to the fact that in the Absolute model, once an association has been extin-guished, it cannot be reacquired. In the iterative model, however, associations can be reacquired. This result is further discussed in chapter six.

4.5.3.6 The Temporal Model

The first expansion to the number of phenomena modelled is to include the effect of the inter-stimulus interval. This brings in the event instance tim-ing information that is provided when positive evidence is provided to the model. This model is built upon the Iterative Acquire-Extinguish model. As there is no timing data for negative evidence, the equation for the change in significance value following negative evidence is unchanged. Due to the com-plexity this model adds, the discussion of how the model works is found in appendix B. What is presented here is a description of the changes that the model introduces to the Iterative Acquire-Extinguish model.

The way that the event instance timing data influences the change in sig-nificance measure for positive evidence is that through equations 4.19, 4.20 and 4.21, a single value is produced, ψ, which lies in the range 0 ≤ ψ ≤ 1.

This value is multiplied with the output of the previous version of the posi-tive evidence formula. This gives the revised version of the posiposi-tive evidence formula, which is shown in equation 4.18.

∆V = ψ(1 − V_n) e^k¹^∆x+ V_n− 1 e^k¹^∆x+_V¹

n − 1 (4.18)

The calculation of ψ itself is based upon two intermediate values, φ and χ that take the timing data of the event instances that influence the size and shape of the inter-stimulus interval curve. These intermediate values are shown in equations 4.19 and 4.20. In those equations, t_S,1 is the start time of the first event instance; t_E,1 denotes the end time of the first event instance; t_S,2 is the start time of the second event instance; tE,2 denotes the end time of the second event instance; and W represents the size of the moving window.

φ = 1 The intermediate values are then fed into the function calculating the tim-ing coefficientψ, as shown in equation 4.21. The way that all three equa-tions 4.19, 4.20 and 4.21 are arrived at is explained in appendix B.

ψ = 2 (2 − φ) e

The Reacquiring model adds to the temporal model the effects of the reacqui-sition phenomenon. There are two effects of the reacquireacqui-sition phenomenon in the way that it changes the acquisition curve each time a reacquisition phase occurs. The first effect is that the rate at which the subject regains any lost association strength is faster. The second effect is that the point at which the acquisition curve begins to level-off is higher. Both of these effects can be achieved if the asymptote of the acquisition changes proportionally to a count of the observed positive evidence.

The reason that varying the asymptote of the positive evidence curve pro-portionally to the positive evidence count will achieve the desired effects is because it changes the maximum height that can be achieved for a set num-ber of positive pieces of evidence. This causes the first effect because it takes fewer positive pieces of evidence to achieve the same effect. The second effect is caused because the asymptote the function is approaching is higher.

This asymptote needs to change in proportion to the count of the observed positive evidence. The reason for this is that there needs to be a state that, when an event type has been extinguished, the state retains the fact that there have been previous times where the significance value has been at a higher point than at present. There is little point creating an iterative function based on this count, as the value stored for the iterative version would only be influenced by that absolute count, and so would purely be an indirect measure of the absolute count itself.

Because of this need for the count of positive evidence for each composite event type, the Reacquiring model and models based upon it maintain as

supplemental data a count of all the positive evidence a particular composite event type has received. The Reacquiring model does not maintain a count of the negative evidence nor does it maintain a count of any observed atomic event types.

The way that the asymptote is varied is by multiplying the absolute version of the positive evidence equation with current desired asymptote value. The absolute function with the asymptote multiplier is then treated to the same derivation of the iterative function that was used in the Iterative Acquire-Extinguish model. The resultant function is shown in equation 4.22, where the variable a is the reacquisition asymptote value and all other variables are the same as they have been previously defined.

∆V = ψ(a − V_n) e^k¹^∆x+ V_n− a e^k¹^∆x+_V^a

n − 1 (4.22)

In order to maintain a significance value that lies between zero and one, the asymptote value itself must always lie between zero and one. This constraint can be achieved if the function to calculate the current asymptote value is itself asymptotic to one. The equation that is used is shown in equation 4.23.

In equation 4.23, ε⁺ denotes the count of the positive evidence; k₃ is a pos-itive constant k3 ∈ R⁺ where the smaller the constant, the faster the value approaches one and k₄ is a positive constant 0 ≤ k₄ ≤ 1 which is the initial value of asymptote, which is the minimum value that the asymptote is allowed to take.

a = k₄+ (1 − k₄)

ǫ⁺ ǫ⁺+ k₃

(4.23) The negative evidence equation is unchanged from the Iterative Acquire-Extinguish model and the derivation of both equations 4.22 and 4.23 is de-scribed in appendix B.

4.5.3.8 The Blocking Model

The Blocking model is an extension of the Reacquiring model to add the effect of the blocking phenomenon. In this model, the blocking concept is extended from there being two predictor event types and one predicted event type to there being arbitrarily many of both. The basic inspiration for the implementation of this model is taken from the idea of subtracting from a maximum association strength, as done in the Rescorla-Wagner (1972) model.

For each piece of positive evidence, the model searches through the list of all positive evidence that has been or will be passed to the model during the current frame. The model searches to find the corresponding event type that has the largest significance value V_maxthat shares the same second component event type. The difference between the largest active significance value and the significance value of the evidence being processed is then used to define a maximum limit on the amount of change that the significance value of the

In document The Application of Classical Conditioning to the Machine Learning of a Commonsense Knowledge of Visual Events (Page 130-141)