Computational Models of Associative Learning

From Sample to Estimate

Chapter 4 Sample from Memory

4.2 Computational Models of Associative Learning

A computational model of associative learning should provide a formal framework for animal learning phenomenon both mechanistically and normatively. The mechanistic questions concern when and how the associative strength of a stimulus should change. The normative question concerns why the associative strength should change in the manner suggested by the mechanism.

4.2.1 Rescorla-Wagner Model

As mentioned above, the leading theory of animal learning was suggested by Rescorla & Wagner (1972). In their model, unlike earlier models, learning occurs not directly because CS-US pairings, but because such pairing is unanticipated on the basis of the current associative strength, which functions effectively as a prediction of US occurrence. This idea is

We will return to this idea in the subsequent chapter

formally defined as an error-correction learning rule, whereby changes in associative strength between CS and US occur whenever differences between what was expected and what actually happened:

! (4.1) where ! is the difference between animal’s expectations and reality (also known as the prediction error) for trial t; ! is the actual US (often rewards such as food and liquid or aversive outcomes, like electronic shocks); ! is the associative strength for the same trial.

To handle multiple stimuli (e.g., both tone and light) within one trial, the Rescorla-Wagner learning rule works by computing the overall associative strength for the trial as the sum of the associative strengths for all stimuli available on that trial:

! (4.2) where ! represents the individual associative strength for stimuli s at trial

t; ! is the set of stimuli present on that trial; ! is an indicator function, and it takes value of 1 when stimulus s is in the set ! (i.e., stimulus s was present on trial t) and 0 otherwise . 9

Whenever the prediction error ! is non-zero, the Rescorla-Wagner learning rule prescribes that associative strengths should get updated in proportion to the prediction error:

! (4.3) where ! is the learning rate that controls the size of the update.

With this simple Rescorla-Wagner learning rule, the model successfully accommodates a number of learning phenomenon in classical conditioning, including acquisition, blocking, and conditioned inhibition (see Pearce & Bouton, 2001 for a review).

δ_t = R_t−V_t δ_t R_t V_t V_t = _∑ s V_t(s)𝕀_At(s) V_t(s) A_t 𝕀A(s) A_t δ_t V_{t+ 1}(s) = V_t(s) + αδ_t𝕀_At(s) α ∈[0,1]

The indicator function can be viewed as a special case of the eligibility trace in

4.2.2 The Random Replay Model

We consider a simple extension to the Rescorla-Wagner model where animals can store past trials of conditioning and actively re-use these trial memories to assist learning (Ludvig et al., under review). As illustrated in Figure 8, there are two parallel streams of information processing in the random replay model. First, there is the usual process of associative learning as formalised in the original Rescorla-Wagner model. The animals encounter a CS-US pairing on a trial and update the associative strength of the CS in the same manner as in the classic Rescorla-Wagner model. Second, past trials (i.e., CS, US, and the timing of the trial) are also remembered in a trial memory. The animal can thus draw samples from this trial memory and replay these sampled trials like normal trials. That is, during the replay process, new prediction errors are computed based on the current associative strength and the content of the trial sampled from memory.

Clearly, how the trial memory stores past trials and how to sample from the trial memory determines the model behaviour. Here, as a proof-of- concept for the replay idea, we adopt the simplest memory storage mechanism that allows a certain degree of forgetting and recency effects . 10

The storage of trials works as if it were a leaky bucket: (a) the memory has a limited capacity (i.e., only a fixed number of trials are remembered), and (b) once the number of trials exceeds the memory capacity, a random trial is dropped out, and (c) the most recent trial is always successfully remembered. The first two rules ensure that forgetting of past trials is present, and the last two rules further regulates the forgetting such that it should be more likely for older trials than for newer trials (i.e., recency). In addition, we assume the simplest trial-retrieval mechanism — random sampling from the memory bucket with replacement.

This replay process will provide a sample of past trials that contains similar information to a normal trial: the CS, US, and the relative timing.

Analytical solutions of optimal memory size and optimal replay mechanism are only

The animal, then, can compute prediction errors for these samples using the Rescorla-Wagner rule:

! . (4.4) The difference from the standard Rescorla-Wagner learning rule is that

! is the remembered US for the replayed trial. ! is, however, still the present associative strength. Similar to the Rescorla-Wagner model, the associative strength needs to be updated whenever the prediction errors from replayed trials, ! , are also non-zero:

! (4.5)

where a different and smaller learning rate ! is used to update the current associative strength; ! is the indicator function now placed on the replayed trial, which takes a value of 1 when the stimulus (e.g., CS or US) was presented in the replayed trial, and 0 otherwise.

Figure 8. Schematic of the random replay model. The standard error-

correction learning rule is depicted in solid arrows. In addition, the model assumes a memory of past trials, which are then randomly sampled and then replayed (dashed arrows). The replayed trials are treated like any other trial and are used to update associative strength through the standard error- correction learning rule.

δreplay t = Rtreplay−Vt Rreplay t Vt δreplay t V_{t+ 1}(s) = V_t(s) + αreplay_δreplay t 𝕀A replay_t (s) αreplay_< _α 𝕀_{A replay} t (s) trial 1 trial t-2 trial t-1

trial t learning associative _strength

time

Random replay

4.3 Towards a Unifying Account of Classical

In document The sampling brain (Page 87-91)