Learning Patterns in Noise: Environmental Statistics Explain the Sequential Effect

(1)

Learning Patterns in Noise: Environmental Statistics Explain the Sequential Effect

Friederike Schüür ([email protected]), Brian Tam ([email protected]), Laurence T. Maloney ([email protected])

Department of Psychology, 6 Washington Place New York, NY 10003 USA

Abstract

Subjects display sensitivity to local patterns in stimulus history (sequential effects) in a variety of decision-making, perceptual, and motor tasks. Sequential effects are typically cast as a prime example of human irrationality. We propose a Bayesian model that explains sequential effects as the natural consequence of one incorrect assumption: instead of assuming a stable world, subjects assume change. We test and confirm one specific prediction of our model in a 2-alternative forced-choice reaction time task. We manipulated participants’ beliefs about the comparative likelihoods of binary events and then showed that these beliefs determine biases in sequential effects. We conclude that the origin of sequential effects is belief in a changing world.

Keywords: sequential effects, forced choice reaction time

task, generative model, decision-making, Bayes

Sequential Effects in Decision Making,

Perception, and Motor Control

In Monte Carlo in the summer of 1913, many lost a fortune in a game of roulette. Gamblers bet millions of francs against black when it came up 26 times in a row. They committed what is known as the Gambler’s fallacy in believing that the probability of red increased after many occurrences of black. The Gambler’s fallacy is just one example of how subjects use local patterns in series of stochastic events to try to predict what is going to happen next. Sensitivity to local patterns or “sequential effects” are a pervasive phenomenon in decision-making (Ayton & Fischer, 2004; Barron & Leider, 2010; Gilovich, Vallone, & Tversky, 1985; Kahneman & Tversky, 1972; Roney & Trick, 2009), perception (Howarth & Bulmer, 1956; Maloney, Martello, Sahm, & Spillmann, 2005; Speeth & Mathews, 1961), and motor behavior (Cho et al., 2002; Soetens, Boer, & Hueting, 1985). Previous studies have found sequential effects in, for example, 2-alternative forced-choice reaction times tasks (Remington, 1969) even after more than 4000 trials (Soetens et al., 1985). If intervals between trials exceed 500ms, then sequential effects are not due to automatic facilitation but reflect subjective expectancy (Soetens et al., 1985; Soetens, 1998).

Given statistical independence of stochastic events, however, predictions based on local patterns are of no predictive value and often costly. Why, then, do we find sequential effects in a wide variety of tasks and in human and non-human subjects? Do we fall prey to superstitions (Skinner, 1948) and end up having to pay the price? Instead of casting sequential effects as a prime example of human failure and irrationality, we develop a rational, Bayesian account of sequential effects. We propose that sequential

effects are driven by mechanisms critical for adapting to a changing (i.e. dynamic) world (cf. Wilder, Jones, & Mozer, 2009; Yu & Cohen, 2008). We test one specific prediction of our model with a novel 2-alternative forced-choice reaction time task, aimed at first manipulating participants’ beliefs about the comparative likelihoods of binary events.

Predicting the Future by Using a Generative Model

If we could predict the future, then we would not have to agonize about which job offer to accept or where to go to college. We would make better decisions because we would know which alternative led to the preferred outcome. Similarly, foreknowledge about upcoming sensory stimuli or imperative events improves sensory perception and motor behavior (Hyman, 1953; Klemmer, 1957), respectively.

If we understood the processes that generated past, present, and future events (i.e. their generative model), then we could predict the future. For deterministic processes, we would know which course of action we would have to take to obtain the desired outcome with certainty. For probabilistic processes, we could assign a probability to each action that it leads to the desired outcome and then pick the one action with the highest probability. Take the outcome of a coin toss, for example. The coin might come up heads with probability _p and tails with probability 1− p . In fact, _p fully specifies the generative model of a coin toss where binary outcomes are drawn from a Bernoulli distribution with rate parameter _p. If one knows _p and p ≠ 0.5 then one should bet on the outcome with p > 0.5. Further, if events drawn from a Bernoulli distribution are stochastically independent, then once _p is known, the outcome of a previous coin toss should not affect the subject’s subsequent betting behavior. In other words, there should be no sequential effects.

Similarly, the outcome of 2-forced choice reaction time tasks, for example, whether a stimulus is going to show up to the left or right of fixation, is a Bernoulli process with probability _p for left and _{1− p} for right. Instead of location, we may also track whether a stimulus is likely to show up at the same (left-left / right-right), or alternate locations (left-right / right-left) (cf. Cho et al., 2002; Soetens et al., 1985; Wilder et al., 2009; Yu & Cohen, 2008). If the outcomes left / right are sampled from a Bernoulli distribution, then whether we observe a repetition or alternation is again a stochastic process with a given probability of repetition γ (and alternation 1−γ). Therefore, once γ is known, there should be no sequential effects. The presence of sequential effects in reaction times,

(2)

however, shows that participants do not follow the optimal policy. But the systematic dependency on local stimulus history also suggests that do not simply fail to learn γ . Instead, they must follow some (incorrect) strategy in estimating γ that underlies sequential effects. We propose that participants make just one mistake: instead of stable worlds with fixed γ they assume dynamic worlds with changing _γ_t.

Stable versus Dynamic Worlds In stable worlds, γ does not change over time but in dynamic worlds, γ may have different values at different times _γ_t. In Figure 1, we illustrate three worlds: one stable world (black line) and two dynamic worlds in which _γ_t_{jumps from one value to the} next at random, Poisson-distributed time-points with constant rate δ. Each new value of _γ_t is drawn from a distribution that can be biased towards higher probabilities (Figure 1b) or lower probabilities (orange; Figure 1c). We propose that sequential effects reflect participants’ efforts to estimate γ over time.

“Whack-a-Mole (elMo)” Task

Subjects completed a 2-forced choice reaction time task based on the arcade game “Whack-A-Mole”. Sesame Street’s Elmo appeared either to the left or right of fixation. Subjects were instructed to press a button with their right or left index finger depending on Elmo’s location as soon as it appeared (Figure 2). Subjects completed three sessions. In Session 1 and 3, Elmo was equally likely to pop up at the same or alternate locations relative to the immediately preceding trial. In other words, the repetition probability was _{γ =}0.5 and did not change over time. In Session 2, however, _γ_t was resampled from a Beta-distribution with a slight bias towards repetitions Beta(12,6) or alternations Beta(6,12). Subjects were randomly assigned to the repetition-bias (N=12) or alternation-bias group (N = 13). Resampling occurred randomly at constant rate _{δ = 0.18}. Each change in _γ_t was signaled explicitly to the participants.

By manipulating the repetition probability _γ_t, we ensured that across trials, Elmo was equally likely to appear to the left or right of fixation. As such, observed effects cannot be explained by stronger preparation or an advantage (for example, due to handedness) of either of the left or right hand across sessions. Observed effects of changing _γ_t can be explained only by subjects’ tracking repetitions and alternations (i.e. 2nd_{order characteristics of the sequence)}

whilst responding to stimuli that appear on the left or right to fixation (1st_{order characteristics).}

To minimize effects of Elmo’s timing on RTs, fixation changed color from white (250ms), to red (250ms), to blue prior to Elmo’s appearance. After the final color change, Elmo popped up after a variable interval sampled from a truncated exponential distribution (mean = 500ms, max. 2s) to ensure constant probability of appearance over time. Also, intervals between trials exceeded 500ms, which precludes any explanation of sequential effects as automatic facilitation (Soetens et al., 1985).

Reaction Times as a Measure of Implicit Expectation

Reaction times (RTs) increase monotonically with the amount of information conveyed by an event that requires subjects to respond (Hyman, 1953). Conversely, subjects respond quickly to events they expect to occur, because predictable events convey little information. For example, subjects respond quickly to frequent compared to infrequent stimuli (Hyman, 1953) and if stimuli occur at expected compared to unexpected times (Klemmer, 1957). RTs are thus informative about participants’ estimated probabilities of stimulus-occurrence. In the current experiment, we measured and analyzed RTs as a function of stimulus history to explore how local patterns in stimulus history affected participants’ estimates of stimulus probabilities.

Learning the Generative Model

Observed events are informative about the probability of repetition γ . In Figure 3a, we show simulation results, assuming a stable world, of trial-by-trial, iterative updating of γ as a function of observed events. The blue line shows Figure 1: a Stable versus b dynamic environments.

(3)

estimates of γ when subjects had no estimate of γ prior to the experiment (blue). In other words, they considered each possible value of γ to be equally likely (i.e. uniform prior). The green line shows estimates of γ for a non-uniform prior with _{γ >}0.5 (i.e. repetition bias). With or without bias, γˆt quickly converges to the true value of 0.5.

In Figure 3b, we show stimulation results of trial-by-trial, iterative updating of γ if participants incorrectly assumed a dynamic environment either with no bias (blue) or with a repetition bias (green). For a uniform prior, _γˆ_t fluctuate over time around the true value of 0.5. For repetition-bias, γˆt are relatively less variable but biased towards higher

values than 0.5.

This variability is caused by discounting the evidence of information conveyed by past events. If subjects assume change, then events in the distant past are less likely to be informative about the current probability _γ_t than events in the recent past. Consequently, participants discount past events. This results in higher variability in estimates, because discounting effectively reduces the number of events participants use to derive estimates (for details of the model, see Model).

Hypotheses In Figure 4 we show simulation results of _γˆ_t as a function of local stimulus history assuming a stable world (Figure 4a) and assuming a dynamic world (Figure 4b). If subjects learned that γ was stable (and _{γ =}0.5), then there should be no sequential effects (Figure 4a). By contrast, if participants assumed a dynamic world, then probability estimates reflect sensitivity to local patterns in stimulus history (Figure 4b).

Stimulations results also reveal that sequential effects should reflect subjects’ repetition or alternation biases. If subjects believe that in a changing world and in _γ_t biased

towards repetition, then sequential effects reflect this bias (Figure 4bd, green line), similarly for alternations (orange line). If participants have a uniform prior, then sequential effects are symmetric (blue line). If participants assume a stable world, then any belief in bias has little effect on sequential effects (Figure 4ac).

We were particularly interested in the effects of repetition versus alternation training during Session 2 on sequential effects in Session 3. During Session 2, we resampled the probability of repetition from a biased Beta-distribution and explicitly signaled each change in probability to the participant to induce the belief in a dynamic world with a bias either towards repetition or alternation. If sequential effects indeed originate in the (incorrect) belief in a dynamic environment, then we would expect to find sequential effects in Session 3. The bias in sequential effects towards either repetition or alternation should correspond to the repetition or alternation training that participants received during Session 2. First, such bias would demonstrate that participants can their environmental statistics. Second and more importantly, such bias would reveal the true nature of sequential effects as phenomenon that originates in the belief that environments are dynamic and change over time. Sequential effects in Session 1, either with or without bias, would then inform us about the beliefs participants held when they walked into our laboratory. It is important to take note that there is no naïve observer (Henrich, Heine, & Norenzayan, 2010). Every subject enters an experiment with his or her own ideas and biases about what is going on based on personal history.

Results

Figure 5 shows participants normalized RTs as a function of local stimulus-history for Session 1 (Figure 5a) and Session 3 (Figure 5b). How quickly participants responded to a repetition or alternation (final event) was determined by Figure 3: Iterative updating assuming a a stable b

dynamic environment.

Figure 4: Hypotheses if subject assume a a stable b dynamic environment.

(4)

local stimulus-history (final event * stimulus-history: F(1,24) = 92.40, p <0.001). If subjects experienced a repetition, we found that RTs increased with fewer occurrences of repetitions in recent history (positive slope of linear fit: mean = 0.088, SE = 0.013) while we found a decrease in RTs with more alternations in recent history when the final event was an alternation (mean = -0.084, SE = 0.011; t(24) = 8.52, p < 0.001). In essence, occurrence of a repetition or alternation in local stimulus-history leads to a decrease in RTs for repetition and alternation, respectively. RTs show clear sequential effects.

The bias in sequential effects towards repetition or alternation changed from Session 1 to Session 3. Importantly, this change depends on the training participants received during Session 2 (session * final event * training: F(1,24) = 6.39, p = 0.012) . For both groups, we found a bias towards alternations prior to training (mean = -0.132; SE = 0.022) compared to repetitions (mean = 0.124, SE = 0.024; final event: F(1,24) = 24.26, p < 0.001). After training, bias was different for each group (final event *

training group: F(1,24) = 8.89, p = 0.005). If repetition

trained participants experienced an alternation, then it took them longer to respond (mean = 0.076, SE = 0.063), compared to alternation trained participants (mean = -0.092, SE = 0.045; t(24) = -2.23, p = 0.036; Figure 5d). And conversely, repetition trained participants responded faster when they experienced a repetition (mean = -0.011, SE = 0.057) compared to alternation trained participants (mean = 0.118, SE = 0.043; t(24) = 1.86, p = 0.076; Figure 5d). Experiencing a dynamic environment with a bias either towards repetitions or alternations determined the bias in sequential effects in a subsequent stable environment.

Discussion

Results show that participants can learn the statistics of their environment. Training participants in a dynamic environment, in which they experienced changing with

a bias toward either alternation or repetition, determines the bias in sequential effects in a subsequent session with constant and no bias. These specific effects of training suggest that sequential effects are not due to human failure or irrationality. Instead, sequential effects are driven by one mistake: instead of assuming a stable environment, participants assume a dynamic environment. Upon this assumption, trial-by-trial iterative updating of the rate parameter of the generative model leads to sequential effects. Any biases in sequential effects reflect participants’ assumptions about the likely values of the rate parameter (rather than automatic facilitation) (Wilder et al., 2009).

The tabula rasa hypothesis in psychology The (incorrect)

belief in a dynamic rather than stable environment may be cast as human failure and irrationality. We argue, however, that we cannot and should not judge the beliefs that participants hold when they enter the laboratory. Participants may live in a world in which they experience change. For example, they may perform much better on their exams in the morning compared to late afternoon. In fact, if changing environments are more common than stable environments, then participants are more justified to believe in change than stability. Sequential effects may reflect that participants are finely tuned to the statistic of their environment, rather than superstitious beliefs. Indeed, our results show that participants can quickly learn the statistics of their environment.

We call the experimenter’s belief that participants enter a laboratory without prior knowledge the tabula rasa hypothesis. The tabula rasa hypothesis is most likely incorrect (see Sun & Perona, 1998 for an example from vision research). To determine “rational priors”, one would have to study the statistics of natural environments (cf. Simoncelli & Olshausen, 2001). To determine the effects of priors on behavior, one can manipulate them during training and then measure effects on subsequent behavior (Adams, Graf, & Ernst, 2004). Training effects then reveal which aspects of behavior are determined by prior beliefs. Given the training effect in Session 3, we can thus conclude that the alternation bias in Session 1 is also driven by prior beliefs: when participants entered our laboratory, they thought alternations were more likely compared to repetitions. We can only speculate why it is that participants believed in alternation. Perhaps they regarded Elmo as an opponent trying to trick them by varying rather than repeating its location? We would have to study behavior in competitive games in participants’ natural environments to determine whether such bias is rational (cf. Tversky & Gilovich, 1989).

Uniform versus non-uniform priors It may seem better to

enter a novel situation (e.g. an experiment) with non-informative, uniform priors rather than beliefs in repetitions or alternations. In Figure 6a, we show that in a stable environment, assuming a non-uniform prior leads to a

γ

t

γ

(5)

reduction in summed squared error when participants assumed a dynamic environment with the same, constant rate of change δ. Figure 6b to 6d show the priors (i.e. all combinations of the two parameters a and b as inputs to the Beta-distribution that formalizes the prior) that led to reduced error for three different change rates (area inside the black lines). Of course, assuming a prior with a mean of 0.5 improves performance. However, priors that differ substantially from the true 0.5 still lead to an improvement in participants’ estimates. In essence, priors reduce the variability in participants’ estimates of _γˆ_t but may induce (constant) bias (if _{≠ 0.5}). There is a trade-off between bias and variability (cf. Jazayeri & Shadlen, 2010). A constant bias is sometimes better than the variability caused by a uniform prior.

Model

Iterative Updating assuming a Stable World

Updated estimate of γ are based on the set of binary observations observed so far

{

x1,..., xt

}

, the number of

observed repetitions r_t (and alternations t− rt). In addition,

estimates may be influenced by participants’ beliefs about γ prior to the experiment. Bayes’ Rule tells us how to compute this estimate:

p(γ | xt)∝ p(xt|γ ) p(γ ) = γ

rt+a(1−γ )t−rt+b (1)

The prior p(γ ) is a Beta distribution with p(γ ) = Β(a +1,b +1) and the predicted probability of seeing a repetition on the next trial is the mean of the posterior

distribution p(γ | xt), a Beta distribution with

Β(a+ rt+1,b +1− rt+1) (see Figure 3a).

Iterative Updating assuming a Dynamic World

When participants assume a dynamic world, the outcome of an event is weighted by how long ago it occurred. The weight is determined by w_u= exp−λu_.

u stands for the

number of trials in the past relative to the current trial _t. λ is a constant that determines how quickly information conveyed by past events is discounted. r_t=

∑

w_t_−uR_t_−u with

R for repetition (and the equivalent for alternations at).

Participants thus estimate γ based on current evidence alone, which corresponds to the mean of the Beta-distribution Beta(r_t,a_t). In addition, participants have prior expectation about γ. The prior p(γ ) is a Beta distribution with p(γ ) = Β(a +1,b +1) and participants’ prior estimate is its mean. Participants combine these two estimate of γ in a weighted sum:

wcpc(γ | xt)+ (1− wc) p(γ ) (2)

w_cis determined by participants’ confidence in their estimate based on current evidence and their prior, which is inversely proportional to the variance of these two Beta-distributions. This results in a single unique estimate γˆ_t on trial _t (see Figure 3b).

Acknowledgments

FS and LTM were funded by NIH #.

References

Adams, W. J., Graf, E. W., & Ernst, M. O. (2004). Experience can change the “light-from-above” prior.

Nature Neuroscience, 7(10), 1057–1058.

Ayton, P., & Fischer, I. (2004). The hot hand fallacy and the gambler’s fallacy: two faces of subjective randomness?

Memory & cognition, 32(8), 1369–1378.

Barron, G., & Leider, S. (2010). The role of experience in the Gambler’s Fallacy. Journal of Behavioral Decision

Making, 23(1), 117–129.

Cho, R. Y., Nystrom, L. E., Brown, E. T., Jones, A. D., Braver, T. S., Holmes, P. J., & Cohen, J. D. (2002). Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task.

Cognitive, Affective, & Behavioral Neuroscience, 2(4),

283–299.

Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17(3), 295–314. Figure 6: Effects of non-uniform prior for ab and

(6)

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain

Sciences, 33(2-3), 61–83.

Howarth, C. I., & Bulmer, M. G. (1956). Non-random sequences in visual threshold experiments. Quarterly

Journal of Experimental Psychology, 8(4), 163–171.

Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal of Experimental Psychology, 45(3), 188–196.

Jazayeri, M., & Shadlen, M. N. (2010). Temporal context calibrates interval timing. Nature neuroscience, 13(8), 1020–1026.

Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive

Psychology, 3(3), 430–454.

Klemmer, E. T. (1957). Simple reaction time as a function of time uncertainty. Journal of Experimental Psychology,

54(3), 195–200.

Maloney, L. T., Martello, M. F. D., Sahm, C., & Spillmann, L. (2005). Past trials influence perception of ambiguous motion quartets through pattern completion. Proceedings

of the National Academy of Sciences of the United States of America, 102(8), 3164–3169.

Remington, R. J. (1969). Analysis of sequential effects on choice reaction times. Journal of Experimental

Psychology, 82(2), 250–257.

Roney, C. J. R., & Trick, L. M. (2009). Sympathetic magic and perceptions of randomness: The hot hand versus the gambler’s fallacy. Thinking & Reasoning, 15(2), 197– 210.

Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual review of

neuroscience, 24(1), 1193–1216.

Skinner, B. F. (1948). “Superstition” in the pigeon. Journal

of Experimental Psychology, 38(2), 168–172.

Soetens, E. (1998). Localizing sequential effects in serial choice reaction time with the information reduction procedure. Journal of Experimental Psychology: Human

Perception and Performance, 24(2), 547–568.

Soetens, E., C, L., & E, J. (1985). Expectancy or automatic facilitation? Separating sequential effects in two-choice reaction time. Journal of Experimental Psychology:

Human Perception and Performance, 11(5), 598–616.

Speeth, S. D., & Mathews, M. V. (1961). Sequential effects in the signal detection situation. Journal of the Acoustical

Society of America, 33, 1046–1054.

Sun, J., & Perona, P. (1998). Where is the sun? Nature

Neuroscience, 1(3), 183–184.

Tversky, A., & Gilovich, T. (1989). The “Hot Hand”: Statistical Reality or Cognitive Illusion? CHANCE, 2(4), 31–34.

Wilder, M., Jones, M., & Mozer, M. (2009). Sequential effects reflect parallel learning of multiple environmental regularities. Advances in neural information processing

systems, 22, 2053–2061.

Yu, A. J., & Cohen, J. D. (2009). Sequential effects: Superstition or rational behavior? Advances in neural