In two experiments, we investigated behavioral (Experiments 1 & 2) and physiological (Experiment 2) responses to options that deviated from a previously learned risk–reward environment and were thus sur-prising. Briefly, between participants, the structure of the environment was manipulated to be negatively correlated, positively correlated, or uncorrelated (Figure 1). In both experiments, participants indicated the price at which they would be willing to sell a monetary gamble of the form “p chance of winning x, otherwise nothing.”
There were two classes of gambles. Environment gambles, denoted as black circles in Figure 1, defined the structure of the environment in a given condition. Test gambles, denoted as triangles in Figure 1, were common to all conditions and used to test our hypotheses. Depending on the environment they were interspersed in, the test gambles belonged to one of three different groups. One subset of the test gambles were surprising gambles (red triangles). These gambles were inconsistent with the risk–reward structure when it was present. A second subset of gambles were expected gambles (blue triangles). These gambles were consistent with the risk–reward structure when it was present (in negative or positive risk–
1In this article we are concerned with peoples’ evaluations of options in the gain domain; thus, “surprisingly bad” options still involve a gain (but a small and unlikely one).
2Reward-prediction errors, in contrast, involve a mismatch between expected and experienced rewards. In contrast to risk-prediction errors, the neurobiological correlates differ depending on the direction of the misprediction: Rewards exceeding expectations lead to a positive reward-prediction error and an increase in dopaminergic firing; rewards worse than expectations lead to a negative reward-prediction error and smaller dopaminergic firing rates (Schultz, 2002; Schultz et al., 1997).
(A) Negative
Probability (%)
0 625 1250 1875 2500 0.0
0.2 0.4 0.6 0.8 1.0
(B) Positive
Payoff (E$)
0 625 1250 1875 2500 0.0
0.2 0.4 0.6 0.8 1.0
(C) Uncorrelated
0 625 1250 1875 2500 0.0
0.2 0.4 0.6 0.8 1.0
Expected Surprising Reference
Figure 1. Prototypical gambles used in the two experiments. Environment gambles (black circles) were drawn from one of the three risk–reward environments. A set of test gambles, which was common to all three conditions, was randomly interspersed after two-thirds (Experiment 1) or one-third (Experiment 2) of the trials. Test gambles are color-coded by gamble type: Test gambles shown in blue were consistent with the risk–reward relationships in a condition and could therefore be expected (in the following referred to as “expected gambles”). Test gambles shown in red were inconsistent with risk–reward relationships in a condition and were therefore surprising (in the following referred to as “surprising gambles”). Test gambles shown in light gray (panel C) served as reference gambles. Here, participants were unlikely to have any expectations about particular risk–reward relationships, due to the environment gambles being uncorrelated (in the following referred to as “reference gambles”).
reward environments). Finally, test gambles that were neither expected nor surprising (in the uncorrelated condition) are referred to as reference gambles (grey triangles). Participants were only exposed to test gambles after sufficient exposure to the environment gambles (see procedures for each experiment for details). Our hypotheses refer to a comparison between either the surprising and expected gambles or the surprising and reference gambles.3
Participants were not instructed to pay attention to the underlying risk–reward structure in either experiment. We examined how pricing and response times (RTs) varied as a function of whether the gambles were surprising or expected, relative to the reference gambles in the uncorrelated condition. In Experiment 2, in addition to the behavioral responses, we tracked pupil size in response to surprising, expected, and reference gambles.
Hypotheses
Pricing
Our first set of hypotheses focuses on the prices people provide in response to surprising options. Prices may deviate from gamble’s expected values in the direction of what can be expected: When it was learned that high payoffs usually co-occur with low probabilities, prices given for high payoff/high probability options may undershoot the gambles’ expected values. Conversely, when it was learned that a high payoffs usually co-occur with high probabilities, prices given to high payoff/low probability options may overshoot the gambles’ expected values. Alternatively, surprising options could just lead to more error in entering
3Technically, there was another subset of test gambles that could be called “average” gambles. They gambles appeared in each environment and were in the mid-range of the payoffs and probabilities. Because they fit all risk–reward environments equally well but had exactly the same characteristics, these gambles were used as control stimuli to examine condition-dependent differences. Briefly, as expected, there were no condition–condition-dependent differences for these gambles.
the price of an option as these options have not been encountered in the past. In this case, there should be no systematicity in the direction in which the prices for surprising options deviate from their expected values.
Response times
Our second set of hypotheses deals with the processing time people allocate to surprising options. One hypothesis was that increasing familiarity with a risk–reward structure of an environment should accelerate the processing of subsequent options consistent with that structure, and decelerate the processing of subsequent options inconsistent with that structure. Such a pattern would be consistent with findings in the domain of event sequence learning (i.e., responses to stimuli in different locations that follow a sequence, e.g., 4–3–2–1–4–3–2–1). Here, longer RTs for stimuli inconsistent with a learned sequence are taken as direct evidence that a sequential stimulus structure has been learned (Rüsseler and Rösler, 2000).
Moreover, and not mutually exclusive to the first hypothesis, response times may vary as a function of the direction of surprise. As noted earlier, a unique feature of surprising value-based stimuli is that options can be surprisingly good or surprisingly bad. Prior research found that people may have a mechanism in place that “prevent[s] impulsive responding due to the presence of high value options” (Cavanagh et al., 2014, p. 2) (also see Frank et al., 2007). By extension, participants may respond more slowly to surprisingly good options than to surprisingly bad ones. Note such a mechanism would also lead to response times increasing as a function of absolute value, rather than as a function of surprise.
Lastly, someone who has learned that risks and rewards are almost perfectly correlated may exploit this statistical regularity (Simon, 1956) and infer one attribute from the other — payoffs from probabilities or probabilities from payoffs — instead of looking up both attributes. This would prevent a decision–
maker from detecting a surprising stimulus altogether. In this case, response times would not differ as a function of surprise but there should be strong deviations in the pricing of the options towards what is being expected.
Pupil dilation
Inspired by paradigms investigating feedback-based “risk-prediction errors” (Preuschoff et al., 2011), in Experiment 2 we modified the pricing paradigm used in Experiment 1 to measure pupil dilation as par-ticipants inspected the properties of a gamble. We did so by sequentially presenting the payoff first, and then the probability for each gamble. We predicted that participants would be surprised by options for which the probability information deviated from the learned risk–reward environment, and that surprise would only manifest itself as the probability information was revealed. That is, similar to Preuschoff et al.
(2011), we hypothesized that participants would show greater pupil dilation when an option turned out to be surprising (i.e., to have an unexpected probability, given the payoff), but not in response to just seeing high or low payoffs.
Pupil dilation may be linked to surprising options for different reasons. Recent research has shown that pupil dilation is associated with an increase in the amount of information that is collected when people are presented with very similar options (see Cavanagh et al., 2014, for choices and pupil dilation modeled as
a drift diffusion process). This increase in the amount of information collected results in a more rigorous evaluation of the alternatives and longer RTs. A similar mechanism might emerge for surprising options in risk–reward environments, in which a more rigorous evaluation of surprising options may be linked to both longer RTs and increases in pupil size. Thus, participants may scrutinize more carefully only those surprising options that offer a surprisingly high expected value and not those that offer a surprisingly low one (e.g., due to less impulsive responding in the presence of high-value options; Frank et al., 2007).