5.3 Other human optimization tasks
5.3.2 Vision
Another potential application domain of our paradigm involves human vision. Psychologists who investigate the desirability of color combinations may do so by exhaustive search (Schloss & Palmer, 2011). For instance, the experimental paradigm for evaluating preferences for color pairs involves first defining a set of colors, then asking subjects to rate their preference for all possible color pairs in the Cartesian product of that set. This exhaustive procedure is very time consuming and limits the number of colors that can be considered. It also practically limits the experimenter to considering only color pairs, since the space of all possible color triplets or quadruplets is vast even with a relatively small set of colors.
Formally, we are interested in using preference judgements to infer the desirability of color combinations and to be able to interpolate to new, unjudged color combinations. We are presenting
a sequence of color stimuli x1, x2, . . . , xN and collecting preference judgements y1, y2, . . . , yN(Figure
5.10). Each judgement is on a bounded rating scale, yi ∈ [a, b] (e.g., 0-5 stars). We assume that
each stimulus xi has an associated unobservable latent affinity or quality f (xi)∈ R, and preference
judgements are a noisy mapping from the Gaussian process distributed f (xi) to the rating scale.
138 he or she may give it a rating at the maximum; if the subject has a very strong preference for the pair, he or she will be forced to give the same maximal rating. The rating scale “clips” y to the range [a, b]. Thus, the likelihood we use is a normal distribution bounded to [a, b] so that all the probability mass above b gets moved to b and all the probability mass below a gets put at a. This gives the mixed distribution
p(yi|f, xi) = Φ a− f(xi) σ 1yi=a+ σ −1 N yi− f(xi) σ 1a<yi<b+ Φ f (xi)− b σ 1yi=b (5.11)
whereN denotes the standard normal probability density function, Φ denotes the standard normal
cumulative density function, σ2 is the noise variance, and 1 is the indicator function. By assump-
tion, observations are conditionally independent given f . Thus, the full data likelihood factorizes
as p(y|f, X) = Q
ip(yi|f(xi)). For posterior inference, we use Laplace’s method, which utilizes
a Gaussian approximation to the intractable posterior p(f|X, y) through a second-order Taylor
approximation.
Using this model on a dataset from Schloss and Palmer (2011) wherein subjects rate their affinity for pairs of colors, we can make predictions about the optimality of color combinations, even if the colors involved have not been tested. For example, in Figure 5.11, we systematically varied one of the colors—this is shown in the background. For each background color, we used the model to predict what the corresponding best and worst matching color would be. The predictions are shown as the smaller squares. This method for interpolating across subjects’ preferences to new, unseen colors shows promise in allowing researchers to systematically explore larger color spaces.
139 5 10 15 20 25 30 35 5 10 15 20 25 30 35 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 Ground Color Figure Color
Average Preference Rating
Figure 5.10: A visualization of the color preference ratings dataset. Each bar represents a particular color pair. The edges of a bar represent one color from the pair, and the interior color represents the other color from the pair. Each subject rated his or her preference for every color pair shown. The height of each bar represents the across-subject average preference.
140
Figure 5.11: Predicted most and least preferred color pairings for a fixed ground lightness level with varying hue and saturation levels.
141
Figure 5.12: Predicted most and least harmonious color pairings for a fixed ground lightness level with varying hue and saturation levels.
Chapter 6
Effectiveness of different study formats
Retrieval practice study—study which involves both quizzing and reviewing—results in stronger and more durable memories than reviewing alone (H. Roediger & Karpicke, 2006a). However, in- corporating quizzing into electronic tutoring systems is impractical for many common types of study materials; quiz answers that are visual, auditory, or procedural in nature cannot readily be entered into or assessed by a computer. A leading theoretical account of the mnemonic benefits of testing holds that the benefits are a result of memory traces being strengthened by the act of memory retrieval (Bjork, 1975). In this chapter, we investigate an important practical implication of this theory: after memory retrieval has occurred, it should not be necessary to physically enter the response into a computer to reap the benefits of retrieval practice.
Most studies of retrieval practice effects have required subjects to make overt responses during study, wherein subjects produce a response by writing, typing, or speaking (Smith, Roediger, & Karpicke, 2013). Some studies suggest that covert retrieval—where subjects mentally rehearse their response without physically producing it—is more beneficial than simply restudying (Izawa, 1976; Carpenter & Pashler, 2007; S. Kang, 2010; Putnam & Roediger, 2013). However, there are few studies that directly compare the effectiveness of overt and covert retrieval practice. Smith et al. (2013) compared overt and covert retrieval practice and found no difference in recall levels on a test shortly following study. In this chapter, we provide empirical evidence that a covert response modality can be more effective than the overt response modality on tests shortly after study, and that the apparent equivalence of the two is an artifact caused by controlling time per
143 trial. Though individual covert retrieval practice trials may be less effective than individual overt retrieval practice trials, they are faster and hence students can undergo substantially more trials in any fixed time window. This yields higher recall for short retention intervals and equivalent recall at longer retention intervals.
6.1 Experiment 1: Constant time per trial
Experiment 1 was a two-session experiment that used a between-subjects design to compare the efficacy of covert and overt retrieval practice study on foreign language vocabulary when time per trial is held constant. A within-subject condition varied the heuristic for determining which vocabulary item to present next to a subject during study. The first session of the experiment was divided into two blocks, one per scheduling heuristic. Within each block, students had an initial presentation of the material followed by 10 minutes of retrieval practice study. They then had a test on a random subset of the material after a 10 minute retention interval filled with a distracting task. The second session of the experiment occurred 48 hours later and tested students on all the remaining material.