B. F. Skinner (1938, 1953, 1958, 1966b, 1971, 1989; Skinner & Epstein, 1982) is probably the best-known learning theorist in the behaviorist tradition. Like Thorndike, Skinner proposed that organisms acquire behaviors that are followed by certain consequences. In order to study the effects of consequences both objectively and precisely, Skinner developed a piece of equipment, now known as a Skinner box , that has gained widespread popularity in animal learning research.
As shown in Figure 4.1 , the Skinner box used in studying rat behavior includes a metal bar that, when pushed down, causes a food tray to swing into reach long enough for the rat to grab a food pellet. In the pigeon version of the box, instead of a metal bar, a lighted plastic disk is located on one wall; when the pigeon pecks the disk, the food tray swings into reach for a short time.
Skinner found that rats will learn to press metal bars and pigeons will learn to peck on plas- tic disks in order to get pellets of food. From his observations of rats and pigeons in their respec- tive Skinner boxes under varying conditions, Skinner (1938) formulated a basic principle of operant conditioning , which can be paraphrased as follows:
A response that is followed by a reinforcer is strengthened and therefore more likely to occur again.
In other words, responses that are reinforced tend to increase in frequency, and this increase—a change in behavior—means that learning is taking place.
Skinner intentionally used the term reinforcer instead of reward to describe a consequence
that increases the frequency of a behavior. The word reward implies that the stimulus or event
following a behavior is somehow both pleasant and desirable, an implication Skinner wanted to
Rewards and Reinforcement
Metal Bar Figure 4.1
A prototypical Skinner box: The food tray swings into reach to provide reinforcement.
avoid for two reasons. First, some individuals will work for what others believe to be unpleasant consequences; for example, as a child my daughter Tina occasionally did something she knew would irritate me because she enjoyed watching me blow my stack. Second, like many behavior- ists, Skinner preferred that psychological principles be restricted to the domain of objectively observable events. A reinforcer is defined not by allusion to “pleasantness” or “desirability”— both of which involve subjective judgments—but instead by its effect on behavior:
A reinforcer is a stimulus or event that increases the frequency of a response it follows. (The act of following a response with a reinforcer is called reinforcement .)
Notice how I have just defined a reinforcer totally in terms of observable phenomena, without reliance on any subjective judgment.
Now that I’ve given you definitions of both operant conditioning and a reinforcer, I need to point out a major problem: Taken together, the two definitions constitute circular reasoning. I’ve said that operant conditioning is an increase in a behavior when it’s followed by a reinforcer, but I can’t seem to define a reinforcer in any other way except to say that it increases behavior. I’m therefore using reinforcement to explain a behavior increase and a behavior increase to explain reinforcement! Fortunately, an article by Meehl (1950) has enabled learning theorists to get out of this circular mess by pointing out the transituational generality of a reinforcer: Any single reinforcer—whether it be food, money, a sleepover with a friend, or something else altogether— is likely to increase many different behaviors in many different situations.
Skinner’s principle of operant conditioning has proven to be a very useful and powerful explanation of why human beings often act as they do, and its applications to instructional and therapeutic situations are almost limitless. Virtually any behavior—academic, social, psychomo- tor—can be learned or modified through operant conditioning. Unfortunately, undesirable behaviors can be reinforced just as easily as desirable ones. Aggression and criminal activity often lead to successful outcomes: Crime usually does pay. And in school settings, disruptive behaviors
can often get teachers’ and classmates’ attention when more productive behaviors don’t (Flood, Wilder, Flood, & Masuda, 2002; McGinnis, Houchins-Juárez, McDaniel, & Kennedy, 2010; J. C. Taylor & Romanczyk, 1994).
As a teacher, I keep reminding myself of what student behaviors I want to increase and try to follow those behaviors with positive consequences. For example, when typically quiet stu- dents raise their hands to answer a question or make a comment, I call on them and give them whatever positive feedback I can. I also try to make my classes not only informative but also lively, interesting, and humorous, so that students are reinforced for coming to class in the first place. Meanwhile, I try not to reinforce behaviors that aren’t in students’ long-term best interests.
For instance, when a student comes to me at semester’s end pleading for a chance to complete an extra-credit project in order to improve a failing grade. I invariably turn the student down, for a simple reason: I want good grades to result from good study habits and high achievement throughout the semester, not from begging behavior at my office door. Teachers must be extremely careful about what they reinforce and what they don’t.
Important Conditions for Operant Conditioning
Three key conditions influence the likelihood that operant conditioning will occur:
◆ The reinforcer must follow the response. “Reinforcers” that precede a response rarely have an
were concerned that the practice of assigning course grades made students so anxious that they couldn’t learn effectively. Thus, the instructors announced on the first day of class that all class members would receive a final course grade of A. Many students never attended class after that first day, so there was little learning with which any grade might interfere.
◆ Ideally, the reinforcer should follow immediately. A reinforcer tends to reinforce the response
that immediately preceded it. As an example, consider Ethel, a pigeon I worked with when I was an undergraduate psychology major. My task was to teach Ethel to peck a plastic disk in a Skinner box, and she was making some progress in learning this behavior. But on one occasion I waited too long after her peck before reinforcing her, and in the meantime she had begun to turn around. After eating her food pellet, Ethel began to spin frantically in counterclockwise circles, and it was several minutes before I could get her back to the disk-pecking response.
Immediate reinforcement is especially important when working with young children and animals (e.g., Critchfield & Kollins, 2001; Green, Fry, & Myerson, 1994). Even many adolescents behave in ways that bring them immediate pleasure (e.g., partying on school nights) despite potentially adverse consequences of their behavior down the road (V. F. Reyna & Farley, 2006; Steinberg et al., 2009). Yet our schools are notorious for delayed reinforcement—for instance, in the form of end-of-semester grades rather than immediate feedback for a job well done.
◆ The reinforcer must be contingent on the response. Ideally, the reinforcer should be pre-
sented only when the desired response has occurred—that is, when the reinforcer is contingent
on the response. For example, teachers often specify certain conditions that children must meet before going on a field trip: They must complete previous assignments, bring signed permission slips, and so on. When these teachers feel badly for children who haven’t met the stated condi- tions and allow them to go on the field trip anyway, the reinforcement isn’t contingent on the response, and the children aren’t learning acceptable behavior. If anything, they’re learning that rules can be broken.
Contrasting Operant Conditioning with Classical Conditioning
In both classical conditioning and operant conditioning, an organism shows an increase in a particular response. But operant conditioning differs from classical conditioning in three impor- tant ways (see Figure 4.2 ). As you learned in Chapter 3 , classical conditioning results from the
Occurs when
Nature of response Association acquired
Classical Conditioning Operant Conditioning
Two stimuli (UCS and CS) are paired
Involuntary: elicited by a stimulus
A response (R) is followed by a reinforcing stimulus (SRf)
Voluntary: emitted by the organism
CS CR R SRf
Figure 4.2
pairing of two stimuli: an unconditioned stimulus (UCS) and an initially neutral stimulus that becomes a conditioned stimulus (CS). The organism learns to make a new, conditioned response (CR) to the CS, thus acquiring a CS→CR association. The CR is automatic and involuntary, such that the organism has virtually no control over what it is doing. Behaviorists typically say that the CS elicits the CR.
In contrast, operant conditioning results when a response is followed by a reinforcing stimu- lus (we’ll use the symbol S Rf ). Rather than acquiring an S→R association (as in classical condi- tioning), the organism comes to associate a response with a particular consequence, thus acquiring an R→S Rf association. The learned response is a voluntary one emitted by the organ-
ism, with the organism having complete control over whether the response occurs. Skinner coined the term operant to reflect the fact that the organism voluntarily operates on, and thereby
has some effect on, the environment.
Some theorists have suggested that both classical and operant conditioning are based on the same underlying learning processes (G. H. Bower & Hilgard, 1981; Donahoe & Vegas, 2004). In most situations, however, the classical and operant conditioning models are differentially useful in explaining different learning phenomena, so many psychologists continue to treat them as distinct forms of learning.