The New Behaviorism
14 Theoretical Behaviorism
Theoretical Behaviorism 167 avoided dealing with the deeply historical nature of learned behavior in higher organisms.
I will give two examples of historical effects, the fi rst hypothetical, and the second involving actual data. Imagine an experimental situation with pigeons that involves not one but two response keys. Pecks on each key are paid off like a Las Vegas–style one-armed bandit, probabilistically (I discussed a setup like this in Chapter 8 ). Let’s look at two different experimental histories. Pigeon A is trained at fi rst to peck on the left response key and paid off on the probabilis-tic schedule; pecks on the right have no effect. After fi fty or so reinforcements for pecking left, bird A is trained to peck only on the right; pecks on the left go unrewarded. After 300 reinforcements for right pecks (this might take four or fi ve daily experimental sessions), pecks on neither key are rewarded (this is termed extinction ). Pigeon B is trained throughout (350 reinforcements) to peck the right key only for food, then extinguished, like A.
There are two things to notice about this simple experiment: On the last day of rewarded training, the behavior of the two pigeons will be indistinguishable—
both will be pecking exclusively on the right key. But in extinction, their behavior will be very different: Bird B, which has never gotten food for peck-ing left, will simply peck more and more slowly on the right, with maybe one or two pecks on the left, until pecking ceases altogether. But Bird A, with its history of a few rewards for pecking left, will show many pecks to both keys before quitting entirely. This is usually called regression. 1 What can this dif-ference in behavior in extinction mean? If behavior is all that matters, the two birds were in the same state at the end of right training. So why did they behave differently when conditions changed?
There are two ways to deal with this divergence. The solution adopted by teleological behaviorism is to redefi ne behavior in a temporally extended way.
Behavior becomes behavioral history. Since our two pigeons have different histories, the fact that their response to a new condition (extinction) is not the same is no longer a puzzle. The limitation of this view is that is provides no compression of the data. If each history is unique, there is no way to predict the effect of a novel history. More than one history produces the kind of diver-gence I have just described. It would be nice to understand why, to have some rationale for similarities among histories.
It is possible, in principle and to some degree in practice, to group together histories that are equivalent, in the sense that the future behavior of the animal (in response to new conditions) will be the same after any history in the set.
For example, in a version of the experiment I have described, we might fi nd that an animal with the history (number of rewards in parenthesis) right (10), left (10) behaves the same in extinction as an animal whose history is left (10), right (15), left (5). The usual way to describe this equivalence is to say that the two animals are in the same state after these two equivalent histories. It might be possible to go further, and fi nd some formula or process that would allow us to say that all histories of the form left ( x ), right ( y ), left ( z ) are equivalent (yield the same subsequent behavior) so long as some function f ( x,y,z ) is the same.
My second example presents just such a formula and shows how it works to account for many aspects of behavior in the two-armed bandit situation. First, the experiment, which is on what is called reversal learning: a single pigeon getting his daily food (a total of sixty cracks at the food magazine, plus an after-experiment supplement to maintain his weight) by pecking at one of two response keys. The schedule on each key is either extinction (i.e., pecking is ineffective) or a variable-ratio schedule with a payoff probability of 1/8. Which key is “hot” changes from day to day according to some rule.
Now just look at two data points. After a history during which the bird gets food for pecking both keys, but never from both on the same day, on day N he gets food for pecking, say, left (L). He does well: Ninety percent of responses are correct (on the left). Now on the next day, the situation is reversed. He gets food only for pecking on the right (R). Again, he does very well: around 90%
correct. Now look at what happens a little later in the experiment, say on day N + 20. Again, we begin with food just for L pecks; again responding is pretty accurate, around 90%. Again, the situation is reversed the next day, but now the bird does less well: only about 80% correct.
Why this difference? The reason is of course the animal’s history. The fi rst data point was taken after a history of daily reversals: LRLR. . . . The second was taken after reversals on every fourth day: LLLLRRRRLLLL. . . .
From a commonsense point of view, it’s pretty obvious what is happening here. When reinforcement alternates every day, the pigeon learns to quickly reverse its preference. Not spontaneously—it needs one or two reinforcers before it switches—but quite quickly. But when reinforcement alternates only every fourth day, the pigeon is more cautious. He waits for several reinforcers before switching, hence the poor performance on each switch day—80% when reversal is every fourth day versus 90% when it is every day.
The cognitive account for this behavior parallels common sense. It just adds a label. The pigeon is said to have developed a reversal learning set when alternation happens every day. But of course, he might have developed such a set when reinforcement alternates every fourth day. Why doesn’t he? Or, at least, why doesn’t he do it as well?
One way to answer this question is to try and summarize the bird’s his-tory in a way that allows us to predict on each day how well he will do in this experiment. What seems to be required is a model that refl ects two separate aspects of “response strength.” One is simply performance: “percent correct,”
or how well the organism is doing on a particular day. The second aspect, which might be called “stickiness,” 2 is a measure of how quickly performance changes when conditions change. Stickiness captures the difference between performance after four days of payoff on the left (four-day alternation) fol-lowed by a day on the right versus performance after just one day on the left (daily alternation). Performance clearly becomes stickier day by day, so long as conditions do not change: stickier—slower to change—after four days of left reward than after just one day.
Theoretical Behaviorism 169 A model that captures these two properties of choice behavior is as follows: 3
1. Assume that the animal’s tendency to peck left or right is determined by response strength, V Left or V Right , according to a winner-take-all rule. In other words, in each discrete time step, the strongest response wins.
2. Assume that V Left = (total reinforcers on left)/(total responses on left)—in other words: that V Left = average payoff probability on the left, and simi-larly for the right.
3. All that remains is to decide over what time period these two totals are to be computed. A simple possibility that turns out to work surprisingly well is the following: rein-forcements received for the left response in the experiment, and x Left is the total number of responses on the left. These totals are updated with every left response (and similarly for the right). x L0 and R L0 are constants representing the animal’s initial tendency to respond left and right.
No matter what the values for x and R, each unreinforced response reduces V (because x, the denominator, increases) and each reinforced response increases it (because the numerator, R, increases 4 ), the exact amount depending on the current values for x Left + x L0 and R Left + R L0 .
This is called the cumulative effects (CE) model. It is another variant of the Law of Effect model I discussed in Chapter 4 . In the CE model, all histories that yield the same set of values for x Left , x L0 , R Left and R L0 (and similarly for the right) are equivalent in terms of the future behavior of the model. It describes quite accurately the performance of individual subjects in a long series of daily, two-day, or four-day reversals. (See Figure 14.1 .)
It works in a very simple way. The more experience the model accumu-lates, the larger the numerator and denominator of Equation 14.1 become, hence the smaller the change in V produced by each response. Also, the more the schedule rewards, say, the left choice, the greater the difference between the strengths of the left and right responses, V left – V right , becomes.
Thus, it takes longer for the model to reverse its preference after four days of left reinforcement than after just a single day, just as the data show. It is stickier after four days of training on one side than after just one. What the model does not do is learn the rule “reverse every day”; it does not show a real “learning set.” On the other hand, there is no evidence that pigeons do either, although smarter animals—monkeys, dogs, children—may.
Figure 14.1 (Top panel) The entire course of a discrimination-reversal experiment with a single pigeon. The solid line shows the proportion of responses each day on the key that currently produced food (with probability 1/8) when those conditions were alternated every day, every two days, or every four days, as shown by the staggered letters at the top: L = only left responses reinforced; R = only right responses reinforced. The dotted line shows the rate of learning from Day N to Day N + 1: the higher the “a” value, the smaller the change in performance from day to day—the “stickier”
the pigeon’s behavior. (Bottom panel): Simulation of these data by the cumulative effects model with initial conditions 1,000, 2,000. (From Staddon, 1993, “Conventional wisdom . . .” op. cit.; see Davis et al., 1993, for more details.)
The model also explains the response-based spontaneous recovery (regres-sion) in my fi rst example. Recall: Train fi rst with left-only reinforcement then with right-only—enough that performance is close to 100% correct at the end of each phase. Then extinguish. The result is regression to an intermediate prefer-ence that refl ects the total amount of training on left and right. But if the bird’s his-tory includes no left reinforcement, this regression will not happen: In extinction, most responses will be on the right. The CE model easily duplicates this pattern. 5
Theoretical Behaviorism 171 The model doesn’t capture every aspect of reversal learning, even in pigeons, however. The CE model is response based, not time based. Hence, it cannot explain time-based phenomena like spontaneous recovery and some-thing called Jost’s law 6 —weakening of newer memories relative to older ones with passage of time. Despite the fact that the model is not time based, it can nevertheless duplicate matching in the interval-schedule concurrent VI VI pro-cedure (see Chapter 4 ), 7 which is yet another demonstration that the matching result is both robust and overdetermined—in the sense that almost any law-of-effect learning process in the conc VI VI situation will conform to it.
The CE model illustrates my main point: how past histories can be sum-marized by a formula that has predictive value. The study of learning can do better than just make lists of particular histories and particular results. The aim of theoretical behaviorism (TB), therefore, is to understand the internal states of the organism by discovering rules that allow us to group together sets of histories that are equivalent in terms of its future behavior.
Theoretical behaviorism shares some features of both classical and Hul-lian behaviorism. It gets from classical behaviorism the conviction that we learn about the organism only through its behavior. It rejects, however, the view shared by Watson and Skinner that psychology need refer only to stimuli and responses. Contra Skinner, it argues that the skin does make a difference:
Events inside the organism (e.g., the changes wrought by past history) are state variables, not stimuli or responses. Contra cognitivism, internal states are not necessarily conscious—mental, introspectable. Contra Hullian behaviorism, internal states are not necessarily physiological. 8 In other words, theoretical behaviorism respects the distinction between intervening variables, which claim no necessary relation to brain physiology, and hypothetical constructs, which do. 9 TB models may make contact with physiology eventually, but the fi rst priority is to explain behavior. TB sees internal states as purely theoreti-cal constructions based on historitheoreti-cal information from behavioral experiments.
Nevertheless, it shares with Hullian behaviorism the idea that the ultimate aim of behavioral study is the derivation of mechanisms or models. As these mod-els evolve, they will surely make some connection with brain physiology.
Theoretical behaviorism is interested in mechanisms for entirely practical reasons. The argument runs like this: Classes of equivalent histories must be discovered by putting together the results from the appropriate set of experi-ments. But real organisms are very complicated and historical experiments take time. There is no way that the full set of internal states of a real animal can be fully enumerated experimentally. Theoretical creativity is necessary, and theories arise not just from “orderly arrangement of data,” but also through invention. In practice, therefore, the main way to specify sets of equivalent his-tories is through dynamic theories that defi ne how moment-by-moment experi-ence changes the state of the organism. These theories can be compared with data, tested (if they do well enough with what is already known), overthrown (all theories are eventually overthrown), revised, and tested again, in the usual scientifi c way.
How does theoretical behaviorism differ from cognitivism? Both are theo-retical and both assume internal states. One difference is that theotheo-retical behav-iorism is explicitly historical and dynamic. It is not concerned directly with representation, but instead with the way that the organism is changed by its experience. A second difference is that theoretical behaviorism makes no pre-sumptions about either its subject matter or its theoretical constructs. Cognitive psychology is “the [computational] study of mental life”; theoretical behavior-ism is not committed to a prejudged view of what theory must do. It looks for models/mechanisms of behavior, where mechanism is whatever works to account for behavior; and behavior is whatever can be usefully observed or measured, including reports of conscious experience (see Chapter 16 ). The-oretical behaviorism assumes in advance neither that mental categories are inevitable ingredients of any valid theory, nor that they must be immediately explicable by such a theory.
And, fi nally, TB contends that the sole purpose of science is to frame par-simonious laws, and not to “explain mental phenomena” in terms of familiar mentalistic ingredients like “expectations,” “representations” and the like. An early advocate of the view that science is simply the simplest possible descrip-tion of nature was Isaac Newton, who famously wrote “hypotheses non fi ngo”
(“I make no hypotheses”), by which he meant that he intended not to “explain”
phenomena but simply to discover their rules of operation. To questions such as, “But what do you mean by force?” and the like, he could simply respond by pointing to the appropriate law.
So why theoretical behaviorism? What is specifi cally behavioristic about this approach? Pure behaviorism, a psychology constructed entirely of physi-cal stimuli and “uninterpreted physiphysi-cal movements,” is impossible. Why?
Because the same physical act can mean many different things: “Not waving, but drowning”—or saying “No!” or “Goodbye!” or whatever. The fact that perception interprets rather than records means that the same physical stimulus can look very different under different conditions. And different physical stim-uli can look the same: “Red” is not just a wavelength. The sensation “red” can be produced in many ways, some involving no red wavelengths at all. 10 The question, therefore, is not should we focus just on behavior and stimuli, defi ned in a physical way—the answer is “no”—but how should we interpret physical movements and stimuli? My suggestion is that we do so through parsimonious dynamic models. These models then are the “behavior,” interpreted in the sim-plest way that makes sense. This is what the organism is “doing,” described in the simplest possible way. And in a way that assumes as little as possible about the relation between subjective experience and observed activity. This is where TB differs from cognitivism—which happily assumes “expectations,” “repre-sentations,” “information” and the like, all with only a tenuous connection to observables. Theoretical behaviorism also assumes as little as possible about brain-behavior relations. What it does attempt is to provide an accurate real-time description of what the organism is doing. It is this—not the “behavior”
tout court —that the neurophysiologist must explain.
Theoretical Behaviorism 173
Figure 14.2 shows the standard framework for representing a fi nite-state machine. 11 It is a perfectly general picture. It just describes the logic of any process whose future behavior depends on its current input and its state, where state just summarizes the effect of all its past inputs. Since this is how we must think logically about the behavior of historical systems, it is an obvious place to begin. Figure 14.2 is also the framework for theoretical behaviorism. It tells us nothing about how stimuli and responses are to be defi ned. It does not spec-ify the properties of the states: how many there are, the rules by which they change, and so on. These are the concern of specifi c theories, and I refer the reader to technical sources for more details. 12 One emerging theme is the idea that many of the properties of simple learning can be explained by interactions among independent agents (“integrators”), each of which retains a memory of its past effectiveness in a given context. I summarized one version of this idea in Chapter 6 . This theme recurs in the earlier discussion of behavior-based AI.
Models along these general lines can describe the basic properties of operant learning, for some properties of complex choice, for temporal properties of habituation, the progressive decrease in responding to repeated “neutral” stim-ulation and for some properties of the kind of interval timing seen on fi xed-interval reinforcement schedules. I discuss in more detail in the next chapter a model for habituation.
***
Behaviorism was once the dominant movement in American psychology. It was eclipsed by the “cognitive revolution” in the late 1970s. Two things seem Figure 14.2 The view of the organism in theoretical behaviorism. For simplicity, time is divided into discrete instants (down arrow). At each instant, a stimulus (which may be “no stimulus”), It, can have either or both of two effects:
It can produce a response, Bt, and change the organism’s state from St to St + 1. The states are defi ned by two tables: one that shows the effect of each stimulus on the response, the other that shows the effect if each stimulus on the subsequent state. This notation is just the standard way of
It can produce a response, Bt, and change the organism’s state from St to St + 1. The states are defi ned by two tables: one that shows the effect of each stimulus on the response, the other that shows the effect if each stimulus on the subsequent state. This notation is just the standard way of