How to Evolve Irrational Preferences 1

(1)

How to Evolve Irrational Preferences

1

Discussion paper for the colloquium on Philosophical Perspectives on Irrationality, University of Montreal, 10-12 Oct 19972

Department of Philosophy & Centre for Applied Ethics, University of British Columbia

Abstract

Cooperation in a single play prisoner’s dilemma is irrational. We consider a simple model where this irrational behavior will evolve, via the evolution of cooperative preferences. These players cooperate rationally, given their preferences and common knowledge, but their preferences dis-tort – fail to track their interests in – their situation. Generalizing, we show that disdis-tortions are predictable; our evolved agents’ preferences will track all but two problematic situations, and here they will take on cooperative distortions. These models introduce the concepts and tools of evolutionary game theory – reproductive fitness values, replicator dynamics and the genetic al-gorithm – to address the question: when should we expect agents to be rational and when should we expect them to be something else.

Keywords: Prisoner’s Dilemma, evolution of preferences, replicator dynamics, genetic algo-rithm

1 Problems and Opportunities

Cooperation in a single play prisoner’s dilemma is irrational yet we find cooperation in animals in nature, in our social life, and in experiments on human subjects. Those who dismiss this behaviour as simply irra-tional, miss an opportunity to understand it better. In contrast, this paper will use small models of artificial evolution to show one way a special kind of irrationality can evolve, selectively, to solve social dilemmas.

1.1 Rational choice foundations for ethics

This paper is motivated by problems in the rational choice approach to moral theory and opportunities af-forded by recent work in cognitive and biological science. The highly developed state of the theory of ra-tional choice makes it an attractive starting point for a pragmatic account of morality. But this approach complicates the moral theorist’s task and possibly distracts us from simpler moral solutions that come about before or alongside rationality. First, the subjective theory of value forces accounts of moral motivation to start with complex constructions like counter-preferential dispositions to choose dominated cooperative strategies. Second, on the received theory “there is no gap between preference and choice” (Heath 1996, p. 531), so those who choose to cooperate in the prisoner’s dilemma (PD), where cooperation in a dominated strategy, must have non-PD preferences. Therefore, this line of reasoning continues, cooperators are just

1

The advertised title of the paper ("The Evolution of Irrational Preferences") was too naturalistic.This ver-sion does not reflect the generalization (… and Wishful False Beliefs") that came out in discusver-sion at the conference.

2

This paper was begun during a leave from UBC and begun while visiting the International Institute for Applied Systems Analysis, Laxenburg, Austria. Thanks to UBC for leave time, IIASA for an excellent re-search environment, and SSHRC for rere-search support for my project Evolving Artificial Moral Ecologies, of which this paper is part. Special thanks to Chris MacDonald for providing research assistance and com-ments on a draft, to Daniel Weinstock for the invitation to present the conference paper and to the excel-lent group of philosophers in the audience for their questions and feedback. This version does not reflect Françoise Lepage's stimulating commentary, for which I thank him.

3

Author’s Address: Email: [email protected]; WWW: http://www.ethics.ubc.ca/pad.html. Centre for Ap-plied Ethics, 227-Agricultural Rd, University of British Columbia, Vancouver, B.C. V6T 1Z2.

(2)

playing a different game. This conclusion is obviously intended as a criticism – suggesting that the moral theorist has avoided the PD game, not solved it – so one is tempted to resist it. But I propose instead to take up this line and use it constructively. It reminds us how subjective game theory is; a game is constituted by the players’ preferences. Moreover, if agents can avoid traps like the PD, and do better by playing some other game, this may be pragmatically as good as a solution to the PD (whatever that might be).4 Thus we come to the question the present paper raises: how can we account for the games successful interactors end up playing?

1.2 Situated Agents

These concerns about rational choice are complemented by some recent work in cognitive science. From the perspective of the situated approach to cognitive science, rational choice theory which focuses how an agent calculates moves in a remarkably complete internal model of the social world, looks somewhat old fashioned. While the standard approach to AI operated within the world model of a single agent, recent criticism has pointed to the importance of situating agents in an environment (Brooks 1991, Clark 1997). These critics go further, conjecturing that agents will allow the world to do much of the work traditional models have loaded on the mind.5 Each of these steps is suggestive. First, when we focus on several agents interacting in a task environment, the individual agent’s mental model of its world moves off centre stage. The mental model becomes part of the agent’s equipment, open to test and refinement; the environmental task is the testing ground. From this point of view, subjective theories of value take the agent’s internal representations too seriously. In particular, an agent’s preferences are a fallible motivational model of the world. We should not take satisfying them as the ultimate end, but only as the agent’s proximate end. Sec-ond, agents whose preferences tracked their interests would be simpler.

1.3 Evolutionary Game Theory

Thus we are led to an evolutionary model. On the cognitive side, it is natural to let the agents’ cognitive equipment vary and adapt to its environment under evolutionary pressure. In our case, we are interested in a purely social environment modeled by evolutionary game theory.6 The payoffs in the matrix, which deter-mine differential reproduction then provides an alternative to the subjective theory of value. Reproductive interests are objective values induced by the evolutionary dynamics that define the model. Finally, the con-figuration of reproductive interests give us a way to model a situation – for example as a Prisoner’s Di-lemma (see Figure 1) – independently of the preferences of the agents. The evolutionary dynamics drive the model to search a space of populations of possible agents. If we allow agents to vary in their preferences, this process will, by selecting successful agents, also select the way they choose among their alternatives – their preferences. Since these preferences determine the game the agents are playing, we should be able to discover what games successful agents play.7

Schema Payoff Values

C D C D

C R,R S,T 2, 2 0, 3

D T,S P,P 3, 0 1, 1

Figure 1 Prisoner's Dilemma8

4

To decide we need to be able to compare the cost of externalist solutions to the PD etc. 5

Herbert Simon’s (1969) ant, whose complex path reflects a complex environment not a complex mind, is the prototype here.

6

We do so for simplicity, not because we endorse the neglect of other environmental factors that game the-ory encourages. See (Hegselmann,1996) for some interesting minimal geography in a game theoretic con-text.

7

Not, it is important to notice, “choose to play,” since these agents can only choose moves, not preferences. Indeed, this is as close as this paper will get to the question whether evolutionary models have normative over and above their clear explanatory content. Danielson (1997b) argues that they do have normative content.

8

(3)

What has this to do with irrationality, the topic of this colloquium? First, it deploys the received conception of rationality in a new way. One often hears about the importance of the logical discipline of rationality, that defection in a PD, for example, follows from a tautology, namely that preferences deter-mine action.9 The temptation is to dismiss any instances of agents cooperating in a PD as simple, that is, uninteresting, irrationality. But I conjecture that if we require agents to be rational (in a sense to be speci-fied below in section 3.1), and they still cooperate, we might discover some interesting structure in their residual irrationality. Second, our agents will be irrational in a special way: in some situations, their prefer-ences will not track their interests. Obviously, this is a subject that has not been given much attention by the subjective theory of value. Notice that I am not defending an objective theory of value in this paper. Rather, I am stipulating a model in which objective values exist – by design, they are decisive -- and testing whether successful rational agents’ preferences track these values.10

2 Extending Sen

11

The proposal that non-PD preferences might account for mutual cooperation in the Prisoner’s Dilemma is due to Sen (1974), who argues for the moral superiority of what he calls Assurance Game (AG) preferences in a situation where welfare (his term) is determined by PD outcomes. The name Assurance Game “derives from the fact that a player would be willing to co-operate if he or she could be assured that the partner would co-operate.” (Kollock, 1997, p. 188). AG preferences differ from PD preferences in ranking R>T, (cf. Figure 1 and Figure 3).12 Sen argues that “if everyone behaved as if they had AG-preferences, and had the assurance of similar good behaviour by others, they would be better off even if they actually had PD-preferences” (p. 60).

Sen’s argument is complicated by two factors: his subjective value framework and a simultaneous game. I will simplify my argument by using objective values and an extended game. Let us call the con-figuration of PD interests in Figure 1 a PD situation and use the term game for the configuration of players’ preferences over these outcomes.

2.1 Extended Situations & Games

T,S P,P PD C C C D D _D T-1 T-2 R,R S,T C C C D D D AG AG 2A 2B T,S P,P R,R S,T R,R S,T T,S P,P P1 C C C D D D OR 2C PD PD PD PD OR

Figure 2 The Extended Prisoner’s Dilemma (XPD)

Figure 2 shows a PD situation extended over time. First player 1 (P1) moves at T-1, then player 2 (P2) moves at T-2. The labels at the bottom are payoffs to P1 and P2, respectively. If the players’

9

For example, Binmore (1994) names a chapter “Toying with Tautologies”. 10

Having mentioned cognitive science and evolution, I should note that it is not my intention to produce a

realistic models of either the evolution or the cognitive structure of preferences. (Indeed, in order to focus on the theory of rationality, I will help myself to one of its more extravagant assumptions, agents’ common knowledge of each other’s preferences.) We will employ cognitive and evolutionary tools to explore the theory of rationality, not to ground it in psychological or biological fact.

11

The argument in this section derives from Danielson (1991). 12

AG preferences are related to the conditionally cooperative strategy disposition defended by Gauthier (1986); cf. Danielson (1988 and 1992), but the relation between preference/interest and disposi-tion/preference models is too complex to take up here.

(4)

ences track their payoffs, this will also be a PD game, as shown on the left, in 2A. (The labels in the boxes indicate the players’ preferences; in 2A both P1 and P2 have PD-preferences.) P2 will choose D in either node at T-2. P1, faced with values, will choose D. These choices are drawn with solid lines. Thus we have a decision path to sub-optimal PP, the rational equilibrium outcome for the XPD.

Second, consider the case where P2 has AG preferences, that is, ranks the PD situation’s outcomes R>T>P>S (see Figure 2B). P2 will now choose C in the left T-2 node, and P1 faces the choice between of CàR and DàP, and, ranking R>P, will choose C. So this game, where just one agent has AG preferences, will lead to mutual cooperation.13 Therefore, P2’s AG-preferences have better consequences for both P1 and P2 than PD-preferences in an XPD situation. (Note that the consequences are measured by their pay-offs, {2,2}, not the rankings indicating the players’ preferences.)

Sen (1974, pp. 60-1) goes on to consider what he labels Other Regarding (OR) preferences that “appear to be adamant on not letting the other player down”; OR-preferences rank R>S>P>T (see Figure 2C). Sen sees preferences as morally superior to AG preferences. This is in part because

OR-preferences resolve a simultaneous PD cooperatively, while AG-OR-preferences do not themselves provide the assurance needed in this case. But OR-preferences are extremely compliant, and do miserably in the case where the other player retains PD-preferences (see Figure 2C). The OR-player in role P2 is willing to coop-erate even when P1 defects, allowing P1 the option of DàT. The OR-player ends up with the S=0 out-come. It is important to note that Sen ignores this cost because he assumes that all players will have the same preferences, not because he scores the outcome in terms of the OR-player’s preferences. To do the latter would be to give (in Sen’s case, second-order) preferences too much weight. The point (which I take over from Sen) is the substantial one that some non-PD preferences do better in terms of PD outcomes, not the trivial one that they do better in their own terms. (Trivial first because any preferences can do better than PD preferences by simply placing maximum value on the equilibrium outcome. Second, the subjective measure does not distinguish OR preferences, which fail to change the situation for the better, from AG preferences, which do.)

2.2 Preferences & Dispositions

Finally, although I do not adopt Sen’s assumption that all players share the same (second order) prefer-ences, I can appreciate its likely motivation. The first problem Sen faces is one of communication: how does a player know the other player’s motivation? Assuming identical motivation solves this problem. But it is too strong an assumption and yields misleading results, like the recommendation of OR-preferences as morally superior. My proposal is to follow the standard game theoretic common knowledge assumption that each player knows the other’s preferences. I realize I am stretching the assumption by invoking it in the rather unusual context where preferences do not track interests.14 Moreover, this stretching does have one interesting consequence. Game theorists often dismiss cooperative solutions to the PD that depend on dis-positions to cooperate. For example Heath asks rhetorically whether Gauthier assumes that agents can read

13

Watkins complains about “Sen’s suggestion that the primary task for a moral agent is not to get results, but to lick one’s preferences into a good shape (the idea being that, if everybody did this, good results would follow of themselves)” (1974, p. 75). However, in the extended game with full infor

(reformed, moralized) preferences can assure the other player. 14

I realize that some find the common knowledge assumption the least plausible part of game theory. For example, Bob Sugden has urged me, in discussion, to drop it from my model. But in the context of demon-strating an element of irrationality in an otherwise rational preference-mediated interaction, it seems appro-priate fully to invoke the theory of rational choice. But I do not want, thereby, to distract attention from my argument’s focus to common knowledge. In particular, my argument is not intended to be a reductio criti-cism of the common knowledge assumption. Second, I only rely of this assumption in this context as a loan (to use Dan Dennett’s metaphor) from the theory of rational choice. In a more complete account of the evolution of less rational forms of interaction, I spell out a way to argue for some common knowledge; cf. Danielson (1997a). Finally this whole exercise is limited to simple models; for any attempt to apply these results practically I agree with Kollock (1997, p. 206), who does not assume that agents’ transformed sub-jective game matrices are common knowledge: “Given that within an Assurance Game people are condi-tional willing co-operators, the task becomes to assure actors that others can be counted on to co-operate. Hence, attempts to signal and advertise one’s commitment to co-operate will be critical.”

(5)

minds.15 However the standard assumption of common knowledge of preferences makes exactly the same claim; players know each other’s dispositions to choose. Presumably their cognitive task is eased by the (tacit) assumption that each has preferences that track their interests in a situation. Then they don’t need to read minds, but merely perceive their common environment. But the common knowledge assumption is not explicitly linked to the tracking assumption, so I will use it here independently.

A second problem avoided by assuming identical (second order) preferences is complexity. By assuming all agents have identical preferences, Sen has only sixteen games to compare. Allowing prefer-ences to differ greatly increases the number of games to 162, too many for Sen’s straightforward analytic method. Evolutionary techniques allow us to deal with this increase in complexity and test preferences in the more demanding general situation.

2.3 More Recent Work

Satz and Ferejohn (1994) argue for “supplementing [rational-choice] theory with an externally derived the-ory of interests.” Several authors contrast rational choice with evolutionary game thethe-ory in particular. Skyrms (1996) provides a good survey. Wilson and Sober (1994, p. 601) suggest something close to the approach I take in this paper, combining evolutionary game theory and deliberative rationality:

To distinguish mechanisms from metaphors, it is useful to think of a psychological mo-tive as a strategy in the game-theoretic sense, which produces a set of outcomes when it interacts with itself and with other strategies. Thus, a psychologically selfish individual (however defined) will be motivated to behave in certain ways with consequences for it-self and others. A psychologically altruistic individual (however defined) will be moti-vated to behave in other ways with a different set of payoffs. Within an evolutionary framework, the empirical claim that individuals are motivated entirely by self-interest must be supported by showing that the psychologically selfish strategy prevails in com-petition with all other strategies.

Even closer to my approach, Sober (1997 p. 417) sketches a model in which, “if rational delibera-tion is to evolve … then the agents’ preferences must not be perfectly correlated with maximizing fitness.” Skyrms (1996, p. 42) sketches a way to relate objective interests to subjective utilities:

In contrast with subjective expected utility theory, both evolution and experimentation share an interest in tangible income. … We have seen that strategies that are not modular rational in payoffs in evolutionary fitness may evolve. It remains to be seen how useful it is to conceptualize these strategies within the framework of subjective expected utility theory. If they are treated in this way, one could think of evolution as generating bounds on utility function for the species.

In what follows, we take up these suggestions, constructing a simple model of agents whose prefer-ences are molded by evolution. Finally, Kollock offers experimental evidence for our model. He finds “people transform interdependent situations into essentially different games” (1997, p. 207) “There is a general tendency to subjectively transform Prisoner's Dilemma into an Assurance Game rather than a Pris-oner’s Dilemma Game” (1997, p 186).

3 An Evolutionary Model

An evolutionary model allows us to simplify our account by providing a two level structure, which sepa-rates interests and preferences. At the first level agents are selected on the basis of the outcomes of a round of a tournament of XPD games played against each of the other agents.16 Since reproduction of agents is

15

“... the more problematic assumption, which is that agents are able to determine their interaction partner’s dispositions without any information about their actions. If this is in fact the claim, then it is very difficult to interpret. In order for it to be true, it would have to be the case that one’s disposition is act in a particular way were revealed somehow other than through one’s actions. And unless one’s choice of disposition af-fected one’s physical appearance, or agents were possessed of some unusual telepathic abilities, this is ex-tremely obscure” (Heath, 1996, p.534).

16

Since each player is evaluated against the evolved roster of agents, this is a co-evolutionary model. Con-trast Axelrod (1987) non-co-evolutionary model, where each strategy plays a fixed roster of strategies.

(6)

determined by the values in the matrix, these constitute the reproductive interests of the agents. The evolu-tionary basis of the model operationalizes a conception of interests.

3.1 Preferences

At the second level, agents’ moves in a situation are determined by some sort of internal structure. In our case, we want this to be a structure of preferences, so it must satisfy two demands. First, it must determine the agent’s choices and second, it must be known to the other player in the situation. Given players gov-erned by preferences, a situation is transformed into a game. Now rationality in games – even simplified games like the XPD – is a complex cognitive task. We will not attempt to evolve this ability. Our strategy is to assume game theoretic rationality on the part of all agents, and only try to evolve preferences (that serve as input to their built-in rationality function) 17.

In the extended PD, rationality demands very little of the player in role P2: a choice between out-comes R and T on the left hand node and a choice between S and P on the right. It is the P1 role that calls for more equipment. P1 must decide between the outcome of choosing C and that of choosing D. In each case a rational P1 should look ahead to predict what P2 will do, given P2’s preferences. It is this look-ahead algorithm that we give to each player. The outcome of this function is an outcome for P1: C leads to R or S, D leads to T or P. So P1 needs to choose between four pairs – a superset of P2’s two pairs. This suggests that an operationally simple model of preferences will be a choice for each of these pairs. (These can be implemented as standard condition/action rules.) Given a canonical ordering of pairs, we can repre-sent preferences simply as vectors of C/D choices. Figure 3 illustrates this scheme with our three running examples of the sixteen possible preferences.

Pairwise Rankings Label Ordering R or T? R or P? S or T? S or P? Choice Vector PD T>R>P>S T R T P <DCDD> AG R>T>P>S R R T P <CCDD> OR R>S>P>T R R S S <CCCC>

Figure 3 Preference Representations

Since all agents share the same rationality function, they differ only in their preferences, so we can identify players and preferences in what follows. Players are operationalized preferences. To test prefer-ences we pair combinations of associated players in an XPD situation. We have already considered the AG/PD pair in section 2.1, where we had PD go first When the roles are reversed, the PD defects whatever AG does, so AG chooses D (since it shares with the PD the preference P>S). Averaging the two encounters, we get the results in the four shaded boxes in Figure 4. AG preferences dominate PD preferences in their interaction, but, of course, there are other interactions to consider because the sums of outcomes across all

encounters in a round of the tournament determine an agent’s reproductive success. For example, when we include OR players in the tournament, PD players, which successfully exploit the OR, do as well as AG.18 However, this effect should be short-lived, as exploited OR players die out, and PD players do poorly with each other. These examples show why we need to test the interactions of all preference combinations and project their interaction over evolutionary time. We predict that although PD preferences will initially do well, AG preferences will eventually prevail in the evolutionary dynamics.

OR AG PD Sum

OR 2 2 0 4

AG 2 2 1.5 5.5

17

This is not what is usually done in evolutionary game theory models. They invoke neither common knowledge nor full rationality in order to be more realistic about simpler biological agents and to try to derive conclusions from weaker assumptions. For example Skyrms (1996, pp. 93f) , “[T]he evolutionary process gives an explanation of the stability of signaling system equilibria that is perfectly good in the ab-sence of common knowledge, or of any knowledge at all!”(emphasis in original).

18

Recall that by giving both players the same preferences, Sen only considered the NW-SE diagonal out-comes in this tournament matrix, ignoring some of OR’s bad effects.

(7)

PD 3 1.5 1 5.5

Figure 4 A Tournament (Payoffs to Row Strategy)

3.2 Replicator Dynamics

To test this hypothesis, we create players with each of the sixteen possible preferences for the XPD situa-tion. Since we need to compare only a small number of possible preferences (i.e. 16), we can use the sim-plest evolutionary model, the replicator dynamics, where (relatively) successful agents are cloned19.We begin with a population in which each preference is equally represented20, and pair each agent with each of the others (and itself) in each role of the XPD. Those players that score better than average in a round of this tournament are proportionally better represented in the next generation. Agents are cloned, without mutation, to construct the next generation. The process is repeated until the population proportions stabi-lize. One run suffices because the replicator dynamics are deterministic.

3.3 Results

The results are clear by generation 30 of the 40 generation run shown in Figure 5. PD-preferences (<DCDD>) do better at first, but from the eight generation, AG-preferences (<CCDD>) do best and soon dominate the population. An even more cooperative preference (<CCDC>) evolves along with the AG-preferences, because it cooperates with the latter. Conversely, the reason PD-preferences do best at first is that they do better – by exploiting – some of the soon to be eliminated preferences in the initial population

0 10 20 30 40 50 60 70 80 0 2 4 6 8 ₁₀ ₁₂ ₁₄ ₁₆ ₁₈ ₂₀ ₂₂ ₂₄ ₂₆ ₂₈ ₃₀ ₃₂ ₃₄ ₃₆ ₃₈ ₄₀ Generations % of Population CCCC CCCD CCDC CCDD CDCC CDCD CDDC CDDD DCCC DCCD DCDC DCDD DDCC DDCD DDDC DDDD

Figure 5 Evolution of Preferences

4 Calibration & Tracking

Our task is only half complete. We have shown that in the XPD the preferences that evolve do not track directly the agents’ interests in the situation. But this failure of tracking may be the fault of our evolution-ary device. To complete our argument, we should provide a background where tracking preferences do

evolve, so that if the XPD case stands out as an exception, it does so for interesting reasons.

19

Cf. Skyrms 1996, chapter 1 and Axelrod 1984, chapter 2, where this process is called “ecological projec-tion” because it involves no evolutionary change in the agents.

20

Indeed, there is nothing special about the initial population in which each possible preference is repre-sented equally. Equality is less biased than giving greater representation to any particular preference(s) of course, but a fuller analysis could consider all variations on the initial population. However this generality would make complicate our model beyond our introductory purposes.

(8)

To calibrate our model we test it in a more general set of situations. Roughly, we construct a set of situations with an arbitrary structure, in order to test whether preferences will track this structure of inter-ests. More precisely, we imbed the XPD in a set of thirteen situations that can be distinguished with our 4-bit preferences.21 Now we ask: if agents are presented with the situations having reproductive payoffs in Figure 6, what preferences will the agents evolve? According to the tracking hypothesis, the surviving, fit-test, agents should have preferences that match their interests. In particular, the situation in Figure 6 line 1(R=3,S=2, T=1, P=0) should lead to the preferences in line 1 (R>S>T>P or <CCCC>), etc.22 To test this, we need to complicate our model in two respects. First, we need to give our agents the ability to differenti-ate the situations. We do this by the simple expedient of concdifferenti-atenating thirteen four-bit choice vectors per agent. The agents in this model are “combinatorial, meaning that all possible conditions … are explicitly associated with a particular action (i.e. cooperate – C or defect – D” (Crowley et al, 1996, p. 52). For ex-ample, an agent whose choice vector begins DDDD CCDD.. has all D preferences in the first situation, AG preferences in the second situation, and so on.

Label Preferences Situations: Payoff Values

# Ordering Choice Vector Index # R S T P

1 OR R>S>P>T <CCCC> 0 3 2 1 0 2 R>P>S>T <CCCD> 1 3 1 0 2 3 R>T >S>P <CCDC> 2 3 1 2 0 4 AG R>T >P>S <CCDD> 3 3 0 2 1 5 ~PD S>P>R>T <CDCC> 4 1 3 0 2 6 P>R>T >S <CDDD> 7 2 0 1 3 7 S>T >R>P <DCCC> 8 1 3 2 0 8 CK T>R >S>P <DCDC> 10 4 1 5 0 9 PD T>R >P>S <DCDD> 11 2 0 3 1 10 BS S>T>P>R <DDCC> 12 0 3 2 1 11 R>T >P>S <DCCD> 13 0 2 1 3 12 T>S >P>R <DDDC> 14 0 2 3 1 13 ~OR P>T>S>R <DDDD> 15 0 1 2 3

Figure 6 Preferences and Situations

Notice, incidentally, that there is no direct link between the interests structuring a situation and the agents’ preferences. That is, different situations point to different parts of a player’s preference vector, but give the players no other information. For example, the XPD is signaled only as situation number 9, which the players use as a pointer to their ninth choice vector. All information about interests is conveyed by the effects of the payoffs on differential reproduction. We arrange the situations in a particular order – the OR situation is first, etc -- to make our test for tracking simpler.

21

There are only 14, not 16, for two reasons. First, 2 preferences – <CDDC >(i.e. R>T, R>R, T>S and S>P, which, given transitivity, entails P>R>T>S>P) and <DCCD> (with a similar problem) -- are incon-sistent, so they cannot be represented as cardinal sets of interests. Agents can be inconsistent; the environ-ment cannot be. Note as well that our ordinal preference representation underdetermines the cardinal situa-tion outcomes. For example, the preferences in row 11 are the inverse of the AG game, but I have selected the interests to create a Battle of the Sexes situation (hence BS preferences). Second, with a tournament of situations, each situation should have the same range of payoffs to give them equal weight in the result. But using the {3,2,1,0} values leaves the Chicken situation poorly defined, with two optimum outcomes: CC and alternating CD/DC. To correct this problem, we re-scaled the Chicken outcomes to the values {5,4,10} and used this one situation in place of the original two (Chicken and its mirror) so that it would not have extra weight in the tournament.

22

Note that the labels R,S,T, and P lose their original sense (see Figure 2) once we generalize beyond the XPD situation. I retain them for simplicity; Sen uses suitably abstract notation.

(9)

4.1 A Genetic Algorithm

Our brute force method of constructing agents for multiple situations suffers from combinatorial explosion. There are 2(13 x 4) different agents, so clearly we cannot use replicator dynamics of section 3.2 to test them explicitly. Instead we use Holland’s (1992a,b) genetic algorithm (GA) to explore this large search space. An initial populations consists of 100 agents of 56 random bits. Each agent plays each of the (99) others in each of P1 and P2 roles for 14 different situations. The players are ranked by their total scores. The top quarter are copied directly into the next generation (to provide continuity for the co-evolutionary model), one quarter are selected proportionally to fitness, one quarter are created by cross-over (from two parents each, selected proportionally to fitness) and one quarter are subject to single point mutation of agents se-lected proportionally to fitness. Note that although our players each play many games (99 x 13 x 2) these games are not iterated, because the players have no cognitive means to connect earlier and later games.23

4.2 Predictions and Results

Our GA model is supposed to do two sorts of jobs. One, calibrate and confirm our appeal to the evolution of preferences, and two, extend that appeal to the many situation case. For the first task, we predict that the GA model will show general improvement. That is, the preferences will adapt to the situations. The answer is clearly yes. For example, in a typical run of 200 rounds, the best of the initial random players scored an average of 2.09 in each of its each of its 13 x 2 x 99 = 2574 interactions while in the last round this had increased to 2.61 (of a maximum near 3, where half the agents are newly created, and therefore can be ex-pected be make errors).

To the second question, we predict that preferences will track interests, except in the social di-lemmas, PD and Chicken (see Figure 6 lines 5, 8, and 9). The results of four runs of 200 rounds each of 100 agents are fairly clear; preferences generally tracked interests. To display these results, we use a binary numbering index scheme and read situations and preferences as binary numbers by making D=1 and C=0. Figure 7 displays a line on which preferences that tracked interests perfectly would fall. (The bumps are due to the situations we deleted; cf. note 21.) We chart the preferences of the best player in the last round of each of the four runs. The distance is adjusted to be the difference in bits between the target and the result24. In every run, preferences differed from interests in the PD, its inverse (where C is defecting) and Chicken. AG-preferences (or their inverse) evolved. The PD situations lead to AG preferences (point 11); Chicken (point 10) lead 3 out of 4 times to AG preferences (the three below the target line) with a willingness to accept S in role P2, which I have called “broad compliance” in Danielson (1992, chapter 9). That is, evolved agents promise to reciprocate cooperation but do not threaten.

0 2 4 6 8 10 12 14 OR _~PD PD Interests Preferences Target Run-1 Run-2 Run-3 Run-4 23

This point meets my concern in Danielson (1992, p. 201) that adding more games would make the games iterated. Of course, once can only have this assurance of independence for cognitively simple agents. Agents with internal state – say agents who learn via neural networks or classifier rules – may connect sev-eral games.

24

That is, choice orderings <CCCD> and <DCCC> each differ from <CCCC> by one bit, but a plot by binary index would make the first = 9 and the second = 1, and thus exaggerate the difference in the case of the first.

(10)

Figure 7 Preferences Track Interests

4.3 Further Work

Our models simplify both situations and agents. On the situation side, we have tested the tracking hypothe-sis for only a small sample of situations – those distinguishable by our original preference set.25 The test could be extended to additional situations. On the agent side, our players are cognitively simple; they sim-ply evolve preferences for each situation their built-in sensors differentiate. On the other hand, they are profligate of cognitive resources; they need 13 x 4 bits to encode a simple function: “prefer your interests except in the case of XPD and Chicken”. And, of course, they pay (via costly opening rounds) for learning so little in such a backhanded way. So we might conjecture that if agents had to pay for cognitive resources, or for this learning, they might make do with the simple function identifying preferences and interests, or, even better, a direct sensor for interests. Whether they would bother to have (evolve or learn) a special function for the exceptions would depend on the frequency of the exceptions and the difficulty finding a simple function to identify them.26

Identifying preferences and interests might appeal to defenders of rational choice. Can they say, We told you so? Not quite. If agents (of some cognitive type, in some situations) prosper by identifying preferences and interests, including PD preferences in the XPD situation, they are rational (in tracking) but they are making a pragmatic mistake in their particular situation. They would be better to have AG prefer-ences in the XPD situation. Their mistake is explicable in terms of their cognitive limitations and the envi-ronments that they face. None the less it remains a mistake, similar to that made by agents who remained committed to cooperation in the one shot PD having conflated it with the iterated case27.

5 Irrationality?

Have we demonstrated the evolution of irrationality as promised in the title? That depends on what ration-ality claims. Consider a weak, subjective conception of rationration-ality, which requires only that behavior be consistent with an agent’s beliefs and preferences. Our evolved agents do not violate this standard of ra-tionality. Indeed, I have gone to some lengths to incorporate it into their design. That is, given their prefer-ences our agents choose the preferred outcome, constructing beliefs about other agents’ behavior on the assumption that other agent’s will be rational as well. But if rationality requires, in addition, that one’s preferences track one’s objective values in the situation, than our agents are not rational. Agents with pref-erences that fail to track interests, in the case of the extended PD and Chicken, will do better, in terms of those very interests.

Or, to put this point another way, our model shows rationality in the subjective sense to be an in-complete theory. It can allow that our agents just happen to interpret the PD and Chicken as the AG (and its tolerant variant), and that given these preferences, our agents are rational. But it cannot explain why the PD and Chicken induces these distorted preferences, although this looks like something that a full theory of instrumental choice should be able to explain. I have argued that evolutionary models provide a way to extend the theory towards a more complete pragmatic account of rationality and its alternatives.

We conclude that rational choice is not a natural assumption. That is, we do not have reason to expect that rational agency will evolve in all situations (cf. Sober 1997, Skyrms 1996, and Börgers 1996). In particular we should not expect it to evolve in social dilemmas like the Prisoner’s Dilemma and Chicken. But we prefer to stress the positive message: evolutionary models provide a pragmatic account for some

25

For example, all of our situations are symmetrical around the NW/SE diagonal axis and (therefore) there are no constant sum games represented.

26

The latter suggests a role for moral theory in discovering the needed functions and the former, ironically, a second order moral interest in increasing the proportion of situations that are social dilemmas, in order to motivate the use of these functions.

27

Skyrms (1996, p. 28) discusses the related case, where “[t]he sequential problem of dividing a cake in ultimatum bargaining may not seem to subjects much different than the problem where claims are simulta-neous and submitted independently. In the latter case, fair division is a perfectly acceptable game theoretic solution. Perhaps subjects generalize from the simultaneous case to the sequential case.”

(11)

interesting deviations from rationality.28 We can predict what sort of agents, rational and irrational, will prosper in some problematic situations

References

Axelrod, Robert.1987. The evolution of strategies in the iterated prisoner’s dilemma. In L. Davis, editor,

Genetic Algorithms and Simulated Annealing. Morgan Kauffman, Los Angeles.

Binmore, Kenneth. 1994. Game theory and the social contract, Vol. 1. Playing Fair. MIT. Press, Cam-bridge, Mass.

Börgers, Tilman. 1996. “On the Relevance of Learning and Evolution to Economic Theory”. University College of London, Centre for Economic Learning and Social Evolution Working Paper. Brooks, Rodney. 1991. “Intelligence without reason.” In Proceedings of the 12th International Joint

Con-ference on Artificial Intelligence. Morgan Kauffman, Los Angeles.

Clark, Andy. 1997. Being There: Putting Brain, Body, and World Together Again. MIT. Press, Cambridge, Mass.

Crowley, P.H., Provencher, L, Sloane, S., Dugatkin, L.A., Spohn, B., Rogers, B.L. and Alfieri, M. 1996. “Evolving cooperation: the role of individual recognition”. BioSystems 37, 49 - 66.

Danielson, Peter.1988. "The visible hand of morality". The Canadian Journal of Philosophy.18:357 - 84. --- 1991. "Is Game Theory Good for Instrumental Ethics?", Presented at a symposium on Game Theory

at the American Philosophical Association Pacific Division meeting, San Francisco. --- 1992. Artificial Morality . Routledge, London.

--- 1997a. “The Co-Evolution of Rationality and Constraint” in Chris Morris and Arthur Ripstein, edi-tors, <Title Withheld>.

--- 1997b. "How to Evolve Something Better than Rationality or Ethics”, The First International Gradu-ate Student Conference on Evolutionary Perspectives in the Social Sciences and Humanities, Uni-versity of B.C, October 3-4.

Gauthier, David. 1986. Morals by Agreement. Oxford University Press, Oxford.

Heath, Joseph. 1996. “A Multi-Stage Game Model of Morals by Agreement” Dialogue, XXXV, 529-52. Hegselmann, R..1996. Social Dilemmas in Lineland and Flatland. In W. Liebrand & D Messick, ed.

Fron-tiers in Social Dilemmas Research. Berlin:Springer.

Holland, John. 1992b. Genetic algorithms. Scientific American, 267(1).

--- 1992a. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, 2nd ed. Cambridge, Mass.: MIT Press.

Kollock, Peter, 1997. "Transforming Social Dilemmas: Group Identity and Co-operation." in P. Danielson, ed, Modeling Rationality, Morality and Evolution New York: Oxford University Press.

Satz, Debra and John Ferejohn. 1994. Rational choice and social theory. Journal of Philosophy 9102: 71-87.

Sen, Amartya. 1974. “Choice, Orderings and Morality.” In S. Körner, ed. Practical Reason. Oxford: Blackwell.

Simon, Herbert. 1969. The Sciences of the Artificial. Cambridge University Press, Cambridge. Skyrms, Brian. 1996. Evolution of the Social Contract. Cambridge University Press, Cambridge.

Sober, Elliott. 1997. “Three Differences between Deliberation and Evolution” in P. Danielson, ed, Model-ing Rationality, Morality and Evolution New York: Oxford University Press.

28

(12)

Watkins, J.W.N.. 1974. “Comment: ‘Self-interest and Morality’”, In S. Korner, ed. Practical Reason. Ox-ford: Blackwell.

Wilson, David S. and Elliott Sober. 1994. “Reintroducing group selection to the human behavioral sci-ences”, Behavioral and Brain Sciences, 17 585-654.