Lecture Notes 12: The Repeated Prisoners’ Dilemma
Finitely Repeated Prisoners’ Dilemma
Recall the prisoners’ dilemma, given below. The 𝐶𝐶 strategy is called “cooperate” and the 𝐷𝐷 strategy is called “defect”.
C D
C 2,2 -1,3
D 3,-1 0,0
𝐷𝐷 is a strictly dominant strategy for both players and so (𝐷𝐷, 𝐷𝐷) is the only Nash Equilibrium. This is perhaps the most disconcerting result in all of game theory. The (𝐶𝐶, 𝐶𝐶) outcome is better for
both players, so it seems like there should be some way to support it.
In a one-shot game, the problem seems hopeless. (𝐶𝐶, 𝐶𝐶) cannot be an equilibrium because each player has an unavoidable incentive to unilaterally deviate to 𝐷𝐷 and increase his payoff from 2 to 3. But what if the game is repeated many times? Might players be able to keep cooperation going?
Suppose that the same two players repeat this prisoners’ dilemma 100 times. Your first guess might be that the players can cooperate early on because there is a penalty to defecting. The other player can punish a defector in future periods, right? But does this argument hold up? Let’s apply backwards induction to this game.
• In period 100 players must play the Nash Equilibrium (𝐷𝐷, 𝐷𝐷). There can be no punishment for defecting in the last period because the game is over. Thus, it cannot be an equilibrium for any player to play 𝐶𝐶 in the last period because he could unilaterally deviate to 𝐷𝐷 and increase his payoff with no possibility of a future consequence. The last period of a game must always involve a Nash Equilibrium.
• In period 99 players realize that the only possible outcome in period 100 is (𝐷𝐷, 𝐷𝐷) no matter what they do in period 99. As a result, there is no way to enforce cooperation in period 99. It is not an equilibrium for any player to cooperate in period 99 because there is no consequence for defecting and increasing his payoff from 2 to 3. He receives a payoff of 0 in period 100 whether he defects or not. Equilibrium play in period 99 must be (𝐷𝐷, 𝐷𝐷).
without any future consequence. After all (𝐷𝐷, 𝐷𝐷) is played in periods 99 and 100 no matter what players do in period 98. So any subgame perfect equilibrium must have players playing (𝐷𝐷, 𝐷𝐷) in period 98 as well.
But this same argument can be iterated all the way back to period 1! The only subgame perfect equilibrium of a finitely repeated prisoners’ dilemma is for both players to defect in all periods.
The basic idea of the argument is just that cooperation unravels backwards because there is a known ending period. There is no reason to keep cooperating in period 100. Knowing this, there is no reason to keep cooperating in period 99 either, etc… The only reason not to defect is to improve your prospects in future periods, but that reason falls apart in period 100. But then since there is no cooperation in period 100 no matter what, there is no reason to cooperate in period 99, or period 98, or period 97, etc… all the way back to period 1.
Here is another demonstration. Suppose that there was a Nash Equilibrium that specified cooperation up to some period 𝐽𝐽 and then defection after that. If both players cooperate, this leads to a payoff stream (2,2 … ,2,2,0,0, … ,0,0). But suppose that our player unilaterally defects in period
𝐽𝐽. This leads to a payoff stream (2,2 … ,2,3,0,0, … ,0,0), which is strictly better. This is a profitable deviation, so there cannot be a Nash Equilibrium with this structure. No cooperation up to any point is possible.
Infinitely Repeated Prisoners’ Dilemma
Things are different for an infinite repetition of the prisoners’ dilemma. The unraveling argument above relied crucially on the last period – no cooperation is possible in the last period, so no cooperation is possible in the second-to-last period, etc… The key difference in an infinitely repeated game is that there is never a last period. At whatever period players find themselves, there are always more periods ahead. Now there is always a reason to cooperate because, in a game that goes on forever, there are always more periods ahead where players can face consequences for defecting. The reason that cooperation doesn’t work with finite repetition is because of that last period where players could not face any consequence for defection.
The key to cooperation is for players to use contingent strategies, which specify different strategies depending upon the history of what was played in previous periods. The basic idea is to specify some punishment associated with defection. Here are two well-known contingent strategies that can be applied in repeated prisoners’ dilemmas:
• Grim trigger strategy: Cooperate in the first period. Thereafter, continue to cooperate as long as all players have cooperated in all previous periods. If any player ever defects, then all players defect forever after.
Grim trigger punishments are much more serious than tit-for-tat punishments. Tit-for-tat punishments allow a defecting player to return to a cooperative path as long as that defecting player starts cooperating again – once he starts cooperating, his opponent will copy his cooperation. With grim trigger strategies, as soon as there is a single defection, everyone defects forever after.
Again, note that both of these crucially rely on infinite repetition. Grim trigger strategies are no help in a finitely repeated game because, in the final period, there is no opportunity left for a punishment. In infinitely repeated games, there are always future periods available to punish a player who defects.
Discounting
If players cooperate forever in the infinitely repeated version of the prisoners’ dilemma above, the payoff stream is {2,2,2,2, … }. Unfortunately, the value of this payoff stream is not well-defined because it adds up to infinity. Thus, evaluating payoffs in infinitely repeated games requires discounting, which just means that future payoffs are worth less than current payoffs, and increasingly less as the payoff gets farther and farther in the future. This is a well-accepted idea. In finance and in economics generally, it is well-understood that a payoff of $100 far in the future is worth less than a payoff of $100 today.
As such, we let 𝛿𝛿 indicate the discount factor. Using this discount factor, a payoff stream of
{2,2,2,2, … } is actually worth {2,2𝛿𝛿, 2𝛿𝛿2, 2𝛿𝛿3, … } in today’s valuation, with 0 < 𝛿𝛿 < 1.
The idea is that the discount factor 𝛿𝛿 measures patience. If 𝛿𝛿 is very close to 1, then the future payoffs {2𝛿𝛿, 2𝛿𝛿2, 2𝛿𝛿3, … } are worth almost as much as the current payoff of 2. These players are patient since they place a lot of value on future payoffs. But if 𝛿𝛿 is closer to 0 then the future payoffs {2𝛿𝛿, 2𝛿𝛿2, 2𝛿𝛿3, … } are worth very little compared to the current payoff of 2.1 This represents impatient players since their future payoffs are not worth very much compared to payoffs today, meaning that players don’t care much what their payoffs are in the future.
An important mathematical result is the infinite geometric series. For any 0 < 𝛿𝛿 < 1:
1 + 𝛿𝛿 + 𝛿𝛿2+ 𝛿𝛿3+ ⋯ = 1
1 − 𝛿𝛿
1 If 𝛿𝛿 = 0.98, then the payoff stream {2,2𝛿𝛿, 2𝛿𝛿2, 2𝛿𝛿3, … } is worth{2, 1.96, 1.92, 1.88, … }. But if 𝛿𝛿 = 0.6, then the
Checking Contingent Strategies in a Repeated Game
There are a multitude of ways to deviate from an equilibrium strategy in an infinitely repeated games. After all, there are an infinite number of periods in which players might consider deviating. Do we have to check all these ways to see whether a strategy constitutes an equilibrium? Luckily, no. We only need to check for one particular kind of deviation to determine whether a strategy in an infinitely repeated game is a subgame perfect equilibrium strategy.
According to the one-shot deviation principle (OSDP), to check whether a player will deviate from a strategy in an infinitely repeated game, it is enough to check a one-shot deviation, followed by the specified strategy thereafter. The OSDP greatly simplifies analysis of strategies in repeated games because it tells us that we only have to check for one kind of deviation.
We will now apply the OSDP to check tit-for-tat and grim-trigger strategies in an infinitely repeated prisoners’ dilemma. Consider the following variation on a prisoners’ dilemma. (𝐷𝐷, 𝐷𝐷) is the Nash Equilibrium, but we might be interested in finding a way to sustain the (𝐶𝐶, 𝐶𝐶) outcome.
C D
C 90,90 70,100
D 100,70 75,75
Tit-for-tat
Let’s first check whether cooperation in a tit-for-tat strategy constitutes an equilibrium. Specifically, the strategy for each player is: “Begin by playing 𝐶𝐶. Thereafter, copy whatever your opponent did in the previous period.”
By cooperating , players receive 90 in every period, so the value of this payoff stream is:
Πcooperate = 90 + 90𝛿𝛿 + 90𝛿𝛿2+ 90𝛿𝛿3+ ⋯
For defection, OSDP tells us that we only need to check for a one-shot defection. By deviating and playing 𝐷𝐷, a player can get a payoff of 100 today. His opponent punishes him the next period by playing 𝐷𝐷. But the deviating player only deviates once, so he returns to playing 𝐶𝐶, meaning that he accepts a payoff of 70 in the following period (his opponent plays 𝐷𝐷 while he plays 𝐶𝐶). Thereafter, our player’s opponent emulates his cooperation, so the payoff is back to 90 again. Putting this all together, the value of our player’s payoff stream from deviating is:
Thus, to determine whether perpetual cooperation is an equilibrium with tit-for-tat strategies, we need to check whether the value of the cooperative payoff stream exceeds the value of the payoff stream associated with deviating:
Πcooperate≥ Πdeviate
90 + 90𝛿𝛿 + 90𝛿𝛿2+ 90𝛿𝛿3 + ⋯ ≥ 100 + 70𝛿𝛿 + 90𝛿𝛿2+ 90𝛿𝛿3+ ⋯
90 + 90𝛿𝛿 ≥ 100 + 70𝛿𝛿 𝛿𝛿 ≥12
In words, as long as the discount factor satisfies 𝛿𝛿 ≥1
2 then cooperating permanently with the tit-for-tat equilibrium is more profitable than deviation, and so both players can permanently
cooperate using tit-for-tat strategies in a subgame perfect equilibrium of this game. But if 𝛿𝛿 <1 2, then it is more profitable for a player to defect than to follow the cooperative strategy, so players will not cooperate using tit-for-tat strategies.
The necessary condition on the discount factor for an equilibrium usually looks something like this. In words, cooperation is an equilibrium when 𝛿𝛿 is sufficiently high. This makes good intuitive sense. When 𝛿𝛿 is high, players are patient and place a high value on future payoffs. In this case, it is not worth defecting today because you suffer a future punishment, and the future is important for patient players. Cooperating today makes sense. But when 𝛿𝛿 is close to 0, players are impatient and they don’t value future payments very much. In a setting like this, defecting today and getting a higher payoff makes sense because the reduced value of the future punishment payoffs is unimportant. Thus, there can be no equilibrium with cooperation. Defection is too attractive.
Cooperation is possible when players are patient but not when players are impatient. Thus, the condition for an equilibrium with cooperation is normally some lower bound on the discount factor
𝛿𝛿. Perpetual cooperation is possible as long as 𝛿𝛿 exceeds some critical value.
Grim Trigger
Let’s check now whether cooperating permanently is a subgame perfect equilibrium in the infinitely repeated version of this game using grim trigger strategies. Specifically, the strategy followed by each player is: “Begin by playing 𝐶𝐶. Continue to play 𝐶𝐶 as long as every player has played 𝐶𝐶 in every previous period. If there is ever any defection, play 𝐷𝐷 forever thereafter.”
By cooperating, players receive a payoff of 90 in every period:
Using the OSDP, we check for the player deviating immediately and getting a payoff of 100. Under a grim trigger strategy, the consequence of defection is to get a payoff of 75 in every period forever after the defection, since the grim trigger specifies both players using 𝐷𝐷 forever after the defection. Thus, the payoff stream from a defection under a grim trigger strategy is:
Πdeviate = 100 + 75𝛿𝛿 + 75𝛿𝛿2+ 75𝛿𝛿3+ ⋯
Again, to check whether perpetual cooperation is an equilibrium using a grim trigger strategy, we need to check whether the cooperative payoffs exceed the deviation payoffs. Here’s where the expression for the sum of a geometric series comes in handy.
Πcooperate≥ Πdeviate
90 + 90𝛿𝛿 + 90𝛿𝛿2+ 90𝛿𝛿3+ ⋯ ≥ 100 + 75𝛿𝛿 + 75𝛿𝛿2+ 75𝛿𝛿3+ ⋯
90(1 + 𝛿𝛿 + 𝛿𝛿2+ 𝛿𝛿3+ ⋯ ) ≥ 100 + 75𝛿𝛿(1 + 𝛿𝛿 + 𝛿𝛿2+ 𝛿𝛿3+ ⋯ )
90 �1 − 𝛿𝛿� ≥ 100 + 75𝛿𝛿 �1 1 − 𝛿𝛿�1 90 ≥ 100(1 − 𝛿𝛿) + 75𝛿𝛿
𝛿𝛿 ≥25
So it is an equilibrium for both players to play (𝐶𝐶, 𝐶𝐶) forever in a subgame perfect equilibrium of the infinite repetition of this game using grim trigger strategies as long as their discount factors
satisfy 𝛿𝛿 ≥2
5. The logic is similar to the tit-for-tat example. When 𝛿𝛿 is low, getting the 100 today is worth a lot and players don’t care very much about the low payoffs in the future, so cooperation
is not possible. But when 𝛿𝛿 ≥2
5, cooperation is possible. Players don’t find it worthwhile to defect because it’s too costly in terms of future payoffs, which are more highly valued.
Problems
1. Two players infinitely repeat the game below, and each player uses the following grim trigger strategy.
I. Play A initially and continue to play A as long as (𝐴𝐴, 𝐴𝐴) has been played in every previous period.
II. If any player has ever deviated from (I), then play B forever after.
A B
A 4,4 0,5
B 5,0 1,1
a. For what values of 𝛿𝛿 will players play A forever in this game?
b. Suppose the payoffs for (𝐵𝐵, 𝐵𝐵) increase to (3,3). For what values of 𝛿𝛿 will players play A forever in this new game?
c. Is cooperation easier in (a) or in (b)? Provide an intuitive explanation in terms of incentives.
2. Two firms are involved in an infinitely repeated game. If they cooperate on pricing, they can split the monopoly profit, each earning $500 per period. But if one firm undercuts slightly, then the firm that undercuts can collect the entire monopoly profit of $1000.
a. Suppose that the firms attempt to cooperate using Nash Reversion: They start by splitting the monopoly profit, but if anyone ever undercuts then the firms revert forever to the competitive outcome where both earn a profit of 0 forever after. For what values of 𝛿𝛿 will the firms be able to perpetually cooperate using this strategy? b. Suppose now that the firms use a Nash Reversion strategy, but that it takes two periods to respond to a deviation. That is, if a firm cheats, the punishment cannot begin for two periods, allowing the cheater to collect $1000 for two periods. For what values of 𝛿𝛿 will the firms be able to perpetually cooperate now?