346_lect13.pdf

(1)

Lecture Notes 13: General Theory on Repeated Games

In the previous section, we looked at repeated games specifically in the context of the prisoners’ dilemma. In this section, we will develop some more general results.

Different Punishments

Consider the infinitely repeated version of the game below.

A B C

A 10,10 3,15 0,7

B 15,3 7,7 -4,5

C 7,0 5,-4 -15,-15

Suppose we are interested in constructing strategies that will support the (𝐴𝐴, 𝐴𝐴) outcome played perpetually. We will consider two possibilities.

Nash Reversion

Notice that (𝐵𝐵, 𝐵𝐵) is a Nash Equilibrium and provides lower payoffs than (𝐴𝐴, 𝐴𝐴), so an obvious place to start is Nash Reversion. Use (𝐵𝐵, 𝐵𝐵) as a grim trigger threat to punish anyone who deviates from (𝐴𝐴, 𝐴𝐴). Specifically, the strategy for each player is: “Play 𝐴𝐴 in the first period and if there has never been any deviation from this in the past. If there is any deviation, then play 𝐵𝐵 forever after.”

To check whether this strategy constitutes a subgame perfect equilibrium, we use the one-shot deviation principle. Cooperating forever gives a payoff stream:

Πcooperate = 10 + 10𝛿𝛿 + 10𝛿𝛿2_{+ 10𝛿𝛿}3_{+ ⋯}

A deviating player can choose 𝐵𝐵 to get a payoff of 15 today, but after doing so both players revert to the Nash Equilibrium (𝐵𝐵, 𝐵𝐵) and get a payoff of 7 forever after (if the payoffs were not symmetric, you would have to see which player had the more profitable deviation). In this case, the payoff from deviating is:

Πdeviate = 15 + 7𝛿𝛿 + 7𝛿𝛿2 + 7𝛿𝛿3+ ⋯

(2)

Πcooperate≥ Πdeviate

10 + 10𝛿𝛿 + 10𝛿𝛿2_{+ 10𝛿𝛿}3 _{+ ⋯ ≥ 15 + 7𝛿𝛿 + 7𝛿𝛿}2_{+ 7𝛿𝛿}3_{+ ⋯} 10(1 + 𝛿𝛿 + 𝛿𝛿2_{+ 𝛿𝛿}3_{+ ⋯ ) ≥ 15 + 7𝛿𝛿(1 + 𝛿𝛿 + 𝛿𝛿}2_{+ 𝛿𝛿}3_{+ ⋯ )}

10 �_{1 − 𝛿𝛿� ≥ 15 + 7𝛿𝛿 �}1 _{1 − 𝛿𝛿�}1 10 ≥ 15(1 − 𝛿𝛿) + 7𝛿𝛿

𝛿𝛿 ≥ 0.625

Carrot and Stick

Let’s try a different kind of punishment for deviation. Intuitively, the availability of the very bad outcome (𝐶𝐶, 𝐶𝐶) should give us some ability to construct more severe punishments. The following kind of strategy is known as a carrot and stick strategy.

1. Play (𝐴𝐴, 𝐴𝐴).

2. If there is any deviation from (1), play (𝐶𝐶, 𝐶𝐶) one time and then return to (1). 3. If there is any deviation from (2), then restart (2).

To determine whether this strategy is an equilibrium, we need to consider both deviations from the cooperative phase (1) and deviations from the punishment phase (2). This wasn’t an issue with the Nash Reversion strategy because the punishment phase was to play a Nash Equilibrium. And, by definition, there is no profitable deviation from a Nash Equilibrium. But, here, the punishment phase asks players to play (𝐶𝐶, 𝐶𝐶), which is not a Nash Equilibrium, so we need to check that too.

Let’s first use the OSDP to check for deviations from the cooperative phase (1). By deviating to 𝐵𝐵 a player can get 15 today, but then must accept the punishment payoff of −15 for one period before returning to playing 𝐴𝐴. This is the best one-shot deviation from phase (1). Comparing the cooperative payoff against the deviation payoff:

10 + 10𝛿𝛿 + 10𝛿𝛿2_{+ 10𝛿𝛿}3_{+ ⋯ ≥ 15 − 15𝛿𝛿 + 10𝛿𝛿}2_{+ 10𝛿𝛿}3_{+ ⋯} 10 + 10𝛿𝛿 ≥ 15 − 15𝛿𝛿

𝛿𝛿 ≥ 0.2

(3)

−15 + 10𝛿𝛿 + 10𝛿𝛿2_{+ 10𝛿𝛿}3_{+ ⋯ ≥ 0 − 15𝛿𝛿 + 10𝛿𝛿}2_{+ 10𝛿𝛿}3_{+ ⋯} −15 + 10𝛿𝛿 ≥ 0 − 15𝛿𝛿

𝛿𝛿 ≥ 0.6

Summarizing, we need 𝛿𝛿 ≥ 0.2 in order for players to not deviate from (1) and we need 𝛿𝛿 ≥ 0.6 in order for players to not deviate from the punishment phase (2). Both are important because the punishment (2) is not credible if it does not constitute an equilibrium strategy if it were reached. Both requirements have to hold. The second requirement is more demanding, so overall this carrot and stick strategy will sustain the outcome (𝐴𝐴, 𝐴𝐴) forever as long as 𝛿𝛿 ≥ 0.6.

We have now considered two different strategies that can lead to the (𝐴𝐴, 𝐴𝐴) outcome being played forever in the infinitely-repeated game. The Nash Reversion strategy will generate this outcome for 𝛿𝛿 ≥ 0.625 but the carrot and stick strategy will work as long as 𝛿𝛿 ≥ 0.6. The carrot and stick strategy is therefore the “better” punishment because it will support cooperation for a wider range of discount factors.

The Folk Theorem

Let’s start with a simple example: the prisoners’ dilemma given below.

C D

C 2,2 -1,3

D 3,-1 0,0

(4)

Our question is this – Which of these payoffs are actually possible in an equilibrium of the infinitely repeated game? Clearly the mutually cooperative payoff of (2,2) is possible as long as the discount factor 𝛿𝛿 is high enough, by using a Nash Reversion punishment.

But are there equlibrium strategies that would produce other payoffs too? (e.g. alternating, or playing some correlated or mixed strategy). The answer is yes. According to the folk theorem, there is some SPE of the infinitely repeated game that will produce any feasible payoff vector where both players get at least their minmaxes, as long as 𝛿𝛿 is high enough.

The folk theorem is an existence result. It does not tell us how to construct strategies that generate any particular payoff vector. But the folk theorem does tell us that for any feasible payoff vector where both players get at least their minmax payoffs, there exists some strategy such that this payoff is the equilibrium payoff in an infinitely repeated game.

For our prisoners’ dilemma, the minmax payoffs are clearly (Π₁minmax, Π₂minmax) = (0,0). The worst thing player 2 can do to player 1 is to defect, to which player 1 will respond by defecting himself. Thus, the worst payoff that the players can impose on each other in this game is 0.

Accordingly, by the folk theorem, if the game is infinitely repeated, there is a SPE that would produce any feasible payoffs where both players get at least their minmax payoffs of 0. This set is sketched below. In other words, there is some strategy in the repeated game that will produce any payoff in this set, as long as the players are patient enough (sufficiently high discount factors 𝛿𝛿).

The shaded set is called the set of feasible, individually rational (FIR) payoffs. Feasible means that the payoff is attainable in the game (inside the convex hull that gives the feasible payoffs).

(5)

The folk theorem tells us that there are equilibrium strategies in the infinitely repeated game that would produce any payoff vector in the FIR set sketched above, as long as the discount factor 𝛿𝛿 is high enough.

Applying the Folk Theorem: Another Example

Consider the infinitely repeated version of the following game. Our goal is to find the set of payoffs that can be obtained in an equilibrium of the infinitely repeated game.

C D

A 1,5 6,3 B 2,3 5,4

The set of feasible payoffs of this game is shown below. It is formed by the convex hull created by the pure strategy payoffs.

Only those payoffs in which both players obtain at least their minmaxes are attainable in subgame perfect equilibrium, so our next task is to find the minmax payoffs for each player.

Consider player 1. His payoff from each of his two strategies is written below, as a function of player 2’s strategy choices.

Π𝐴𝐴 = 1𝑝𝑝𝐶𝐶+ 6𝑝𝑝𝐷𝐷 Π𝐵𝐵= 2𝑝𝑝𝐶𝐶+ 5𝑝𝑝𝐷𝐷

(6)

giving him the highest payoff. From the diagram, it is obvious that on the upper envelope the lowest payoff for player 1 is obtained when player 2’s strategy is 𝑝𝑝_𝐶𝐶 = 1. In other words, player 2 minmaxes player 1 by using the pure strategy 𝐶𝐶. This results in player 1 getting a payoff of 2, so player 1’s minmax payoff is Π₁minmax= 2.

We now need to find player 2’s minmax payoff. Her payoff from each of her two strategies is written below, again as a function of player 1’s strategy choices.

Π𝐶𝐶 = 5𝑝𝑝𝐴𝐴+ 3𝑝𝑝𝐵𝐵 Π𝐷𝐷 = 3𝑝𝑝𝐴𝐴+ 4𝑝𝑝𝐵𝐵

(7)

Looking at the upper envelope of these payoff functions, it is clear that the lowest payoff that player 1 can impose on player 2 is to use the mixed strategy shown. We can calculate precisely where Π_𝐶𝐶 = Π_𝐷𝐷.

5𝑝𝑝𝐴𝐴 + 3𝑝𝑝𝐵𝐵= 3𝑝𝑝𝐴𝐴+ 4𝑝𝑝𝐵𝐵 5𝑝𝑝𝐴𝐴+ 3(1 − 𝑝𝑝𝐴𝐴) = 3𝑝𝑝𝐴𝐴+ 4(1 − 𝑝𝑝𝐴𝐴)

𝑝𝑝𝐴𝐴 = 1₃

In words, player 1 minmaxes player 2 by using the strategy 1

3𝐴𝐴 + 2

3𝐵𝐵. The corresponding payoff

for player 2 can be calculated by plugging in either to Π_𝐶𝐶 or to Π_𝐷𝐷 since they are equal at this point.

Π𝐶𝐶 = 5𝑝𝑝𝐴𝐴 + 3𝑝𝑝𝐵𝐵= 5 �1_{3� + 3 �}2_{3� =}11_{3 ≈ 3.67}

Π𝐷𝐷 = 3𝑝𝑝𝐴𝐴 + 4𝑝𝑝𝐵𝐵= 3 �1_{3� + 4 �}2_{3� =}11_{3 ≈ 3.67}

We conclude that player 1 minmaxes player 2 by playing 1

3𝐴𝐴 + 2

3𝐵𝐵 and that player 2’s minmax

payoff is Π2minmax= 3.67.

(8)

Finitely Repeated Games

We close our discussion of repeated games by returning briefly to finitely repeated games. In the finitely repeated prisoners’ dilemma, we established in the previous section that the only SPE is for all players to defect in all periods. However, other games can open up more possibilities even with finite repetition. For example, suppose that we repeat the game below two times. The pure-strategy equilibria of the stage game are (𝐵𝐵, 𝐵𝐵) and (𝐶𝐶, 𝐶𝐶).

A B C

A 5,5 0,6 0,0 B 6,0 4,4 0,0 C 0,0 0,0 1,1

When the game is repeated twice, it is straightforward that the players must play a Nash Equilibrium in the last period. For example, it is not possible for players to play (𝐴𝐴, 𝐴𝐴) in the last period, because either player could unilaterally deviate to 𝐵𝐵 to increase his payoff. The game is over, so there can be no consequence to doing so.

We will restrict our attention here to pure strategies. There are four “trivial” subgame perfect equilibria of this game that just involve playing a Nash Equilibrium in both stages. This is obviously a subgame perfect equilibrium since a Nash Equilibrium by definition has no profitable deviation. Here are these four “trivial” equilibria, with strategies to be used by both players.

• _𝐵𝐵 in period 1 and then 𝐵𝐵 in period 2 for any choices in period 1.

• _𝐵𝐵 in period 1 and then 𝐶𝐶 in period 2 for any choices in period 1.

• _𝐶𝐶 in period 1 and then 𝐵𝐵 in period 2 for any choices in period 1.

• _𝐶𝐶 in period 1 and then 𝐶𝐶 in period 2 for any choices in period 1.

Can we do any better? It is hopeless to try to play anything other than a Nash Equilibrium in the last period, but might it be possible for (𝐴𝐴, 𝐴𝐴) to be played in the first period?

Here is the big idea. The last period has to involve a Nash Equilibrium, but this game has two different equilibria (unlike the prisoners’ dilemma). By using different Nash Equilibria in the last period, depending upon whether there is cooperation in the first period, we have a punishment available for defection. Thus, we can construct a subgame perfect equilibrium of the twice-repeated game where players play (𝐴𝐴, 𝐴𝐴) in the first period and it’s not worth deviating from this.

(9)

anybody deviates in the first period, the equilibrium instead calls for (𝐶𝐶, 𝐶𝐶) in the second period (the “bad” Nash Equilibrium). We can state this equilibrium precisely as follows:

• Play 𝐴𝐴 in period 1. Play 𝐵𝐵 in period 2 as long as both players played 𝐴𝐴 in period 1. Play 𝐶𝐶 in period 2 if any player did not choose 𝐴𝐴 in period 1.

With this strategy, it is a subgame perfect equilibrium for players to choose the non-Nash equilibrium (𝐴𝐴, 𝐴𝐴) in period 1. The second-period strategies are Nash Equilibria so we don’t need to check those. What about the first-period strategy? A player could unilaterally deviate in period 1 to 𝐵𝐵 and get a payoff of 6 instead of 5. But the cost is that period 2 play will involve the “bad” equilibrium (𝐶𝐶, 𝐶𝐶) rather than the “good” equilibrium (𝐵𝐵, 𝐵𝐵). This reduces the period 2 payoff from 4 to 1. It’s not worth cheating in period 1 to increase your payoff by 1 since the consequence is reducing your payoff by 3 in period 2.

This strategy works only because there are multiple Nash Equilibria in the last stage. Thus, which

(10)

Problems

1. Consider an infinite repetition of the game below, and consider the following strategy, to be used by both players.

(I) Play 𝐶𝐶 initially, or if 𝐶𝐶 was played in the previous period.

(II) If there is a deviation from (I), then play 𝑃𝑃 one time and restart (I). (III) If there is a deviation from (II), then restart (II).

For what values of 𝛿𝛿 will players play 𝐶𝐶 forever in a subgame perfect equilibrium?

C D P

C 4,4 0,6 0,0 D 6,0 2,2 2,0 P 0,0 0,2 1,1

2. Consider an infinite repetition of the game below, and consider the following strategy, to be used by both players.

(I) Play 𝐶𝐶 for two periods.

(II) Play 𝐵𝐵 forever after (I) is completed. (III) If there is any deviation, restart (I).

For what values of 𝛿𝛿 is this strategy a subgame perfect equilibrium?

A B C

A 1,1 0,-1 -1,1

B -1,0 0,0 -1,0

C 1,-1 0,-1 -2,-2

3. Consider an infinite repetition of the game below. Sketch the set of payoffs attainable in some SPE of this game for sufficiently high values of 𝛿𝛿.

a b

(11)

4. Consider an infinite repetition of the game below. Sketch the set of payoffs attainable in some SPE of this game for sufficiently high values of 𝛿𝛿.

Y Z

A 2,1 0,0 B 0,0 1,3

5. You and a friend are computing the subgame perfect equilibria of an infinite repetition of some game 𝐺𝐺. Your friend suggests eliminating strictly dominated strategies from 𝐺𝐺 in order to speed the computation. What do you think of your friend’s suggestion?

6. The game 𝐺𝐺 is a zero-sum game with Nash Equilibrium payoffs of (Π1, Π2) = (3, −3). Your friend claims to you that, if this game is infinitely repeated, then the Folk Theorem implies that the game will have many subgame perfect equilibria with different payoffs as long as 𝛿𝛿 is high enough. Is your friend correct?

7. This problem refers to the two simultaneous games below. You may restrict all of your answers to pure strategies.

a. Suppose that 𝐺𝐺1 is played twice. Is there a SPE where (𝐴𝐴, 𝐴𝐴) is played during the first period? If so, present such an equilibrium. If not, then explain why not. b. Suppose that 𝐺𝐺1 is played one time and then 𝐺𝐺2 is played one time. Is there a SPE

where (𝐴𝐴, 𝐴𝐴) is played during the first period? If so, present such an equilibrium. If not, then explain why not.

Game 𝐺𝐺1 Game 𝐺𝐺2

a b y z

A 5,5 0,7 Y 8,8 1,2