M yopic B est-R eply w ith M istakes - Interaction patterns, learning processes and equilibria

It rests by changing. (LII)

6.1 Introduction

In this Chapter we complement the analysis of Chapter 3 by focusing on some models of myopic best-reply dynamics with mistakes. Mistakes in the decision process un dertaken by players, model perturbations that affect the underlying dynamics. Such perturbation is customarily referred to as the noise. Extensively analyzed in the recent hterature, noisy dynamics formahze the idea that a small amount of random noise in players’ actions may drive population behaviour towards particular equihbria and away from others. As a result, some equihbria may be selected by the dynamics as the most likely outcome of play when interaction is repeated over time.

A generic model of noisy best-reply dynamics involves at least two aspects: the specification of the noisy decision process on the part of players and the study of the dynamics of the aggregate population. As far as the first is concerned, we take the underlying noiseless process to be exactly the myopic best-reply dynamics of Chapter 3. However, the latter is ’’ perturbed” , in that we assume that when players are called

to choose actions, they best-respond with high probability, but, with some small probability, they do something else instead.

We shall focus on two different specifications of a model of population dynamics. In the first, noisy best-reply dynamics, we assume that each player makes a mistake with a probabihty that is fixed, equal for all players and uncorrelated over time. As we shall see, the dynamics of the aggregate population is described by a Markov chflin over the same state space of the underlying myopic best-reply dynamics. Besides, the latter is regular, in that all transitions may occur with strictly positive probabihty. Regular Markov chains are known to admit a unique limit distribution (in the ter minology of Chapter 3, this means that the process admits a single ergodic set, that contains all states). Uniqueness of the limit distribution then motivates further char acterizations of the process as the ’’noise” becomes neghgible. These dynamics have been studied extensively in the recent hterature on equihbrium selection; in the first part of this Chapter, we report the main findings and focus on particular cases th at seem to undermine the generahty of the results.

In the second part of this Chapter, we analyze a model where the probabihty with which mistakes occur explicitly depends on the expected payoff from the interaction. We refer to this model as best-reply dynamics with payoff dependent mistakes. When ever expected payoffs differ across players, then so do the probabihties with which each player adopts each single action. This introduces a high degree of correlation among players’ choices. As a result, though the process stiU inherits Markovian properties from the underlying myopic best reply dynamics, aggregate transition probabihties depend on the specific configuration of play. We shah formalize the process as a Markov random field. For some classes of games we are able to characterize it in terms of the set of Gibbs measures associated to a particular specification of the interaction potential, given by the average expected payoff in the population, intro duced in Chapter 3. The behaviour of this category of processes is, in general, less weh understood than that of simple Markov chains. Under asynchronous dynamics,

we are able to explicitly derive the unique limit distribution and, in analogy with what we do in the first part of this Chapter, address equihbrium selection issues.

The analysis of noisy best-reply dynamics is carried out as in Kandori, Mailath and Rob (1993). The way we formahze the stochastic process is shghtly different: exploiting the fact that the underlying Markov chain is regular, from the original chain (that wih be called p(s) and is defined over 0 ), we derive a new chain (that we wih caU p(s)) that is defined over the set of states that are absorbing in the unperturbed process (that is a subset of ©). Doing this has the advantage that, whenever we are able to identify the set of absorbing states of the unperturbed process with the set of Nash-equihbria of the underlying game, we can think of p(e) as ranging over the latter set. This formalizes the intuition that appears in Kandori, Mailath and Rob (1993) and in Canning (1992) for an underlying 2-2 Coordination game. Although we do not pursue this explicitly, we conjecture that, for particular classes of games, the rates at which the transitions in p(s) occur correspond exactly to the radius and coradius identified in Elfison (1995), quoted in Footnote 4. We do not report the equilibrium selection result obtained in that paper, because the result refers to a ’’Darwinian dynamics” that does not correspond exactly to the best-reply we analyze here. I further conjecture that the extension of that result to best-reply dynamics may be obtained by using the definition of risk-dominance we use in this work, rather than the one introduced by the author. The result of Theorem 43 is only a re-statement of a result of Kandori and Rob (1993), that underfines the fact that the definition of risk-dominance we adopt incorporates many of the requirements needed to in the proof. Example 42 makes, once again in this dissertation, the point that a model of local interaction is to be described by looking at

the original state-space, 0 , and not at a lumped version of it. Although a p(£) can be derived for a locally interactive model in exactly the same fashion as for a population matching model, the set of relevant states, among which to selection is to take place, may radically dijffer.

The analysis of best-reply dynamics with payoff dependent mistakes re lates to Blume (1993) and An and Kiefer (1992). The class of behavioural rules we study is exactly the same. The main difference from Blume (1993) is that, in the model we study, the population is finite. The adjustment process studied in that work rehes on a continuous time formulation that can be thought of as a hmit, for the time interval becoming infinitesi- mally small, of the asynchronous dynamics we analyze. The nature of our model is closer to that of An and Kiefer (1992), where players play with all neighbours and take into consideration the average payoff obtained in a round of interaction. Besides the result, I befieve the main value added of our model, with respect to other in the same line of research, is th at it aims at providing a motivation, in terms of average payoff in the population, to an otherwise exquisitely technical formalization. On one hand, I hope this helps to partially alleviate the sense of frustration that the reader might experience when faced with aseptic techniques. On the other hand, as Example 48 shows, the quantity that appears at the expo nent of the formula in Theorem 46 a) is easy to calculate and b) reminds of a correlated equilibrium, which is what we started with in Chapter 1. Having said that. Section 6.3.2 can be skipped without loss of continuity.

In document Interaction patterns, learning processes and equilibria in population games (Page 110-113)