Be havi or i n Games 3
An dreas Blume
Dep ar tment of Econ omics
Un iversity of Iowa
Novemb er 8, 1994
Abst ra ct
Thi s pap er i nvesti gates a cl ass of p opu lation-l earni ng dynamic s. In e ve ry
pe-rio dagentse itheradoptabestrepl ytothecurrentdi stri butionofac tu al pl ay,ora
b e stre ply to asampl e,take nwi th repl ac ement, fromthe di stri bution of intend ed
pl ay(thestrategie sadoptedattheendofl astperi o d),ortheyarein ac ti ve. If
sam-pl in gwi th repl acementand bei ngin ac ti vehave stri ctlypositi ve prob ab il i ty,these
dynami cs converge global ly to minimal cu rb setsi n the ab sence of mi stakes. For
two-pl ayer i2j-games, i;j 3; th e same resul t hol ds even if onl y best
respond-i ng to actual p lay an d b e in g i nactive h ave posi tive probabi li ty. I f players make
mistakes i n the imp lementation of the ir strategi es, th ese d ynamics sele ct among
minimalcurb sets.
3
I wrote the rs t dr aft of this pap er while I en joyed t he h os pitality of Ce ntER. I thank, without
implic at ing,SjaakHurke nsandEr ic vanDamme forc omments . Ihave alsob e ne ttedfromcomme nts
made in the IO MEGT workshop at t he Unive rsity of Iowa. I am grate ful to the NSF for nancial
1 Introduct io n
Thispapercha racterizesthe lo ng-runo utco mesofaclas sof learningdynamicsing ames .
The cha racteriza tio nisin terms ofpropertieso f s ubsets of the sp ace ofstra teg y pro les
ofth eu nderlyingga mes. Youn g[199 3]hasshownthatu ndersomeco ndition son players '
memorya ndth eco mpletenessoftheirinformation,adaptivepl ayco nvergesa lmos tsurely
to a pu re strateg y equ ilibrium, provid ed the game s atises an acyclicity requirement.
The present paper is in the s ame s pirit. It looks at a di erent dynamic a nd, more
importantly, it d ro ps the a cyclicity conditio n. Without this condition, the question
arises wh ich objectsca ntake the pla ceof the pure strategy Nashequilibria . In general,
one suspects tha t this will depen d o n deta ils of the dyna mic. This paper a rgues that
there are interesting cla sses of dynamics whos e long- run o utcomes can b e chara cterized
in termsof c urb (c losed un der rational behav ior) s ets.
A p ro duct set o f s tra tegies is closed under inclus io n o f bes t replies if it contains all
b est res p o nses to indep endent b eliefs su pp o rted on itself. Bas u a nd Weibull [19 91]w ho
rstexa minedthepro p ertiesofsuchs ets ,refertoth emas cl osedunde rration albe hav ior
(curb). Curb sets which do not pro p erly co nta in an other curb set a re referred to as
minimalc urb set s. Ingenericnormalfo rmga mesthesecoincidewithpers istentsets(the
extremep o intsofp ersistentretra cts),Kalaia ndSamet[1 98 4],Balkenborg[19 92]. While
theseto fratio nalizab lestrategies ,Bernheim[198 4]a ndPea rce[198 4],isama ximalxed
p o int under the b est reply mapping, aminimal curb s et is a min imal xed point under
this mapping.
Cu rb sets have a nu mb erof a ttractivefeatures. T hey s hare with strict equilibria the
propertytha ttheycontaina llb estrepliesa gainstthemselves . Everycurbsetcontainsthe
support of a Nas hequilib rium; suchequilib ria are referred to as curbequilibria . Blume
[199 4] s hows that in g ames w ith one-s ided pre- play co mmunica tio n, the minimal curb
conditio n selects the communicating player's favo rite equilibrium if it is no t too risky.
Hurkens[1 993 ]demo nstrates that inpre-playcommunicationga mes wheremes sag esare
communica ting playerstheir mo stpreferred payo .
L ike s trict equilibria, curb sets will b e lo cally sta ble under a larg e class of plausible
dynamic adjustment rules . The questio n I wa nt to p o se in this paper is whether it is
p o ssible to provid e a rmer dynamic foun dation fo r minima l curb sets . This requires
that on e address two iss ues . Are there dynamics which converge g lo bally to minima l
curb sets? Arethere dynamics which s elect a mo ng minimal curb s ets?
Hurkens [199 4] provides one ans wer to the rst qu estion. He examin es a d yna mic in
the spiritof Young[1 993 ]wh ichco nverg esalmos tsurely to aminimalcurb setfro ma ny
initialco ndition. In hisd yna mic,o nly on epa irofa gentsplays inanyg iven timeperio d .
Ea chofth emtakesapossiblyincompletesamplefro mniteleng thhis toriesofpas t play
and b estres p ond sto somedistributio nover the sample. T hefactthat onlythe support
of the sa mple matters, g uarantees tha t every b elief over strategies in the current state
is possible. Therefo re the dynamic w ill eventuallyleave a ny setof s trategies that is not
curb. Finite length histo ries gu arantee th at once the process has s p ent sucient time
ina minima l cu rbsetitcan not exitthe minima lcurb seta nymore. Hurkens s hows that
his dyna mic converges glo bally to minimal curb sets , almo st s urely. He then g o es on
to ask wh ether a dding muta tio ns to his dynamic yields s election among minima l curb
sets. This is no t the ca se b eca use as soon a s a mistake enters the current state, it is
p o ssible that th e activeplayersattribute any probab ility to the corresp o ndin gs tra tegy.
Thereforeo nemista keissucienttoupseta nyminima lcu rbsetinHu rkens'framewo rk.
I want to prop o se a d i erent dynamic. I co nsid er larg e populations of a gents . All
ag entsplayin every period;whenthey play, th ey us ethe strategies theyhad adopt edat
the end o f last period, unless they are exp erimenting. If th ey are exp erimenting , they
rand omize, choosing each of their available strategies with strictly positive proba bility.
Agents di er in how th ey process info rma tio n. In each period every agent either best
responds, or gathe rs information o r is inact ive. Bes t resp o nding ag ents learn the true
current distribution of p lay fro m playing aga ins t the entire popu lation. Agents w ho
b est reply a gainst their information. 1
Inactive a gents ca rry over the strateg y they had
ado pted a t the end o f last p eriod into the next p eriod. Because p lay pa rtners are
identiable, the info rma tio n availab le to b est resp o nders is modelled a s co ming from
sa mplin g without replacement and fo r simplicity as learning th e true distributio n of
play, wherea s information gathering is potentia lly ind irect and therefore modelled as
sa mplin gwith replacement.
I derive two ma in res ults; one under the cond ition tha t the mista ke pro bability is
zero a ndonefo rpositivebut smallmista keprobab ilities . Withoutmista kesthedynamic
converges to minima l cu rb sets regardless of the initial cond ition. With mistakes the
dynamic selects among minima l curb sets. For two -player games with two exhaus tive
minima l curb sets, I chara cterize the conditio nfor s election o f one o f the minima l curb
sets;thisco nditio nreducestoHa rsanyi-Seltenriskdo mina ncein222games. Intuitively,
the two sets of sta tes where the p o pula tio ns play entirely a ccording to o ne of the two
minima l curb sets (minimal c urb states set s) a re exceptiona l. The dyna mic ca n leave
these sets of states only if suciently many mistakes occur simulta neously. Any other
sta te,o utsideofthesesets, b elo ngs to theba sin sofattractionofbothminima lcurb state
sets. Thusallthat ma ttersis howmanymis takes it takes to upseteith er one o f thetwo
minima lcurb state s ets.
I a lsoinves tiga teto w hatextentweneedsa mplin gwithreplacement ora sinHurkens
[199 4]adirecta ssumptiontha teverydistributio nwhichissup p ortedo nthecurrentstate
has positive proba bility. I sh ow tha t we may b e a ble to do without such ass umptions .
Atleas tintwo-playeri2j-ga meswith i;j 3asimplebes t- replyrulewherea gentsare
either in active o r move to o ne of their b est replies, converges globa lly to minima l curb
sets a lmos tsurely.
Bes ides the prob lem of ndin g dyna mics w hich lea d to an d select a mo ng minima l
curb sets in agame, thereis the dua lproblemo f wh en it isp o ssible to nd simplech
ar-acterizations o f s ta ble sets of a dynamic in terms o f the ga me. Fo r a Ma rkov pro cess
with sta tio narytra nsition prob abilities th e stable sets are the recu rrentcommun ication
1
cla sses of the process . As Youn g [19 93 ] has sh own, o ne can select a mo ng the recurrent
communica tio n class es by co nsid ering limits o f perturbedprocesses as the p erturbation
van ishes. The limiting dis tribution will have its sup p ort concentra ted on the co
mmu-nica tio n cla sses w ith th e least stochas tic potential. In g eneral, it will be dicult to
cha racterizethes e co mmunica tion cla sses in terms o f the underlying game. T he present
paperp ro p o sesacla ssofdynamicsforwh ichs uchacharacterizationisp o ssible;fo rth ese
dynamics the recurrent co mmu nication clas ses o f the u np erturb ed d yna mic correspond
to the minima l curb sets of the underlying ga me. T hecorrespondenceis as fo llows: for
a given recurrent communica tio n cla ss th ere is exactly o ne minima l curb set such that
each sta te has su pp o rt on ly o n th is minima l curb s et; conversely, given a ny minima l
curb set,everys tatew ithsup p ort onthis minimalcurbsetb elo ngs to o nea nd the sa me
recurrentcommu nicationcla ss. Furthermore, fo rano n-trivialclass o fg ameso necan
de-rivesucientco ndition sforthe selection of aparticula r minima l curb setvia p erturb ed
dynamics. Interestingly,it turns out, that the payo s inthe equilibria belong in g to the
minima l cu rb sets play a seconda ry role as far a s s election is co ncerned. Supp o se one
sta rts with a strict equilib rium that is selected by the p erturb ed dynamic. If o ne then
repla ces this equ ilibrium with a g ame that is a minima l curb set in th e new ly fo rmed
ga me and has a uniqu e equilibrium with the same payos a s the origin al equ ilibrium,
this minimal cu rb setneed not b e selected by th ep erturb ed dynamic.
Th e paper is orga nized a s fo llows. The next section describ es the model. Section 3
introduces th e dynamic without muta tion s a nd derives the g lo bal co nverg ence to
min-ima l curb sets. Sectio n 4 introduces mistakes in the implementa tion o f stra teg ies and
demo nstratesthatthedynamics electsamongexhaus tiveminimalcurbsetsintwo-p layer
ga mes. Section 5 conclud es.
2 The Setup
Co nsid er a nite set of p o pulations P w ith typical element p 2 P : Denote the size of
p o pulatio n p by N
p
: Each p o pulation corresponds to one of the players in the ga me
that N
p
>#(S
p
); the cardinality of player p 'ss trategy space. S :=2
p2P S p with typica l element s2 S;and u p
is atype p player's utility functio n; u
p
:S !<: Let 6
p
denote a
typepp layer'sseto f mixed strategies,and 6:=2
p2P 6
p
: A typical elementof 6will b e
denoted by: If we exclud ethe pth element fro m; th e resulting vecto rwill b ewritten
as
0p :u
p
extends to 6in the us ual way.
For any nite set X; let 1(X) sta nd for th e s et o f proba bility distrib utio ns over
X: Let BR
p
(1) deno te player p's pu re b est reply co rrespondence, and dene BR ( ) :=
2 p2P BR p ( 0p
): I will also use the na tural extension o f BR (1) to sets of s trategies as
arg uments.
Basua ndWeibull[1 991 ]introducedth enotionofcu rb(cl osedunderration albe hav ior)
sets. Aproduct seto fs trategiesQ=2
p2P Q p ;Q p S p
;isclosed underinclusio nofb est
replies (curb)if ea ch Q p isno nempty and BR(2 p2P 1(Q p )) Q:
Ifacurbs etsdoesno tpro p erlyco nta inano thercurbs et,itiscalledminima l. The
strate-gies which form a min imal curb s et a re called curb strateg ies, and equilibria b elo nging
to minimal curb s ets are curb equilibria.
Blume [19 94 ] and Hurkens [199 3] show that the curb equilib rium requirement selects
ecient outcomes in g ames w ith pre- play co mmunica tion . Blumeconsiders g ames w ith
costles smessag es ;Hurkensana lyzestheca seofnomina lmessa gecosts . Cons iderthetwo
ga mes b elow. Bo th g ames have two minimal curb sets corresp o nding to the two strict
Nash equilibria (U;L)an d (D;R):
U D L R 3 ,3 0 ,0 0 ,0 1 ,1 G 1 U D L R 9,9 0,8 8,0 7,7 G 2
Ifweallowplayer oneto sendoneo f twomes sag esm
1 o rm
2
b eforeplaying th eg ame,
(m 1 ;U) (m 1 ;D) (m 2 ;U) (m 2 ;D) LL L R R L RR 3,3 3 ,3 0 ,0 0,0 0,0 0 ,0 1 ,1 1,1 3,3 0 ,0 3 ,3 0,0 0,0 1 ,1 0 ,0 1,1 0 1 (m 1 ;U) (m 1 ;D ) (m 2 ;U) (m 2 ;D ) LL LR RL RR 9 ,9 9,9 0,8 0,8 8 ,0 8,0 7,7 7,7 9 ,9 0,8 9,9 0,8 8 ,0 7,7 8,0 7,7 0 2
Ifmes sag esarecos tless,thenBlume[199 4]show stha tallcu rbequilibriain0
1
support
the ecient payo pa ir (3 ;3): This result g eneralizes p rovided a conditio n that trades
o the risk o f the ecientequilibriumin the un derlying ag ainstth e size of the mess age
spa ce is s atised. In 0
2
which is ba sed on G
2
; all equilibria are curb equilibria b ecause
there is a tension between risk do minan ce an d Pareto do minan ce in the underlying
ga me. If mess ages carry a no minal co st which distingu ishes them, these resu lts can
b e streng th ened cons iderably. Hurkens [199 3] shows that with no minal mess age costs ,
0m 1 <m 2 ;f(m 1
;U)g2fL L;LRgisthe uniqueminimalcurbsetinb oth ofthea b ove
communica tio nga mes. He shows that th is resu ltg enera lizesto n-p layerg amesinwhich
a subseto f the player setcan s end amessa ge.
3 Dynamics
In this sectio n I des crib e the lea rnin g dynamic, and cha racterize its lon g-run ou tco mes
in the ab sen ceo f experimentation. I will showthat the process converges a lmos tsurely
to a curb s et, regardless o f the in itial p o pula tio n sta te. In the fo llowing section I will
exa mine this proces s further under the co ndition tha t the experimentation pro bability
is dierent fro m zero.
Th estate ofpopulationpa t timet isgivenby the vector !
p;t =fs it g i2p :The s tate of
thed yna mics ys temattimetisg ivenby!
t =f! p;t g p2P :Inp erio dtstate! t01 isrepla ced
Everyagentinpplaysag ainstallagentsinPnp:Whensheplaysinperiodt,ana gentuses
stra teg y s
i;t01
; unless she experiments (o r ma kes a mista ke). E ach agent experiments
with proba bility 0;in thatcase she choos es ea ch ofher pure strategies withs trictly
p o sitive pro bability. There a rethree dierent ways in w hichag entsprocessinformation
in periodt and thereby genera tethe n ew sta te s
t
: With p robability
1
an a gent ado pts
a b est reply ag ainstth e current distributio n o f actualp lay in period t;with pro bability
2
she ga thersinformationa b o utthe stra teg ies that weread optedlast p eriod,a nd w ith
prob ability 3 =10 1 0 2
she isin activeinperio dt:SubsequentlyIw illreferto ag ents
in these various ro les a s, b est resp o nding, info rma tio n ga thering a nd inactive players .
A best resp o nding a gent meets allag ents from populations she does not b elo ng to and
lea rns th e truedistributio no f currentplay (includin gmistakes ) inthe cu rrent periodin
thos epopulations;shethenado ptsabestreplya gainstthisdistribu tion . Aninformation
ga theringagenttakesas ample(withrepla cement)fro mthes tra teg iesth atwerea dopted
in period t01 (the intended play of periodt); s he then a dopts an -b est reply a gainst
uncorrela ted beliefs based on this sa mple. An ag ent who is inactive in perio d t pa sses
throu gh tha t p erio d with out chang ing her strategy.
Playwithmis takesinperiodtgeneratesatemporarys tate!~
t
towhichb estresponding
playersb est res p on da t th eend of p eriod t. Note tha t fo r ea ch ag ent i2 peach pa rtia l
temporary state !~
0 p;t
natura lly can b e identied with a n u nco rrelated b elief
0 p = 1 21112 p01 2 p+1 21112 #(P)
foragenti2poverS
0p :=2 q 6=p S q ;whichitself ca nb e
identiedwithanelementof6
0 p =2
q 6=p 6
q
:Thisbeliefisbasedonthe observedrela tive
frequencies of strategies in th e p o pula tio ns not includ ing i: With this identication of
sta tes a nd beliefs we can say that a b est resp o ndin g agent i2p a dopts a pure strategy
inBR
i (~!
0 p
):Fo rthe remainderofthissectio nIwillsettheexp erimenta tio npro bability
to zero; I returnto the ca seof >0 inth en ext sectio n.
D eno te the p erio d t sa mple frompopulation pobta ined by a ninformation gathering
ag ent i 2= p by X
ipt
: Agent i's entire p eriod t sample is then X
it := fX ipt g p2P;p63i L ike
sta tes, s amples g ive rise to uncorrelated b eliefs. T herefore it makes sense to co nsider
BR
(X
it
the sa mp le X
it
: An info rma tio ng athering a gent i ado ptsone o f these -bestreplies ; 2
in
case of indierence, she rando mize, putting strictly positive pro bability on each o f the
stra teg iesin BR
i (X
it
):L et sa mple s izesb e time inva ria nt an d deno tethe sizeo f p layer
i's samplefro m p o pula tio np; i62p ; by N
ip :
Th e dynamic process described here is a Markov cha in with stationa ry trans ition
prob abilities on the sta te spa ce, d eno ted by : The rs t objective of this pa p er is to
cha racterize the recurrent communica tio n cla sses of this Markov cha in. The recurrent
communica tio n cla sses a re s ubsets o f such tha t (i) fro m every sta te there is a nite
leng th sequence of positive probab ility transitio ns to at lea st on e o f thes e class es, (ii)
within each clas s every sta te can be rea ched from every other sta te via a nite length
sequen ce of positivep robab ility tra nsitio ns, and (iii) no state outside one of the cla sses
can be rea ched fro m a sta te inside throu gh a p o sitive pro bability transitio n. Since
(minima l)curb setswillgurepro minentlyinthis chara cteriza tio n,deneaseto f states
supportedentirelyo none(minimal)curbset,an dincludingallsuchstatesa sa
(minimal-)curb-sta te set. L et 9(2):=f 26jsupp( i )2 i g 82S:
For any subs et 2of theseto f pure stra teg y proles,this isthe setof allmixed strategy
pro les o r equivalently uncorrela ted b eliefs with support in 2: For a ny s uch 2 one can
dene the set
V(2) :=2[BR (9(2)) 82S
consistingofthe unionof2an da llbes trepliesaga ins tuncorrelatedbeliefsco ncentrated
on 2:L etV t
deno te the t-fo lditeratio nof V:It isea sily seen that,sta rting with aset2
2
Iuse-b es tre pliesb ec ausethes amplingpr oc es sonlyallowsonetoapproximateth es etofp oss ible
b e liefsove ragivensup p ortofst rategie s. Altern ativelyonecouldpr o cee dlikeHurke nsandar guedir ec tly
inte rmsofs upp orts ;i.e . onec oulds implyp ostu lateanupdat ingr uleinwhichplayersc anmovetoany
st rategywhich isa b es tr eply to some belie fs over agive ns upp ort. Also, in agene ric clas sof games ,
we c an re plac e-b es t replie sby b es t replie s inourdynamic . Th e gene ric prop er tyto lo ok for isthat
one reaches a xedp ointof V by iterating V suciently o ften. Lemma 1 82S; 9T : (8t >T;V t+ 1 (2) =V t (2)): Proof: V :2 S !2 S is monoto nica nd 2 S is n ite. 2
We next need notation to d escrib e the xed p oint that is reached if one starts the
itera tion with the support 2()of a stra teg y prole:
L et 2() := fs2Sjs i 2supp( i )g t( ) := minft2NjV t+ 1 (2())=V t (2())g W() := V t( ) (2( ))
t( ) is the minimal number of perio d s needed b efore one reaches the xed p o int from
2( ); a nd W( ) is the xedpointrea ched fro m.
Lemma 2 W( ) contains a minimal curb set for all 26:
Proof: W( ) is clos ed under inclusion o f b estreplies. 2
Th esetofstatesca nbeidentiedw ithanitesub seto fthe seto fmixedstrategies .
The dynamics ca n b e describ ed by atran sition pro bability (1j1) such tha t 8 ; 2 ;
(j) deno tes the proba bility th at the system will b e in state in p eriod t +1; if
in period t it is in state : dep ends on population sizes , sample s izes a nd : Let
N :=ffN p g p2P ;fN ip g p2P ;i=2p gg: Let ;N
b ethe tran sition proba bility as a function of
and N:
Th es etdoesnotconta ins tatescorresp o ndin gtoeverybelief. Therefo reitisp o ssible
that ou rdynamic with bes t replies, instead o f -b est replies , does not leave a given set
of states even tho ugh that s et is n ot a cu rb-sta te s et. T he following game provides an
t 1 t 2 t 3 s 1 s 2 s 3 1,1 1,0 0 ,b 1,0 1,a 0 ,b 0,0 0,0 .1 ,.1 where a= p 2;and b= p 2 1+ p 2 :
In this ga me only the strict equilibriu m (s
3 ;t
3
) fo rms a minimal curb set. However
the pro d uct setfs
1 ;s 2 g2ft 1 ;t 2
g is closed underin clusionof bes t repliesa gainst beliefs
for w hich ea ch pro bability must be a rationa l numb er. Only if th e column player puts
prob ability b o n the rst s trategy of h er o pp o nent is h er th ird s trategy a b est reply
ag ainstbeliefs co ncentratedo n the rst twostrategiesof h er o pp o nent.
Onthe otherhand,w ithsucientlyla rgesamplesizesa nyb eliefca nb ea pproximated
arbitrarily clos ely. Therefore, with - b est replies there is a cha nce that our dynamic
eventu ally leaves every setthat is not acurb-sta te set. 3
This mo tiva tes the next lemma
which s ays that every b est reply to a given b elief is an -best reply to an o p en neig
h-b o rhoo d of that b elief. Thus, if we can a pproximate b eliefs arb itrarily closely, the set
of b est rep lies to any pro du ct set o f stra teg ies is a sub set of the set of - b est replies to
theniteapproximationofthesa meset,p rovidedtheapproxima tio nissucientlyclo se.
Lemma 3 8>0; 8 26; 8s i 2BR i ( );9 >0: j~0 j< )s i 2BR i (~ ):
Proof: Suppose no t, and let s
i
2 BR
i
( ): Then there exists ~
n ! ; t i (~ n ) 2 S i s uch that u i (t i (~ n );~ n )>u i (s i ;~ n )+ 8n:
Co mpa ctness o f the strategy spa ce and continuity of u imply th at there exists a t
i 2S i such that u i (t i ;)u i (s i ; )+; 3
whichco ntra dictss i 2BR i (): 2 For a ny g iven 2 6; ;N
induces a p robability distribution over the set 2 S
of
supports. This is the prob ability th at next period's state will have a certa in support
giventha tthe cu rrentsta teis:Letthepro babilityo fsupport222 S
giventhe current
sta teis bedeno tedP
;
N
(2j);giventhetransitio nproba bility
;N
(1j1):L etP t
;N (2j )
denote the pro bability of s upp o rt 2after t periodsif the initial sta te is : In p articular
P 1 ;N (2j)=P ;N (2j): LetN min :=minfN ip g i2I;p2P :
A central cha racteristicof ou rlearning ru le is that fro m a ny s tate th ereis p o sitive
prob abilitytha t next period'sstate willhaves upp o rt V(2( )):Itera ting thisargu ment,
onemayconclude thatfroma ny thereisp os itivep robab ilitytha t a fteranitenumb er
of p eriods the state h as a s upp o rt which is a xed point of V(1): This is the content of
the follow ing lemma .
Lemma 4 If 2 ; 3 > 0 ; t hen 8 > 0 ; 9 N;T : N min > N ) P T ;N (W()j)>0; 82 :
Proof: Let( )b ethesetofpossibles amplesw ithreplacement,giventhecurrentstate
:: G iven the identication of s amples with b eliefs, () is a nite approxima tion of
9(2( )):Furthermore,()co nverg esto9(2( )) inthe Ha usdorsenseas N
min
!1:
Let BR
i
(9(2())) be the (nite) set o f (pure) bes t replies by a gent i to the beliefs
concentra ted o n 2(): For every i; 8s
i 2 BR i (9(2())); choose (s i ) 2 9(2( )) s uch that s i 2 BR i ((s i
)): Note tha t for every > 0; 9N
: N min > N ) 8;8i;8s i 2 BR i (9(2( )));9~(s i ) 2 ( ) such th at j(s i )0~(s i
)j < : To see this note tha t there
are nitely ma ny i; nitely many combina tion s o f 2( ) and s
i 2 BR i (9(2())); and that ea chs ing le(s i
) can b e app roximated by abelief in():
By Lemma 3 8; 8(s i );9 > 0 : j(s i )0 j < ) s i 2 BR i (): Since there
are n itely many su ch (s
i
) to cons ider a cro ss a ll individuals a nd a ll su pp o rts , we can
intercha nge quantiers to o btain 8; 9 :8i;8(s
i );j(s i )0j<)s i 2BR i ():
Co mbinin g the las t two o bservations, we may conclude tha t: 8 > 0 ; 9 N : N min > N ) 8;8i;8s i 2 BR i (9(2())); 9~(s i ) 2 ( ) such that s i 2 BR (~(s i )): Since all
sa mples fro m( (t)) have p o sitivep robab ility, ea chs
i 2BR
i
(9(2((t)))) has p o sitive
prob ability o f being in the support o f (t+1); beca use
3
> 0; a ny s
i
2 2( (t)); also
has p o sitivep robability of being in the support of (t+1):
Since N
p
> #(S
p
); with p o sitive pro bability all the s trategies in V(2( (t))) are
present in the p o pulation in p eriod t + 1. Therefo re, 8 > 0;9 N : N min > N ) P ;N
(V(2( ))j ) >0;8 2: The conclusion fo llows by app lying this last o bservation
rep eatedly a nd combining it w ith Lemma 1. 2
Acco rding to the lemmath ereexis tsan upp er b o undon the min imalnumbero f
posi-tiveproba bilitytra nsition sittakesfroma nyinitials tate torea chasta tewhich\ covers"
a cu rb set. At tha t point there exis ts a positive proba bility tra nsition into a minima l
curb set. Corollary 1 If 2 ; 3 > 0 ; then 8 > 0; 9 N; T 0
suc h that for N
min >
N; from any
initial state 2 ; the system move s in to a min imal -c urb -state se t after no more than
T 0
it erations with positive probabil ity.
Proof: Fro m the propositio n, a fter T s teps the system rea ches a sta te w hose support
\includes" a minima l curb set. There is p os itive proba bility that in the next ro und all
ag entsare activeand draw s amplesfro m the curb set. Let T 0
=T +1 : 2
Th e follow ing lemma veries that if, for a given ga me, is chosen suciently small,
then thelea rning dyn amicca nnotexitaminima l- curb- state curbs etonce ithas entered
it.
Lemma 5 9 > 0 : 80 < < ; if (t) is an el emen t of a curb- state se t 2; the n
supp ((t+k))2; 8k0 Proof: If s i 62 BR i (2); th en there exists (s i ) > 0 such th at 80 < < (s i ); s i 62 BR i (2): Co nsider :=minf(s i )ji2I;s i 62BR i (2);2 Sg: 2
the lea rning process will eventu ally end up in one of th eminimal-curb-s ta te sets.
Proposition 1 If
2 ;
3
> 0 ; t hen, 9 > 0; such that 80 < < ; 9
N such that for
N
min >
N; and for any init ial state 2 ; the l earnin g process conv erg es almost su rely
to a minimal- curb- state set.
Proof: For ag iven let N
min
a nd T 0
begivenas in Coro llary 1such that
P
;N
( (t+T 0
)2minimal-curb-s tate setj (t)=)>0 8 2:
Then
P
;N
( (t+kT 0
)62minima l-curb- state setj (t)=)(10 ) k
:
Thus the pro bability tha t the sys tem does not converg e to a minimal-curb-sta te set
equ als lim k!1 (10) k =0: 2
Not only does the learning dynamic co nverge g lob ally a lmos t surely to one of the
minima l- curb -state sets. Inside s uch sets every state is reached fro m every oth er state
via anite len gth sequenceof p o sitive pro bability trans itions. This follows fro mlemma
4. Therefo re we havethe following co rolla ry:
Corollary 2 If
2 ;
3
> 0 ; then , 9 > 0; suc h that 80 < < ; 9
N suc h that for
N
min >
N; t he minimal- curb- state sets are the rec urre nt commun icat ion c lasses of the
learnin gdynamic .
3.1 A S imple Best-Reply Rule
In some interesting cla sses o f g amesone obta ins co nverg enceto minimal- cu rb -statesets
from a simplebest-reply rule. 4
Cons ider a dynamicin w hich agents are either inactive,
4
with pro bability 10; o rmove to a b est reply, with proba bility ; 0< <1:One can
easily check that this p ro cess converges almo st surely to the unique minimal-cu rb-state
setinthe pre-playco mmu nicationgamewith red ucedno rma lfo rm0
1
:Inth is ca sethere
isnon eedtointroducesamplingw ithreplacementtog en era tearicheno ughseto fbeliefs
out of the current state.
It is ea sy to s ee tha t this observation g enera lizes fo r this clas s of commun ication
ga mes. However, it is no t clear how far o ne can extend it b eyo nd this class. In this
subs ection I w ill s how tha t the o bservation is valid fo r all two -player i2j- games w ith
i;j 3:Inthis cla sso fga mesitissucientthats omeagentsmovetoabes trep ly while
others d on't to g enerate beliefs w hich will indu ce exit from any product set, that does
not co nta in a minimalcurb set.
For technical rea sons Iwillag ain replacebestrepliesinthe dyn amicby -bestreplies
and refer to the \-b est- reply rule." In the proofs I w ill arg ue in terms of b est rep lies .
Thissucesfo rgenericgames;th earg umentsforg eneralga mesintermsof-bes treplies
area nalogo usto theo nesma deabovean dtherefo reomitted. Therearetwopopulations ,
p=1,2.
Proposition 2 Le t G be an y i2j- game with i;j 3 : Then , for all 0 < < 1; the re
ex ists ; such that for all 0<< t he re e xists
N such that N
p >
N impl ie s that unde r
the -best -repl y ru le the proce ss will almost surel y conve rge to a minimal- curb- state set.
Proof: Noteth atfro manys tatethereispositivepro babilitythatinonestepthepro cess
moves to a s tate such that within each population a ll ag ents use the sa me stra teg y. If
one ofth esestrategiesis aminimalcurb strategy, weare done,becaus e thereis p o sitive
prob abilitytha t onlythe otherp o pula tio nmovesan d thatw ithin that p op ula tion every
ag entmovestoabes treply. Th ereforecontinueun dertheass umptionthatwes tartw ith
a sta te in wh ich every memb er of a p o pula tio n us es the same stra teg y wh ich is not a
minima lcurb strategy.
Since this initial state (s
1 ;t
1
one a nd the o ther is co ncentrated on two strateg ies; e.g ., th e n ew state is sup p orted on fs 1 ;s 2 g2ft 1 g: If s 2
is a curb stra teg y we are do ne b eca use (s
2 ;t
1
) wa s o ne of the
sta tes reach ed with pos itive pro bability from (s
1 ;t 1 ): Oth erwise, fs 1 ;s 2 g2ft 1 g is not
curb,w hichmeanstha t either s
3
isab estreplyto t
1
;inw hichcas ewea redon e b ecause
there a re o nly three s trategies a nd one of them has to b e cu rb, o r there exist beliefs
concentra ted o n fs
1 ;s
2
g such tha t th e column p layer ha s a b est reply which is not t
1 ;
without lo ssof g en era lity let itbe t
2 :If t
3
is a lso abest reply to so me b eliefso nfs
1 ;s 2 g we a re do ne becau se eith er t 2 o r t 3
wo uld have to be a curb stra teg y. Sup p os e no t,
i.e. let t
2
be the o nly b est reply and let it no t b e a curb stra teg y. Note that from
the initia l sta te(s
1 ;t
1
) (almo st)every distributio n overthe twostrateg iess
1 a nd s
2 h as
p o sitive proba bility. T herefore, with p os itive pro bability we move to a s tate sup p orted
on fs 1 ;s 2 g2ft 1 ;t 2
g: In particular, fo r (almost) every distribution over ft
1 ;t
2
g; th ere is
a co rres p on ding state w hich can be rea ched with positive proba bility.
None ofthe strateg iess
1 ;s 2 ;t 1 ; a nd t 2
isa cu rbstra teg y. And since thereis no belief
overfs 1 ;s 2 gtowh icht 3
isabestreply,itmustb ethecasetha ts
3 isab estreplyto so me b eliefsoverft 1 ;t 2
g:Su pp o sewearea tap o sitiveproba bility statewherethedistribution
over ft 1 ;t 2 g is such that s 3
is a b est reply. At tha t p o int there is p o sitive pro bability
that the entire row p opu lation moves to s
3
which from th e forego ing must b e a curb
stra teg y. 2
Th ere is still the question o f whether it is pos sible that the dyna mics may end up
b eing conned to a subs et of a minimal-curb-sta te set. Ideally we would want to show
that the recurrent communica tio n class es o f the - b estrep ly dyna mic coincidewith the
minima l- curb -state s ets. Iw ill prove a somewhat wea ker res ult.
Proposition 3 Le tGbeanytwo-pl aye ri2j- gamewithi;j 3. The n,forall0<<1 ;
the reex ists;suc ht hatforall0<<t heree xists N su chthatforN p > N t hefollowin g
holds: Fore ve rymin imal curb set Q=Q
1 2Q
2
of G; e ve ry q2Q
i
; and ev ery recu rre nt
Proof: Obviou s for 12j;j =1;2;3;min imalcurb sets.
Wha tab o ut222 ?C onsideraminimalcurbsetofthefo rmfs
1 ;s 2 g2ft 1 ;t 2 g:Without
loss o f generality we can sta rt with a pure strategy combina tio n, s ay (s
1 ;t 1 ): Also wlo g s 2
is a b est reply, and there exis ts a b elief, ; over fs
1 ;s 2 g to which t 2 is a b est reply.
The result fo r 222 minima l curb sets follows b eca use we can move any fra ction o f the
p o pulatio nto abestreply.
Next co nsid er minimal curb s ets of the fo rm fs
1 ;s 2 g2ft 1 ;t 2 ;t 3 g: Witho ut loss of
generality we ca n startthe pro cessat (s
1 ;t 1 ): Th eneither s 2 ; or t 2 ; or t 3 is a b estreply. Sup p os e rsttha ts 2 isab estreply tot 1
. T henthe dynamicca ngenerate a llp o ssible
b eliefs ofthe colu mn playerover fs
1 ;s
2
g. Since weare d ea ling with amin imal curbset,
theremus texistb eliefs
2 a nd
3
concentra tedonthiss etsuchthatt
j
2BR (
j
);j =2;3 :
Nowsupposein stea d thats
2 isno ta b estrep ly to t 1 :T hen ,if t 2 and t 3 a reboth b est repliesto s 1
;the dyna miccan generatea lldistributio nsoverft
1 ;t 2 ;t 3 ga nd therewillb e
at least one distrib utio nover these three strategies whichma kes s
2
a b estreply.
It rema ins to consider the ca se where o nly t
2
is a b est reply to s
1
: In that cas e s
2
mus t be ab est reply to t
2 ; for otherwise fs 1 g2ft 1 ;t 2
g would forma minimalcurb set.
This is a nalog ous to the cas e w heres
2
wa sa b est replyto t
1 :
Nextcons ider minimalcurb sets of the formfs
1 ;s 2 ;s 3 g2ft 1 ;t 2 ;t 3
g:As befo re, wlog ,
we ca nstart the dyna mico utat astatecorresp o nding to the pu restra teg y co mbination
(s 1 ;t 1 ):Als o,wlo g,s 2 is ab est reply to t 1
, which means tha t wecan moveto a ny belief
concentra ted on fs 1 ;s 2 g: If s 3
is a b est reply as well, we a re don e b ecaus e we can move to any mixture over
fs 1 ;s 2 ;s 3
g a nd there is at lea st one such mixture fo r each t
2 a nd t
3
which ma kes them
b est replies. Th ereforesupp o seth at s 3 is n otab est reply to t 1 :Ifthere a rebeliefs 2 and 3 over fs 1 ;s 2 g such that t j 2 BR ( j
);j = 2;3; we ca n generate all b eliefs over ft
1 ;t 2 ;t 3 g in
the fo llowing way. First move to a sta te su pp o rted on fs
1 ;s 2 g2ft 1 g corresp o nding to the b elief j
;j = 2 ;3 with the least weig ht on s
2
; say
2
: Then, simulta neo usly move
the rowp op ula tion to
3
an d thedesired fra ction of th eco lumnpopulation to t
2
:Inthe
b ecaus e we haveb een able to g enera te allposs ibleb eliefs overft 1 ;t 2 ;t 3 g
Sup p os enextthatonlyfort
2 thereisabelief 2 s upp o rtedo nfs 1 ;s 2 g2ft 1 gsuchthat t 2 2BR( 2 ):Thens 3
must beab estreply to somebeliefs over ft
1 ;t
2
g:Bya na rgument
ana lo go us to the on e just given it follows tha t we ca n g en era te all beliefs over fs
2 ;s
3 g:
If there is such a belief such tha t t
3
is a b est reply, we a re don e. Otherw ise, s
1 must
b e a best reply to s ome b eliefs concentrated on t
1 ;t
2
: Note als o that t
2 must b e a b est reply to either s 1 or s 2
:In either case we can repea tthe constructionfro mthe previo us
para gra ph to genera te allbeliefs overfs
1 ;s
2 ;s
3
g which co ncludes the a rgument. 2
4 Selection
In the previo us section I considered learning without exp erimenta tio n, mis takes o r
mu-tations . I showed tha t from any initial condition th e dynamic co nverg es a lmost surely
to on eof the minimal-curb-sta tesets. Th eunperturbeddyna micdoes notselecta mo ng
curbsets . Wo rkbyYoung[19 93 ],Kando ri,Maila thandRob[199 3], Ellis on[1 99 3]s hows
that simila rdynamicsselecta mo ng strictNas hequilibria,providedtheyare a ugmented
to allow fo r mis takes. Samuelson [199 3], Noldeke a nd Sa muels on [1 993 ] [199 4]
inves-tigate selectio n amo ng no nsingleto n recurring commun ication classes. Hurkens [19 94 ]
show stha t an intuitivecla ssof dynamics whichconverg esgloba lly to curbs etsdoes not
select a mong them o nce mistakes a re a dded . I w ill show in this sectio n tha t adding
mista kes to the population learningd yna mics lea ds to selectio nmuch like in the works
cited a b ove. In this s ection I will a ssumethat
1
;>0:
Th e key idea is tha t mis takes mus t remain tra nsient; it mus t no t be p o ssible for a
smallnumb er of mistakes to prop aga te throug h the system and to indu ce larg e eects .
This pro p erty, transie nce of mistak es, is sha red by the dyna mics of Ka ndori, Ma ilath
and Rob, Noldekeand S amuelson, etc..
Th emain result of thissectio nrelies on aproperty of Ma rkov chainswith s tationary
trans ition pro babilities w hich was estab lished by Young [1 993 ]; Freidlin and Wentzell
ability
0
: As sume that with high proba bility the process follow s
0
; but with so me
prob abilityag entsma ke mistakes. Letthe co rrespond ing noisytransitio nproba bility b e
denoted by
w here isa pa rametermea suring theoverall levelo f noise inthe system.
Assumethat
satises the fo llowing three properties:
1.
isaperiodica nd irred uciblefo r all 2(0 ; ];
2. lim !0 (j)= 0 (j); 8; 2; an d 3.
>0 for some implies 9r0:0<lim
!0 0 r (j)<1:
It is wellknownthat the rst pro p erty imp liesthat
h asa uniqu es tationary
distribu-tion,andth atthiss tatio narydistributio ndescribesthelong- runbehavioro fthedynamic
irrespective of initia l co nditions . For a ny s ta tio nary distributio n 0 o f 0 let 0 denote
the prob ability a ssignedto the state by 0
:
Ifthe trans itionfro mto isno timp o ssibleu nder
;r(; ) =r iscalledthere
sis-tanceofthetransitio nfromto :Let
1 ;
2 ;:::;
J
denotetherecurrentcommun ication
cla sses of
0
: Fo r a ll i;j; i 6=j; let r
i;j
be the least resis tance a mo ng all directed pa ths
b eg inningin
i
and endingin
j
:Deneag raphG w ithvertices in dexedby f1;2;:::;Jg
and for each i;j-p air a directed edge (i;j) with weight r
ij
: A j-tree in G is a sp anning
subtreeo f G;i.e., fo reveryvertexi6=j thereexistsexa ctlyone directedpathfromito j :
The totalresista nce ofaj-tree is thesumo fthe res ista nces ofthe directededg es inthat
tree. T heleast tota l resistanceamong a llj-trees ,denoted
j
; isthe s to chasticpotentia l
of the recurrent co mmunication class
j
:Young provesthe following prop o sitio n.
Le t
be the uniq ue stationary distrib ution of
; for any : Then,
1. as !0;
con ve rge s to a stationary distrib ution 0
of
0 ; and
2. is stoc hast icall y stabl e ( 0
> 0) if an d onl y if is an e le ment of the recu rre nt
commu nication cl ass with min imum stochastic pote ntial .
Note that thelearning process we exa mineinthisp ap er is a p eriodica nd irreducible,
p erturbedprocesshasonlyone recurrentco mmunicationclas s. Youn g'spro p os ition and Co rollary2 imply: Proposition 4 If 1 ; 2 ; 3
; > 0; then , 9 > 0 ; suc h t hat 80 < < ; 9 N such that for N min >
N; is stochastically stabl e if an d on ly if it bel ong s to a minimal- curb- st ate
set with minimal stoc hastic poten tial.
Als o, a moment's re ectio n s hows tha t fo r a ny two states a nd ; the res ista nce
r (;);ifitisn ite,isequa ltotheminimumnumb ero fmis takesneeded tomovefrom
to :Recalltha tfo rth edynamicpro p os edhere th eroleo fmistakesistoa ctivatecertain
b est replies of players. The mista kes themselves d o not move the system. If there are
sucientlymanymista kes,astrategymayb ecomeabes treplythatwasn'tb efore;more
mista kes,of th eright kind,will ach ievethe sa me eect.
I will demonstra teselection among minimalcu rbs ets fo raclass oftwo- playerg ames .
Thisisthecla ssofgamesw ithtwominimalcurbs ets,Q 1
a ndQ 2
;suchthatea chstrategy
of ea ch player is in the projectio n of a t lea st o ne minimal curb s et. In th at case I will
say tha t the two minimalcu rbsets a re exhaustiv e.
From Co ro llary 2 a nd Propositio n 4, we know tha t if
1 ;
2
; and
3
are all positive,
then there a retwo recu rrentcommunica tio n clas ses ofthe unp erturbed process,
1 and
2
; co rresponding to the two minimal curb sets, a nd the limit stationa ry distribution
willass ignpositiveweighto nlytothestatesinth eminimal- cu rb -statesetwithmin imum
stocha stic potentia l.
In o rder to nd o ut which
j
is selected, we n eed to calculate the paths of least
resis tancefrom
1 to
2
an dviceversa. Weca nmovefrom
1 to
2
wheneversu ciently
many typ e-1 (or typ e-2) players ma ke a mistake a nd u se a s trategy in Q 2
: Su ciently
manymis takesofthe rightkindeventua llyturnactio nsinQ 2
intobes treplies. Oncethe
currents tateissupportedo nstra teg iesfro mbothcurbsets,theunperturb edcomp o nent
oftheproces sta kesovera ndmovesthesta teintoeith eroneth eofthetwo
2
b elong s to the ba sins of a ttra ction of b o thminima l- curb- state sets, is a consequence
ofsa mplin g(not necessa rilyw ithrepla cement). Theleast numberofmista kesneededto
trans it from
1 to
2
can be expres sed in termso f the players'beliefs. L et
p bep layer p's b elief and p (Q j 0 p
) the proba bility wh ich player p's beliefs assign to Q j 0p : Dene j p :=minf p (Q j 0p )jBR( p )\Q j p 6=;g:
Thisisthelea stpro bability playerpca na ttachtothe setofstrategiesQ j
0p
andstillhave
a b est reply in Q j
p
: For simplicity a ssume that p o pula tio nsizes are the s ame and equa l
to N, an d for any real number x let [x] b e the sma llestinteg er greatertha n or equa lto
x: The leas tnumb er o f mista kesneeded to tra nsit from
i to j is then equal to minf[ j 1 N];[ j 2 N]g:
This obs erva tio nu ses the fact that anys tate withpositivesu pp ort o ns trategiesfrom
b o thcurb s ets b elo ngs to the ba sins o f attraction o f b o th curb sets. D e ne
j :=minf j 1 ; j 2 g:
Withthese preliminaries we have the fo llowingresult:
Proposition 5 If G is a tw o-pl aye r g ame , with tw o exhaust iv e min imal curb se ts Q 1 and Q 2 , if 1 ; 2 ; 3
; > 0; then , 9 > 0; such that 80 < < ; 9
N suc h that for
N;N
min >
N; is st ochastically stable if andon ly if 2
j ; an d j =minf 1 ; 2 g:
In the case where G is a symmetric g ame a nd th e two curb sets are s trictequilibria ,
the selectio ncriterion inthe theoremreduces to the fa miliarrisk domina nce criterion of
Harsa nyi and Selten [198 8].
5,1 1,5 0 ,0
1,5 5,1 0 ,0
0,0 0,0 2 ,2
Inthisexampleth eupp er-lefthandcurbsetwillb es electedbythepopulationlearning
dynamic. Aftera llb o thplayerscang uara nteethemselvesapayoof3ag ainstanybeliefs
concentra ted on this s et w hich shows that mo re mistakes are needed to upset a state
supportedo nthis curb settha n asta te whereallag entsuse their third strateg y. Inthis
exa mp le,aspecicatio nofthe underlyingd eterministicproces sinKa ndori,Mailath and
Rob's (KMR)[1 993 ]dynamicsuchthat ad ju stments p eedsa reequa linboth dimensio ns
in regions where th eb asins o f attra ction of the equilibria overla p would yield the sa me
selectio n. This show s once mo re tha t s ampling w ith repla cement is not a n ecessary
conditio nfor selection among curb sets w hichare no tstrict equilibria .
Note that the s election we o btain here do es no t dep end o n the values of
1 ; 2 and 3
; as lon g as they a re a ll p o sitive. Furthermo re, thes e valu es ca n b e dierent a cro ss
p o pulatio ns, o r even w ithin a population. Thus there is a wide rang e of mista ke-free
dynamics which yield the same s election. This is a result o f the fa ct that the bas ins of
attra ctiono f dierent curbsets overla p. In thea b oveexampleand in222-coordination
ga mes it is the ca se tha t fro m a ny sta te no t s upp o rted entirely on on e of the two curb
sets either of the two cu rb sets can b e reached without mistakes. Th is phenomen on is
already notedin KMR'sp ap er. Theya ls op o intout tha tthere are dyn amicsinthe two
-p o -pulatio nscena riow hichs atisfytheir Darw inian conditio n, thatonlythe b eststrategy
in a p opu lation grows , yet select the (2 ;2 )-equilibrium. This would be the ca se, for
exa mp le, if in a reg io n w here the ba sins of attra ction overlap the s p eed o f a djustment
toward one equilib rium is much faster tha n toward the other. Thus in th e fra mework
presented hereselectiono f anda mo ng curbsetsisob tain edo btainedunderalargesetof
Th at a dyna mic satisfy the D arwinian co ndition is no gua rantee that curb sets will
b e s elected,as the next example s hows. 5
x,x 2,2 2 ,2 2,2
2,2 5,0 0 ,5 0,0
2,2 0,0 5 ,0 0,5
2,2 0,5 0 ,0 5,0
L et x > 2 and clos e to 2 . T he ga me ha s a unique equilibriu m, with payo vector
(x;x):The unique minima lcurb setco incides with this equilibrium.
Co nsider a dyna micin w hicha gents fro mtwo p o pula tio nsare ra ndomly matched to
play this game. As they play, they make mista kes with pro bability : When they make
a mistake, they put strictly p o sitive pro bability on each of their stra teg ies. After ea ch
roun d of play they learn the true d istribution of play in the la st p erio d a nd move to a
b est reply ag ainst this distribution. In the fra mework of this pa p er, this corresp o nds
to the cas eof
1
=1: It is easily checked tha t for =0 this Markov proces s h as three
recurrentcommunicatio ncla sses, onecorresp o ndingto the equilibrium,one toa cyclein
whichag ents use only their last three stra teg ies, and one to a cycle inw hichthe payo
vector isa lways (2 ;2):