• No results found

Learning, Experimentation, and Long-Run Behavior in Games

N/A
N/A
Protected

Academic year: 2021

Share "Learning, Experimentation, and Long-Run Behavior in Games"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Be havi or i n Games 3

An dreas Blume

Dep ar tment of Econ omics

Un iversity of Iowa

Novemb er 8, 1994

Abst ra ct

Thi s pap er i nvesti gates a cl ass of p opu lation-l earni ng dynamic s. In e ve ry

pe-rio dagentse itheradoptabestrepl ytothecurrentdi stri butionofac tu al pl ay,ora

b e stre ply to asampl e,take nwi th repl ac ement, fromthe di stri bution of intend ed

pl ay(thestrategie sadoptedattheendofl astperi o d),ortheyarein ac ti ve. If

sam-pl in gwi th repl acementand bei ngin ac ti vehave stri ctlypositi ve prob ab il i ty,these

dynami cs converge global ly to minimal cu rb setsi n the ab sence of mi stakes. For

two-pl ayer i2j-games, i;j 3; th e same resul t hol ds even if onl y best

respond-i ng to actual p lay an d b e in g i nactive h ave posi tive probabi li ty. I f players make

mistakes i n the imp lementation of the ir strategi es, th ese d ynamics sele ct among

minimalcurb sets.

3

I wrote the rs t dr aft of this pap er while I en joyed t he h os pitality of Ce ntER. I thank, without

implic at ing,SjaakHurke nsandEr ic vanDamme forc omments . Ihave alsob e ne ttedfromcomme nts

made in the IO MEGT workshop at t he Unive rsity of Iowa. I am grate ful to the NSF for nancial

(2)

1 Introduct io n

Thispapercha racterizesthe lo ng-runo utco mesofaclas sof learningdynamicsing ames .

The cha racteriza tio nisin terms ofpropertieso f s ubsets of the sp ace ofstra teg y pro les

ofth eu nderlyingga mes. Youn g[199 3]hasshownthatu ndersomeco ndition son players '

memorya ndth eco mpletenessoftheirinformation,adaptivepl ayco nvergesa lmos tsurely

to a pu re strateg y equ ilibrium, provid ed the game s atis es an acyclicity requirement.

The present paper is in the s ame s pirit. It looks at a di erent dynamic a nd, more

importantly, it d ro ps the a cyclicity conditio n. Without this condition, the question

arises wh ich objectsca ntake the pla ceof the pure strategy Nashequilibria . In general,

one suspects tha t this will depen d o n deta ils of the dyna mic. This paper a rgues that

there are interesting cla sses of dynamics whos e long- run o utcomes can b e chara cterized

in termsof c urb (c losed un der rational behav ior) s ets.

A p ro duct set o f s tra tegies is closed under inclus io n o f bes t replies if it contains all

b est res p o nses to indep endent b eliefs su pp o rted on itself. Bas u a nd Weibull [19 91]w ho

rstexa minedthepro p ertiesofsuchs ets ,refertoth emas cl osedunde rration albe hav ior

(curb). Curb sets which do not pro p erly co nta in an other curb set a re referred to as

minimalc urb set s. Ingenericnormalfo rmga mesthesecoincidewithpers istentsets(the

extremep o intsofp ersistentretra cts),Kalaia ndSamet[1 98 4],Balkenborg[19 92]. While

theseto fratio nalizab lestrategies ,Bernheim[198 4]a ndPea rce[198 4],isama ximal xed

p o int under the b est reply mapping, aminimal curb s et is a min imal xed point under

this mapping.

Cu rb sets have a nu mb erof a ttractivefeatures. T hey s hare with strict equilibria the

propertytha ttheycontaina llb estrepliesa gainstthemselves . Everycurbsetcontainsthe

support of a Nas hequilib rium; suchequilib ria are referred to as curbequilibria . Blume

[199 4] s hows that in g ames w ith one-s ided pre- play co mmunica tio n, the minimal curb

conditio n selects the communicating player's favo rite equilibrium if it is no t too risky.

Hurkens[1 993 ]demo nstrates that inpre-playcommunicationga mes wheremes sag esare

(3)

communica ting playerstheir mo stpreferred payo .

L ike s trict equilibria, curb sets will b e lo cally sta ble under a larg e class of plausible

dynamic adjustment rules . The questio n I wa nt to p o se in this paper is whether it is

p o ssible to provid e a rmer dynamic foun dation fo r minima l curb sets . This requires

that on e address two iss ues . Are there dynamics which converge g lo bally to minima l

curb sets? Arethere dynamics which s elect a mo ng minimal curb s ets?

Hurkens [199 4] provides one ans wer to the rst qu estion. He examin es a d yna mic in

the spiritof Young[1 993 ]wh ichco nverg esalmos tsurely to aminimalcurb setfro ma ny

initialco ndition. In hisd yna mic,o nly on epa irofa gentsplays inanyg iven timeperio d .

Ea chofth emtakesapossiblyincompletesamplefro m niteleng thhis toriesofpas t play

and b estres p ond sto somedistributio nover the sample. T hefactthat onlythe support

of the sa mple matters, g uarantees tha t every b elief over strategies in the current state

is possible. Therefo re the dynamic w ill eventuallyleave a ny setof s trategies that is not

curb. Finite length histo ries gu arantee th at once the process has s p ent sucient time

ina minima l cu rbsetitcan not exitthe minima lcurb seta nymore. Hurkens s hows that

his dyna mic converges glo bally to minimal curb sets , almo st s urely. He then g o es on

to ask wh ether a dding muta tio ns to his dynamic yields s election among minima l curb

sets. This is no t the ca se b eca use as soon a s a mistake enters the current state, it is

p o ssible that th e activeplayersattribute any probab ility to the corresp o ndin gs tra tegy.

Thereforeo nemista keissucienttoupseta nyminima lcu rbsetinHu rkens'framewo rk.

I want to prop o se a d i erent dynamic. I co nsid er larg e populations of a gents . All

ag entsplayin every period;whenthey play, th ey us ethe strategies theyhad adopt edat

the end o f last period, unless they are exp erimenting. If th ey are exp erimenting , they

rand omize, choosing each of their available strategies with strictly positive proba bility.

Agents di er in how th ey process info rma tio n. In each period every agent either best

responds, or gathe rs information o r is inact ive. Bes t resp o nding ag ents learn the true

current distribution of p lay fro m playing aga ins t the entire popu lation. Agents w ho

(4)

b est reply a gainst their information. 1

Inactive a gents ca rry over the strateg y they had

ado pted a t the end o f last p eriod into the next p eriod. Because p lay pa rtners are

identi able, the info rma tio n availab le to b est resp o nders is modelled a s co ming from

sa mplin g without replacement and fo r simplicity as learning th e true distributio n of

play, wherea s information gathering is potentia lly ind irect and therefore modelled as

sa mplin gwith replacement.

I derive two ma in res ults; one under the cond ition tha t the mista ke pro bability is

zero a ndonefo rpositivebut smallmista keprobab ilities . Withoutmista kesthedynamic

converges to minima l cu rb sets regardless of the initial cond ition. With mistakes the

dynamic selects among minima l curb sets. For two -player games with two exhaus tive

minima l curb sets, I chara cterize the conditio nfor s election o f one o f the minima l curb

sets;thisco nditio nreducestoHa rsanyi-Seltenriskdo mina ncein222games. Intuitively,

the two sets of sta tes where the p o pula tio ns play entirely a ccording to o ne of the two

minima l curb sets (minimal c urb states set s) a re exceptiona l. The dyna mic ca n leave

these sets of states only if suciently many mistakes occur simulta neously. Any other

sta te,o utsideofthesesets, b elo ngs to theba sin sofattractionofbothminima lcurb state

sets. Thusallthat ma ttersis howmanymis takes it takes to upseteith er one o f thetwo

minima lcurb state s ets.

I a lsoinves tiga teto w hatextentweneedsa mplin gwithreplacement ora sinHurkens

[199 4]adirecta ssumptiontha teverydistributio nwhichissup p ortedo nthecurrentstate

has positive proba bility. I sh ow tha t we may b e a ble to do without such ass umptions .

Atleas tintwo-playeri2j-ga meswith i;j 3asimplebes t- replyrulewherea gentsare

either in active o r move to o ne of their b est replies, converges globa lly to minima l curb

sets a lmos tsurely.

Bes ides the prob lem of ndin g dyna mics w hich lea d to an d select a mo ng minima l

curb sets in agame, thereis the dua lproblemo f wh en it isp o ssible to nd simplech

ar-acterizations o f s ta ble sets of a dynamic in terms o f the ga me. Fo r a Ma rkov pro cess

with sta tio narytra nsition prob abilities th e stable sets are the recu rrentcommun ication

1

(5)

cla sses of the process . As Youn g [19 93 ] has sh own, o ne can select a mo ng the recurrent

communica tio n class es by co nsid ering limits o f perturbedprocesses as the p erturbation

van ishes. The limiting dis tribution will have its sup p ort concentra ted on the co

mmu-nica tio n cla sses w ith th e least stochas tic potential. In g eneral, it will be dicult to

cha racterizethes e co mmunica tion cla sses in terms o f the underlying game. T he present

paperp ro p o sesacla ssofdynamicsforwh ichs uchacharacterizationisp o ssible;fo rth ese

dynamics the recurrent co mmu nication clas ses o f the u np erturb ed d yna mic correspond

to the minima l curb sets of the underlying ga me. T hecorrespondenceis as fo llows: for

a given recurrent communica tio n cla ss th ere is exactly o ne minima l curb set such that

each sta te has su pp o rt on ly o n th is minima l curb s et; conversely, given a ny minima l

curb set,everys tatew ithsup p ort onthis minimalcurbsetb elo ngs to o nea nd the sa me

recurrentcommu nicationcla ss. Furthermore, fo rano n-trivialclass o fg ameso necan

de-rivesucientco ndition sforthe selection of aparticula r minima l curb setvia p erturb ed

dynamics. Interestingly,it turns out, that the payo s inthe equilibria belong in g to the

minima l cu rb sets play a seconda ry role as far a s s election is co ncerned. Supp o se one

sta rts with a strict equilib rium that is selected by the p erturb ed dynamic. If o ne then

repla ces this equ ilibrium with a g ame that is a minima l curb set in th e new ly fo rmed

ga me and has a uniqu e equilibrium with the same payo s a s the origin al equ ilibrium,

this minimal cu rb setneed not b e selected by th ep erturb ed dynamic.

Th e paper is orga nized a s fo llows. The next section describ es the model. Section 3

introduces th e dynamic without muta tion s a nd derives the g lo bal co nverg ence to

min-ima l curb sets. Sectio n 4 introduces mistakes in the implementa tion o f stra teg ies and

demo nstratesthatthedynamics electsamongexhaus tiveminimalcurbsetsintwo-p layer

ga mes. Section 5 conclud es.

2 The Setup

Co nsid er a nite set of p o pulations P w ith typical element p 2 P : Denote the size of

p o pulatio n p by N

p

: Each p o pulation corresponds to one of the players in the ga me

(6)

that N

p

>#(S

p

); the cardinality of player p 'ss trategy space. S :=2

p2P S p with typica l element s2 S;and u p

is atype p player's utility functio n; u

p

:S !<: Let 6

p

denote a

typepp layer'sseto f mixed strategies,and 6:=2

p2P 6

p

: A typical elementof 6will b e

denoted by: If we exclud ethe pth element fro m; th e resulting vecto rwill b ewritten

as 

0p :u

p

extends to 6in the us ual way.

For any nite set X; let 1(X) sta nd for th e s et o f proba bility distrib utio ns over

X: Let BR

p

(1) deno te player p's pu re b est reply co rrespondence, and de ne BR ( ) :=

2 p2P BR p ( 0p

): I will also use the na tural extension o f BR (1) to sets of s trategies as

arg uments.

Basua ndWeibull[1 991 ]introducedth enotionofcu rb(cl osedunderration albe hav ior)

sets. Aproduct seto fs trategiesQ=2

p2P Q p ;Q p S p

;isclosed underinclusio nofb est

replies (curb)if ea ch Q p isno nempty and BR(2 p2P 1(Q p )) Q:

Ifacurbs etsdoesno tpro p erlyco nta inano thercurbs et,itiscalledminima l. The

strate-gies which form a min imal curb s et a re called curb strateg ies, and equilibria b elo nging

to minimal curb s ets are curb equilibria.

Blume [19 94 ] and Hurkens [199 3] show that the curb equilib rium requirement selects

ecient outcomes in g ames w ith pre- play co mmunica tion . Blumeconsiders g ames w ith

costles smessag es ;Hurkensana lyzestheca seofnomina lmessa gecosts . Cons iderthetwo

ga mes b elow. Bo th g ames have two minimal curb sets corresp o nding to the two strict

Nash equilibria (U;L)an d (D;R):

U D L R 3 ,3 0 ,0 0 ,0 1 ,1 G 1 U D L R 9,9 0,8 8,0 7,7 G 2

Ifweallowplayer oneto sendoneo f twomes sag esm

1 o rm

2

b eforeplaying th eg ame,

(7)

(m 1 ;U) (m 1 ;D) (m 2 ;U) (m 2 ;D) LL L R R L RR 3,3 3 ,3 0 ,0 0,0 0,0 0 ,0 1 ,1 1,1 3,3 0 ,0 3 ,3 0,0 0,0 1 ,1 0 ,0 1,1 0 1 (m 1 ;U) (m 1 ;D ) (m 2 ;U) (m 2 ;D ) LL LR RL RR 9 ,9 9,9 0,8 0,8 8 ,0 8,0 7,7 7,7 9 ,9 0,8 9,9 0,8 8 ,0 7,7 8,0 7,7 0 2

Ifmes sag esarecos tless,thenBlume[199 4]show stha tallcu rbequilibriain0

1

support

the ecient payo pa ir (3 ;3): This result g eneralizes p rovided a conditio n that trades

o the risk o f the ecientequilibriumin the un derlying ag ainstth e size of the mess age

spa ce is s atis ed. In 0

2

which is ba sed on G

2

; all equilibria are curb equilibria b ecause

there is a tension between risk do minan ce an d Pareto do minan ce in the underlying

ga me. If mess ages carry a no minal co st which distingu ishes them, these resu lts can

b e streng th ened cons iderably. Hurkens [199 3] shows that with no minal mess age costs ,

0m 1 <m 2 ;f(m 1

;U)g2fL L;LRgisthe uniqueminimalcurbsetinb oth ofthea b ove

communica tio nga mes. He shows that th is resu ltg enera lizesto n-p layerg amesinwhich

a subseto f the player setcan s end amessa ge.

3 Dynamics

In this sectio n I des crib e the lea rnin g dynamic, and cha racterize its lon g-run ou tco mes

in the ab sen ceo f experimentation. I will showthat the process converges a lmos tsurely

to a curb s et, regardless o f the in itial p o pula tio n sta te. In the fo llowing section I will

exa mine this proces s further under the co ndition tha t the experimentation pro bability

is di erent fro m zero.

Th estate ofpopulationpa t timet isgivenby the vector !

p;t =fs it g i2p :The s tate of

thed yna mics ys temattimetisg ivenby!

t =f! p;t g p2P :Inp erio dtstate! t01 isrepla ced

(8)

Everyagentinpplaysag ainstallagentsinPnp:Whensheplaysinperiodt,ana gentuses

stra teg y s

i;t01

; unless she experiments (o r ma kes a mista ke). E ach agent experiments

with proba bility 0;in thatcase she choos es ea ch ofher pure strategies withs trictly

p o sitive pro bability. There a rethree di erent ways in w hichag entsprocessinformation

in periodt and thereby genera tethe n ew sta te s

t

: With p robability 

1

an a gent ado pts

a b est reply ag ainstth e current distributio n o f actualp lay in period t;with pro bability



2

she ga thersinformationa b o utthe stra teg ies that weread optedlast p eriod,a nd w ith

prob ability 3 =10 1 0 2

she isin activeinperio dt:SubsequentlyIw illreferto ag ents

in these various ro les a s, b est resp o nding, info rma tio n ga thering a nd inactive players .

A best resp o nding a gent meets allag ents from populations she does not b elo ng to and

lea rns th e truedistributio no f currentplay (includin gmistakes ) inthe cu rrent periodin

thos epopulations;shethenado ptsabestreplya gainstthisdistribu tion . Aninformation

ga theringagenttakesas ample(withrepla cement)fro mthes tra teg iesth atwerea dopted

in period t01 (the intended play of periodt); s he then a dopts an -b est reply a gainst

uncorrela ted beliefs based on this sa mple. An ag ent who is inactive in perio d t pa sses

throu gh tha t p erio d with out chang ing her strategy.

Playwithmis takesinperiodtgeneratesatemporarys tate!~

t

towhichb estresponding

playersb est res p on da t th eend of p eriod t. Note tha t fo r ea ch ag ent i2 peach pa rtia l

temporary state !~

0 p;t

natura lly can b e identi ed with a n u nco rrelated b elief 

0 p =  1 21112 p01 2 p+1 21112 #(P)

foragenti2poverS

0p :=2 q 6=p S q ;whichitself ca nb e

identi edwithanelementof6

0 p =2

q 6=p 6

q

:Thisbeliefisbasedonthe observedrela tive

frequencies of strategies in th e p o pula tio ns not includ ing i: With this identi cation of

sta tes a nd beliefs we can say that a b est resp o ndin g agent i2p a dopts a pure strategy

inBR

i (~!

0 p

):Fo rthe remainderofthissectio nIwillsettheexp erimenta tio npro bability

to zero; I returnto the ca seof >0 inth en ext sectio n.

D eno te the p erio d t sa mple frompopulation pobta ined by a ninformation gathering

ag ent i 2= p by X

ipt

: Agent i's entire p eriod t sample is then X

it := fX ipt g p2P;p63i L ike

sta tes, s amples g ive rise to uncorrelated b eliefs. T herefore it makes sense to co nsider

BR 

(X

it

(9)

the sa mp le X

it

: An info rma tio ng athering a gent i ado ptsone o f these -bestreplies ; 2

in

case of indi erence, she rando mize, putting strictly positive pro bability on each o f the

stra teg iesin BR 

i (X

it

):L et sa mple s izesb e time inva ria nt an d deno tethe sizeo f p layer

i's samplefro m p o pula tio np; i62p ; by N

ip :

Th e dynamic process described here is a Markov cha in with stationa ry trans ition

prob abilities on the sta te spa ce, d eno ted by : The rs t objective of this pa p er is to

cha racterize the recurrent communica tio n cla sses of this Markov cha in. The recurrent

communica tio n cla sses a re s ubsets o f such tha t (i) fro m every sta te there is a nite

leng th sequence of positive probab ility transitio ns to at lea st on e o f thes e class es, (ii)

within each clas s every sta te can be rea ched from every other sta te via a nite length

sequen ce of positivep robab ility tra nsitio ns, and (iii) no state outside one of the cla sses

can be rea ched fro m a sta te inside throu gh a p o sitive pro bability transitio n. Since

(minima l)curb setswill gurepro minentlyinthis chara cteriza tio n,de neaseto f states

supportedentirelyo none(minimal)curbset,an dincludingallsuchstatesa sa

(minimal-)curb-sta te set. L et 9(2):=f 26jsupp( i )2 i g 82S:

For any subs et 2of theseto f pure stra teg y pro les,this isthe setof allmixed strategy

pro les o r equivalently uncorrela ted b eliefs with support in 2: For a ny s uch 2 one can

de ne the set

V(2) :=2[BR (9(2)) 82S

consistingofthe unionof2an da llbes trepliesaga ins tuncorrelatedbeliefsco ncentrated

on 2:L etV t

deno te the t-fo lditeratio nof V:It isea sily seen that,sta rting with aset2

2

Iuse-b es tre pliesb ec ausethes amplingpr oc es sonlyallowsonetoapproximateth es etofp oss ible

b e liefsove ragivensup p ortofst rategie s. Altern ativelyonecouldpr o cee dlikeHurke nsandar guedir ec tly

inte rmsofs upp orts ;i.e . onec oulds implyp ostu lateanupdat ingr uleinwhichplayersc anmovetoany

st rategywhich isa b es tr eply to some belie fs over agive ns upp ort. Also, in agene ric clas sof games ,

we c an re plac e-b es t replie sby b es t replie s inourdynamic . Th e gene ric prop er tyto lo ok for isthat

(10)

one reaches a xedp ointof V by iterating V suciently o ften. Lemma 1 82S; 9T : (8t >T;V t+ 1 (2) =V t (2)): Proof: V :2 S !2 S is monoto nica nd 2 S is n ite. 2

We next need notation to d escrib e the xed p oint that is reached if one starts the

itera tion with the support 2()of a stra teg y pro le:

L et 2() := fs2Sjs i 2supp( i )g t( ) := minft2NjV t+ 1 (2())=V t (2())g W() := V t( ) (2( ))

t( ) is the minimal number of perio d s needed b efore one reaches the xed p o int from

2( ); a nd W( ) is the xedpointrea ched fro m.

Lemma 2 W( ) contains a minimal curb set for all  26:

Proof: W( ) is clos ed under inclusion o f b estreplies. 2

Th esetofstatesca nbeidenti edw itha nitesub seto fthe seto fmixedstrategies .

The dynamics ca n b e describ ed by atran sition pro bability (1j1) such tha t 8 ;  2 ;

(j) deno tes the proba bility th at the system will b e in state  in p eriod t +1; if

in period t it is in state :  dep ends on population sizes , sample s izes a nd : Let

N :=ffN p g p2P ;fN ip g p2P ;i=2p gg: Let   ;N

b ethe tran sition proba bility as a function of 

and N:

Th es etdoesnotconta ins tatescorresp o ndin gtoeverybelief. Therefo reitisp o ssible

that ou rdynamic with bes t replies, instead o f -b est replies , does not leave a given set

of states even tho ugh that s et is n ot a cu rb-sta te s et. T he following game provides an

(11)

t 1 t 2 t 3 s 1 s 2 s 3 1,1 1,0 0 ,b 1,0 1,a 0 ,b 0,0 0,0 .1 ,.1 where a= p 2;and b= p 2 1+ p 2 :

In this ga me only the strict equilibriu m (s

3 ;t

3

) fo rms a minimal curb set. However

the pro d uct setfs

1 ;s 2 g2ft 1 ;t 2

g is closed underin clusionof bes t repliesa gainst beliefs

for w hich ea ch pro bability must be a rationa l numb er. Only if th e column player puts

prob ability b o n the rst s trategy of h er o pp o nent is h er th ird s trategy a b est reply

ag ainstbeliefs co ncentratedo n the rst twostrategiesof h er o pp o nent.

Onthe otherhand,w ithsucientlyla rgesamplesizesa nyb eliefca nb ea pproximated

arbitrarily clos ely. Therefore, with - b est replies there is a cha nce that our dynamic

eventu ally leaves every setthat is not acurb-sta te set. 3

This mo tiva tes the next lemma

which s ays that every b est reply to a given b elief is an -best reply to an o p en neig

h-b o rhoo d of that b elief. Thus, if we can a pproximate b eliefs arb itrarily closely, the set

of b est rep lies to any pro du ct set o f stra teg ies is a sub set of the set of - b est replies to

the niteapproximationofthesa meset,p rovidedtheapproxima tio nissucientlyclo se.

Lemma 3 8>0; 8 26; 8s i 2BR i ( );9 >0: j~0 j< )s i 2BR  i (~ ):

Proof: Suppose no t, and let s

i

2 BR

i

( ): Then there exists ~

n ! ; t i (~ n ) 2 S i s uch that u i (t i (~ n );~ n )>u i (s i ;~ n )+ 8n:

Co mpa ctness o f the strategy spa ce and continuity of u imply th at there exists a t

i 2S i such that u i (t i ;)u i (s i ; )+; 3

(12)

whichco ntra dictss i 2BR i (): 2 For a ny g iven  2  6;   ;N

induces a p robability distribution over the set 2 S

of

supports. This is the prob ability th at next period's state will have a certa in support

giventha tthe cu rrentsta teis:Letthepro babilityo fsupport222 S

giventhe current

sta teis  bedeno tedP

; 

N

(2j);giventhetransitio nproba bility

;N

(1j1):L etP t

;N (2j )

denote the pro bability of s upp o rt 2after t periodsif the initial sta te is : In p articular

P 1  ;N (2j)=P ;N (2j): LetN min :=minfN ip g i2I;p2P :

A central cha racteristicof ou rlearning ru le is that fro m a ny s tate  th ereis p o sitive

prob abilitytha t next period'sstate willhaves upp o rt V(2( )):Itera ting thisargu ment,

onemayconclude thatfroma ny thereisp os itivep robab ilitytha t a ftera nitenumb er

of p eriods the state h as a s upp o rt which is a xed point of V(1): This is the content of

the follow ing lemma .

Lemma 4 If  2 ; 3 > 0 ; t hen 8 > 0 ; 9  N;T : N min >  N ) P T  ;N (W()j)>0; 82 :

Proof: Let( )b ethesetofpossibles amplesw ithreplacement,giventhecurrentstate

:: G iven the identi cation of s amples with b eliefs, () is a nite approxima tion of

9(2( )):Furthermore,()co nverg esto9(2( )) inthe Ha usdor senseas N

min

!1:

Let BR

i

(9(2())) be the ( nite) set o f (pure) bes t replies by a gent i to the beliefs

concentra ted o n 2(): For every i; 8s

i 2 BR i (9(2())); choose (s i ) 2 9(2( )) s uch that s i 2 BR i ((s i

)): Note tha t for every  > 0; 9N

 : N min > N  ) 8;8i;8s i 2 BR i (9(2( )));9~(s i ) 2 ( ) such th at j(s i )0~(s i

)j < : To see this note tha t there

are nitely ma ny i; nitely many combina tion s o f 2( ) and s

i 2 BR i (9(2())); and that ea chs ing le(s i

) can b e app roximated by abelief in():

By Lemma 3 8; 8(s i );9 > 0 : j(s i )0 j <  ) s i 2 BR  i (): Since there

are n itely many su ch (s

i

) to cons ider a cro ss a ll individuals a nd a ll su pp o rts , we can

intercha nge quanti ers to o btain 8; 9 :8i;8(s

i );j(s i )0j<)s i 2BR  i ():

Co mbinin g the las t two o bservations, we may conclude tha t: 8 > 0 ; 9  N : N min >  N ) 8;8i;8s i 2 BR i (9(2())); 9~(s i ) 2 ( ) such that s i 2 BR  (~(s i )): Since all

(13)

sa mples fro m( (t)) have p o sitivep robab ility, ea chs

i 2BR

i

(9(2((t)))) has p o sitive

prob ability o f being in the support o f (t+1); beca use 

3

> 0; a ny s

i

2 2( (t)); also

has p o sitivep robability of being in the support of  (t+1):

Since N

p

> #(S

p

); with p o sitive pro bability all the s trategies in V(2( (t))) are

present in the p o pulation in p eriod t + 1. Therefo re, 8 > 0;9  N : N min >  N ) P  ;N

(V(2( ))j ) >0;8 2: The conclusion fo llows by app lying this last o bservation

rep eatedly a nd combining it w ith Lemma 1. 2

Acco rding to the lemmath ereexis tsan upp er b o undon the min imalnumbero f

posi-tiveproba bilitytra nsition sittakesfroma nyinitials tate torea chasta tewhich\ covers"

a cu rb set. At tha t point there exis ts a positive proba bility tra nsition into a minima l

curb set. Corollary 1 If  2 ; 3 > 0 ; then 8 > 0; 9  N; T 0

suc h that for N

min >



N; from any

initial state  2 ; the system move s in to a min imal -c urb -state se t after no more than

T 0

it erations with positive probabil ity.

Proof: Fro m the propositio n, a fter T s teps the system rea ches a sta te w hose support

\includes" a minima l curb set. There is p os itive proba bility that in the next ro und all

ag entsare activeand draw s amplesfro m the curb set. Let T 0

=T +1 : 2

Th e follow ing lemma veri es that if, for a given ga me,  is chosen suciently small,

then thelea rning dyn amicca nnotexitaminima l- curb- state curbs etonce ithas entered

it.

Lemma 5 9 > 0 : 80 <  < ; if  (t) is an el emen t of a curb- state se t 2; the n

supp ((t+k))2; 8k0 Proof: If s i 62 BR i (2); th en there exists (s i ) > 0 such th at 80 <  < (s i ); s i 62 BR  i (2): Co nsider :=minf(s i )ji2I;s i 62BR i (2);2 Sg: 2

(14)

the lea rning process will eventu ally end up in one of th eminimal-curb-s ta te sets.

Proposition 1 If 

2 ;

3

> 0 ; t hen, 9 > 0; such that 80 <  < ; 9 

N such that for

N

min >



N; and for any init ial state 2 ; the l earnin g process conv erg es almost su rely

to a minimal- curb- state set.

Proof: For ag iven let N

min

a nd T 0

begivenas in Coro llary 1such that

P

;N

( (t+T 0

)2minimal-curb-s tate setj (t)=)>0 8 2:

Then

P

;N

( (t+kT 0

)62minima l-curb- state setj (t)=)(10 ) k

:

Thus the pro bability tha t the sys tem does not converg e to a minimal-curb-sta te set

equ als lim k!1 (10) k =0: 2

Not only does the learning dynamic co nverge g lob ally a lmos t surely to one of the

minima l- curb -state sets. Inside s uch sets every state is reached fro m every oth er state

via a nite len gth sequenceof p o sitive pro bability trans itions. This follows fro mlemma

4. Therefo re we havethe following co rolla ry:

Corollary 2 If 

2 ;

3

> 0 ; then , 9 > 0; suc h that 80 <  < ; 9 

N suc h that for

N

min >



N; t he minimal- curb- state sets are the rec urre nt commun icat ion c lasses of the

learnin gdynamic .

3.1 A S imple Best-Reply Rule

In some interesting cla sses o f g amesone obta ins co nverg enceto minimal- cu rb -statesets

from a simplebest-reply rule. 4

Cons ider a dynamicin w hich agents are either inactive,

4

(15)

with pro bability 10; o rmove to a b est reply, with proba bility  ; 0< <1:One can

easily check that this p ro cess converges almo st surely to the unique minimal-cu rb-state

setinthe pre-playco mmu nicationgamewith red ucedno rma lfo rm0

1

:Inth is ca sethere

isnon eedtointroducesamplingw ithreplacementtog en era tearicheno ughseto fbeliefs

out of the current state.

It is ea sy to s ee tha t this observation g enera lizes fo r this clas s of commun ication

ga mes. However, it is no t clear how far o ne can extend it b eyo nd this class. In this

subs ection I w ill s how tha t the o bservation is valid fo r all two -player i2j- games w ith

i;j 3:Inthis cla sso fga mesitissucientthats omeagentsmovetoabes trep ly while

others d on't to g enerate beliefs w hich will indu ce exit from any product set, that does

not co nta in a minimalcurb set.

For technical rea sons Iwillag ain replacebestrepliesinthe dyn amicby -bestreplies

and refer to the \-b est- reply rule." In the proofs I w ill arg ue in terms of b est rep lies .

Thissucesfo rgenericgames;th earg umentsforg eneralga mesintermsof-bes treplies

area nalogo usto theo nesma deabovean dtherefo reomitted. Therearetwopopulations ,

p=1,2.

Proposition 2 Le t G be an y i2j- game with i;j  3 : Then , for all 0 <  < 1; the re

ex ists ; such that for all 0<< t he re e xists 

N such that N

p >



N impl ie s that unde r

the -best -repl y ru le the proce ss will almost surel y conve rge to a minimal- curb- state set.

Proof: Noteth atfro manys tatethereispositivepro babilitythatinonestepthepro cess

moves to a s tate such that within each population a ll ag ents use the sa me stra teg y. If

one ofth esestrategiesis aminimalcurb strategy, weare done,becaus e thereis p o sitive

prob abilitytha t onlythe otherp o pula tio nmovesan d thatw ithin that p op ula tion every

ag entmovestoabes treply. Th ereforecontinueun dertheass umptionthatwes tartw ith

a sta te in wh ich every memb er of a p o pula tio n us es the same stra teg y wh ich is not a

minima lcurb strategy.

Since this initial state (s

1 ;t

1

(16)

one a nd the o ther is co ncentrated on two strateg ies; e.g ., th e n ew state is sup p orted on fs 1 ;s 2 g2ft 1 g: If s 2

is a curb stra teg y we are do ne b eca use (s

2 ;t

1

) wa s o ne of the

sta tes reach ed with pos itive pro bability from (s

1 ;t 1 ): Oth erwise, fs 1 ;s 2 g2ft 1 g is not

curb,w hichmeanstha t either s

3

isab estreplyto t

1

;inw hichcas ewea redon e b ecause

there a re o nly three s trategies a nd one of them has to b e cu rb, o r there exist beliefs

concentra ted o n fs

1 ;s

2

g such tha t th e column p layer ha s a b est reply which is not t

1 ;

without lo ssof g en era lity let itbe t

2 :If t

3

is a lso abest reply to so me b eliefso nfs

1 ;s 2 g we a re do ne becau se eith er t 2 o r t 3

wo uld have to be a curb stra teg y. Sup p os e no t,

i.e. let t

2

be the o nly b est reply and let it no t b e a curb stra teg y. Note that from

the initia l sta te(s

1 ;t

1

) (almo st)every distributio n overthe twostrateg iess

1 a nd s

2 h as

p o sitive proba bility. T herefore, with p os itive pro bability we move to a s tate sup p orted

on fs 1 ;s 2 g2ft 1 ;t 2

g: In particular, fo r (almost) every distribution over ft

1 ;t

2

g; th ere is

a co rres p on ding state w hich can be rea ched with positive proba bility.

None ofthe strateg iess

1 ;s 2 ;t 1 ; a nd t 2

isa cu rbstra teg y. And since thereis no belief

overfs 1 ;s 2 gtowh icht 3

isabestreply,itmustb ethecasetha ts

3 isab estreplyto so me b eliefsoverft 1 ;t 2

g:Su pp o sewearea tap o sitiveproba bility statewherethedistribution

over ft 1 ;t 2 g is such that s 3

is a b est reply. At tha t p o int there is p o sitive pro bability

that the entire row p opu lation moves to s

3

which from th e forego ing must b e a curb

stra teg y. 2

Th ere is still the question o f whether it is pos sible that the dyna mics may end up

b eing con ned to a subs et of a minimal-curb-sta te set. Ideally we would want to show

that the recurrent communica tio n class es o f the - b estrep ly dyna mic coincidewith the

minima l- curb -state s ets. Iw ill prove a somewhat wea ker res ult.

Proposition 3 Le tGbeanytwo-pl aye ri2j- gamewithi;j 3. The n,forall0<<1 ;

the reex ists;suc ht hatforall0<<t heree xists  N su chthatforN p >  N t hefollowin g

holds: Fore ve rymin imal curb set Q=Q

1 2Q

2

of G; e ve ry q2Q

i

; and ev ery recu rre nt

(17)

Proof: Obviou s for 12j;j =1;2;3;min imalcurb sets.

Wha tab o ut222 ?C onsideraminimalcurbsetofthefo rmfs

1 ;s 2 g2ft 1 ;t 2 g:Without

loss o f generality we can sta rt with a pure strategy combina tio n, s ay (s

1 ;t 1 ): Also wlo g s 2

is a b est reply, and there exis ts a b elief, ; over fs

1 ;s 2 g to which t 2 is a b est reply.

The result fo r 222 minima l curb sets follows b eca use we can move any fra ction o f the

p o pulatio nto abestreply.

Next co nsid er minimal curb s ets of the fo rm fs

1 ;s 2 g2ft 1 ;t 2 ;t 3 g: Witho ut loss of

generality we ca n startthe pro cessat (s

1 ;t 1 ): Th eneither s 2 ; or t 2 ; or t 3 is a b estreply. Sup p os e rsttha ts 2 isab estreply tot 1

. T henthe dynamicca ngenerate a llp o ssible

b eliefs ofthe colu mn playerover fs

1 ;s

2

g. Since weare d ea ling with amin imal curbset,

theremus texistb eliefs

2 a nd

3

concentra tedonthiss etsuchthatt

j

2BR (

j

);j =2;3 :

Nowsupposein stea d thats

2 isno ta b estrep ly to t 1 :T hen ,if t 2 and t 3 a reboth b est repliesto s 1

;the dyna miccan generatea lldistributio nsoverft

1 ;t 2 ;t 3 ga nd therewillb e

at least one distrib utio nover these three strategies whichma kes s

2

a b estreply.

It rema ins to consider the ca se where o nly t

2

is a b est reply to s

1

: In that cas e s

2

mus t be ab est reply to t

2 ; for otherwise fs 1 g2ft 1 ;t 2

g would forma minimalcurb set.

This is a nalog ous to the cas e w heres

2

wa sa b est replyto t

1 :

Nextcons ider minimalcurb sets of the formfs

1 ;s 2 ;s 3 g2ft 1 ;t 2 ;t 3

g:As befo re, wlog ,

we ca nstart the dyna mico utat astatecorresp o nding to the pu restra teg y co mbination

(s 1 ;t 1 ):Als o,wlo g,s 2 is ab est reply to t 1

, which means tha t wecan moveto a ny belief

concentra ted on fs 1 ;s 2 g: If s 3

is a b est reply as well, we a re don e b ecaus e we can move to any mixture over

fs 1 ;s 2 ;s 3

g a nd there is at lea st one such mixture fo r each t

2 a nd t

3

which ma kes them

b est replies. Th ereforesupp o seth at s 3 is n otab est reply to t 1 :Ifthere a rebeliefs 2 and 3 over fs 1 ;s 2 g such that t j 2 BR ( j

);j = 2;3; we ca n generate all b eliefs over ft

1 ;t 2 ;t 3 g in

the fo llowing way. First move to a sta te su pp o rted on fs

1 ;s 2 g2ft 1 g corresp o nding to the b elief j

;j = 2 ;3 with the least weig ht on s

2

; say

2

: Then, simulta neo usly move

the rowp op ula tion to

3

an d thedesired fra ction of th eco lumnpopulation to t

2

:Inthe

(18)

b ecaus e we haveb een able to g enera te allposs ibleb eliefs overft 1 ;t 2 ;t 3 g

Sup p os enextthatonlyfort

2 thereisabelief 2 s upp o rtedo nfs 1 ;s 2 g2ft 1 gsuchthat t 2 2BR( 2 ):Thens 3

must beab estreply to somebeliefs over ft

1 ;t

2

g:Bya na rgument

ana lo go us to the on e just given it follows tha t we ca n g en era te all beliefs over fs

2 ;s

3 g:

If there is such a belief such tha t t

3

is a b est reply, we a re don e. Otherw ise, s

1 must

b e a best reply to s ome b eliefs concentrated on t

1 ;t

2

: Note als o that t

2 must b e a b est reply to either s 1 or s 2

:In either case we can repea tthe constructionfro mthe previo us

para gra ph to genera te allbeliefs overfs

1 ;s

2 ;s

3

g which co ncludes the a rgument. 2

4 Selection

In the previo us section I considered learning without exp erimenta tio n, mis takes o r

mu-tations . I showed tha t from any initial condition th e dynamic co nverg es a lmost surely

to on eof the minimal-curb-sta tesets. Th eunperturbeddyna micdoes notselecta mo ng

curbsets . Wo rkbyYoung[19 93 ],Kando ri,Maila thandRob[199 3], Ellis on[1 99 3]s hows

that simila rdynamicsselecta mo ng strictNas hequilibria,providedtheyare a ugmented

to allow fo r mis takes. Samuelson [199 3], Noldeke a nd Sa muels on [1 993 ] [199 4]

inves-tigate selectio n amo ng no nsingleto n recurring commun ication classes. Hurkens [19 94 ]

show stha t an intuitivecla ssof dynamics whichconverg esgloba lly to curbs etsdoes not

select a mong them o nce mistakes a re a dded . I w ill show in this sectio n tha t adding

mista kes to the population learningd yna mics lea ds to selectio nmuch like in the works

cited a b ove. In this s ection I will a ssumethat 

1

;>0:

Th e key idea is tha t mis takes mus t remain tra nsient; it mus t no t be p o ssible for a

smallnumb er of mistakes to prop aga te throug h the system and to indu ce larg e e ects .

This pro p erty, transie nce of mistak es, is sha red by the dyna mics of Ka ndori, Ma ilath

and Rob, Noldekeand S amuelson, etc..

Th emain result of thissectio nrelies on aproperty of Ma rkov chainswith s tationary

trans ition pro babilities w hich was estab lished by Young [1 993 ]; Freidlin and Wentzell

(19)

ability 

0

: As sume that with high proba bility the process follow s 

0

; but with so me

prob abilityag entsma ke mistakes. Letthe co rrespond ing noisytransitio nproba bility b e

denoted by 



w here isa pa rametermea suring theoverall levelo f noise inthe system.

Assumethat 



satis es the fo llowing three properties:

1. 



isaperiodica nd irred uciblefo r all 2(0 ; ];

2. lim  !0   (j)= 0 (j); 8; 2; an d 3.  

>0 for some  implies 9r0:0<lim

 !0  0 r   (j)<1:

It is wellknownthat the rst pro p erty imp liesthat 



h asa uniqu es tationary

distribu-tion,andth atthiss tatio narydistributio ndescribesthelong- runbehavioro fthedynamic

irrespective of initia l co nditions . For a ny s ta tio nary distributio n  0 o f  0 let  0  denote

the prob ability a ssignedto the state  by  0

:

Ifthe trans itionfro mto  isno timp o ssibleu nder 



;r(; ) =r iscalledthere

sis-tanceofthetransitio nfromto :Let

1 ;

2 ;:::;

J

denotetherecurrentcommun ication

cla sses of 

0

: Fo r a ll i;j; i 6=j; let r

i;j

be the least resis tance a mo ng all directed pa ths

b eg inningin

i

and endingin

j

:De neag raphG w ithvertices in dexedby f1;2;:::;Jg

and for each i;j-p air a directed edge (i;j) with weight r

ij

: A j-tree in G is a sp anning

subtreeo f G;i.e., fo reveryvertexi6=j thereexistsexa ctlyone directedpathfromito j :

The totalresista nce ofaj-tree is thesumo fthe res ista nces ofthe directededg es inthat

tree. T heleast tota l resistanceamong a llj-trees ,denoted

j

; isthe s to chasticpotentia l

of the recurrent co mmunication class

j

:Young provesthe following prop o sitio n.

Le t  

be the uniq ue stationary distrib ution of 



; for any : Then,

1. as  !0; 

con ve rge s to a stationary distrib ution  0

of 

0 ; and

2.  is stoc hast icall y stabl e ( 0



> 0) if an d onl y if  is an e le ment of the recu rre nt

commu nication cl ass with min imum stochastic pote ntial .

Note that thelearning process we exa mineinthisp ap er is a p eriodica nd irreducible,

(20)

p erturbedprocesshasonlyone recurrentco mmunicationclas s. Youn g'spro p os ition and Co rollary2 imply: Proposition 4 If  1 ; 2 ; 3

; > 0; then , 9 > 0 ; suc h t hat 80 <  < ; 9  N such that for N min > 

N;  is stochastically stabl e if an d on ly if it bel ong s to a minimal- curb- st ate

set with minimal stoc hastic poten tial.

Als o, a moment's re ectio n s hows tha t fo r a ny two states  a nd ; the res ista nce

r (;);ifitis n ite,isequa ltotheminimumnumb ero fmis takesneeded tomovefrom

to :Recalltha tfo rth edynamicpro p os edhere th eroleo fmistakesistoa ctivatecertain

b est replies of players. The mista kes themselves d o not move the system. If there are

sucientlymanymista kes,astrategymayb ecomeabes treplythatwasn'tb efore;more

mista kes,of th eright kind,will ach ievethe sa me e ect.

I will demonstra teselection among minimalcu rbs ets fo raclass oftwo- playerg ames .

Thisisthecla ssofgamesw ithtwominimalcurbs ets,Q 1

a ndQ 2

;suchthatea chstrategy

of ea ch player is in the projectio n of a t lea st o ne minimal curb s et. In th at case I will

say tha t the two minimalcu rbsets a re exhaustiv e.

From Co ro llary 2 a nd Propositio n 4, we know tha t if 

1 ;

2

; and 

3

are all positive,

then there a retwo recu rrentcommunica tio n clas ses ofthe unp erturbed process,

1 and

2

; co rresponding to the two minimal curb sets, a nd the limit stationa ry distribution

willass ignpositiveweighto nlytothestatesinth eminimal- cu rb -statesetwithmin imum

stocha stic potentia l.

In o rder to nd o ut which

j

is selected, we n eed to calculate the paths of least

resis tancefrom

1 to

2

an dviceversa. Weca nmovefrom

1 to

2

wheneversu ciently

many typ e-1 (or typ e-2) players ma ke a mistake a nd u se a s trategy in Q 2

: Su ciently

manymis takesofthe rightkindeventua llyturnactio nsinQ 2

intobes treplies. Oncethe

currents tateissupportedo nstra teg iesfro mbothcurbsets,theunperturb edcomp o nent

oftheproces sta kesovera ndmovesthesta teintoeith eroneth eofthetwo

(21)

2

b elong s to the ba sins of a ttra ction of b o thminima l- curb- state sets, is a consequence

ofsa mplin g(not necessa rilyw ithrepla cement). Theleast numberofmista kesneededto

trans it from

1 to

2

can be expres sed in termso f the players'beliefs. L et

p bep layer p's b elief and p (Q j 0 p

) the proba bility wh ich player p's beliefs assign to Q j 0p : De ne j p :=minf p (Q j 0p )jBR( p )\Q j p 6=;g:

Thisisthelea stpro bability playerpca na ttachtothe setofstrategiesQ j

0p

andstillhave

a b est reply in Q j

p

: For simplicity a ssume that p o pula tio nsizes are the s ame and equa l

to N, an d for any real number x let [x] b e the sma llestinteg er greatertha n or equa lto

x: The leas tnumb er o f mista kesneeded to tra nsit from

i to j is then equal to minf[ j 1 N];[ j 2 N]g:

This obs erva tio nu ses the fact that anys tate  withpositivesu pp ort o ns trategiesfrom

b o thcurb s ets b elo ngs to the ba sins o f attraction o f b o th curb sets. D e ne

j :=minf j 1 ; j 2 g:

Withthese preliminaries we have the fo llowingresult:

Proposition 5 If G is a tw o-pl aye r g ame , with tw o exhaust iv e min imal curb se ts Q 1 and Q 2 , if  1 ; 2 ; 3

; > 0; then , 9 > 0; such that 80 <  < ; 9 

N suc h that for

N;N

min >



N;  is st ochastically stable if andon ly if 2

j ; an d j =minf 1 ; 2 g:

In the case where G is a symmetric g ame a nd th e two curb sets are s trictequilibria ,

the selectio ncriterion inthe theoremreduces to the fa miliarrisk domina nce criterion of

Harsa nyi and Selten [198 8].

(22)

5,1 1,5 0 ,0

1,5 5,1 0 ,0

0,0 0,0 2 ,2

Inthisexampleth eupp er-lefthandcurbsetwillb es electedbythepopulationlearning

dynamic. Aftera llb o thplayerscang uara nteethemselvesapayo of3ag ainstanybeliefs

concentra ted on this s et w hich shows that mo re mistakes are needed to upset a state

supportedo nthis curb settha n asta te whereallag entsuse their third strateg y. Inthis

exa mp le,aspeci catio nofthe underlyingd eterministicproces sinKa ndori,Mailath and

Rob's (KMR)[1 993 ]dynamicsuchthat ad ju stments p eedsa reequa linboth dimensio ns

in regions where th eb asins o f attra ction of the equilibria overla p would yield the sa me

selectio n. This show s once mo re tha t s ampling w ith repla cement is not a n ecessary

conditio nfor selection among curb sets w hichare no tstrict equilibria .

Note that the s election we o btain here do es no t dep end o n the values of 

1 ; 2 and  3

; as lon g as they a re a ll p o sitive. Furthermo re, thes e valu es ca n b e di erent a cro ss

p o pulatio ns, o r even w ithin a population. Thus there is a wide rang e of mista ke-free

dynamics which yield the same s election. This is a result o f the fa ct that the bas ins of

attra ctiono f di erent curbsets overla p. In thea b oveexampleand in222-coordination

ga mes it is the ca se tha t fro m a ny sta te no t s upp o rted entirely on on e of the two curb

sets either of the two cu rb sets can b e reached without mistakes. Th is phenomen on is

already notedin KMR'sp ap er. Theya ls op o intout tha tthere are dyn amicsinthe two

-p o -pulatio nscena riow hichs atisfytheir Darw inian conditio n, thatonlythe b eststrategy

in a p opu lation grows , yet select the (2 ;2 )-equilibrium. This would be the ca se, for

exa mp le, if in a reg io n w here the ba sins of attra ction overlap the s p eed o f a djustment

toward one equilib rium is much faster tha n toward the other. Thus in th e fra mework

presented hereselectiono f anda mo ng curbsetsisob tain edo btainedunderalargesetof

(23)

Th at a dyna mic satisfy the D arwinian co ndition is no gua rantee that curb sets will

b e s elected,as the next example s hows. 5

x,x 2,2 2 ,2 2,2

2,2 5,0 0 ,5 0,0

2,2 0,0 5 ,0 0,5

2,2 0,5 0 ,0 5,0

L et x > 2 and clos e to 2 . T he ga me ha s a unique equilibriu m, with payo vector

(x;x):The unique minima lcurb setco incides with this equilibrium.

Co nsider a dyna micin w hicha gents fro mtwo p o pula tio nsare ra ndomly matched to

play this game. As they play, they make mista kes with pro bability : When they make

a mistake, they put strictly p o sitive pro bability on each of their stra teg ies. After ea ch

roun d of play they learn the true d istribution of play in the la st p erio d a nd move to a

b est reply ag ainst this distribution. In the fra mework of this pa p er, this corresp o nds

to the cas eof 

1

=1: It is easily checked tha t for  =0 this Markov proces s h as three

recurrentcommunicatio ncla sses, onecorresp o ndingto the equilibrium,one toa cyclein

whichag ents use only their last three stra teg ies, and one to a cycle inw hichthe payo

vector isa lways (2 ;2):

References

Related documents

Generally the Bishop standing outside the pawn-chain has the greater range of action, but as the second player is often forced to keep the Bishop within the pawn-chain,

It has been accepted for inclusion in Mathematics, Physics, and Computer Science Faculty Articles and Research by an authorized administrator of Chapman University Digital Commons..

Respect for fundamental workers' rights and for environmental protection requirements should be ensured in a context of trade and economic expansion: the jobs created

The Central Procurement Board confirms in its letter dated 03 October 2008 to the Independent Review Panel that 19 technical bids were received.. However, when the bids were

The software was designed to provide a simple machine tool structural monitoring system for researchers using LabVIEW based multi sensor data acquisition program.. Several

(vouchers) sent to consumers mobile phones, as a consequence of SMS-interaction with a relevant digital signage advertisement, will be used if actions such as purchases

Since these major patterns are used in many real patches (almost 30%) to fix bugs, we may generate more successful patches by leveraging them in automatic patch

The second feature of the currency board (established in Article 3), the fixed exchange rate, would have changed in the following way: the anchor currency would have been a