41  Download (0)

Full text


T uto rial on Computational Game Theo NIPS 2002 Michael Kea rns Computer and Info rmation Science Universit y of P ennsylvania mk ea rns@cis.up enn.edu F o r an up dated and expanded version of these slides, visit http://www.cis.up enn.edu/  mk ea rns/nips02tuto rial


Thanks T o: Avrim Blum Dean F oster Sham Kak ade Jon Kleinb erg Daphne Koller John Langfo rd Michael Littman Yisha y Mansour Andrew Ng  Luis Ortiz  David P a rk es  La wrence Saul  Rob Schapire  Y oav Shoham  Satinder Singh  Moshe T ennenholtz  Manfred W a rmuth


Road Map (1) Examples of Strategic Con ict as Matrix Games Basics De nitions of (Matrix) Game Theo ry Notions of Equilib rium: Overview De nition and Existence of Nash Equilib ria Computing Nash Equilib ria fo r Matrix Games Graphical Mo dels fo r Multipla y er Game Theo ry Computing Nash Equilib ria in Graphical Games


Road Map (2) Other Equilib rium Concepts: { Co rrelated Equilib ria { Co rrelated Equilib ria and Graphical Games { Evolutiona ry Stable Strategies { Nash's Ba rgaining Problem, Co op erative Equilib Lea rning in Rep eated Games { Classical App roaches; Regret Minimizing Algo rithms Games with State { Connections to Reinfo rcement Lea rning Other Directions and Conclusions


Example: Prisoner's Dilemma Tw o susp ects in a crime a re interrogated in sepa rate ro Each has t w o choices: confess o r deny With no confessions, enough evidence to convict on lesser cha rge; one confession enough to establish guilt P olice o er plea ba rgains fo r confessing Enco de strategic con ict as a pa y o matrix : pa y o s confess deny confess 3 , 3 0 , 4 deny 4 , 0 1 , 1 What should happ en?


Example: Ha wks and Doves Tw o pla y ers comp ete fo r a valuable resource Each has a confrontational strategy (\ha wk") and a concil-iato ry strategy (\dove") V alue of resource is V ; cost of losing a confrontation is Supp ose C > V (think nuclea r rst strik e) Enco de strategic con ict as a pa y o matrix : pa y o s ha wk dove ha wk ( V C ) = 2 , ( V C ) = 2 V , 0 dove 0 , V V= 2 , V= 2 What should happ en?


A (W eak) Metapho r Actions of the pla y ers can be view ed as (bina ry) va riables Under any reasonable notion of \rationalit y", the pa y o ma-trix imp oses constraints on the joint b ehavio r of these t w va riables Instead of b eing p robabilistic , these constraints a re strategic Instead of computing conditional distributions given the other actions, pla y ers optimize their pa y o Pla y ers a re sel sh and pla y their b est resp onse


Basics of Game Theo ry Set of pla y ers i = 1 ;:::;n (assume n = 2 fo r no w) Each pla y er has a set of m basic actions o r pure strategies (such as \ha wk" o r "dove") Notation: a i will denote the pure strategy chosen b y pla y er Joint action: ~ a P a y o to pla y er i given b y matrix o r table M i ( ~ a ) Goal of pla y ers: maximize their o wn pa y o


Notions of Equilib ria: Overview (1) An equilib rium among the pla y ers is a strategic stando No pla y er can imp rove on their current strategy But under what mo del of communication, co o rdination, collusion among the pla y ers? All standa rd equilib rium notions a re descriptive rather than p rescriptive


Notions of Equilib ria: Overview No communication o r ba rgaining: Nash Equilib ria Communication via co rrelation o r sha red randomness: Co rrelated Equilib ria F ull communication and coalitions: (Asso rted) Co op erative Equilib ria Equilib rium under evolutiona ry dynamics: Evolutiona ry Stable Strategy W e'll b egin with Nash Equilib ria


Mixed Strategies Need to intro duce mixed strategies Each pla y er i has an indep endent distribution p i over their pure strategies ( p i 2 [0 ; 1] in 2-action case) Use ~p = ( p 1 ;:::;p n ) to denote the p ro duct distribution duced over joint action ~a Use ~ a  ~p to indicate ~ a distributed acco rding to ~p Exp ected return to pla y er i : E ~a  ~p [ M i ( ~ a )] (What ab out mo re general distributions over ~ a ?)


Nash Equilib ria A p ro duct distribution ~p such that no pla y er has a unilateral incentive to deviate All pla y ers kno w all pa y o matrices Info rmal: no communication, deals o r collusion allo w ed | every one fo r themselves Let ~p [ i : p 0 i ] denote ~p with p i replaced b y p 0 i F o rmally: ~p is a Nash equilib rium (NE) if fo r every pla y er i and every mixed strategy p 0 i , E ~a  ~p [ M i ( ~ a )]  E ~ a  ~p [ i : p 0 i ] [ M i ( ~ a )] Nash 1951: NE alw a ys exist in mixed strategies Pla y ers can announce their strategies


App ro ximate Nash Equilib ria A set of mixed strategies ( ~p 1 ;:::; ~p n ) such that no pla y er has \to o much" unilateral incentive to deviate F o rmally: ~p is an  -Nash equilib rium (NE) if fo r every pla y er i and every mixed strategy p 0 i , E ~a  ~p [ M i ( ~ a )]  E ~ a  ~p [ i : p 0 i ] [ M i ( ~ a )]  Motivation: intertia, cost of change,. . . Computational advantages


NE fo r Prisoner's Dilemma Recall pa y o matrix: pa y o s confess deny confess 3 , 3 0 , 4 deny 4 , 0 1 , 1 One (pure) NE: ( confess , confess ) F ailure to co op erate despite b ene ts Source of great and enduring angst in game theo


NE fo r Ha wks and Doves Recall pa y o matrix ( V < C ): pa y o s ha wk dove ha wk ( V C ) = 2 , ( V C ) = 2 V , 0 dove 0 , V V= 2 , V= Three NE: { pure: ( ha wk , dove ) { pure: ( dove , ha wk ) { mixed: ( Pr [ha wk] = V= C , Pr [ha wk] = V= C ) Ro ck-P ap er-Scisso rs: Only mixed NE


NE Existence Intuition Supp ose that ~p is not a NE F o r some pla y er i , must be some pure strategy giving higher return against ~p than p i F o r each such pla y er, shift some of the w eight of p i to this pure strategy Leave all other p j alone F o rmalize as continuous mapping ~p ! F ( ~p ) Brou w er Fixed P oint Theo rem : continuous mapping F of a compact set into itself must p ossess ~p  such that F ( ~p  ) = ~p One-dimensional case easy , high-dimensional diÆcult


Some NE F acts Existence not gua ranteed in pure strategies Ma y be multiple NE In multipla y er case, ma y be exp onentially many NE Supp ose ( p 1 ;p 2 ) and ( p 0 1 ;p 0 2 ) a re t w o NE Zero-sum: ( p 1 ;p 0 2 ) and ( p 0 1 ;p 2 ) also NE, and give pla y ers same pa y o s (games have a unique value ) General sum: ( p 1 ;p 0 2 ) ma y not be a NE; di erent NE ma give di erent pa y o s Which will be chosen? { dynamics, additional criteria, structure of interaction?


Computing NE Inputs: { P a y o matrices M i { Note: each has m n entries ( n pla y ers, m actions Output: { Any NE? { All NE? (output size) { Some pa rticula r NE?


Complexit y Status of Computing a NE Zero-sum, 2-pla y er case (input size m 2 ): { Linea r Programming { P olynomial time solution General-sum case, 2 pla y ers (input size m 2 ): { Closely related to Linea r Complementa rit y Problems { Can be solved with the Lemk e-Ho wson algo rithm { Exp onential w o rst-case running time { Probably not in P , but p robably not NP -complete?


Complexit y Status of Computing a NE (2) Maximizing sum of rew a rds NP-complete fo r 2 pla y ers General-sum case, multipla y er (input size m n ): { Simplical sub division metho ds (Sca rf 's algo rithm) { Exp onential w o rst-case running time { Not clea r small action spaces ( n = 2) help Missing: compact mo dels of la rge pla y er and action


2-Pla y er, Zero-Sum Case: LP F o rmulation Assume 2 pla y ers, M = M 1 = M 2 Let p 1 = ( p 1 1 ;:::;p m 1 ) and p 2 be mixed strategies Minimax theo rem sa ys: max p 1 min p 2 f p 1 Mp 2 g = min p 2 max p 1 f p 1 Mp 2 Solved b y standa rd LP metho ds


General Sum Case: A Sampling F olk Theo rem Supp ose ( p 1 ;p 2 ) is a NE Idea: let ^ p i be an empirical distribution b y sampling p If w e sample enough, ^ p i and p i will get nea rly identical returns against any opp onent strategy (unifo rm convergence) Thus, (^ p 1 ; ^ p 2 ) will be  -NE F rom Cherno b ounds, only  (1 = 2 ) log( m ) samples suÆces Yields ( m ) (1 = 2 ) log ( m ) algo rithm fo r app ro ximate NE


Compact Mo dels fo r Multipla y er Games Even in 2-pla y er games, computational ba rriers app ea Multipla y er games mak e things even w o rse Ma yb e w e need b etter rep resentations See accompanying P o w erP oint p resentation.


Co rrelated Equilib ria NE ~p is a p ro duct distribution over the joint action ~ a SuÆces to gua rantee existence of NE No w let P be an a rbitra ry joint distribution over ~ a Info rmal intuition: assuming all others pla y \their pa rt" of P , i has no unilateral incentive to deviate from P Let ~ a i denote all actions except a i Sa y that P is a Co rrelated Equilib rium (CE) if fo r any pla y i , and any actions a; a 0 fo r i : X ~ a i P ( ~ a i j a i = a ) M i ( ~ a i ;a )  X ~a i P ( ~ a i j a i = a ) M i ( ~a i ;a 0 )


Advantages of CE Conceptual: Some CE pa y o vecto rs not achievable b y NE Everyda y example: traÆc signal CE allo ws \co op eration" via sha red randomization Any mixture of NE is a CE | but there a re other CE as w ell Computational: note that X ~a i ( P ( ~ a i ;a i = a ) =P ( a i = a )) M i ( ~ a i ;a )  X ~a i ( P ( ~a i ;a i = a ) =P ( a i = a 0 )) M i ( ~a i ;a is linea r in va riables P ( ~ a i ;a i = a ) = P ( ~ a ) Thus have just a linea r feasibilit y p roblem 2-pla y er case: compute CE in p olynomial time


Co rrelated Equilib ria and Graphical Games No matter ho w complex the game, NE facto r Thus, NE alw a ys have compact rep resentations Any mixture of NE is a CE Thus, even simple games can have CE of a rbitra ry complexit Ho w do w e rep resent the CE of a graphical game? Restrict attention to CE up to exp ected pa y o equivalence


Ma rk ov Nets and Graphical Games Let G be the graph of a graphical game Can de ne a Ma rk ov net MN ( G ): { F o rm cliques of lo cal neighb o r h ood s in G { F o r each clique C , intro duce p otential function  C  0 just the settings in C { Ma rk ov net semantics: Pr [ ~ a ] = (1 = Z ) Q C  ( ~ a C ) F o r any CE of a game with graph G, there is a CE with identical exp ected pa y o s rep resentable in MN ( G ) Link be t w een strategic and p robabilistic structure If G is a tree, can compute a (random) CE eÆciently


Evolutiona ry Game Theo ry A di erent mo del of multipla y er games Assume an in nite p opulation of pla y ers | but that meet in random, pairwise confrontations Assume symmetric pa y o matrix M (as in Ha wks and Doves) Let P be the distribution over actions induced b y the (aver-aged) p opulation mixed strategies p i Then tness of p i is exp ected return against P Assume evolutiona ry dynamics : the higher the tness of p i the mo re o sp ring pla y er i has in the next generation


Evolutiona ry Stable Strategies Let P be the p opulation mixed strategy Let Q be an invading \mutant" p opulation Let M ( P; Q ) be the exp ected pa y o to a random pla y er from P facing a random pla y er from Q Supp ose p opulation is (1  ) P + Q Fitness of incumb ent p opulation: (1  ) M ( P; P ) + M ( P; Q Fitness of invading p opulation: (1  ) M ( Q; P ) + M ( Q; Q ) Sa y P is an ESS if fo r any Q 6= P and suÆciently small  > 0, (1  ) M ( P; P )+ M ( P; Q ) > (1  ) M ( Q; P ) + M ( Q; Q ) Either M ( P; P ) > M ( Q; P ) o r M ( P; P ) = M ( Q; P ) and M ( P; Q ) > M ( Q; Q )


ESS fo r Ha wks and Doves Recall pa y o matrix ( V < C ): pa y o s ha wk dove ha wk ( V C ) = 2 , ( V C ) = 2 V , 0 dove 0 , V V= 2 , V= ESS: P (ha wk) = V= C


Rema rks on ESS Do not alw a ys exist! Sp ecial t yp e of (symmetric) NE Biological eld studies Sources of randomization Mixed strategies vs. p opulation averages Ma rk et mo dels


Richer Game Rep resentations Have said quite a lot ab out single-shot matrix What ab out: { Rep eated games { Games with state (chess, check ers) { Sto chastic games (multi-pla y er MDPs) Can alw a ys (painfully) exp ress in no rmal fo rm No rmal fo rm equilib ria concepts relevant


Rep eated Games Still have underlying game matrices No w pla y the single-shot game rep eatedly , examine tive o r average rew a rd Game has no internal state (though pla y ers might) Relevant detail: ho w many rounds of pla y?


Lea rning in Rep eated Games \Classical" algo rithms: { Fictitious Pla y: b est resp onse to empirical distribution opp onent pla y { V a rious (sto chastic) gradient app roaches Common question: when will such dynamics converge to NE? P ositive results fairly restrictive Generalizations to pa rametric strategy rep resentations?


Exp onential Up dates and Regret Minimization View rep eated pla y as a sequence of trials against an a rbitra opp onent Maintain a w eight on each pure strategy On each trial, multiply each w eight b y a facto r exp onentially decreasing in its regret General setting: nea r-minimization of regret on sequence, but no gua rantee of NE Zero-sum case: t w o \copies" will converge to NE Regret minimization and NE vs. CE


Rep eated Games and Bounded Rationalit y Consider restricting the complexit y of strategies in T rounds of a rep eated game Example: next action computed b y a nite state machine the histo ry of pla y so fa r New equilib ria ma y a rise from the restriction Prisoner's Dilemma: if numb er of states is o (log( T )), mutual co op eration (denial) b ecomes a NE


Games with State Standa rd bo a rd games: chess, check ers Often feature pa rtial o r hidden info rmation (p ok Might involve randomization (backgammon)


Sto chastic Games Generalize MDPs to multiple pla y ers A t each state s , have pa y o matrix M s i fo r pla y er i Immediate rew a rd to i at state s under joint action ~ a is M s i ( ~ a Ma rk ovian dynamics : P ( s 0 j s; ~a ) Discounted sum of rew a rds Every pla y er has a p olicy  i ( s ) Generalize optimal p olicy to (Nash) equilib rium (  1 ;:::; n ) Don't just have to w o rry ab out in uence on future state , but every one else's p olicy Explo ration even mo re challenging


Sto chastic Games and RL F o r xed p olicies of opp onents, can de ne value functions What happ ens when indep endent Q-lea rners pla y? Results with di erent amounts and t yp e of sha red Generalization of E 3 algo rithm to sto chastic games Generalization of spa rse sampling metho ds


Conclusions Classical game theo ry a rich and va ried fo rmalism fo r strate-gic reasoning , a complement to mo re passive reasoning Lik e p robabilit y theo ry , p rovides sound foundations but lacks emphasis on rep resentation and computation Computational game theo ry aims to p rovide these emphases Many substantive connections to NIPS topics already under w a y (graphical mo dels, lea rning algo rithms, dynamical tems, reinfo rcement lea rning) :: : :: : but even mo re lie ahead. Come nd me to chat ab out op en p roblems!


Contact Info rmation Email: mk ea rns@cis.up enn.edu W eb: www.cis.up enn.edu/  mk ea rns This tuto rial: www.cis.up enn.edu/  mk ea rns/nips02tuto { will mo rph into P enn course page COL T/SVM 2003 sp ecial session on game theo ry




Related subjects :