Reverse-Mapping Strategies - Chapter 3 Endnotes

3.7 Chapter 3 Endnotes

3.7.2 Reverse-Mapping Strategies

Once an equilibrium strategy for an abstracted game has been obtained, there are many ways it can be used to play the real game. The most obvious is to simply do the reverse mapping of the abstraction to translate the strategy back onto the full game. However, numerous embellishments are worth consideration. One method in particular, called bucket splitting, was implemented but was not discussed in the original paper, due to space limitations.

Suppose that for a particular context we have a bucket 4 hand, and the equilibrium strategy gives us a mixed strategy of {0.00, 0.60, 0.40} for fold, call, and raise, respectively. The straightforward way to use this information is to spin a spinner (i.e., generate a random number in the range 0.0 - 1.0) and play the corresponding action. However, this treats all hands in the bucket identically. That was a necessary assumption for accomplishing the coarse-granularity abstraction, but when the time comes to choose an action for the real game, it might be advantageous to distinguish between hands within that wide bucket range.

In simple bucket splitting, we distinguish between the top half and the bottom half of the hands in the bucket, and then bias our actions in a way that should produce better results. In the given example, we might call 100% of the time with hands from the bottom half, and raise 80% of the time with hands from the top half. Thus, we maintain the same recommended game-theoretic frequency (ratio) of actions overall, but we improve our hand selection for those designated actions. More generally, we can split the class into as many subdivisions as we like, up to the total number of distinct hands in the class. We call this more general process bucket splintering. In the extreme case, almost all hands in the class can be assigned a pure strategy, while the mixed strategy is still maintained for the class as a whole.

Technically, aligning the actions with an ordering of hands by strength could vi- olate the equilibrium, and could potentially leak information. However, in practice the differences might be almost impossible to detect. Ironically, we can take advan- tage of the stochastic noise and partial observability of the game (which are usually impediments to us), to provide more than adequate obfuscation of our slightly bi- ased actions. Moreover, in view of the crudeness of the entire approach, we can only claim that pseudo-optimal playing strategies are “in the spirit” of equilibrium strategies. Going to any length to preserve the integrity of the abstracted mixed strategies would almost certainly be needlessly pedantic.

Bibliography

[1] D. Billings. The First International RoShamBo Programming Competition. The International Computer Games Association Journal, 23(1):42–50, 2000. [2] D. Billings. Thoughts on RoShamBo. The International Computer Games

Association Journal, 23(1):3–8, 2000.

[3] D. Billings, A. Davidson, J. Schaeffer, and D. Szafron. The challenge of poker. Artificial Intelligence, 134(1–2):201–240, January 2002.

[4] D. Koller and N. Megiddo. The complexity of two-person zero-sum games in extensive form. Games and Economic Behavior, 4(4):528–552, 1992.

[5] D. Koller, N. Megiddo, and B. von Stengel. Fast algorithms for finding ran- domized strategies in game trees. In Annual ACM Symposium on Theory of Computing, STOC’94, pages 750–759, 1994.

[6] D. Koller and A. Pfeffer. Representations and solutions for game-theoretic problems. Artificial Intelligence, 94(1):167–215, 1997.

[7] H. W. Kuhn. A simplified two-person poker. In H. W. Kuhn and A. W. Tucker, editors, Contributions to the Theory of Games, volume 1, pages 97– 103. Princeton University Press, 1950.

[8] J. F. Nash. Equilibrium points in N-person games. Proceedings of the National Academy of Sciences, 36:48–49, 1950.

[9] M. Sakaguchi and S. Sakai. Solutions of some three-person stud and draw poker. Mathematica Japonica, 6(37):1147–1160, 1992.

[10] A. Selby. Optimal heads-up pre-flop hold’em. WWW, 1999. http://www.archduke.demon.co.uk/simplex/.

[11] J. Shi and M. Littman. Abstraction methods for game theoretic poker. In T. A. Marsland and I. Frank, editors, Computers and Games 2000, LNCS 2063, pages 333–345. Springer-Verlag, 2002.

[12] D. Sklansky and M. Malmuth. Hold’em Poker for Advanced Players. Two Plus Two Publishing, 2nd edition, 1994.

[13] K. Takusagawa. Nash equilibrium of Texas Hold’em poker, 2000. Undergrad- uate thesis, Computer Science, Stanford University.

[14] J. von Neumann and O. Morgenstern. The Theory of Games and Economic Behavior. Princeton University Press, 1944.

Chapter 4 Adaptive Imperfect Information

Game-Tree Search (2004)

Game-Tree Search with Adaptation in

Stochastic Imperfect Information Games

1 4.1 Introduction

Modeling the preferences and biases of users is an important topic in recent AI research. For example, this type of information can be used to anticipate user program commands [14], predict web buying and access patterns [8], and automatically generate customized interfaces [23]. It is usually easy to gather a corpus of data on a user (e.g., web page accesses), but mining that data to predict future pat- terns (e.g., the next web page request) is challenging. Predicting human strategies in a competitive environment is even more challenging.

The game of poker has become a popular domain for exploring challenging AI problems. This has led to the development of programs that are competitive with strong human players. The current best program, PSOPTI (also known as SPAR- BOT), is based on approximating a game-theoretic Nash equilibrium solution [4].

However, there remains a significant obstacle to overcome before programs can

T. Schauenberg, N. Burch, M. Bowling, R. Holte, J. Schaeffer, and D. Szafron. Game-tree search with adaptation in stochastic imperfect-information games. In H. J. van den Herik, Y. Bj¨ornsson, and N. Netanyahu, editors, Computers and Games: 4th International Conference, CG’04, volume 3846 of Lecture Notes in Computer Science, pages 21–34. Springer-Verlag GmbH, 2004.

play at a world-class level: opponent modeling. As the world-class poker player who played against PSOPTIinsightfully recognized [4]:

“You have a very strong program. Once you add opponent modeling to it, it will kill everyone.”

This issue has been studied in two-player perfect information games (e.g., [16, 10, 15]), but has not played a significant role in developing strong programs. In poker, however, opponent modeling is a critical facet of strong play. Since a player has imperfect information (does not know the opponent’s cards), any information that a player can glean from the opponent’s past history of play can be used to improve the quality of future decisions. Skillful opponent modeling is often the differentiating factor among world-class players.

Opponent modeling is a challenging learning problem, and there have been sev- eral attempts to apply standard machine learning techniques to poker. Recent ef- forts include: neural nets [5], reinforcement learning [11], and Bayesian nets [18], which have had only limited success. There are a number of key issues that make this problem difficult:

1. Learning must be rapid (within 100 games, preferably fewer). Matches with human players do not last for many thousands of games, but the information presented over a short term can provide valuable insights into the opponent’s strategy.

2. Strong players change their playing style during a session; a fixed style is predictable and exploitable.

3. There is only partial feedback on opponent decisions. When a playerfolds (a common scenario), their cards are not revealed. They might have had little choice with a very weak hand; or they might have made a very good play by folding a strong hand that was losing; or they might have made a mistake by folding the best hand. Moreover, understanding the betting decisions they made earlier in that game becomes a matter of speculation.

This paper presents a novel technique to automatically compute an exploitive counter-strategy in stochastic imperfect information domains like poker. The program searches an imperfect information game tree, consulting an opponent model at all opponent decision nodes and all leaf nodes. The most challenging aspects to the computation are determining: 1) the probability that each branch will be taken at an opponent decision node, and 2) theexpected value (EV) of a leaf node. These difficulties are due to the hidden information and partial observability of the domain, and opponent models are used to estimate the unknown probabilities. As more games are played, the opponent modeling information used by the game-tree search generally becomes more accurate, thus improving the quality of the evalu- ations. Opponents can and will change their style during a playing session, so old data needs to be gradually phased out.

This paper makes the following contributions:

1. Miximax and Miximix, applications of the Expectimax search algorithm to stochastic imperfect information adversarial game domains.

2. Using opponent modeling to refine expected values in an imperfect information game-tree search.

3. Abstractions for compressing the large set of observable poker situations into a small number of highly correlated classes.

4. The program VEXBOT, which convincingly defeats strong poker programs

including PSOPTI, and is competitive with strong human players.

In document University of Alberta (Page 119-124)