• No results found

Knowledge and Strategy-based Computer Player for Texas Hold'em Poker

N/A
N/A
Protected

Academic year: 2021

Share "Knowledge and Strategy-based Computer Player for Texas Hold'em Poker"

Copied!
102
0
0

Loading.... (view fulltext now)

Full text

(1)

Knowledge and Strategy-based

Computer Player for

Texas Hold'em Poker

Ankur Chopra

Master of Science

School of Informatics

University of Edinburgh

(2)

Abstract

The field of Imperfect Information Games has interested researchers for many years, yet the field has failed to provide good competitive players to play some of the complex card games at the master level. The game of Poker is observed in this project, along with providing two Computer Poker Player solutions to the gaming problem, Anki – V1 and Anki – V2. These players, along with a few generic ones, were created in this project using methods ranging from Expert Systems to that of Simulation and Enumeration. Anki – V1 and Anki – V2 were tested against a range of hard-coded computer players, and a variety of human players to reach the conclusion that Anki – V2 displays behaviour at the intermediate level of human players. Finally, many interesting conclusions regarding poker strategies and human heuristics are observed and presented in this thesis.

(3)

Acknowledgments

I would like to thank Dr. Jessica Chen-Burger for her overwhelming support and help throughout the life-cycle of this project, and for the late nights she spent playing my Poker Players. I would also like to thank Mr. Richard Carter for his insight into the workings of some of the Poker players, and all the authors of the research quoted in my bibliography, especially the creators of Gala, Loki, Poki and PsOpti.

I would also like to thank my parents, who have always been there to me, and inspire me every step of the way. And finally, I would like to acknowledge the calming contribution of my lab-fellows, without whom, completing this dissertation couldn't have been nearly this much fun.

(4)

Declaration

I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified.

(5)

Table of Contents

Chapter 1 – Introduction...1

Chapter 2 – Literature Review...3

2.1 - Imperfect Information games...4

2.2 - Poker history...4

2.3 - Gala...5

2.4 - Loki...7

2.5 - Poki...9

2.6 - PsOpti...11

Chapter 3 – Playing Poker...14

3.1 – Basic rules and aim of tournament...14

3.2 - Sequence of each game (specific to Texas Hold'em)...15

3.3 - Betting Rounds...15

3.4 - Winning combinations...16

3.5 - Basic player/strategy types...17

3.6 - Advanced strategies and Poker complexity...18

3.7 - Abstraction Techniques of 2-Person Bet-Limit Poker...19

Chapter 4 – Design and Methodology...20

4.1 - Choice of Prolog...20

4.2 – General...20

4.2.1 - Incorporation of Rules...20

4.2.2 - Basic Two-Human-Player Poker...23

4.2.3 - General Strategic Behavior...24

4.3 - Anki – V1...25

4.3.1 - Strategy based player...25

4.3.2 - Overview of functioning and method...26

(6)

4.4 - Anki – V2...33

4.4.1 - A Randomised Rational Strategy Player...33

4.4.2 - Statistical Method vs. Random Generator ...34

4.4.3 - Formulas and Evaluation...35

4.4.3.1 - Pseudo games and Winning Potential...36

4.4.3.2 - Calculation of Probability Triples...37

4.4.3.3 – Betting Strategies, Randomised Numbers and Enumeration...38

Chapter 5 – Testing and Evaluation...41

5.1 - System Test – White and Black Box Testing...41

5.2 - Random -1 player's Evaluation...45

5.3 - Evaluation of Anki – V1...46

5.3.1 - Anki – V1 vs. Computer players...46

5.3.2 - Anki – V1's Evaluation against Human Players...50

5.4 - Evaluation of Anki - V2...54

5.4.1 - Anki - V2 vs. Computer players...55

5.4.2 - Anki – V2's Evaluation against vs. Human...58

5.5 - Anki – V1 vs. Anki – V2...60

5.5.1 – Direct Anki Comparison...60

5.5.2 – Anki Comparison in Human and Random Players...61

5.6 - Anki and the Previous Research...65

Chapter 6 – Conclusion and Future Work...68

6.1 - General conclusions...68

6.2 - Conclusions of Anki – V1 and Anki – V2...68

6.3 - Conclusions of Poker Game and Betting Strategies...70

6.4 - Future Work...71

6.4.1 – The Anki Poker Teaching Tool...71

6.4.2 - Testing and Extensions to project...72

6.4.3 - Resource based extensions to project...73

Bibliography...75

(7)

Chapter 1 – Introduction

Poker has recently become one of the most popular games in the gaming community with many online poker rooms and programs. Despite the high level of human interest in this game, the computer programs and existing AI of this game are still in its infancy. A very interesting and influential paper was written in this field in 1995 by Daphne Koller on Imperfect Information Games[1]; and after more than a decade of research and advancements, the best of Poker programs are still known to lose regularly to master level players.

The concept of Imperfect Information Games makes this field and project very significant. The AI community has come leaps and bounds in creating world champion level players for Perfect Information Games, e.g. chess, backgammon, etc. The approach of these games is very different from Imperfect Information Games, as all players can view the entire state of the game or gaming world at any point of time. This allows all the required information to be coded into the AI player's game design. On the other hand, extensive work done on the Imperfect Information field has only yielded either theoretical solutions or restricted Poker players. The statistically best Poker player, PsOpti, is considered better than most human players, but is still below the master level. PsOpti, also, only works with a restricted form of Texas Hold'em Poker, a variant of poker, optimised for 2 players.

The reason why Imperfect Information Games are considered hard is because it requires players, human and AI, to cope with uncertainty, taking risks and deploying strategies. Also in such games, there is no concept of strict strategy, whereby a method could be found which would always lead to the optimal result. To have a strict strategy in most Imperfect Information Games, including poker, is considered suboptimal, as it provides additional information to the opponent. Thus, in addition to playing well with good strategies, an aspect of randomness needs to be incorporated into the player's strategy. It is sometimes beneficial to play a bad game, to encourage the opponent to bet more in future games, and thus win more in the long run.

(8)

represent a Knowledge Base, and thus Prolog has been chosen as the language for implementing this project. There are a variety of computer Poker players that were created, including Random – 1, Random – 2,Anki – V1 and Anki – V2.

The players created in this project are tested against many pre-coded Strict Strategy computer players, and also against various categories of Human Players, ranging from Beginner to

Advanced. A comparison to previous research has also been done, all of which is documented and presented in the form of results and conclusions.

The next chapter, Chapter 2, presents the Literature Review on the subjects of Imperfect

Information Games and previous Computer Poker Player Solutions to the Poker gaming problem. It also provides in-depth information about three of the best and well-documented players in the field, i.e. Loki, Poki and PsOpti.

Chapter 3, Playing Poker, deals with the Poker game and Texas Hold'em in particular. The game, its rules, sequence of play and winning or losing conditions are presented along with the basic and complex strategies used by Human Players during the game. The abstraction techniques being utilised in this program, i.e. Bet-Limit Two-Player Texas Hold'em Poker Player, is explained in Section 3.7.

Chapter 4, Methodology, presents the design and structure of the system and computer players created. The system architecture is represented along with the strategies and specifications of Random – 1, Random – 2, Anki – V1 and Anki – V2.

The testing documentation, results and evaluation of Anki – V1 and Anki – V2 is presented in Chapter 5, Testing and Evaluation. Both these players are compared against strict or random strategy players explained in Chapter 4, against Human Players and also against each other. They are finally compared to the previous research and Computer Poker Players created in the field.

(9)

Chapter 2 – Literature Review

Game theory has been a very prominent area of research since the 1940's. It has been utilised and successfully applied to various other fields such as

“...inspection of nuclear facilities, models of biological evolution, the FCC auctions for radio and television bandwidth, and much more.” [1]

It has also helped accomplish big advances in the AI gaming community. AI player capabilities have reached an extent whereby games such as Chess and Backgammon are no longer under human dominance. Computer players compete against grandmasters in such games and win on a very frequent basis. Other games with lesser strategical requirements can no longer even be won by humans, when playing against a competent computer player.

It is, however, found that AI dominance exists only in one section of the gaming field. The games mentioned above and others like it are called Perfect Information Games. These are games in which the entire state of the game is visible to all players, or, all players of the game have equal “Knowledge”. This complete knowledge of the game state allows brute-search algorithms to compute scenarios and possible future moves. Success has been achieved in this field through improvement in speed of searches, for example, Deep Blue searched over 250 million chess positions a second to allow it to make the most optimal move.

The other section of the gaming field lies in Imperfect Information Games. Examples of these are Bridge, Poker and RoShamBo (also known as Stone-Paper-Scissors). These are games that only allow partial world knowledge to be available to players, and requires them to base decisions on the same uncertainty. In the case of poker, or other similar card games, imperfect information is created by the hidden cards that the opponent holds. Similarly, as your cards are hidden, your opponent too has imperfect information.

(10)

Perfect Information Games have a well-defined optimal move, i.e. there is always a move which is at least as good as some other move. In addition to this, if the opponent gains knowledge of this move, it would make no difference to the current play, as it is by definition, the optimal move. This property is exploited by search algorithms to find that optimal move in the game tree.

2.1 - Imperfect Information games

“Although game-tree search works well in perfect-information games, there are problems in trying to use it for imperfect information games such as bridge (and poker). The lack of knowledge about the opponents' possible moves gives the game tree a very large branching factor, making the game tree search infeasible.” [4] Imperfect Information Games are played with a constant knowledge-gap between players with both having partial knowledge of the game state. Unlike Perfect Information Games, a

deterministic strategy cannot be utilised in these games as that would allow the opponent to have an advantage of guessing the complete knowledge of the game. For this reason, optimal strategies in Imperfect Information Games can be described as a random combination of strategies with optimal evaluation of the partially known game state.

“Kuhn ([19]) has shown for a simplified poker game that the optimal strategy does, indeed, use randomization.” [1]

2.2 - Poker history

It is evident from the previous citation that people have been interested in computer poker players for a very long time. Apart from Kuhn, illustrious mathematicians and economists have also worked on theoretical and practical solutions for poker. This research field dates back to John von Neumann and John Nash, who used simplified poker to illustrate fundamental principles

[16][17][18].

With such a long history of focus on Imperfect Information Games, especially Poker, it would be expected that the current computer players at least be of commendable standard. However, this is not the case. The most prolific players have all been created under the same umbrella of Darse Billings' work. Also, the best full-game Poker players are intermediate quality at best. The

(11)

creators of these programs confess to this fact in [10][13]. Their latest player, PsOpti is shown to perform well at the master level but only after a variety of changes to game-play, for example, restricting the game to a two person play. Most games of poker can see between 8 to 10 players playing a single game (full-game Poker).

There is now a need for Imperfect Information Game computer players to catch up with their Perfect Information Game counterparts and offer a similar challenge to the human gaming community. This is the purpose on the basis of which this project has been proposed and evaluated.

Over the previous two decades, a variety of solutions have been offered to the computer Poker player creation problem. These solutions can be sub-divided into the following four categories;

Theoretical, Expert Systems, Simulation and Enumeration and finally, Game-Theoretic.

These terms are further explained in detail below along with a case study of their most prominent example.

2.3 - Gala

Gala was created by Daphne Koller and Avi Pfeffer, first documented in [1]. It offers a theoretical solution to a restricted form of Poker. Once again, a two player system was evaluated, to which Gala, a logic programming language, was applied.

Even though it was only a theoretical solution, Gala offered to write down the rules and

constraints of Poker play in pseudo code form for the first time. This allowed future programmers such as Darse Billings and this project to gain insight into the sequential methods and workings of computer Poker players. Darse Billings, one of the major contributors of modern computer Poker players frequently refers to both [1] and [5], the works of Koller and Pfeffer.

(12)

In addition to the language, Gala also provided algorithms that prevented exponential growth of the game-trees with Imperfect Information.

Figure 1. An abbreviated Gala description of Poker, from [1]

However, as the game-trees were still found to be proportional to the unknown states in the game, the algorithm was found to be impractical for its application to full-scale poker. The authors concluded,

“. . . we are nowhere close to being able to solve huge games such as full-scale poker, and it is unlikely that we will ever be able to do so” [1]

(13)

This statement of Koller and Pfeffer and been repeated many times in the various papers by Darse Billings, e.g. [10].

2.4 - Loki

Loki was one of the first ever full-game computer poker players. It was developed by Darse Billings et. al., and is documented in [11]. It was created in two stages. Both versions have an expert knowledge and rule-based engine hard-coded into them, i.e. the experience of Darse Billings as a master poker player facilitated the formation of a form of game tree. This game tree consisted of scenarios and strategies suggested by the expert, and thus the game engine could brute-search through these strategies to obtain a near-optimal or random solution.

The first version of the program relied solely on this expert knowledge to play against other players. It can be argued if this was truly an AI player or just the code representation of a particular master's player. Strategies were given priority rankings, and were randomly selected using weights, this allowed them to be relatively random.

The first version, created on 1998, was shown to perform well solely against beginners, as even with the expert system, certain situations resulted in near deterministic working. As discussed earlier, any form of fixed strategy by an opponent can be used by the player to their advantage, and this occurred with Loki when it played against more experienced players. Figure 2 shows the basic Loki Architecture, and its functions.

(14)

Figure 2. Loki Architecture

The second version, updated in 1999, added more computing concepts into the player. It, however, still kept the core expert engine,

“ Loki uses expert knowledge in four places:

1. Pre-computed tables of expected income rates are used to evaluate its hand before the pre-flop, and to assign initial weight probabilities for the various possible opponent hands.

2. The Opponent Modeler applies re-weighting rules to modify the opponent hand weights based on the previous weights, new cards on the board, opponent betting actions, and other contextual information.

3. The Hand Evaluator uses enumeration techniques to compute hand strength and hand potential for any hand based on the game state and the opponent model.

4. The Bettor uses a set of expert-defined rules and a hand assessment provided by the Hand Evaluator for each betting decision: fold, call/check or raise/bet.” [13]

In the above statement, there is a mention of 'hand strength' and 'hand potential'. Hand Strength is the probability that the cards held by a player are the best possible cards in the game. On the other

(15)

hand, Hand Potential refers to the probability of the current cards becoming the best cards in the game in the future. These two terms together form the basis of a Hand Evaluator.

There were major changes installed in the hand evaluator. Figure 3 shows the transformation of Loki from one form to another, and its reduction of dependence on expert knowledge. Another major update was the introduction of simulation to determine the best future action. This is explained in detail under the next sub-heading of Poki, as the transformation from Loki – 2 to Poki was almost immediate.

Figure 3. Transformation from Loki - 1 to Loki – 2

Loki offered some useful insight into the design of the player created in this project. The most prominent of these was Bucketing, which was incorporated in Loki – 2 and is present in Anki - V1. Bucketing is a method of abstraction, whereby, scenarios or game states are bunched together in groups. These groups are based on the expected reaction of the player, for example, in poker, hands of King-King and Ace-Ace can be grouped together as they are nearly as good as each other. More poker glossary and information is available from Chapter 3.

Another important concept introduced in Loki - 2 was the usage of Triples to represent

probability values for fold, bet and raise. The definition of these terms can be found in Chapter 3, but they are basically the options available to a player during the course of a Poker game betting round. This concept has also been incorporated into Anki – V2.

(16)

2.5 - Poki

Poki, the step up from Loki, introduced the strategy of Simulation and Enumeration to the computer poker players. It was created in 1999 – 2000, once again by Darse Billings et. al., [10]. Poki is also currently the best full-game Texas Hold'em Poker player, the variant of poker under consideration in this project. Once again, more information is available in Chapter 3 regarding Texas Hold'em.

“Poki supports a simulation-based betting strategy. It consists of playing out many likely scenarios, keeping track of how much each decision will win or lose. Every time it faces a decision, Poki invokes the Simulator to get an estimate of the expected value (EV) of each betting action. ... A single trial consists of playing out the hand from the current state of the game through to the end. Many trials produce a full information simulation.” [10]

Enumeration, on the other hand, refers to the updating of the current belief or partial information state, with data received through interaction from the outside world. In the case of any card game, this can be implemented in the form of increasing the apparent chance of good cards with the opponent, if the opponent continues to bet excessive money during the game. This requires some form of an opponent modeler, which tries to guess an opponent's hand or strategy through an opponent's actions.

Figure 4 shows the architecture of Poki. It can be seen how it is very similar to that of Loki – 2 in Figure 3. The most major upgrade that resulted in a different name was concerning the opponent modeling.

(17)

Figure 4. Poki's Architecture

In spite of the above upgrades to the AI player, Poki was still found to be inadequate in certain situations,

“The program’s weaknesses are easily exploited by strong players, especially in the 2-player game.” [2]

So, the Computer Poker Community managed to increase their level of play, but they were still below par in comparison to good human players.

Anki – V2, the system developed in this thesis, also uses the concepts of Simulation and

Enumeration in addition with Triple Generator to decide on its actions during a betting round. This is further explained in Chapter 4, Design and Methodology.

2.6 - PsOpti

PsOpti is the most recent player in the market, created once again by Darse Billings et. al. in 2003 [2]. It is a player based on game-theoretic strategies, using concepts of game theory. Ironically, the approach is quite similar to Gala, but only at the most fundamental level. PsOpti is the first game-theoretic AI player that was successfully tested against a master poker player, and was

(18)

“A game-theoretic solution produces a sophisticated strategy that is well-balanced in all respects. It is also safe and "robust", because it is guaranteed to obtain the theoretical (optimal) value of the game against *any* other strategy.“ [1]

This value allows the player to play a pseudo-optimal game throughout its tournament. As it finds the value against any game, this also holds true for randomised, aggressive and bluffing game-play, all of which exist to a very high extent in master level poker play. Figure 5 shows PsOpti's player against a master level player 'the count'. The unit of measure used here is 'small bets won', which can basically be replaced in this case by 10s of dollars.

Figure 5. “the count's” performance against PsOpti

However, this player is severely restricted in its game play.

“... abstraction techniques are combined to represent the game of 2- player Hold’ em, having size O(1018), using closely related models each having size O(107) .” [2]

Some of these abstraction techniques remove the possibility of PsOpti being utilised in the full-game field in its current state. For example, one of the abstractions removes one of the betting rounds from the game of play, and another restricts the play to only a player game. A two-player game can be viewed as a simplified version of a full-game, wherein upto ten people can be playing at the same time.

(19)

This can still be seen as a great beginning, as all the abstractions are mostly computational reductions, and such players can easily be scaled up once they have proven their might.

“The drawback is that this type of strategy is fixed -- it can't adapt to the style of a particular opponent. Although it will break even against any opponent, it might only win at a slow rate against a weak or predictable player.” [1]

Another manner of interpreting this statement is that the game-theoretic player evaluates its current state and the state of the game, to try and obtain a near-optimal strategy. It has has no opponent modeling and thus cannot take advantage of human inexperience or mistakes. Opponent Modeling cannot be expressed in a similar manner as a game-theoretic approach; and thus the creators of PsOpti are currently looking into a player which plays game-theoretic till it develops enough knowledge about an opponent, so that it can switch to more sub-optimal strategies. These are strategies, which do not necessarily give the best result against an optimal player, but are the expertly determined best response to the manner in which the opponent is playing.

Many of the abstractions used in PsOpti's creation are also being used to create the program of this project. This is because these abstractions offer a smaller set of constraints to satisfy, which is a commodity in a time constrained project such as this. Also, PsOpti's creation shows how these abstractions are mere computational releases of regulations, and offer a faster cycle of design, prototyping and results which are still relevant to the full-game analysis.

(20)

Chapter 3 – Playing Poker

3.1 – Basic rules and aim of tournament

Poker is a game played with a normal deck of 52 cards. The cards are divided into four set of thirteen cards, with each set have a distinctive symbol. These sets are called suits, and are named; spades, clubs, diamonds and hearts. The first two suits are usually represented in black, and the latter two in red. Each suit has cards from two till ten, followed by Jack, Queen, King and Ace, in order of value.

The aim of a poker game is to obtain the best possible pattern with a 'hand' of 5 cards. 'Hand' has two definitions in poker, it refers to the final set of cards available to a player to form his/her pattern, and also the initial cards that are provided to each player and are hidden from the rest. This project will use the word in both contexts, but the difference can be found through the use of the words 'final hand' and 'initial hand'.

A game in Poker is defined as the entire sequence of play from receiving the initial hand, to the point where some-one 'folds'; forfeits the current money in the pot, or a showdown determines which of the players have the strongest final hands. A game starts with a couple of blinds, which is small amount of money put into the game by two players without looking at individual initial hands. This allows each game to be worth at least some amount of money, and encourages more aggressive strategies by players. The game ends with the last remaining player, or the strongest player taking all the money in the 'pot' after the various betting rounds.

A poker tournament consists of various players who start with equal money. As the games proceed, any player who reaches the end of their monies is said to have been eliminated and leaves the table. The last person left on the table with all the money is the winner of that tournament.

Basic aim of poker can be described as “Win as much as you can, but when you about to lose, lose as little as you can.” Quite clearly the statement portrays the need for evaluation and deception.

(21)

3.2 - Sequence of each game (specific to Texas Hold'em)

“ ... the game of Texas Hold’em, the poker variation used to determine the world champion in the annual World Series of Poker. Hold’em is generally considered to be the most strategically complex poker variant that is widely played in casinos and card clubs. It is also convenient because it has particularly simple rules and logistics.” [10]

The popularity of Poker has already been discussed in Chapter 1, so the choice of Texas Hold'em should be obvious, as it is seen as the most popular form of the game.

Texas Hold'em starts with all the players receiving a two-card initial hand. These cards are hidden from all the players bar the one to whom they are dealt. With deception or future potential of cards in mind, a betting round is held. The exact semantics of a betting round are explained in the next sub-section. At the end of the betting round, three community cards are dealt face-up. In computer play, this is represented by making the knowledge of the three cards available to both players. These three cards are called the 'Flop'.

The Flop is followed by another round of betting, at the end of which another community card is shown. This card is known as the 'Turn'. This is followed by another round of betting and then another community card called the 'River' is revealed. The final betting round takes place afterwards, and if more than one person remains till the end, the seven cards are checked for the best hand of five. The winner takes all the money collected in the betting rounds of that game.

3.3 - Betting Rounds

The aim of the betting round is to collect equal money from all players before proceeding to the next stage, be it the revealing of community cards or a showdown. The options available to a player at any point in the round is fold and bet. Folding results in immediate forfeit of the money in the pot. Bad as it may sound, it is usually better to fold a hand which you are quite certain would not win, rather than lose money on it.

(22)

there has been no money put down at the start of a betting round. So, at the start of any betting round, the person with the first turn has the option to check, and after that the second person has the option to check, bet or fold.

The final option available to a player is that of a raise. Raising puts 20 units of money into the pot and is almost the opposite of checking, as it is allowed only when there has already been a bet on the table. Also, raising is restricted to a maximum of three times per player.

Betting rounds can follow many patterns from the choices available above. For a two player game, an example is both players checking, in which case neither player puts any money into the pot. Another is that of both players betting, whereby both put in 10 units each. Certain complex betting rounds can also occur, such as check, bet, raise, raise, bet. Here, the first player checked, only to have the second player bet. The first player now has the option of betting, folding or raising, and he chooses the latter most. The second player re-raises and finally the first player bets to bring the contribution of both players at the end of the betting round to 30 units each.

A betting round terminates as soon as a person folds, in which case the game ends, or when a person matches the opponent's contribution to the pot by checking or betting. Raising is usually used to 'raise-the-stakes'.

3.4 - Winning combinations

The strength of a winning five-card hand is determined by the table shown in Figure 6. The top-most 'Royal Flush' is considered the best hand in the game, whereas a 'High Card' is the worst. When determining a winner at the showdown, the player with the pattern which is highest on the table in Figure 6 wins.

(23)

Figure 6. Winning combinations in Texas Hold'em Poker

Certain circumstances require further evaluation to find the winner of a hand. In such a case, the entire best-five hand is seen. For example, if by some strange luck, the community cards are the four Jacks and an 8, then the best five-card hand for each player is going to be the four jacks, as royal and straight flush and not possible, followed by the next highest card. It is this card that determines the winner of the game. If Player 1 has King-Four, and Player 2 has Queen-Ten, then the winner is Player 1, as his 'Four Jacks with King kicker' beat Player 2's 'Four Jacks with Queen kicker'. As is the case with Four of a Kind, sometimes, the second or third kickers are seen to determine the winner, but never beyond the best-five hand of a player.

Draws occur in poker usually when both player 'play-the-board', i.e. the best-five hand for both players is actually the community cards. In this case, as the ranking of all the five cards is the same for both players, the pot is split halfway between the two players.

(24)

frequently bets or raises rather than checking or calling (more than the norm), while passive means the opposite. Another simple set of categories is loose, moderate and tight. A tight player will play fewer hands than the norm, and tend to fold in marginal situations, while a loose player is the opposite. Players may be classified as differently for pre-flop and post-flop play” [3]

Pre-flop refers to the betting round held before the flop is revealed, whereas post-flop refers to the rest of the game after the display of the flop. A differentiation is usually made between these stages as the amount of information change is very great. Initially, each player has knowledge of 2 of the 4 cards that have been dealt, and after the flop, each player has knowledge of 5 of the 7 cards in play.

The different strategies mentioned above also result in a specific form of reaction from the opponent. Games against aggressive or loose players would be worth a lot more, and at the same time, raising by an aggressive player may not always mean that the player has a good hand. On the contrary, raising by a tight or conservative player should be considered with greater concern.

Apart from these basic player types, there are also known to exist many complex strategies which make the game of Poker both interesting and exceedingly hard.

3.6 - Advanced strategies and Poker complexity

“Poker is a complex game, with many different aspects, from mathematics and hidden information to human psychology and motivation. To master the game, a player must handle all of them at least adequately, and excel in most.” [10]

This is the task being handed to a computer player to successfully excel in the game of Poker. Quite clearly, this is a huge task, and it probably would not be accomplished in the next few years. Mostly this is due to the inability of computing players to have a gut instinct or the ability to rapidly change one's strategy to combat another's. This topic is discussed further in Future Work in Chapter 6.

Computing players have to deal with a lot of advanced strategies like check-raising, whereby a false impression is imparted on the opponent by checking on a good hand to raise when the turn comes back to the player. This would usually cause the opponent to at least respond with a bet,

(25)

and thereby allow the current Player to extract more money from the opponent. In addition to this, a check-raise may be made only in order to scare an opponent, as it is a well-known aggressive strategy.

The best player to date, i.e. PsOpti incorporates no opponent modeling.

“It is important to understand that a game-theoretic optimal player is, in

principle, not designed to win. Its purpose is to not lose. An implicit assumption is that the opponent is also playing optimally, and nothing can be gained by

observing the opponent for patterns or weaknesses.” [2]

It is in the face of these challenges that this project hopes to show a brave front and come up with some important conclusions that may assist in furthering the field of AI poker players.

3.7 - Abstraction Techniques of 2-Person Bet-Limit Poker

The form of poker being considered for design and evaluation is that of two-player, bet-limit Texas Hold'em Poker. These abstractions have been made as they allow a player to be built under the required time constraints, and yet have conclusions which are applicable to full-scale poker players.

Bet-limit poker is a form of poker whereby each bet is of a set limit, i.e. 10 money units in this case. Texas Hold'em tournaments are usually held with no-limit poker, whereby a player can dedicate his/her entire money on the first bet of the first game itself. Bet-limit poker provides both a regulated and a beginner level of understanding, which can be matched by the scope of this project.

(26)

Chapter 4 – Design and Methodology

4.1 - Choice of Prolog

The first decision taken for this thesis was regarding the choice of programming base. The duration of time available or the project and the heuristics constraints required a language that expressed a knowledge base and rule-based reasoning in a clear and concise manner, and yet had to ability to produce a quick design-to-result cycle. For this reason, Logic Programming was chosen to be the fundamental base upon which the program would be created.

This decision does result in a disadvantage to the computing power efficiency of the final result, however, Prolog's ability to create programs and obtain prototypes and results much faster, makes it an ideal choice for this thesis. Rewriting the program to allow interoperability through the usage of platform-independent languages such as Java has been discussed in Chapter 6, Future Work and Conclusion.

4.2 – General

4.2.1 - Incorporation of Rules

The primary design of the thesis program required a structure, this was provided through insight into the Gala system. An in-depth knowledge of Poker rules was also discussed with the people mentioned in the Acknowledgment section to finalise certain discrepancies in the system. Like all popular games, Texas Hold'em has a variety of different rules, which are used by different organisations in their tournaments. For example, the first step in any game is the small and big blinds, whereby the first and second players put the money equivalent to half the betting amount and the betting amount respectively without looking at their cards. This allows each game to be worth at least some amount, and thereby more interesting.

This is a rule seen for major multi-player tournaments where the chance to be the first or the last in a betting round circulates and thus allows every player to be at an advantage or disadvantage at some point in the game. This does not hold true for two-player poker, as the significant pro-con

(27)

scenario of a tournament is lost in two-player poker. For example, every player that is required to put in a big blind that player also gets to be the final player to bet in that betting round. Thus the pros and cons seem to balance each other out, a system which would be redundant in a smaller table of two-player poker. Also, the program designed for this project has very limited opponent modeling, which makes the changing of sequence of play unnecessary as a player's strategy doesn't effect the betting decision of the AI Player. For this reason, the blind system has been replaced with an ante. An ante is a fixed amount of money contributed by each player at the start of each game, due to the same reason as blinds. It is worth a bet amount, i.e. 10 units of money in this case.

Figure 7 shows a clear view of the playing sequence of each game, in the form of cards and Betting Rounds. The Start represents the submission of ante, this is followed by the individual hands being allocated. The next step on the board is the revealing of the Flop, however, there is a betting round, known as Pre-Betting Round that is played before the Flop is revealed. Similarly, the Post-Flop and Post-Turn Betting Rounds sit between the Flop and Turn, and the Turn and the River respectively. The tournament continues onto the Post-River Betting Round, which is also called Final Betting Round sometimes. The final step, i.e. the Showdown represents the

(28)

Figure 7. Sequence of Texas Hold'em Play

Poker tournaments also have a variety of rules to decide the winning pattern at a showdown, or the final comparison of players' cards. Certain organisations do not recognise the role of kickers in the winning hand, and are restricted only to the basic pattern. For example, in the example used

(29)

in Section 3.4 regarding the comparison of “Four Jacks with a King Kicker” and “Four Jacks with a Queen Kicker”, the above mentioned organisations would regard the result to be a draw, as the basic pattern is Four-of-a-Kind and both players have it in the form of Jacks. The kicker plays no role in the decision.

The program created for this thesis follows the more generally accepted rule of recognising the role of a kicker. This is due to both the rules' popularity and the extensive strategy management required on the addition of this rule. The cards available to both players, i.e. community cards, can sometimes lead to a very good hand by themselves. But even in such a case, this rule creates the need for additional strategy, as kickers can be used to determine the outcome.

4.2.2 - Basic Two-Human-Player Poker

The next step after the creation of a framework like the one described in the above section, is the formation of players. The most basic form of play was implemented first, i.e. Human vs. Human. Clearly, this required no knowledge base or strategy on behalf of the computer in terms of game-play. The primary purpose of a Human vs. Human player was to brute-force check the stability and rules of the Poker framework.

A user-friendly Prolog interface was created for the users to allow multiple games to be played. Section 3.3 discusses the various betting choices available to each player, most of them only under certain circumstances. Choices of checking and raising were encoded along with their necessary rules and restrictions. A recursive error checker was also added to return the game state to its previous form in the case of an invalid entry from a human player.

Finally, at the end of each game, if no player has folded, the program displays the cards of both players and specifies the winning pattern and player (showdown scenario). In the case of a player folding, there is no showdown and the cards are scratched, i.e. none of the players get to see each other's cards. This follows the working of real poker, whereby you can only see your opponent's cards if you fight them all the way to the end, and they do the same.

(30)

4.2.3 - General Strategic Behavior

Basic player strategies have been discussed in Section 3.5, and the keywords such as aggressive,

conservative, tight and loose will be used very frequently in this thesis. These keywords are used to describe the playing strategies of a poker player, and the basic aim of such a player is to randomise these strategies as much as possible. Sticking to just one of the strategies can allow the opponent to recognise it and respond accordingly. The first step towards finding a near-optimal poker strategy is to create poker players which utilise these basic strategies strictly or randomly, and then to compare their performances. The comparison is given in Chapter 5, Testing and Evaluation, and the detail of the players created is provided below.

Most of the previous research in the field also points towards this form of project cycle, whereby known bad or random players are created, and newer versions of the actual AI players are played against these players to obtain performance data. [2]

The single most important decision in Poker can be stated as “knowing when to play your hand”. The keywords associated with this feature are tight and loose, where tightness signifies fewer hand plays and loose signifies a more liberal approach. Based on these keywords, two players were created.

The first type of player randomly chose between acting loose and tight when it was it's turn to act. This player was coined 'Random - 1'. In addition to having a randomised betting strategy, it also had a randomised strategy to determine its aggressiveness. So, in theory, this Random - 1 was a completely random player, which could fold, bet, check or raise at any time; the final two options being restricted under the relevant circumstances.

The second type of player chose a randomised aggression but has a strictly loose policy. It played every hand that it was dealt, randomly choosing how much money it wanted to bet on it. This player is called 'Random - 2'. As the policy of the player forces it to be completely loose, it never folds a hand, and bets and raises quite frequently. Thus, it is also seen as a more aggressive player by humans.

(31)

A third type of player with a completely tight strategy was not required, as such a player would always fold, or check, and would thus lose against any other player which chose to bet even once during the whole game.

4.3 - Anki – V1

Anki – V1 is best introduced through its architectural diagram. Figure 8 shows the overview of Anki – V1 along with all its functions and working. These are further explained in the later subsections of this chapter.

Figure 8. Anki - V1's Architectural Overview

4.3.1 - Strategy based player

After the creation of the generic Random - 1 and Random - 2, the need for a more intelligent player arose. The secret to a human-level intelligence player may lie in the strategies used by human beings to decide their betting strategies.

(32)

The evaluation before each betting round is done by looking at the 'type of hand' that the player has. 'Type of Hand' here refers to both the current hand strength and the future potential to have a strong hand. This is done by grouping together similar 'types of hands'. There are check, bet and raise 'buckets' or groups, and by matching the current hand to the grouping, the final betting strategy is decided.

4.3.2 - Overview of functioning and method

“The most important method of abstraction for the computation of our pseudo-optimal strategies is called bucketing. This is an extension of the natural and intuitive concept that has been applied many times in previous research

[22][23][14]. The set of all possible hands is partitioned into equivalence classes (called buckets or bins). A many-to-one mapping function determines which hands will be grouped together. Ideally, the hands should be grouped according to strategic similarity, meaning that they can all be played in a similar manner without much loss in EV (Expected Value).” [2]

Before the working of the code and player is understood, there is a need to express the method by which the above mentioned buckets have been created. The most difficult time to correctly judge a betting strategy is that of the pre-flop betting round. This is the time when the player has the least percentage of information available to him/her, only 50%, as only two of four cards are visible, compared to the end, where 7 of the 9 cards are known to the player.

However, playing strategies for initial hand can be determined as well. This can be done through simulation and expert knowledge. [3] provides an extensive table for comparison of performance of opening hands, however, it was designed specifically for Loki, and thus uses certain

unspecified changes and heuristics. A similar table specific to this project needed to be created, with estimated performance values of opening hands. These performance values were used to determine the bucket or group that the hand would be assigned in. More details regarding this simulation is given in the next subsection, 4.3.3.

After having dealt with the pre-flop, the post-flop also needs some extensive strategies. Groups in this case were decided on the basis of an optimistic hand potential. Optimistic here can be defined has a fairly loose strategy that looks for patterns and hopes to complete them in the future. All the

(33)

possible winning patters such as sequences, flush or pairs are seen, along with their completeness to finally determine the group of a particular state of a hand.

It is important to note that unlike the pre-flop hand, the post-flop works on patterns rather than on actual cards. For example, simulation determines the playing value of each possible initial hand, i.e. all 2,652 of them, yet, the post-flop hands are taken on the basis of pattern, instead of individual values. A sequence of 3,4,5,6,7 would be treated in this system exactly the same as a sequence of 4,5,6,7,8 as they both follow the same pattern. The sequences pattern, is divided into low and high number sequences, and is not judged on the exact value of the cards. This is because of the information explosion of the game state. For example, leading up to the final betting round, a player could have 6.74 * 1011 combinations of cards available. This number is clearly too large to

allow individual analysis of each hand.

Aggressive strategies are commonplace among expert human players. This is one of the well known strategies of master play, i.e. to intelligently utilise aggressive play to increase the doubt in an opponent's mind about a player's luck or bluff. Always staying aggressive is obviously not considered wise, as it discloses the player's strategy, but general aggressive behaviour is accepted at the master level. This reasoning has lead to a tight aggressive showdown player.

The above categorisation requires more explanation. The tightness refers to its decision to only play to the end if it feels it has at least some form of a pattern available to it, i.e. it does not play to the end with a 'High Card'. The aggressive nature is apparent from Section 4.3.4 where the table shows that anything above a pair is automatically put into the raising group, and is thus given the maximum aggression.

Another important point to be mentioned is that the evaluation of the betting strategy for Anki – V1 is done before each betting round, and the decision is maintained throughout that betting round, irrespective of the strategy of the opponent.

(34)

4.3.3 - Probability realisation of all possible starting hands

As it has been discussed before, the pre-flop strategy evaluator looks at all the possible hands to determine their performance. Performance can be measured in many ways; Darse Billings uses sb/hds, i.e. small bets won per hand. In a tournament game, the size of the smallest bet allowed keeps increasing, and thus the winning is described by small bets won per hand, which is can be constant over the tournament. If money per hand was considered, latter games would be given undue additional weights due to their larger small bets. For the purposes of this project, a more generic definition has been implemented. Performance is described as the probability of winning a game with a particular hand.

An exhaustive method to determine the probability of winning is not possible, simply due to the large game state that results O(1011). As a result, a combination of simulation and previously

defined optimistic hand potential is referenced to determine the various betting strategy groups.

Figure 9 shows a tabular form of the result of playing any particular starting hand. This table was created by the author of this thesis, and the exact manner of its creation is explained below the figure. There is a reduction in the number of observed possible hands when the need for specific suits is abstracted. A hand can either be suited, i.e. both cards are of the same suit, or unsuited, i.e. different suits. Either way, the exact suit is unimportant, a suited 'spade' is considered at par with a suited 'diamond', and the same is true for all other combinations.

Figure 9. Probability of victory of initial hands

A K Q J T 9 8 7 6 5 4 3 2 A 88.68 69.95 68.92 67 66.62 65.35 63.48 64.26 61.62 60.43 59.8 59.6 58.61 K 66.15 81.88 63.25 62.57 62.49 61.17 57.81 57.09 55.86 56.02 54.54 53.43 53.1 Key : Unsuited Q 66.5 61.59 80.92 60.36 58.46 57.54 55.16 53.88 53.52 53.73 52.26 50.27 50.3 Suited J 65.87 61.19 58.66 77.52 57.23 55.11 53.76 53.41 50.69 50.6 49.33 48.45 48.67 Pair T 64.01 59.68 56.48 54.48 75 53.85 51.75 50.77 49.41 47.25 46.83 45.35 46.32 A – Ace 9 62.3 57.34 53.97 53.57 51.23 71.88 51.16 50 46.75 45.72 43.92 43.67 42.66 K – King 8 61.68 54.78 51.99 51.49 49.96 47.96 68.04 48.39 46.35 44.42 42.95 41.52 41.78 Q- Queen 7 61.15 54.31 52.83 49.52 48.08 46.65 44.32 65.76 45.18 42.73 40.95 39.87 39.52 J – Jack 6 60.6 54.18 50.25 48.77 45.09 45.7 44.23 42.77 63.33 43.08 41.46 39.7 37.9 T – 10 5 58.95 53.82 50.91 47.29 45.34 41.91 41.36 39.88 38.07 59.8 41.66 38.73 36.7 4 57.67 52.13 48.35 46.99 43.48 39.83 39.48 38.37 37.61 37.68 56.5 38.52 35.94 3 54.88 50.41 48.15 45.09 42.68 41.05 37.74 37.08 36.32 37.22 36.28 52.95 35.96 2 54.72 50.26 46.41 44.79 42.22 39.78 36.22 35.52 33.6 35.66 32.35 31.62 49.83

(35)

Each of the numbers given above is obtained in same manner:

1. The specified hand of the player is used along with a random hand for the opponent. 2. A random flop, turn and river are generated.

3. The final hands of both the player and opponent are compared to decide the victor. 4. Steps 1 – 3 are repeated for 5000 games.

5. Finally, the percentage of times that the player won is calculated, by dividing the victories by total number of games, i.e. 5000 in this case.

The approach followed above is similar to the one used to create the table in [3], however, the use of heuristics and strategies has been removed. This is because the calculation is used to find the Winning Potential, and not the sb/hds value that Loki uses. This eases both the creation and use of the data from the table.

The buckets or groups of the pre-flop strategies are decided on the basis of the numbers created in Figure 9 and optimistic potential. The latter term forces any suited or sequenced player to be played irrespective of the number that was actually received from the table above. For example, even though a 2-3 unsuited only has a 31.62% chance of winning at the end, it is still played as a bet, as it may result in a high winner in the form of a sequence.

The exact grouping of the pre-flop strategies is provided in Figure 10. The hands have been explained in plain English, followed by Figure 11, which is a modified form of Figure 9, and shows the grouping explicitly over the all the initial hands in the game.

Figure 10. Grouping of Pre-Flop Strategies

If possible Raise, otherwise Bet Bet If possible Check, otherwise Fold

Everything Else High Pair (i.e. 9-9 or higher)

(36)

Figure 11. A modified version of Figure 8 to show grouping of initial hands

It may seem from the Figure 11 that the majority of the hands are bet, with a comparatively smaller set being checked or folded. However, this is not the case. Each example of a suited cell in the table above can exist in four different forms, i.e. through the four suits of the game. On the other hand, each non-suited cell, exists 12 times in the actual game, through all non-suited combinations of the four suits.

4.3.4 - Grouping form of strategy

Figure 12 provides the rest of the betting strategies used to group hands in later stages of the game. Post-Flop refers to the betting hand right after the Flop, and similarly with Post-Turn and Post-River. Post-River is also the final betting round of the game.

Figure 12. Buckets of betting strategy at various stages

A K Q J 10 9 8 7 6 5 4 3 2 A 88.68 69.95 68.92 67 66.62 65.35 63.48 64.26 61.62 60.43 59.8 59.6 58.61 K 66.15 81.88 63.25 62.57 62.49 61.17 57.81 57.09 55.86 56.02 54.54 53.43 53.1 Key : Raise Q 66.5 61.59 80.92 60.36 58.46 57.54 55.16 53.88 53.52 53.73 52.26 50.27 50.3 Bet J 65.87 61.19 58.66 77.52 57.23 55.11 53.76 53.41 50.69 50.6 49.33 48.45 48.67 Check T 64.01 59.68 56.48 54.48 75 53.85 51.75 50.77 49.41 47.25 46.83 45.35 46.32 A – Ace 9 62.3 57.34 53.97 53.57 51.23 71.88 51.16 50 46.75 45.72 43.92 43.67 42.66 K – King 8 61.68 54.78 51.99 51.49 49.96 47.96 68.04 48.39 46.35 44.42 42.95 41.52 41.78 Q- Queen 7 61.15 54.31 52.83 49.52 48.08 46.65 44.32 65.76 45.18 42.73 40.95 39.87 39.52 J – Jack 6 60.6 54.18 50.25 48.77 45.09 45.7 44.23 42.77 63.33 43.08 41.46 39.7 37.9 T – 10 5 58.95 53.82 50.91 47.29 45.34 41.91 41.36 39.88 38.07 59.8 41.66 38.73 36.7 4 57.67 52.13 48.35 46.99 43.48 39.83 39.48 38.37 37.61 37.68 56.5 38.52 35.94 3 54.88 50.41 48.15 45.09 42.68 41.05 37.74 37.08 36.32 37.22 36.28 52.95 35.96 2 54.72 50.26 46.41 44.79 42.22 39.78 36.22 35.52 33.6 35.66 32.35 31.62 49.83

If possible Raise, else Bet Bet If possible Check, else Fold

Post – Flop Everything Else

Post-Turn Everything Else

Post – River Any winning pattern better than a low pair Low pair and High Cards Everything Else Flush, or one card away from it

Seq or 1 away from a High Seq (more than 9) Three or Four of a kind, or Full House Two pair, with at least one of the no. in hand

High Pair (i.e. 10-10 or higher)

2 cards away from Flush or High Seq One away from a Seq

All other pairs Pairs in board with high cards in

Both cards higher than a 10 Three or Four of a kind, or Full House

Flush, Seq, or High Pair Two pair with both numbers in hand

One away from Flush or Seq Pairs in Board, or just one in hand

(37)

As can be seen from the figure, the player continues with its optimistic policy, whereby any chance of a flush or sequence is not abandoned. The headings of the columns of Figures 10 and 12 also show the exact strategies being utilised by the player. For example, it is not always possible to raise in a situation, thus in that case, the player decides to bet, and similarly when the player is unable to check, it folds.

4.3.5 - Similarities and differences from human beings

There are many similarities between the workings of Anki – V1 and that of a human being. One of the major ones is that of Bucketing. Human players tend to form pre-defined groups in their minds that allow them to act in face of familiar situations. For example, a player with K-Q suited would probably behave similar to when given the cards J-Q suited.

There are only a maximum of three betting choices available to a player at any given time, and a human player is required to play to the merit of the cards. Pre-Flop strategies are usually quite strict with intermediate or beginner level players. Lower-level human players decide beforehand about which kind of cards they would play with, and which ones they would usually fold. This method is similar to one being utilised in Anki – V1.

Post-Flop bucketing resembles human play even more in Anki – V1, whereby the player works by matching the best patterns that they can find. Human players also tend to look for flushes, sequences, etc., before looking at the exact cards or suits available.

Apart from the expressed similarities, there are also a number of human features with which Anki – V1 differs. Human players do not have the capability to create a probability table similar to the one shown in Figure 9, instead, they rely on instinct and experience. Anki – V1 can benefit from its higher computation power to allow such a table to tone it's betting strategies. A definite advantage that humans hold over Anki – V1 is that of opponent modeling and the randomisation of their strategies to some extent. This plays a great part in the final result of its Anki – V1's play against humans, more of which is expressed in Chapter 5, Testing and Evaluation.

(38)

handled very well in Anki – V1, the introduction of randomised betting strategy is required. This leads to the creation of Anki – V2.

4.4 - Anki – V2

Figure 13. Anki - V2's Architectural Overview

Figure 13 shows the emphasis on the creation of a randomised betting strategy, along with the ability to tone the randomisation through methods such as adjusting the Tightness Threshold, etc. More on the player is discussed below.

4.4.1 - A Randomised Rational Strategy Player

Following the expert-rule based approach of Anki – V1, a player was required that

followed a more Simulation and Enumeration form of strategy build-up. It is obvious that

an AI player has higher computation power than that of a normal human player, thus the

computer needs to given the opportunity to utilise this power to better itself at Poker.

(39)

Simulation and Enumeration allows real-time strategy build up, and limited reaction to

opponent's responses in a game. This new player is called Anki – V2. It primarily uses the

concept of Probability Triples.

“ A probability triple is an ordered triple of values, PT = [f ,c ,r ], such that f + c + r = 1.0, representing the probability distribution that the next betting action in a given context is a fold (check), call (bet), or raise, respectively.” [13]

The probability triples allow the program to create a controlled randomised strategy over a betting round, as compared to Anki – V1, which worked with a strict strategy over the whole round. Each of the numbers in the Triple represent the individual probability of a certain action. A significant difference between the quotation given above and the implementation in this project lies in the range of the numbers used. Poki and Loki used a range between 0.0 – 1.0 to express the Triples, whereas this thesis utilises all the real numbers between 0.0 and 100.0. It makes no major change to the expressive power of the program, but allows a percentage output for each of the Triple values.

4.4.2 - Statistical Method vs. Random Generator

The exact working of the Anki – V2 is explained in sequence below, this provides an overview of the entire Figure 13 seen previously :

1. Like Anki – V1 for the initial hand, Anki – V2 also plays simulated games with randomised values for the flop, turn, river and opponent hands. In the end, it receives a winning percentage. The games played are called 'pseudo games' and the winning percentage is called WP or Winning potential.

2. Using this WP, and some pre-defined formulas, the player creates a Probability Triple for the next betting round.

3. During the course of the betting round, at every decision point, a random real number between 0 and 100 is generated which is compared against the probabilities of the Probability Triples, to decide on the betting action.

(40)

5. At the end of the betting round, when the next set of community cards are shown, the pseudo games are played again, by including information that is now available. For example, if the flop and turn have been revealed, Steps 1 through 4 are repeated, but Step 1 only generates random rivers and opponent hands, and uses the flop and turn

information to get a more exact WP.

The exact formulas and values used to calculate all the data mentioned above is given in Section 4.4.3. However, it is important to understand the need for this simulation method against a more mathematical statistical model.

One of the major drawbacks of simulation is that it is essentially a form of approximation. Experts have documented and proven that luck or a strange coincidence of events can effect a hand's performance to make it seem better or worse than its actual value [10]. This phenomenon can show its effects for a couple of thousand hands at a time. For this reason, even though extensive simulation can be considered to be a very good approximation, it is always exactly that, and can unknowingly contain high levels of noise. The other option that can be considered is that of statistics.

A statistical method to find the Winning Potential of a given hand would consist of finding all the scenarios under which the current hand is stronger than an opponents'. This is clearly a more exact method of hand evaluation. Also, there is definite possibilities of this method in the final betting round, when the final state or pattern of a player is known. The amount of unknown information is quite scarce, i.e. only the opponent's two-card hand. Thus an exact statistical model of hands better than the players' can be generated.

This method however has a near exponential blow-out when the first betting round is considered. The number of possibilities of future cards have been discussed earlier, i.e. 6.74 * 1011 possible

hands. To group these hands in terms of those which are better, equivalent or worse is clearly a mammoth task, the kind for which there is neither computational power nor time. This is further proven by the fact that no computer Poker player created till date has ever tried to obtain the exact statistical evaluation of a game state.

(41)

Another major reason for the choice of simulation is that fact that it mimics human play. It offers both optimistic or pessimistic viewpoints at times, quite like another human player. This allows a more random strategy than what a strict statistical model would provide.

4.4.3 - Formulas and Evaluation

There are a variety of formulas and numbers that have been utilised in the formation of Anki – V2. These features need further explanation, both for their function in the program and their justification. The exact formation of the Probability Triple is also explained, and so is the working of the betting action evaluator.

4.4.3.1 - Pseudo games and Winning Potential

Each pre-betting round evaluation begins with the simulation of 1000 pseudo games, at the end of which the number of games that the player won, drew or lost are reported back. These numbers are then used to determine a Winning Potential. The WP is then adjusted using enumeration, which is explained in Section 4.4.3.3. And finally, the WP is converted into a Probability Triple. Figure 14 shows a part of the program code which relates to this exact sequence of work, along with a representation of the formula used to calculate WP from the data of pseudo games.

Figure 14. Sequence of evaluation of Winning Potential

The 'X' written in the final line of the Prolog code is the Probability Triple that is generated. The method 'assign_str' is explained in Section 4.4.3.2.

The first justification is regarding the 1000 pseudo games being played. This is due to the time constraints specified by Darse Billings that a player should not ponder over a decision for more than two seconds. And with the provided computational power, 1000 pseudo games were found to require 1 – 2 seconds of computation time. Higher computation power would allow more pseudo

play_pseudo_game(...),

WP is ((W + (D / 2)) / (W + D + L)) * 100, WP1 is WP + N,

assign_str(WP1, X), ... Winning Potential=

WinDraw

2

WinDrawLose ×100

(42)

Another formula created by the author was the regarding the finding of the Winning Potential. Previous papers explain Positive Potential as Winning probability over Total number of games. Drawn games are not discussed, and definitely need to be addressed, especially in the case of a smaller number of pseudo games such as this. The decision was taken to give half importance to drawn games, as they offer half the return of a normal winning game.

4.4.3.2 - Calculation of Probability Triples

Winning Potential is converted to Probability triples using a method 'assign_str'. This method uses the value of the Winning Potential to return percentage values for the next betting round. There are three sub-sets that have been introduced in the method:

1. When the WP is less than or equal to 50 : The chances of raising are quite low. In addition to this, at 0 WP, the chances of folding are maximum, with a low betting potential. As the WP increases, so do the chances of betting.

2. When the WP is less than or equal to 75 : Chances of raising are higher, with betting getting a major bonus, and checking reducing significantly.

3. When the WP is greater than 75 : Checking has a low probability, with the chance of raising slowly increasing and betting reaching a moderate level.

Figure 15 shows the formulas used to portray the above three points. More explanation regarding the numbers follows the figure.

Figure 15. Formulas used to calculate Probability Triples, given WinPot

The formulas arise by firstly looking at the most basic premises, i.e. the Probability Triples required at the crucial points of 0, 50 and 100% probability. It is obvious that at no point should the value of any of these be 0, as that would lead to a strict strategy, and would thus be

recognisable by humans. It was decided that the value of betting or raising should not fall below

Check Bet Raise

<= 50 10 <= 75 20 > 75 15 90 – WinPot WinPot – 10 105 – WinPot WinPot – 20 WinPot 2 10 80−WinPot 2

(43)

10%, as this allows substantial bluffing power to the player even in the case of bad hands. As a result, the probability triple for 0% WP becomes [80,10,10] in the form [Check, Bet, Raise].

Human players can be found to change their strategy rather dramatically with knowledge of the current hand having more than 50% or 75% Winning Potential. This strategy change has been dulled to a large extent in the program, to allow a more gradual change relating to the exact Winning Potential.

The hand with an exact 50% WP can be seen to have the Triple [45, 45, 10], with a jump to [40, 40, 20] for a hand with a number just above 50 but tending to it. From this point on, the checking power falls at a greater speed, while the betting power builds up.

The final change occurs at 75%WP, at which point the need for a greater raise probability is introduced. Exact 75% WP has a Triple [15, 65, 20] and a WP just over 70, but tending to it, has a Triple [15, 30, 55]. This may seem like a dramatic jump, but in practice it is quite mellow due to the inability to raise in most situations. In these situations, the player simply bets, thereby reducing the apparent jump in probability. This is discussed further in the next subsection.

4.4.3.3 – Betting Strategies, Randomised Numbers and Enumeration

This section deals with the happening inside a betting round. The Anki – V2 Player has a Probability Triple available to it, guiding its future moves, however, these moves still need to be monitored and executed.

“The choice (of betting action) can be made by generating a random number, allowing the program to vary its play, even in identical situations. This is analogous to a mixed strategy in game theory, but the probability triple implicitly contains contextual information resulting in better informed decisions which, on average, can outperform a game theoretic approach.” [13]

Figure 16 provides a Prolog code excerpt from the actual program that details the working of Anki – V2's betting turn. The random number generated is a real number between 0 and 100. That

References

Related documents

commissioner shall ensure that a Response to Intervention framework implemented by a school district includes, at a minimum, the following elements: high quality research-

The Researcher concluded that due to real change in their attitudes, the Branch Manager and key staff members at Ilkeston branch were most likely to continually adopt

O debate sobre formas institucionais de cooperação entre níveis federativas o eleva a nível constitucional e de mudanças de determinações constitucionais onde estão se

Together with our BDO Member Firms in the Greater China Region, we provide a comprehensive range of services to assist you through your planning and implementation stages to

Traàn Quang Tieán [email protected] Nguyeãn Maïnh Huøng [email protected] Phaïm Thò Thu Thuyû [email protected] Voõ Vaên Minh [email protected] Ñoã Phuù

Data R is expected to con- tain contributions from point sources below the detection limit of our image, diffuse sources, system noise, limitations in our imaging and source

ARMSTRONG ATLANTIC STATE UNIVERSITY 14 NORTH GEORGIA COLLEGE &amp; STATE UNIVERSITY 13. GEORGIA SOUTHERN

The wall thickness designations &#34;Standard&#34;, &#34;Extra·Strong&#34;, and Double Extra·Strong&#34; have been commercially used designations for many years. As ex·