The Minimax Algorithm - The Anatomy of Chess Playing Programs

2.3 The Anatomy of Chess Playing Programs

2.3.1 The Minimax Algorithm

The Minimax theorem, first proven by John von Neumann in 1928 [113, 114], is a fundamental result in game theory. It states that in two-person, zero-sum games with a finite action space, there exists a value v such that:

1. the first player can guarantee himself a payoff of v, irrespective of the second player’s strategy, and

2. the second player can guarantee himself a payoff of −v, irrespective of the first player’s strategy.

While both players strive to maximize their own payoff (or equivalently, min- imize their opponent’s), it is traditional to label one player as the maximizing player (henceforth, Max) and the other as the minimizing player (Min). The value v is referred to as the Minimax value of the game, and represents the payoff that would be awarded to the victor of the game (and the penalty assessed of the loser) if both players were to act optimally. This outcome is also referred to as the Nash Equilibrium — a scenario in which neither player stands to gain anything by unilaterally changing their strategy [71].

In games like Chess where players alternate turns, the optimal first action for the player on move from a given position s — indeed, the entire sequence of optimal moves by both players from s that leads to the equilibrium outcome — can be computed in a recursive fashion using the Minimax algorithm [85]. In- tuitively, the algorithm determines the best action for the player on move at the initial position (assumed to be Max) by a process of reasoning that goes as follows: “What move can Max initially make, such that regardless of Min’s response,

Algorithm 2.1: The Minimax algorithm

1: procedureMINIMAX(state, depthT oGo, side) 2: if stateis terminal or depthT oGo== 0 then

3: returnEVALUATE(state)

4: end if

5: if side== Max then

6: score ← −∞

7: for eachsuccessor s0

of state do

8: eval ←MINIMAX(s0, depthT oGo − 1, Min) 9: score ← max(score, eval)

10: end for

11: else

12: score ← ∞

13: for eachsuccessor s0

of state do

14: eval ←MINIMAX(s0, depthT oGo − 1, Max)

15: score ← min(score, eval)

16: end for

17: end if

18: return score

19: end procedure

Max can make a counter-move, such that regardless of Min’s response to that counter- move, . . ., such that in the end, Max’s payoff is maximized?” This is accomplished by expanding the entire game tree in a depth-first fashion, starting at position s. The outcome of each possible line of play, which is given by the value of the terminal node reached (for example, −1 representing wins for Min,+1 for wins for Max and 0 for draws), is then backed up all the way to the root node, while assuming that players pick the move that is most beneficial to them every step of the way. The move leading to the position with the best backed-up value (i.e., the Minimax value) is then the optimal move at position s. Unfortunately, the number of positions examined by this procedure grows exponentially; in a

game with a constant branching factor of b that lasts d plys, this na¨ıve algorithm will expand bd

nodes, which is impractical for all but the most trivial games. In games like Chess, Minimax searches are thus truncated at some maximum depth and a heuristic evaluation function is applied at the leaf nodes of this search tree. These leaf value estimates are treated as if they were “true” Minimax val- ues and are backed-up as before, to guide decision-making at the root node. The pseudocode for this depth-bounded search procedure is shown in Algo- rithm 2.1. The procedure is invoked by supplying the starting state, the side on move (Max or Min) and a maximum lookahead depth (depthT oGo). Figure 2.1 demonstrates the outcome of this procedure when applied to a mid-game position in Tic-Tac-Toe.

Figure 2.1: Tree labeling produced by executing Algorithm 2.1 on the given Tic-Tac-Toe position. The order of edge expansion can be inferred from the alphabetic ordering of edge labels.

It is unsurprising then that the performance of practical game-playing programs is heavily dependent on how well the heuristic function approximates the true utility of a state. Given that modern Chess engines typically examine many millions of positions to make a single move, this heuristic function must be fast to compute. A typical approach to heuristic construction is to first cal- culate features of the state description — in Chess, this may be concepts like the material imbalance, whether castling has occurred, whether pawns are blocked and so on. The individual contributions of these features are then combined in a weighted fashion to produce an overall score for a given position. Un- fortunately, this design process is more of an art than an exact science, requiring heavy input from domain experts. DEEP BLUE, for example, used about 8000 features in its evaluation function [24], with the weights requiring ex- tensive hand-tuning. Automatic methods for tuning weights have produced some promising results — variants of TD-learning, employed in the programs KNIGHTCAP[11] and MEEP [112], and genetic algorithms used in BLONDIE25

[37], have produced programs that play Chess at the master level. However, these are still no match for the hand-crafted heuristics used in the top-of-the- line programs. Modern programs also supplement the stock heuristic function with opening books and endgame databases. These help with planning in the early and late stages of the game, where static evaluation of positions is harder, but a large body of knowledge is readily available thanks to centuries of intensive human study [64].

In document Understanding Sampling-Based Adversarial Search Methods (Page 30-34)