Optimal decisions MiniMax Algorithm α-β pruning Imperfect, real-time decisions

(1)

Lecture 06

(2)

Outline

• _{Optimal decisions}

• _{MiniMax Algorithm}

• _{α-β pruning}

(3)

Games vs. search problems

(4)

Games Playing (1)

We will be concerned with games that, like chess,

have the following characteristics:

• Two players

• Turn-taking

• Deterministic

The same move in the same situation always has

the same effect.

• Perfect information

(5)

Two-Agent Games

• _{Two-agent, perfect information,}

zero-sum games

• _{Two agents move in turn until either}

one of them

wins

or the result is a

draw.

• _{Each player has a complete model of}

(6)

Minimax Procedure (1)

• _{Two player :}

_MAX

_and

_MIN

• _{Task : find a “best” move for}

_MAX

• _{Assume that}

_MAX

_{moves first, and that the two}

players move alternately.

• _MAX

_node

–

_{nodes at even-numbered depths correspond to positions}

in which it is MAX’s move next

• _MIN

_node

(7)

Game tree (2-player, deterministic, turns)

(8)

Minimax

• _{Perfect play for deterministic games}

• _{Idea: choose move to position with highest}

_{minimax value}

= best achievable payoff against best play

(9)

An Example

Compute the backed-up values calculated by the minimax

algorithm. Show your answer by writing values at the

appropriate nodes in the above tree.

(10)

(11)

Properties of minimax

• _Complete?

_{Yes (if tree is finite)}

• _Optimal?

_{Yes (against an optimal opponent)}

• _{Time complexity?}

_O(b

m

_{) (b-legal moves; m- max tree depth)}

• _{Space complexity?}

_{O(bm) (depth-first exploration)}



For chess, b ≈ 35, m ≈100 for "reasonable" games

(12)

Example : Tic-Tac-Toe (1)

• _MAX

_marks

_crosses

_MIN

_marks

_circles

• _{it is}

_MAX

_{’s turn to play first.}

–

_{With a depth bound of 2, conduct a breadth-first}

search

–

_{evaluation function}

_e

₍

_p

_{) of a position}

_p

• _{If p is not a winning for either player,}

e(p) = (no. of complete rows, columns, or diagonals that

are still open for MAX) - (no. of complete rows,

columns, or diagonals that are still open for MIN)

• _{If p is a win of MAX, e(p) =}

_

(13)

–

_{evaluation function}

_e

₍

_p

_{) of a position}

_p

• _{If p is not a winning for either player,}

e(p) = (no. of complete rows, columns, or diagonals that

are still open for MAX) - (no. of complete rows, columns,

or diagonals that are still open for MIN)

(14)

Question?

• _{How to improve}

search efficiency?

• _{It is possible to}

(15)

Alpha-Beta Pruning

• It is possible to compute the correct minimax decision without looking at every node in the game tree.

• Alpha-beta pruning allows to eliminate large parts of the tree from consideration, without influencing the final decision.

• Alpha-beta pruning gets its name from two parameters.

– They describe bounds on the values that appear anywhere along the path under consideration:

• α = the value of the best (i.e., highest value) choice found so far along the path for MAX

(16)



_{The leaves below}

_B

_{have the values 3, 12 and 8.}



_{The value of}

_B

_{is exactly 3.}



_{It can be inferred that the value at the root is}

_{at least}

_3,

because MAX has a choice worth 3.

B

(17)



_C

_{, which is a MIN node, has a value of}

_{at most}

_2.



_But

_B

_{is worth 3, so MAX would never choose}

_C

_.



_{Therefore, there is no point in looking at the other successors of}

_C

_.

B C

(18)

• D, which is a MIN node, is worth at most 14.

• _{This is still higher than MAX’s best alternative (i.e., 3), so}_D_{’s other}

successors are explored.

B C D

(19)

• _{The second successor of} _D_{is worth 5, so the}

exploration continues.

B C D

(20)



_{The third successor is worth 2, so now}

_D

_{is worth exactly 2.}



_{MAX’s decision at the root is to move to}

_B

_{, giving a value of 3}

B C D

(21)

Properties of α-β



Pruning

does not

affect final result



_{Good move ordering improves effectiveness of}

pruning



With "perfect ordering," time complexity = O(b

m/2

₎



doubles

depth of search



A simple example of the value of reasoning about

which computations are relevant (a form of

(22)



_{α is the value of the}

best (i.e., highest-value)

choice found so far at

any choice point along

the path for

max



_If

_v

_{is worse than α,}

max

will avoid it



prune that branch



Define β similarly for

min

(23)

(24)

(25)

Resource limits

Suppose we have 100 secs, explore 10

4

_nodes/sec



10

6

_{nodes per move}

Standard approach:



_{cutoff test:}

 _{e.g., depth limit (perhaps add}_{quiescence search}₎



evaluation function

(26)

Game Include an Element of Chance

(27)

(28)

• Checkers : Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. Checkers is now solved.

• Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic.

(29)

• _{Othello: human champions refuse to}

compete against computers, who are too

good.

• _{Go: human champions refuse to compete}

against computers, who are too bad. In

Go, b > 300, so most programs use

pattern knowledge bases to suggest

plausible moves, along with aggressive

pruning.

(30)

• _{All the}_game_{start with defining by the initial state (how the board is set}

up), the legal actions in each state, the result of each action, a terminal test (which says when the game is over), and a utility function that

applies to terminal states.

• _In_{two-player zero-sum games}_{which has perfect information, the}

minimax algorithm can select optimal moves by a depth-first enumeration of the game tree.

• _The_{alpha–beta search algorithm}_{computes the same as minimax, but}

achieves much greater efficiency by eliminating subtrees that are provably irrelevant.

(31)

• _{Usually, it is not feasible to consider the whole game}

tree (even with alpha–beta) need to

cut the search off

at some point and apply a heuristic

evaluation

function

that estimates the utility of a state.

• _{Games of chance can be handled by an extension to}

the

minimax algorithm

that evaluates a chance node

by taking the average utility of all its children,

weighted by the probability of each child.

(32)

8 7 3 9 1 6 2 4 1 1 3 5 3 9 2 6 5 2 1 2 3 9 7 2 8 6 4

Alpha Beta Pruning (Exercise)

MAX

MIN

(33)

8 7 3 9 1 6 2 4 1 1 3 5 3 9 2 6 5 2 1 2 3 9 7 2 8 6 4

Alpha Beta Pruning (Exercise)

1 2



8

3 5

= 8

4

6



8

7 8



9

9 11 13 17 19 21 2426 28 32 34 36 10



2

12



4

14

= 4

15

= 4



4 16

18



1

20



3

22

= 5

30

= 5



5 23



5 31

25



3

27



9



6 29

33



1

35



2

37

= 3



3 38 39

= 5

MAX

MIN

MAX

(34)

1. Differentiate between Game and Search

2. Explain the Adversarial search strategy.

3. Describe the working of minimax algorithm.

4. Write down the properties of minimax algorithm.

5. Describe the working of alpha-beta pruning algorithm.

6. Write down the properties of alpha-beta pruning algorithm.

(35)

(36)

(37)

(38)