• No results found

Optimal decisions MiniMax Algorithm α-β pruning Imperfect, real-time decisions

N/A
N/A
Protected

Academic year: 2020

Share "Optimal decisions MiniMax Algorithm α-β pruning Imperfect, real-time decisions"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 06

(2)

Outline

Optimal decisions

MiniMax Algorithm

α-β pruning

(3)

Games vs. search problems

(4)

Games Playing (1)

We will be concerned with games that, like chess,

have the following characteristics:

• Two players

• Turn-taking

• Deterministic

The same move in the same situation always has

the same effect.

• Perfect information

(5)

Two-Agent Games

Two-agent, perfect information,

zero-sum games

Two agents move in turn until either

one of them

wins

or the result is a

draw.

Each player has a complete model of

(6)

Minimax Procedure (1)

Two player :

MAX

and

MIN

Task : find a “best” move for

MAX

Assume that

MAX

moves first, and that the two

players move alternately.

MAX

node

nodes at even-numbered depths correspond to positions

in which it is MAX’s move next

MIN

node

(7)

Game tree (2-player, deterministic, turns)

(8)

Minimax

Perfect play for deterministic games

Idea: choose move to position with highest

minimax value

= best achievable payoff against best play

(9)

An Example

Compute the backed-up values calculated by the minimax

algorithm. Show your answer by writing values at the

appropriate nodes in the above tree.

(10)
(11)

Properties of minimax

Complete?

Yes (if tree is finite)

Optimal?

Yes (against an optimal opponent)

Time complexity?

O(b

m

) (b-legal moves; m- max tree depth)

Space complexity?

O(bm) (depth-first exploration)

For chess, b ≈ 35, m ≈100 for "reasonable" games

(12)

Example : Tic-Tac-Toe (1)

MAX

marks

crosses

MIN

marks

circles

it is

MAX

’s turn to play first.

With a depth bound of 2, conduct a breadth-first

search

evaluation function

e

(

p

) of a position

p

If p is not a winning for either player,

e(p) = (no. of complete rows, columns, or diagonals that

are still open for MAX) - (no. of complete rows,

columns, or diagonals that are still open for MIN)

If p is a win of MAX, e(p) =

(13)

evaluation function

e

(

p

) of a position

p

If p is not a winning for either player,

e(p) = (no. of complete rows, columns, or diagonals that

are still open for MAX) - (no. of complete rows, columns,

or diagonals that are still open for MIN)

(14)

Question?

How to improve

search efficiency?

It is possible to

(15)

Alpha-Beta Pruning

• It is possible to compute the correct minimax decision without looking at every node in the game tree.

• Alpha-beta pruning allows to eliminate large parts of the tree from consideration, without influencing the final decision.

• Alpha-beta pruning gets its name from two parameters.

– They describe bounds on the values that appear anywhere along the path under consideration:

• α = the value of the best (i.e., highest value) choice found so far along the path for MAX

(16)

The leaves below

B

have the values 3, 12 and 8.

The value of

B

is exactly 3.

It can be inferred that the value at the root is

at least

3,

because MAX has a choice worth 3.

B

(17)

C

, which is a MIN node, has a value of

at most

2.

But

B

is worth 3, so MAX would never choose

C

.

Therefore, there is no point in looking at the other successors of

C

.

B C

(18)

D, which is a MIN node, is worth at most 14.

This is still higher than MAX’s best alternative (i.e., 3), so D’s other

successors are explored.

B C D

(19)

The second successor of D is worth 5, so the

exploration continues.

B C D

(20)

The third successor is worth 2, so now

D

is worth exactly 2.

MAX’s decision at the root is to move to

B

, giving a value of 3

B C D

(21)

Properties of α-β

Pruning

does not

affect final result

Good move ordering improves effectiveness of

pruning

With "perfect ordering," time complexity = O(b

m/2

)

doubles

depth of search

A simple example of the value of reasoning about

which computations are relevant (a form of

(22)

α is the value of the

best (i.e., highest-value)

choice found so far at

any choice point along

the path for

max

If

v

is worse than α,

max

will avoid it

prune that branch

Define β similarly for

min

(23)
(24)
(25)

Resource limits

Suppose we have 100 secs, explore 10

4

nodes/sec

10

6

nodes per move

Standard approach:

cutoff test:

e.g., depth limit (perhaps add quiescence search)

evaluation function

(26)

Game Include an Element of Chance

(27)
(28)

• Checkers : Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. Checkers is now solved.

• Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic.

(29)

Othello: human champions refuse to

compete against computers, who are too

good.

Go: human champions refuse to compete

against computers, who are too bad. In

Go, b > 300, so most programs use

pattern knowledge bases to suggest

plausible moves, along with aggressive

pruning.

(30)

All the game start with defining by the initial state (how the board is set

up), the legal actions in each state, the result of each action, a terminal test (which says when the game is over), and a utility function that

applies to terminal states.

In two-player zero-sum games which has perfect information, the

minimax algorithm can select optimal moves by a depth-first enumeration of the game tree.

The alpha–beta search algorithm computes the same as minimax, but

achieves much greater efficiency by eliminating subtrees that are provably irrelevant.

(31)

Usually, it is not feasible to consider the whole game

tree (even with alpha–beta) need to

cut the search off

at some point and apply a heuristic

evaluation

function

that estimates the utility of a state.

Games of chance can be handled by an extension to

the

minimax algorithm

that evaluates a chance node

by taking the average utility of all its children,

weighted by the probability of each child.

(32)

8 7 3 9 1 6 2 4 1 1 3 5 3 9 2 6 5 2 1 2 3 9 7 2 8 6 4

Alpha Beta Pruning (Exercise)

MAX

MIN

(33)

8 7 3 9 1 6 2 4 1 1 3 5 3 9 2 6 5 2 1 2 3 9 7 2 8 6 4

Alpha Beta Pruning (Exercise)

1 2

8

3 5

= 8

4

6

8

7 8

9

9 11 13 17 19 21 2426 28 32 34 36 10

2

12

4

14

= 4

15

= 4

4 16

18

1

20

3

22

= 5

30

= 5

5 23

5 31

25

3

27

9

6 29

33

1

35

2

37

= 3

3 38 39

= 5

MAX

MIN

MAX

(34)

1. Differentiate between Game and Search

2. Explain the Adversarial search strategy.

3. Describe the working of minimax algorithm.

4. Write down the properties of minimax algorithm.

5. Describe the working of alpha-beta pruning algorithm.

6. Write down the properties of alpha-beta pruning algorithm.

(35)
(36)
(37)
(38)

References

Related documents

DIGITAL TECHNIQUES /ELECTRONIC INSTRUMENT SYSTEM FIBER OPTICS PAGE 9 of 18. BASIC COMPLEMENTARY COURSE FOR AF & PP

In the articles in 1962, and interestingly, from the two women historians published in 1976, on the bicentennial, that we see a viewpoint opposite from those earliest accounts,

developed the first combination of an image space depth representation with a dynamics-aware motion planner; feasible trajectories were generated in 3-D and projected into image

Flop Games Poker games like Hold’em (and Omaha) that are played using community cards that are dealt face up in the centre of the table.. Flush Any five cards of the

But more than this, we propose a model in which a well-known failure of expected utility — captured by the Allais paradox — is equivalent to a particular configuration of choice

Oak Hills Local School District policies regarding computer technology, including access to the Internet, certain online services, and the Oak Hills schools computer network..

This is typically a test configuration information such that test writer need not change the Java code for test case but rather can configure the test case using these input

Those present in the sanctuary are invited to remain silent, singing and speaking fervently with heart and mind.. Those attending worship online, please raise your voices aloud