General Non-Simultaneous Two-Player Games

10.3 Non-Simultaneous Two-Player Games

10.3.2 General Non-Simultaneous Two-Player Games

In general game playing the games are not necessarily zero-sum with only rewards of0, 50, and 100, but the players can get any reward within {0, . . . , 100}, no matter what the opponent gets. Also, it is not trivial to find out if a game is actually zero-sum. One approach to determine this would be to calculate all reachable terminal states and determine their rewards. A different approach that uses answer-set programming to prove certain properties (among which is the zero-sum assumption) is due to Schiffel and Thielscher (2009). We did not yet implemented such an approach, so that usually the algorithm of the previous section is not enough to solve such games.

A first approach we implemented (Edelkamp and Kissmann, 2007) relies on restarts once a state is assigned different rewards. This resulted in an algorithm with exponential runtime. After some analysis we found that all the approach does is making sure that a state is finally solved only once all the successor states are solved. Having found this we were able to implement a much more efficient algorithm (cf. Algorithm 10.3) (Edelkamp and Kissmann, 2008e).

While Algorithm 10.2 worked only for turn-taking games, i. e., games in which the player to choose the next move changes after each step, this algorithm also works for any non-simultaneous two-player game, so that it is possible to solve games where a player is active for several subsequent states.

We use a101 × 101 matrix of BDDs, solution, to store the solved states. The BDD in position (i, j) represents those states where player0 can achieve a reward of i and player 1 a reward of j. This matrix is initialized with the reachable terminal states according to their calculated rewards (line 3). To save some time later on we manage two additional BDDs, solved and unsolved , that store the states that are already solved, i. e., the reachable states that are inserted into the matrix, and those that are not, respectively.

To extend the solution we perform a backward search. In each step we call the strong pre-image to find those predecessors of the solved states whose successors are all solved and where the currently considered player has to choose the next move (line 8). If such states exist they are added to the set of solved states, removed from the set of unsolved states, and finally solved by the function solveStates (cf. Algorithm 10.4), where they are inserted into the matrix. The update of the sets of solved and unsolved states results in the fact that it is not necessary to evaluate the entire matrix to find the states already solved after every step. Once all states are solved the algorithm stops and returns the solution matrix.

The insertion into the matrix works as follows. For the current player the algorithm checks all positions of the matrix in a predefined order (see next paragraph). If the current position is not empty, i. e., if at least one state resides there, we need to check if one of the solvable states can generate a successor that equals one of those states. As the transition relation operates in both directions, we can instead check if some

10.3. NON-SIMULTANEOUS TWO-PLAYER GAMES 145

Algorithm 10.3: Symbolic solution of general non-simultaneous two-player games Input: General non-simultaneous two-player game G= ⟨P, L, N , A, I, T , R⟩. Output: Strong solution represented as a101 × 101 matrix of BDDs.

1 reach ← BFS(G) // BFS for finding all reachable states.

2 for alli, j ∈ {0, . . . , 100} do // For all possible combinations of rewards. . . 3 solutioni,j ← reach ∧ T ∧ R (0, i) ∧ R (1, j) // Initialize bucket of solution matrix. 4 solved ←₀_≤i,j≤100solutioni,j // All states in solution matrix are solved. 5 unsolved ← reach ∧ ¬solved // All reachable states not solved are unsolved.

6 while unsolved ̸= ⊥ do // While there are unsolved states left. . .

7 for allp ∈ {0, 1} do // For both players. . .

8 solvable ← strong pre-image(solved) ∧ unsolved ∧ movep // Determine solvable states

// wherep has control.

9 if solvable ̸= ⊥ then // If there is a solvable state. . .

10 solved ← solved ∨ solvable // Add solvable states to set of solved states. 11 unsolved ← unsolved ∧ ¬solvable // Remove solvable states from set of unsolved states. 12 solution ← solveStates(G, solution, p, solvable) // Solve the solvable states.

13 return solution // Return solution matrix.

Algorithm 10.4: Symbolic solution of a set of states for a general non-simultaneous two-player game (solveStates)

Input: General non-simultaneous two-player game G= ⟨P, L, N , A, I, T , R⟩. Input: Matrix solution containing the solution so far.

Input: Indexp denoting the current player.

Input: BDD solvable representing the states to be solved. Output: Extended solution.

1 for alli, j ∈ {0, . . . , 100} do in order // Evaluate all reward combinations in predefined order.

2 if solution_i,j̸= ⊥ then // If the reward combination is reachable. . .

3 newSolved ← pre-image(solution_i,j) ∧ solvable // Determine which solvable states are a

// predecessor of a state in the current position.

4 solutioni,j← solutioni,j∨ newSolved // Add those states to the solution. 5 solvable ← solvable ∧ ¬newSolved // Remove those states from the set of solvable states. 6 if solvable= ⊥ then return solution // If all solvable states are solved we are done.

predecessors of one of the stored states is present in the solvable states (line 3). If that is the case, those predecessors are added to the BDD in the current position and removed from the set of solvable states. Orders to Process the Rewards For general rewards it is important to find a reasonable order to process the rewards. For two-player games, the two most reasonable assumptions are that the players try to maximize their own reward (cf. Figure 10.7a) or to maximize the difference to the opponent’s reward (cf. Figure 10.7b).

The first of these is the more general assumption—in a competition scenario with more than only two players and a number of games to be played it might be more important to gain a large number of points in total than trying to prevent one of the opponents from gaining a larger reward. In a scenario with only two players this might not be the best strategy. In fact, there it would appear more reasonable to try to maximize the difference. This way, the player might get somewhat fewer points but achieve an advantage over the opponent.

As an example, take the two possible outcomes75–100 and 50–0. In the case of maximizing the own reward the first player would prefer the first outcome, achieving75 points and totally ignoring how many points the opponent might gain. In this case the opponent would get even more points (100). In the case of

146 CHAPTER 10. SOLVING GENERAL GAMES . . . . . . . . . .. . .. . .. . 0 1 99 100 0 1 99 100 own reward opponent’ s re w ard

(a) Maximizing the own reward.

. . . . . . . . . .. . .. . .. . 0 1 99 100 0 1 99 100 own reward opponent’ s re w ard

(b) Maximizing the difference.

Figure 10.7: The different orders to traverse the matrix.

maximizing the difference the first player would prefer the second outcome, thereby forswearing25 points but gaining an advantage of50 points over the opponent.

In case of a zero-sum game, these two orders actually are the same. In both cases the players prefer to win over a draw, which in turn they prefer over a loss. Thus, the algorithms using one of these orders generate the same results as the zero-sum algorithm.

Soundness of the Algorithm

Theorem 10.9 (Soundness of Algorithm 10.3). The algorithm for symbolically solving well-formed general non-simultaneous two-player games (Algorithm 10.3) is sound and calculates a strong solution corresponding to the specified traversal order.

Proof. We will proof the soundness by induction.

When starting backward search, all states reachable in forward direction have been determined; espe- cially, all reachable terminal states have been calculated (which are a subset of all states compatible with the termination criterion). For these states the reward for both players is unambiguous and the terminal states are inserted into the corresponding positions of the matrix.

Now, calculating the strong pre-image in line 8 results in all states whose successors are already in the matrix and thus solved according to the specified traversal order, so that these states are solvable as well. The additional condition that the current player must be the one to choose a move does not hamper the argumentation, as we immediately apply this step for both players and stop the algorithm only when no unsolved states are left.

In the end, the strong pre-image captures all states reached in forward direction. This is because on the one hand the normal backward search results in a superset of the states reachable in forward direction (cf. Lemma 10.6), on the other hand the strong pre-image finds all predecessors whose successors are already solved. As we start with all terminal states as solved states and we know that a well-formed game ends in a terminal state after a finite number of steps, so that no loops are possible, at some point during the backward search all successors of a state are solved and the solving of the new states might give rise to further solvable states.

The states are inserted as described in Algorithm 10.4. Here the matrix is traversed in the order specified before. If a predecessor of a non-empty position in the matrix is found in the set of solvable states it is inserted to this position and removed from the set of solvable states. This ensures that each state is inserted only once. Furthermore, as the traversal is according to the specified order, the state is inserted into the position that is preferred by the player to choose the move over all other possible rewards reachable from that state. Thus, all solvable states are inserted into the correct positions within the matrix.

In this algorithm it is also possible to omit the forward search to find all the reachable states. In that case the BDD representing the unsolved states becomes redundant, as we do not know which states can yet

10.3. NON-SIMULTANEOUS TWO-PLAYER GAMES 147

Algorithm 10.5: Symbolic calculation of the reachable states in a layered approach (layeredBFS ). Input: General non-simultaneous two-player game G= ⟨P, L, N , A, I, T , R⟩.

Output: Index of last non-empty layer (plus layers of reachable states on hard disk).

1 current ← I // Start at the initial state.

2 layer ←0 // Store index of the current layer, starting with0.

3 while current ̸= ⊥ do // While the current layer is not empty. . .

4 store current as reachlayer on hard disk // Store the current layer.

5 current ← image(current ∧ ¬T ) // Calculate the successor layer.

6 layer ← layer+ 1 // Increase the layer index.

7 return layer −1 // Return the index of the last non-empty layer.

be solved. Thus, the loop needs to be adapted to stop once no new solvable states have been found for both players. As the set of states reachable in backward direction from terminal states is a superset of the set of states reachable in forward direction from the initial state this still results in a sound algorithm.

In order to use this solution for optimal play (according to the specified traversal order) we must choose a move that results in a successor state within the same bucket as the current state. This way we make sure that we do not lose any point, but improve the result whenever the opponent makes a sub-optimal move.

In document Symbolic Search in Planning and General Game Playing (Page 154-157)