Best Response Computation - Solving Extensive-Form Games Using Imperfect Recall Abstraction

One of the main computational components in algorithmic game theory is the problem of computing a best response since it forms a subproblem of many existing algorithms (e.g., [7, 8]). Formally, a strategy of a player (e.g., pure, mixed, behavioral) is a best response to the given strategy of his opponent if its expected utility is maximal against this strategy compared to all other strategies from the particular class. To denote the best response regardless of the type of the strategy (i.e., regardless whether we consider mixed or behavioral strategies), we use the term ex-ante best response.

In perfect recall EFGs, it is sufficient to consider a pure best response. However, this is no longer true in imperfect recall EFGs. Consider a one-player game called the absentminded driver [47] depicted on the left in Figure 3.4. We see that by playing any pure, or even a mixed strategy, the player cannot reach outcome higher than 1. Note that, as described in Section 2.1.1, in a mixed strategy a player samples a pure strategy from the given distribution before the game begins; hence, there is no randomization when the information set is reached, and player 1 always follows the pure strategy sampled from the mixed strategy. When using a behavioral strategy, however, player samples the action from a given distribution independently every time an information set is reached. Hence, the ex-ante optimal strategy is a behavioral strategy b(s) = ²₃ that reaches the expected value of ⁴₃. In general, we need to consider randomized best responses in so-called absent minded games (games where there exists a path from the root of the game tree to some leaf such that at least one information set Ii ∈ I_i is visited more than once). In the following 23

text, we will assume that the games are without absentmindedness, where it is sufficient to consider only pure strategies to find an ex-ante best response.

Lemma 3.3.1. Let G be an imperfect recall game without absentmindedness and b₁ strat-egy of player 1. There exists an ex-ante pure best response of player 2.

Proof. If the strategy of player 1 is fixed, finding the best response for player 2 corresponds to finding an optimum of a function over a closed convex polytope of all possible behavioral strategies of player 2 created by constraining the strategy by

a∈A(I)

b2(a) = 1, ∀I ∈ I2.

Vertices of this polytope are formed by pure behavioral strategies. Since each action can be chosen at most once in a game without absentmindedness, the objective function is multilinear in a form

z∈Z

C(z)b1(z)u2(z) Y

a∈seq₂(z)

b2(a).

Notice that, thanks to the assumption of no absentmindedness, the variables b₂(a) in every product always describe behavior at most once for one information set and so the variables are independent. Hence, an optimum must be in one of the vertices of this polytope – that is a pure behavioral strategy.

The problem of finding an ex-ante best response in imperfect recall games without absentmindedness is still NP-hard (follows from complexity results in [34]) while the prob-lem is easy (polynomial) in perfect recall games. The main difference is that in imperfect recall games an ex-ante best response cannot be found by selecting an action with the highest expected utility to be played in each information set (called a time consistent strategy [33]). This is caused by the fact that the belief in an information set of a player is not perfectly determined by the strategy of the opponent and nature, but also by the strategy of the best-responding player.

Example 3.3.1. Consider the game in Figure 3.4 (right) between player 1 and chance.

The ex-ante best response of player 1 in this game is to play B,D,F getting the utility of

5−ε

2 . Note, however, that since the belief of player 1 in his imperfect recall information sets depends on his behavior above the information set, one can reach a time consistent strategy playing B,C,E with the expected utility of 2. This strategy is time consistent since when checking every information set separately, there is no deviation of player 1, which could increase his expected value. Note that player 1 does not have A-loss recall in the game in Figure 3.4 (right) since parents of the nodes in the information set I3 are in two distinct information sets I₁, I₂ and their common predecessor is a chance node.

Lemma 3.3.2. Time consistent strategies and ex-ante best responses are guaranteed to be equivalent for player i in a game G if and only if i has A-loss recall in G [33].

Proof. See proof of Theorem 1 in [33].

Consequently, the computation of the best response is in P in A-loss recall games [31, 33]. The following lemmas are a consequence of these facts.

Solving Extensive-Form Games Lemma 3.3.3. Let G be an imperfect recall game where player i has A-loss recall. Let G⁰ be the coarsest perfect recall refinement of G for player i. Every pure behavioral strategy b⁰_i of player i from G⁰ has realization equivalent pure behavioral strategy b_i in G and vice versa.

Proof. First we show how b_i is constructed from b⁰_i. Consider an information set I of player i in G and corresponding information sets I¹, . . . , I^j created according to H(I) in the coarsest perfect refinement G⁰ of G.

Let us first show that at most one of information sets I¹, . . . , I^j can be reached when players play according to strategy profile (b⁰_i, b−i), for every b−i ∈ B_−iin G⁰. Due to A-loss recall of player i, for every pair of nodes h_k, h_l from two different information sets I^k, I^l ∈ {I¹, . . . , I^j}, there exists an information set I⁰ of player i and two distinct actions a, a⁰ ∈ A_i(I⁰), a 6= a⁰ such that a ∈ seqi(hk) ∧ a⁰ ∈ seq_i(hl). However, since b⁰_i is a pure strategy, only one action among the pair of actions a, a⁰ can be played with a non-zero probability and consequently, only one information set in every pair I^k, I^l ∈ {I¹, . . . , I^j} can be reached.

We use this property to construct b_i. For information set I from G that is divided into information sets I¹, . . . , I^j in G⁰ we define b_i(I, a) = 1 for action a ∈ A(I^k), where b⁰_i(I^k, a) = 1 and I^k is the only reachable set from I¹, . . . , I^j as shown above. If no information set from I¹, . . . , I^j is reachable, we set bi in I arbitrarily. For all information sets I⁰∈ I⁰that are not split in the coarsest perfect recall refinement (and therefore are the same as in G), we set bi(I⁰) = b⁰_i(I⁰). Due to the construction, the realization equivalence between b⁰_i and bi follows immediately.

The same construction yields realization equivalent strategy also in the opposite direc-tion.

Lemma 3.3.4. Let G be an imperfect recall game where player 2 has A-loss recall and b₁ a strategy of player 1. Let G⁰ be the coarsest perfect recall refinement of G for player 2.

Let b⁰₂ be a pure best response to b1 in G⁰ and let b2 be a realization equivalent behavioral strategy to b⁰₂ in G, then b₂ is a pure best response to b₁ in G.

Proof. Note that b₁ is a valid strategy in both games since the information set structure for player 1 in G and G⁰ is identical. Since player 2 is not absentminded in G, it is enough to consider pure behavioral strategies (Lemma 3.3.1) as a best response to b1 in G.

Furthermore from Lemma 3.3.3 we know that every pure behavioral strategy ˆb⁰₂ from G⁰ has realization equivalent pure behavioral strategy ˆb₂ in G, hence also the expected utility u1(b1, ˆb⁰₂) in G⁰ is equal to u1(b1, ˆb2) in G. Since b⁰₂ is a best response to b1 in G⁰, it holds that for every pure behavioral strategy ˆb⁰₂ in G⁰ and its realization equivalent counterpart ˆb₂ in G,

u₂(b₁, ˆb₂) = u₂(b₁, ˆb⁰₂) ≤ u₂(b₁, b⁰₂) = u₂(b₁, b₂).

Finally, since also every pure behavioral strategy ˆb2 in G has realization equivalent pure behavioral strategy in G⁰ (Lemma 3.3.3), there can be no ˆb₂ for which u₂(b₁, ˆb₂) >

u2(b1, b2).

Lemma 3.3.4 allows us to formulate the concise mathematical program described in Section 4.3 which is used as the core of algorithms discussed in Section 4.4.

In document Solving Extensive-Form Games Using Imperfect Recall Abstraction (Page 39-42)