Distributed Methods for Computing Approximate Equilibria

(1)

Equilibria.

Artur Czumaj · Argyrios Deligkas ·

Michail Fasoulakis · John Fearnley ·

Marcin Jurdzi´nski · Rahul Savani

Abstract We present a new, distributed method to compute approximate Nash equilibria in bimatrix games. In contrast to previous approaches that analyze the two payoff matrices at the same time (for example, by solving a single LP that combines the two players’ payoffs), our algorithm first solves two independent LPs, each of which is derived from one of the two payoff ma-trices, and then computes an approximate Nash equilibrium using only lim-ited communication between the players. Our method gives improved bounds on the complexity of computing approximate Nash equilibria in a number of different settings. Firstly, it gives a polynomial-time algorithm for comput-ing approximate well supported Nash equilibria (WSNE) that always finds a 0.6528-WSNE, beating the previous best guarantee of 0.6608. Secondly, since our algorithm solves the two LPs separately, it can be applied to give an improved bound in the limited communication setting, giving a randomized expected-polynomial-time algorithm that uses poly-logarithmic communica-tion and finds a 0.6528-WSNE, which beats the previous best known guaran-tee of 0.732. It can also be applied to the case ofapproximate Nash equilibria, where we obtain a randomized expected-polynomial-time algorithm that uses poly-logarithmic communication and always finds a 0.382-approximate Nash

The first, third and fifth author were partially supported by the Centre for Discrete Mathe-matics and its Applications (DIMAP) and EPSRC award EP/D063191/1. The second, fourth and sixth author were supported by EPSRC award EP/L011018/1. The second author was also supported by ISF award #2021296.

A. Czumaj·M.Fasoulakis·M. Jurdzi´nski

Department of Computer Science and DIMAP, University of Warwick, UK A. Deligkas

Faculty of Industrial Engineering and Management, Technion, Israel J. Fearnley·R. Savani

Department of Computer Science, University of Liverpool, UK M.Fasoulakis

Institute of Computer Science, Foundation for Research and Technology-Hellas (ICS-FORTH), Greece

(2)

equilibrium, which improves the previous best guarantee of 0.438. Finally, the method can also be applied in the query complexity setting to give an algo-rithm that makesO(nlogn) payoff queries and always finds a 0.6528-WSNE, which improves the previous best known guarantee of 2/3.

1 Introduction

The problem of finding equilibria in non-cooperative games is a central problem in game theory. Nash’s seminal theorem proved that every finite normal-form game has at least oneNash equilibrium[18], and this raises the natural question of whether we can find one efficiently. After several years of extensive research, it was shown that finding a Nash equilibrium is PPAD-complete [7] even for two-playerbimatrix games [3], which is considered to be strong evidence that there is no polynomial-time algorithm for this problem.

Approximate equilibria. The fact that computing an exact Nash equilib-rium of a bimatrix game is unlikely to be tractable, has led to the study of approximate Nash equilibria. There are two natural notions of approximate equilibrium, both of which will be studied in this paper. An -approximate Nash equilibrium (-NE) is a pair of strategies in which neither player can increase their expected payoff by more than by unilaterally deviating from their assigned strategy. An-well-supported Nash equilibrium (-WSNE) is a pair of strategies in which both players only place probability on strategies whose payoff is withinof the best response payoff. Every-WSNE is an-NE but the converse does not hold, so a WSNE is a more restrictive notion.

Approximate Nash equilibria are the more well studied of the two concepts. A line of work has studied the best guarantee that can be achieved in poly-nomial time [2, 6, 8]. The best algorithm known so far is the gradient descent method of Tsaknakis and Spirakis [20] that finds a 0.3393-NE in polynomial time, and examples upon which the algorithm finds no better than a 0.3385-NE have been found [12]. On the other hand, progress on computing approximate-well-supported Nash equilibria has been less forthcoming. The first correct algorithm was provided by Kontogiannis and Spirakis [16] (which shall hence-forth be referred to as the KS algorithm), who gave a polynomial time algo-rithm for finding a 2

3-WSNE. This was later slightly improved by Fearnley

et al. [10] (whose algorithm we shall refer to as the FGSS-algorithm), who gave a new polynomial-time algorithm that extends the KS algorithm and finds a 0.6608-WSNE; prior to this work, this was the best known approxima-tion guarantee for WSNEs. For the special case of symmetric games, there is a polynomial-time algorithm for finding a 1₂+δ-WSNE [5].

Previously, it was considered a strong possibility that there is a PTAS for this problem (either for finding an -NE or -WSNE, since their complexity is polynomially related). A very recent result of Rubinstein [19] sheds serious doubt on this possibility. _EndOfTheLineis the canonical problem that de-fines the complexity class PPAD. The “Exponential Time Hypothesis” (ETH) forEndOfTheLinesays that any algorithm that solves anEndOfTheLine

(3)

instance withn-bit circuits, requires 2Ω˜(n)_{time. Rubinstein’s result says that,}

subject to the ETH for_EndOfTheLine, there exists a constant, but so far undetermined, ∗, such that for < ∗, every algorithm for finding an -NE takes quasi-polynomial time, so the quasi-PTAS of Lipton et al. [17] is optimal.

Communication complexity.Approximate Nash equilibria can also be stud-ied from the communication complexity point of view, which captures the amount of communication the players need to find a good approximate Nash equilibrium. It models a natural scenario where the two players each know their own payoff matrix, but do not know their opponent’s payoff matrix. The players must then follow a communication protocol that eventually produces strategies for both players. The goal is to design a protocol that produces a sufficiently good-NE or-WSNE while also minimizing the amount of com-munication between the two players.

Communication complexity of equilibria in games has been studied in pre-vious works [4, 15]. The recent paper of Goldberg and Pastink [13] initiated the study of communication complexity in the bimatrix game setting. There they showed Θ(n2_{) communication is required to find an exact Nash}

equilib-rium of an n×n bimatrix game. Since these games have Θ(n2_{) payoffs in}

total, this implies that there is no communication-efficient protocol for find-ing exact Nash equilibria in bimatrix games. For approximate equilibria, they showed that one can find a 3₄-NEwithout any communication, and that in the no-communication setting, finding a 1₂-NE is impossible. Motivated by these positive and negative results, they focused on the most interesting setting, which allows only a polylogarithmic (in n) number of bits to be exchanged between the players. They showed that one can compute 0.438-NE and 0. 732-WSNE in this context. Recently Babichenko and Rubinstein [1] proved the first lower bounds for the communication complexity of -NE. They proved that for bimatrix games there is constant > 0 such that polynomial com-munication (inn) is needed in order to compute an -NE. Furthermore, they showed that in N-player binary-action games there exists a constant > 0 such that 2Ω(N)_{communication is needed for computing an}_-NE.

Query complexity. The payoff query model is motivated by practical ap-plications of game theory. It is often the case that we know that there is a game to be solved, but we do not know what the payoffs are, and in order to discover the payoffs, we would have to play the game. This may be costly, so it is natural to ask whether we can find an equilibrium while minimizing the number of experiments that we must perform.

Payoff queries model this situation. In the payoff query model we are told the structure of the game, i.e., the strategy space, but we are not told the payoffs. We can then make payoff queries, where we propose a pure strategy profile, and we are told the payoff of each player under that strategy profile. Our task is to compute an equilibrium of the game while minimizing the number of payoff queries that we make.

The study of query complexity in bimatrix games was initiated by Fearn-ley et al. [11], who gave a deterministic algorithm for finding a 1₂-NE using

(4)

2n−1 payoff queries. A subsequent paper of Fearnley and Savani [9] showed a number of further results. Firstly, they showed anΩ(n2) lower bound on the query complexity of finding an -NE with < 1₂, which combined with the result above, gives a complete view of the deterministic query complexity of approximate Nash equilibria in bimatrix games. They then give a randomized algorithm that finds a (3−√5

2 +)-NE usingO(

n·logn

2 ) queries, and a

random-ized algorithm that finds a (2₃+)-WSNE usingO(n·log4 n) queries.

Our contribution.In this paper we introduce adistributedtechnique that al-lows us to efficiently compute-NE and-WSNE using limited communication between the players.

Traditional methods for computing WSNEs have used an LP based ap-proach that, when used on a bimatrix game (R, C), solves the zero-sum game (R−C, C−R). The KS algorithm uses the fact that if there is no pure 2₃ -WSNE, then the solution to this zero-sum game is a 2₃-WSNE. The slight improvement of the FGSS-algorithm [10] to 0.6608 was obtained by adding two further methods to the KS algorithm: if the KS algorithm does not pro-duce a 0.6608-WSNE, then either there is a 2×2matching pennies sub-game that is 0.6608-WSNE or the strategies from the zero-sum game can be im-proved by shifting the probabilities of both players within their supports in order to produce a 0.6608-WSNE.

In this paper, we take a different approach. We first show that the bound of 2₃ can be matched using a pair of distributed LPs. Given a bimatrix game (R, C), we solve the two zero-sum games (R,−R) and (−C, C), and then give a simple procedure that we call the base algorithm, which uses the solutions to these games to produce a 2₃-WSNE of (R, C). Goldberg and Pastink [13] also considered this pair of LPs, but their algorithm only produces a 0.732-WSNE. We then show that the base algorithm can be improved by applying the probability-shifting and matching-pennies ideas from the FGSS-algorithm. That is, if the base algorithm fails to find a WSNE, then a 0.6528-WSNE can be obtained either by shifting the probabilities of one of the two players, or by identifying a 2×2 sub-game. This gives a polynomial-time algorithm that computes a 0.6528-WSNE, which provides the best known ap-proximation guarantees for WSNEs.

It is worth pointing out that, while these techniques are thematically sim-ilar to the ones used by the FGSS-algorithm, the actual implementation is significantly different. The FGSS-algorithm attempts to improve the strate-gies by shifting probabilitieswithin the supports of the strategies returned by the two player game, with the goal of reducing the other player’s payoff. In our algorithm, we shift probabilitiesaway from bad strategies in order to im-prove that player’s payoff. This type of analysis is possible because the base algorithm produces a strategy profile in which one of the two players plays a pure strategy, which simplifies the analysis that we need to carry out. On the other hand, the KS-algorithm can produce strategies in which both play-ers play many strategies, and so the analysis used for the FGSS-algorithm is necessarily more complicated.

(5)

Complexity setting Payoffs Solution Previous best approximation This paper

Computational (polynomial) [0,1] -WSNE 0.6608 [10] 0.6528

Query (n·log(n) queries) [0,1] -WSNE 0.6667 [9] 0.6528 +

Communication (polylogarithmic) [0,1] -WSNE 0.7321 [13] 0.6528 +

Communication (polylogarithmic) {0,1} -WSNE 0.7321 [13] 0.5 +

Communication (polylogarithmic) [0,1] -NE 0.4384 [13] 0.3820 +

Table 1 Comparison of our approximation guarantees with the previous best-known guar-antees.

Since our algorithm solves the two LPs separately, it can be used to improve upon the best known algorithms in the limited communication setting. This is because no communication is required for the row player to solve (R,−R) and the column player to solve (−C, C). The players can then carry out the rest of the algorithm using only poly-logarithmic communication. Hence, we obtain a randomized expected-polynomial-time algorithm that uses poly-logarithmic communication and finds a 0.6528-WSNE. Moreover, the base algorithm can be implemented as a communication efficient algorithm for finding a (1₂+ )-WSNE in awin-lose bimatrix game, where all payoffs are either 0 or 1.

The algorithm can also be used to beat the best known bound in the query complexity setting. It has already been shown by Goldberg and Roth [14] that an -NE of a zero-sum game can be found by a randomized algorithm that usesO(nlog2n) payoff queries. Since the rest of the steps used by our algorithm

can also be carried out usingO(nlogn) payoff queries, this gives us a query efficient algorithm for finding a 0.6528-WSNE.

We also show that the base algorithm can be adapted to find a 3−

√ 5 2 -NE

in a bimatrix game. Once again, this can be implemented in a communication efficient manner, and so we obtain an algorithm that computes a (3−

√ 5

2 +)-NE

(i.e., 0.382-NE) using only poly-logarithmic communication.

Finally, we provide a lower bound against the base algorithm that is essen-tially tight, namely within 0.0034 of the theoretical upper bound.

2 Preliminaries

Bimatrix games.Throughout, we use [n] to denote the set{1,2, . . . , n}. An

n×n bimatrix game is a pair (R, C) of two n×n matrices: R gives payoffs for the row player andC gives the payoffs for the column player. We make the standard assumption that all payoffs lie in the range [0,1]. For simplicity, as in [13], we assume that each payoff has constant bit-length1. A win-lose bimatrix game has all payoffs in{0,1}.

Each player has n pure strategies. To play the game, both players simul-taneously select a pure strategy: the row player selects a row i∈[n], and the

1 _{The statements of our results can easily be extended to the case where all payoffs can be} represented usingbbits by including a factorbin all our communication complexity bounds.

(6)

column player selects a column j ∈[n]. The row player then receives payoff

Ri,j, and the column player receives payoffCi,j.

Amixed strategy is a probability distribution over [n]. We denote a mixed strategy for the row player as a vector x of length n, such that xi is the

probability that the row player assigns to pure strategy i. A mixed strategy of the column player is a vectory of length n, with the same interpretation. Given a mixed strategyxfor either player, the supportofxis the set of pure strategies i with xi > 0. If x and y are mixed strategies for the row and

the column player, respectively, then we call (x,y) a mixed strategy profile. The expected payoff for the row player under strategy profile (x,y) is given by xT_R_y _{and for the column player by} _xT_C_y_{. We denote the} _support _of

a strategy x as supp(x), which gives the set of pure strategies i such that

xi>0.

Nash equilibria.Letybe a mixed strategy for the column player. The set of pure best responsesagainstyfor the row player is the set of pure strategies that maximize the payoff againsty. More formally, a pure strategyi∈[n] is a best response againstyif, for all pure strategiesi0 ∈[n] we have:P

j∈[n]yj·Ri,j≥ P

j∈[n]yj·Ri0,j. Column player best responses are defined analogously.

A mixed strategy profile (x,y) is a mixed Nash equilibrium if every pure strategy in supp(x) is a best response against y, and every pure strategy in supp(y) is a best response againstx. Nash [18] showed that all bimatrix games have a mixed Nash equilibrium.

Approximate equilibria. There are two commonly studied notions of ap-proximate equilibrium, and we consider both of them in this paper. The first notion is of an-approximate Nash equilibrium (-NE), which weakens the re-quirement that a player’s expected payoff should be equal to their best response payoff. Formally, given a strategy profile (x,y), we define theregret suffered by the row player to be the difference between the best response payoff and actual payoff, i.e.,

max

i∈[n] (R·y)i

−xT ·R·y.

The termR·y is a vector where thei-th entry of the vector, (R·y)i,

corre-sponds the payoff the row player gets from playing hisi-th pure strategy when the column player playsy. Hence, the maximum over this vector is the best response payoff for the row player. The termxT _·_R_·_y_{encodes the expected}

payoff to the row player under the strategy profile (x,y).

Regret for the column player is defined analogously. We have that (x,y) is an-NE if and only if both players have regret less than or equal to.

The other notion is of an -approximate-well-supported equilibrium ( -WSNE), which weakens the requirement that players only place probability on best response strategies. Given a strategy profile (x,y) and a pure strategy

j∈[n], we say that j is an-best-response for the row player if max

i∈[n] (R·y)i

(7)

An-WSNE requires that both players only place probability on-best-responses. In an -WSNE both players place probability only on -best-responses. For-mally, the row player’spure strategy regret under (x,y) is defined to be

max i∈[n] (R·y)i − min i∈supp(x) (R·y)i .

Pure strategy regret for the column player is defined analogously. A strategy profile (x,y) is an-WSNE if both players have pure strategy regret less than or equal to.

Communication complexity. We consider the communication model for bimatrix games introduced by Goldberg and Pastink [13]. In this model, both players know the payoffs in their own payoff matrix, but do not know the payoffs in their opponent’s matrix. The players then follow an algorithm that uses a number of communication rounds, where in each round they exchange a single bit of information. Between each communication round, the players are permitted to perform arbitrary randomized computations (although it should be noted that, in this paper, the players will only perform polynomial-time computations) using their payoff matrix, and the bits that they have received so far. At the end of the algorithm, the row player outputs a mixed strategy

x, and the column player outputs a mixed strategyy. The goal is to produce a strategy profile (x,y) that is an-NE or -WSNE for a sufficiently small

while limiting the number of communication rounds used by the algorithm. The algorithms given in this paper will use at mostO(log2n) communication rounds. In order to achieve this, we use the following result of Goldberg and Pastink [13].

Lemma 1 ([13]) Given a mixed strategyx for the row-player and an >0, there is a randomized expected-polynomial-time algorithm that usesO(log22n

)-communication to transmit a strategyxsto the column player where|supp(xs)| ∈

O(log2n)and for every strategyi∈[n] we have: |(xT ·R)i−(xTs ·R)i| ≤.

The algorithm uses the well-known sampling technique of Lipton, Markakis, and Mehta to construct the strategyxs, so for this reason we will call the

strat-egyxs thesampled strategy fromx. Since this strategy has a logarithmically

sized support, it can be transmitted by sendingO(log2n) strategy indexes, each

of which can be represented using lognbits. By symmetry, the algorithm can obviously also be used to transmit approximations of column player strategies to the row player.

Query complexity. In the query complexity setting, the algorithm knows that the players will play an n×n game (R, C), but it does not know any of the entries of R or C. These payoffs are obtained using payoff queries in which the algorithm proposes a pure strategy profile (i, j), and then it is told the value of Rij and Cij. After each payoff query, the algorithm can make

(8)

we consider take polynomial time) in order to decide the next pure strategy profile to query. After making a sequence of payoff queries, the algorithm then outputs a mixed strategy profile (x,y). Again, the goal is to ensure that this strategy profile is an-NE or-WSNE, while minimizing the number of queries made overall.

There are two results that we will use for this setting. Goldberg and Roth [14] have given a randomized algorithm that, with high probability, finds an -NE of a zero-sum game usingO(n·log2 n) payoff queries. Given a mixed strategy

profile (x,y), an -approximate payoff vector for the row player is a vectorv

such that, for all i ∈ [n] we have |vi−(R·y)i| ≤ . Approximate payoff

vectors for the column player are defined symmetrically. Fearnley and Savani [9] observed that there is a randomized algorithm that when given the strategy profile (x,y), finds approximate payoff vectors for both players usingO(n·log2 n)

payoff queries and that succeeds with high probability. We summarise these two results in the following lemma.

Lemma 2 ([9, 14])Given ann×nzero-sum bimatrix game, with probability at least (1−n−18)(1−2

n)

2_{, we can compute an}_{-Nash equilibrium}₍_x_,_y_{), and} -approximate payoff vectors for both players under (x,y), using O(n·log2 n)

payoff queries.

3 The base algorithm

In this section, we introduce thebase algorithm, which provides a simple way to find a2₃-WSNE. We present this algorithm separately for three reasons. Firstly, the algorithm is interesting in its own right, since it provides a relatively straightforward method for finding a 2₃-WSNE that is quite different from the technique used in the KS-algorithm. Secondly, our algorithm for finding a 0.6528-WSNE will build on this algorithm and replace its final step with two more involved procedures. Thirdly, at the end of this section, we show how this algorithm can be adapted to provide a communication efficient way to find a (0.5 +)-WSNE in win-lose games.

(9)

Algorithm 1

1. Solve the zero-sum games (R,−R) and (−C, C).

– Let (x∗,y∗) be a NE of (R,−R), and let (ˆx,yˆ) be a NE of (−C, C).

– Let vr be the value secured by x∗ in (R,−R), and let vc be

the value secured by ˆy in (−C, C). Without loss of generality assume thatvc ≤vr.

2. Ifvr≤2/3, then return (ˆx,y∗).

3. If for allj ∈[n] it holds thatCT

j ·x∗≤2/3, then return (x∗,y∗).

4. Otherwise:

– Letj∗ be a pure best response tox∗.

– Find a rowisuch thatRij∗ >1/3 andC_ij∗ >1/3.

– Return (i,j∗).

We argue that this algorithm is correct. For that, we must prove that the rowiused in Step 4 actually exists.

Lemma 3 If Algorithm 1 reaches Step 4, then there exists a row i such that

Rij∗ >1/3 andC_ij∗>1/3.

Proof Leti be a row sampled fromx∗. We will show that there is a positive probability that rowi satisfies the desired properties.

We begin by showing that the probability that Pr(Rij∗≤1

3)<0.5. Let the

random variableT = 1−Rij∗, whereiis chosen according to the probability distribution x∗. Note that E[Rij∗] = (x∗R)_j. Since v_r > 2

3, we claim that E[T] < 1₃. This follows from the fact that x∗ is a minmax strategy for the row player, and thus it guarantees a payoff more than 2/3 against any pure strategy of the column player.

Thus, applying Markov’s inequality we obtain: Pr(T ≥2 3)≤ E[T] 2/3 <0.5. Since Pr(Rij∗ ≤ 1 3) = Pr(T ≥ 2

3) we can therefore conclude that Pr(Rij∗ ≤ 1

3)<0.5. The exact same technique can be used to prove that Pr(Cij∗≤ 1 3)<

0.5, by using the fact thatC_jT∗·x∗> 2₃.

We can now apply the union bound to argue that: Pr(Rij∗≤

1

3 or Cij∗ ≤ 1 3)<1.

Hence, there is positive probability that rowisatisfiesRij∗> 1

3 andCij∗ > 1 3,

so such a row must exist. ut

We now argue that the algorithm always produces a 2₃-WSNE. There are three possible strategy profiles that can be returned by the algorithm, which we consider individually.

(10)

The algorithm returns in Step 2: Sincevc ≤vrby assumption, and since

vr≤ 2₃, we have that (R·y∗)i≤₃2 for every rowi, and ((ˆx)T·C)j≤ 2₃ for

every column j. So, both players can have pure strategy regret at most 2₃ in (ˆx,y∗), and thus this profile is a 2₃-WSNE.

The algorithm returns in Step 3: Much like in the previous case, when the column player playsy∗, the row player can have pure strategy regret at most 2₃; observe that in this case the row player actually suffers zero regret since x∗ is a best response against y∗. The requirement that C_jTx∗ ≤ 2 3

also ensures that the column player has pure strategy regret at most 2₃. Thus, we have that (x∗,y∗) is a 2₃-WSNE.

The algorithm returns in Step 4: Both players have payoff at least 1₃ un-der (i,j∗) for the sole strategy in their respective supports. Hence, the maximum pure strategy regret that can be suffered by a player is 1−1

3 = 2 3.

Observe that the zero-sum games solved in Step 1 can be solved via linear programming, and so the algorithm runs in polynomial time. Therefore, we have shown the following.

Theorem 4 Algorithm 1 always produces a 2

3-WSNE in polynomial time.

3.1 Win-lose games

The base algorithm can be adapted to provide a communication efficient method for finding a (0.5 +)-WSNE in win-lose games. In brief, the algo-rithm can be modified to find a 0.5-WSNE in a win-lose game by making Steps 2 and 3 check against the threshold of 0.5. It can then be shown that if these steps fail, then there exists a pure Nash equilibrium in column j∗. This can then be implemented as a communication efficient protocol using the algorithm from Lemma 1.

Formally, we will study the following simple modification of Algorithm 1.

Algorithm 2

– Let (x∗,y∗) be a NE of (R,−R), and let (ˆx,yˆ) be a NE of (C,−C).

the value secured by ˆy in (−C, C). Without loss of generality assume thatvc ≤vr.

2. Ifvr≤0.5, then return (ˆx,y∗).

3. If for allj ∈[n] it holds thatCT

j ·x∗≤0.5, then return (x∗,y∗).

4. Otherwise:

– Letj∗ be a pure best response tox∗.

– Find a rowisuch thatRij∗ = 1 andC_ij= 1.

(11)

We will show that this algorithm always finds a 0.5-WSNE in a win-lose game. Firstly, we show that the pure Nash equilibrium found in Step 4 always exists. The following lemma is similar to Lemma 3, but exploits the fact that the game is win-lose to obtain a stronger conclusion.

Lemma 5 If Algorithm 2 is applied to a win-lose game, and it reaches Step 4, then there exists a row i∈supp(x∗)such thatRij∗= 1 andC_ij∗= 1.

Proof Leti be a row sampled fromx∗. We will show that there is a positive probability that rowi satisfies the desired properties.

We begin by showing that the probability that Pr(Rij∗= 0)<0.5. Let the random variableT = 1−Rij∗. Sincev_r> 1

2, we have thatE[T]<0.5. Thus,

applying Markov’s inequality we obtain: Pr(T ≥1)≤E[T]

1 <0.5.

Since Pr(Rij∗= 0) = Pr(T ≥1) we can therefore conclude that Pr(R_ij∗= 0)< 0.5. The exact same technique can be used to prove that Pr(Cij∗= 0)<0.5, by using the fact thatCT

j∗·x∗>0.5.

We can now apply the union bound to argue that: Pr(Rij∗= 0 orC_ij∗ = 0)<1.

Hence, there is positive probability that rowisatisfiesRij∗>0 andC_ij∗>0, so such a row must exist. The final step is to observe that, since the game is win-lose, we have that Rij∗ >0 implies R_ij∗ = 1, and that C_ij∗ >0 implies

Cij∗= 1. ut

We now prove that the algorithm always finds a 0.5-WSNE. The reasoning is very similar to the analysis of the base algorithm. The strategy profiles returned by Steps 2 and 3 are 0.5-WSNEs by the same reasoning that was given for the base algorithm. Step 4 always returns a pure Nash equilibrium, which is a 0-WSNE.

Communication complexity. We now show that Algorithm 2 can be im-plemented in a communication efficient way.

The zero-sum games in Step 1 can be solved by the two players indepen-dently without any communication. Then, the players exchange vr and vs

using O(logn) rounds of communication. If both vr and vs are smaller than

0.5, then the players use the algorithm from Lemma 1 to communicate ˆxsand

y∗s between themselves, using the parameter 2 in place of . Since the payoffs

under the sampled strategies are within ₂ of the originals, we have that all pure strategies have payoff less than or equal to 0.5 +under (ˆxs,y∗s), so this

strategy profile is a (0.5 +)-WSNE.

We will assume from now on thatvr> vc. If the algorithm reaches Step 3,

then the row player uses the algorithm of Lemma 1 to communicatex∗_sto the column player. The column player then computes a best response j∗s against

(12)

x∗_s, and checks whether the payoff of j∗_s against x∗_s is less than or equal to 0.5 +. If so, then the players output (x∗s,j∗s), which is a (0.5 +)-WSNE

Otherwise, we claim that there is a pure strategy i∈ supp(x∗_s) such that (i,j∗_s) is a pure Nash equilibrium. This can be shown by observing that the expected payoff ofx∗_sagainstj∗_sis at least 0.5−, while the expected payoff of

j∗s againstx∗s is at least 0.5 +. Repeating the proof of Lemma 5 using these

inequalities then shows that the pure Nash equilibrium does indeed exist. Since supp(x∗s) has logarithmic size, the row player can simply transmit to the

column player all payoffsRij∗

s for whichi∈supp(x

∗

s), and the column player

can then send back a row corresponding to a pure Nash equilibrium. In conclusion, we have shown the following theorem.

Theorem 6 For every win-lose game and >0, there is a randomized expected-polynomial-time algorithm that finds a (0.5 +)-WSNE with Olog22n

com-munication.

4 An algorithm for finding a 0.6528-WSNE

In this section, we show how Algorithm 1 can be modified to produce a 0. 6528-WSNE.

Outline.Our algorithm replaces Step 4 of Algorithm 1 with a more involved procedure. This procedure uses two techniques, that both find an-WSNE with

< 2

3. Firstly, we attempt to turn (x

∗_,_j∗_{) into a WSNE by}_{shifting probabilities.}

Observe that, sincej∗is a best response, the column player has a pure strategy regret of 0 in (x∗,j∗). On the other hand, we have no guarantees about the row player sincex∗ might place a small amount of probability strategies with payoff strictly less than 1₃. However, since x∗ achieves a high expected payoff (due to Step 2,) it cannot place too much probability on these low payoff strategies. Thus, the idea is to shift the probability thatx∗ assigns to entries ofj∗ with payoff less than or equal to 1₃ to entries with payoff strictly greater than 1₃, and thus ensure that the row player’s pure strategy regret is below 2₃. Of course, this procedure will increase the pure strategy regret of the column player, but if it is also below 2₃ once all probability has been shifted, then we have found an-WSNE with < 2₃.

If shifting probabilities fails to find an-WSNE with < 2₃, then we show that the game contains amatching penniessub-game. More precisely, we show that there exists a column j0, and rowsb andssuch that the 2×2 sub-game induced byj∗, j0,b, andshas the following form:

(13)

@ @ I II b s j∗ j0 ≈1 0 0 ≈1 0 ≈1 ≈1 0

Thus, if both players play uniformly over their respective pair of strategies, thenj∗_,_j0_,_b_{, and}_s_{with have payoff}_≈₀_._{5, and so this yields an}_{-WSNE with} < 2

3.

The algorithm. We now formalize this approach, and show that it always finds an-WSNE with < 2₃. In order to quantify the precisethat we obtain, we parametrise the algorithm by a variablez, which we constrain to be in the range 0≤z < ₂₄1. With the exception of the matching pennies step, all other steps of the algorithm will return a (2₃−z)-WSNE, while the matching pennies step will return a (1

2+f(z))-WSNE for some increasing functionf. Optimizing

the trade off between 2 3−zand

1

2+f(z) then allows us to determine the quality

of WSNE found by our algorithm.

The algorithm is displayed as Algorithm 3. Steps 1, 2, and 3 are versions of the corresponding steps from Algorithm 1, which have been adapted to produce a (2₃−z)-WSNE. Step 4 implements the probability shifting procedure, while Step 5 finds a matching pennies sub-game.

Observe that the probabilities used in xmpandymp are only well defined whenz≤ 1

24, because we have that 1−15z

2−39z >1 wheneverz >

1

24, which explains

(14)

Algorithm 3

– Letvrbe the value secured byx∗in (R,−R), and letvcbe the value secured by ˆyin (−C, C). Without loss of generality assume thatvc≤vr.

2. Ifvr≤2/3−z, then return (ˆx,y∗). 3. If for allj∈[n] it holds thatCT

jx∗≤2/3−z, then return (x∗,y∗). 4. Otherwise:

– Letj∗_{be a pure best response against}_x∗_{. Define:}

S:={i∈supp(x∗) :Rij∗<1/3 +z} B:= supp(x∗)\S

– Define the strategyxBas follows. For eachi∈[n] we have:

(xB)i= ( ₁ Pr(B)·x ∗ i ifi∈B 0 otherwise. – If (xBT_·_C₎ j∗≥1 3+z, then return (xB,j ∗_). 5. Otherwise:

– Letj0 be a pure best response againstxB.

– If there exists ani∈supp(x∗) such that (i,j∗) or (i,j0) is a pure (2₃−z )-WSNE, then return it.

– Find a rowb∈Bsuch thatRbj∗>1−_1+3z18z andCbj0>1− 18z

1+3z.

– Find a rows∈Ssuch thatCsj∗>1− 27z

1+3z andRsj0 >1− 27z

1+3z.

– Define the row player strategyxmp and the column player strategyymp

as follows. For eachi∈[n] we have:

xmp_i=      1−24z 2−39z ifi=b, 1−15z 2−39z ifi=s, 0 otherwise. ymp_i=      1−24z 2−39z ifi=j ∗_, 1−15z 2−39z ifi=j 0_, 0 otherwise. – Return (xmp,ymp).

The correctness of Step 5.This step of the algorithm relies on the existence of the rowsb ands, which is shown in the following lemma.

Lemma 7 Suppose that the following conditions hold:

1. x∗ has payoff at least 2₃−z against j∗. 2. j∗ has payoff at least 2₃−z against x∗. 3. x∗ has payoff at least 2₃−z against j0.

4. Neitherj∗ norj0 contains a pure (2₃−z)-WSNE(i, j) withi∈supp(x∗). Then, both of the following are true:

– There exists a row b∈B such thatRbj∗>1− 18z

1+3z andCbj0 >1− 18z

1+3z.

– There exists a row s∈S such that Csj∗>1− 27z

1+3z andRsj0 >1−

27z

(15)

We defer the proof of this lemma to Section 4.1. Observe that the precon-ditions are indeed true if the Algorithm reaches Step 5. The first and third conditions hold because, due to Step 2, we know thatx∗is a min-max strategy that secures payoff at least vr > 2₃ −z. The second condition holds because

Step 3 ensures that the column player’s best response payoff is at least 2 3−z.

The fourth condition holds because Step 5 explicitly checks for these pure strategy profiles.

Quality of approximation.We now analyse the quality of WSNE our algo-rithm produces. Steps 2, 3, 4, 5 each return a strategy profile. Observe that Steps 2 and 3 are the same as the respective steps in the base algorithm, but with the threshold changed from 2₃ to 2₃ −z. Hence, we can use the same reasoning as we gave for the base algorithm to argue that these steps return (2₃−z)-WSNE. We now consider the other two steps.

The algorithm returns in Step 4: By definition all rows r ∈ B satisfy

Rij∗ ≥ 1

3 +z, so since supp(xB) ⊆ B, the pure strategy regret of the

row player can be at most 1−(1₃+z) = 2₃−z. For the same reason, since (xBT _·_C₎

j∗ ≥ 1

3+z holds, the pure strategy regret of the column player

can also be at 2₃−z. Thus, the profile (xB,j∗) is a (2₃−z)-WSNE.

The algorithm returns in Step 5: Since Rbj∗ >1− 18z

1+3z, the payoff of b

when the column player playsymp is at least: 1−24z 2−39z · 1− 18z 1 + 3z =1−39z+ 360z 2 2−33z−117z2 Similarly, since Rsj0 > 1− 27z

1+3z, the payoff ofs when the column player

plays ympis at least: 1−15z 2−39z · 1− 27z 1 + 3z =1−39z+ 360z 2 2−33z−117z2

In the same way, one can show that the payoffs of j∗ and j0 are also

1−39z+360z2

2−33z−117z2 when the row player playsxmp. Thus, we have that (xmp,ymp)

is a (1−1−39z+360z2

2−33z−117z2)-WSNE.

To find the optimal value forz, we need to find the largest value ofzfor which the following inequality holds.

1−1−39z+ 360z 2

2−33z−117z2 ≤

2 3 −z.

Setting the inequality to an equality and rearranging gives us a cubic poly-nomial equation: 117z3_{+ 432}_z2₋₃₀_z₊ 1

3 = 0. Since the discriminant of

this polynomial is positive, this polynomial has three real roots, which can be found via the trigonometric method. Only one of these roots lies in the range

(16)

0≤z < ₂₄1, which is the following: z= 1 117 √ 3 √2434√3 cos ₁ 3 arctan ₃₉ 240073 √ 9749√3 −3√2434 sin ₁ 3 arctan ₃₉ 240073 √ 9749√3 −48√3 ! .

Thus, we get z ≈0.013906376, and we have found an algorithm that always produces a 0.6528-WSNE. So we have the following theorem.

Theorem 8 There is a polynomial time algorithm that, given a bimatrix game, finds a0.6528-WSNE.

4.1 Proof of Lemma 7

In this section we assume that Steps 1 through 4 of our algorithm did not return a (2

3−z)-WSNE, and that neither j

∗ _nor_j0 _{contained a pure (}2 3−z

)-WSNE. We show that, under these assumptions, the rowsbandsrequired by Step 5 do indeed exist.

Probability bounds.We begin by proving bounds on the amount of proba-bility thatx∗ can place on S andB. The following lemma uses the fact that

x∗ secures an expected payoff of at least 2₃−zto give an upper bound on the amount of probability that x∗ can place on S. To simplify notation, we use Pr(B) to denote the probability assigned byx∗ to the rows in B, and we use Pr(S) to denote the probability assigned byx∗ to the rows inS.

Lemma 9 Pr(S)≤1+3z

2−3z.

Proof We will prove our claim using Markov’s inequality. Consider the random variable T = 1−Rij∗ where i is sampled from x∗. Since by our assumption the expected payoff of the row player is greater than 2/3−z we get that

E(T)≤1/3 +z. If we apply Markov’s inequality we get

P r(T≥ 2 3 −z)≤ E(T) 2 3 −z ≤ 1 + 3z 2−3z

which is the claimed result. ut

Next we show an upper bound on Pr(B). Here we use the fact thatj∗does not contain a (2₃−z)-WSNE to argue that all column player payoffs inB are smaller than 1₃+z. Since we know that the payoff ofj∗ againstx∗ is at least

2

3−z, this can be used to prove a upper bound on the amount of probability

thatx∗ assigns to B.

Lemma 10 Pr(B)≤ 1+3z

(17)

Proof Since there is noi∈supp(x∗) such that (i,j∗) is a pure (2₃−z)-WSNE, and since each rowi∈BsatisfiesRij∗≥1

3+z, we must have thatCij∗< 1 3+z

for every i∈B. By assumption we know thatC_jT∗x∗ >2/3−z. So, we have the following inequality:

2 3 −z <Pr(B)·( 1 3+z) + 1−Pr(B) ·1.

Solving this inequality for Pr(B) gives the desired result. ut Payoff inequalities forj∗.We now show properties about the average payoff obtained from the rows in B and S. Recall that xB was defined in Step 4 of our algorithm, and that it denotes the normalization of the probability mass assigned by x∗ to rows in B. The following lemma shows that the expected payoff to the row player in the strategy profile (xB,j∗) is close to 1.

Lemma 11 We have(xBT _·_R₎

j∗>1−6z

1+3z.

Proof By definition we have that: (xBT ·R)j∗ = 1 Pr(B)· X i∈B x∗_i ·Rij∗. (1)

We begin by deriving a lower bound forP i∈Bx

∗

i ·Rij∗. Using the fact that

x∗ secures an expected payoff of at least 2/3−zagainstj∗and then applying the bound from Lemma 9 gives:

2 3 −z < X i∈B x∗_i ·Rij∗+ ( 1 3 +z)·Pr(S) ≤X i∈B x∗_i ·Rij∗+ ( 1 3 +z)· 1 + 3z 2−3z.

Hence we can conclude that:

X i∈B x∗_i ·Rij∗ > 2 3 −z− 1 3 · (1 + 3z)2 2−3z =1−6z 2−3z.

Substituting this into Equation (1), along with the upper bound on Pr(B) from Lemma 10, allows us to conclude that:

(xBT·R)j∗≥ 2−3z 1 + 3z · X i∈B x∗_i ·Rij∗ > 2−3z 1 + 3z · 1−6z 2−3z = 1−6z 1 + 3z. u t

(18)

Next we would like to show a similar bound on the expected payoff to the column player of the rows inS. To do this, we definexSto be the normalisation of the probability mass that x∗ assigns to the rows in S. More formally, for eachi∈[n], we define: (xS)i= ( ₁ Pr(S)·x ∗ i ifi∈S 0 otherwise.

The next lemma shows that the expected payoff to the column player in the profile (xS,j∗) is close to 1.

Lemma 12 We have(xST _·_C₎

j∗> 1−6z

1+3z.

Proof By definition we have that: (xST·C)j∗= 1 Pr(S)· X i∈S x∗i ·Cij∗. (2)

We begin by deriving a lower bound for P i∈Sx

∗

i ·Cij∗. By assumption, we know that CT

j∗x∗ > 2/3−z. Moreover, since j∗ does not contain a (2₃ −z )-WSNE we have that all rowsiinB satisfyCij∗ ≤1/3 +z. If we combine these facts that with Lemma 10 we obtain:

2 3−z < X i∈S x∗_i ·Cij∗+ ( 1 3 +z)·Pr(B) ≤X i∈S x∗_i ·Cij∗+ ( 1 3 +z)· 1 + 3z 2−3z.

Hence we can conclude that:

X i∈S x∗i ·Cij∗> 2 3−z− 1 3· (1 + 3z)2 2−3z = 1−6z 2−3z.

Substituting this into Equation (2), along with the upper bound on Pr(S) from Lemma 10, allows us to conclude that:

(xBT·R)j∗≥ 2−3z 1 + 3z · X i∈B x∗_i ·Rij∗ > 2−3z 1 + 3z · 1−6z 2−3z = 1−6z 1 + 3z. u t

(19)

Payoff inequalities forj0.We now want to prove similar inequalities for the column j0. The next lemma shows that the expected payoff for the column player in the profile (xB,j0) is close to 1. This is achieved by first showing a lower bound on the payoff to the column player in the profile (xB,j∗), and then using the fact thatj∗is not a (2

3−z)-best-response againstxB, and that

j0 _{is a best response against}_x B.

Lemma 13 We have(xBT ·C)j0 >1−6z

1+3z.

Proof We first establish a lower bound on (xBT·C)j∗. By assumption, we know thatC_jT∗x∗>2/3−z. Using this fact, along with the bounds from Lemmas 9 and 10 gives: 2 3−z <Pr(B)·(xB T _·_C₎ j∗+ Pr(S)·1 ≤ 1 + 3z 2−3z·(xB T _·_C₎ j∗+ 1 + 3z 2−3z.

Solving this inequality for (xBT _·_C₎

j∗ yields: (xBT ·C)j∗> 1 3 · 1−21z+ 9z2 1 + 3z .

Now we can prove the lower bound on (xBT·C)j0. Sincej∗ is not a (2

3−z

)-best-response againstxB, and sincej0is a best response againstxBwe obtain: (xBT·C)j0 >(x_BT ·C)_j∗+ 2 3−z (xBT·C)j0 > 1 3· 1−21z+ 9z2 1 + 3z + 2 3−z = 1−6z 1 + 3z. u t

The only remaining inequality that we require is a lower bound on the expected payoff to the row player in the profile (xS,j0). However, before we can do this, we must first prove an upper bound on the expected payoff to the row player in (xB,j0), which we do in the following lemma. Here we first prove that most of the probability mass of xB is placed on rows i in which

Cij0 >1

3+z, which when combined with the fact that there is noi∈supp(x ∗₎

such that (i,j0) is a pure (2₃ −z)-WSNE, is sufficient to provide an upper bound. Lemma 14 We have(xBT _·_R₎ j0 <1 3· 1+33z+9z2 1+3z .

Proof We begin by proving an upper bound on the amount of probability mass assigned by xB to rows i with Cij0 < 1

3 +z. Let T = 1−Cij0 be a random variable where the rowiis sampled according to xB. Lemma 13 implies that:

E[T]<1−1−6z

1 + 3z =

9z

(20)

Observe that Pr(T ≥1−(1₃+z)) = Pr(T ≥ 2

3−z) is equal to the amount of

probability that xB assigns to rowsi with Cij0 < 1

3 +z. Applying Markov’s

inequality then establishes our bound. Pr(T ≥ 2 3−z)≤ 9z 1+3z 2 3−z .

So, if p = ₍₁₊₃_z₎₍₂9z_/₃₋_z₎ then we know that at least 1−p probability is assigned byxBto rowsi such thatCij0 ≥ 1

3+z. Since we have assumed that

there is no i ∈ supp(x∗_{) such that (}_i,_j0_{) is a pure (}2

3 −z)-WSNE, we know

that any such rowimust satisfyRij0 <1

3+z. Hence, we obtain the following

bound: (xBT ·R)j0 <(1−p)·( 1 3 +z) +p = 1 3 · 1 + 33z+ 9z2 1 + 3z . u t

Finally, we show that the expected payoff to the row player in the profile (xS,j0) is close to 1. Here we use the fact thatx∗is a min-max strategy along with the bound from Lemma 14 to prove our lower bound.

Lemma 15 We have(xST _·_R₎

j0> 1−15z

1+3z.

Proof Sincex∗ is a min-max strategy that secures a value strictly larger than

2 3−z, we have: 2 3 −z <Pr(B)·(xB T ·R)j0+ Pr(S)·(x_ST·R)_j0. Substituting the bounds from Lemmas 9, 10, and 14 then gives:

2 3 −z < 1 + 3z 2−3z · 1 3 · 1 + 33z+ 9z2 1 + 3z + 1 + 3z 2−3z ·(xS T_·_R₎ j0. Solving for (xST_·_R₎

j0 then yields the desired result. ut

Finding rowsbandu.So far, we have shown that the expected payoff to the row player in (xB,j∗) is close to 1, and that the expected payoff to the column

player in (xB,j0) is close to 1. We now show that there exists a row b ∈ B

such thatRbj∗ is close to 1, andC_bj0 is close to 1, and that there exists a row

s∈ S in whichCsj∗ and R_sj0 are both close to 1. The following lemma uses Markov’s inequality to show a pair of probability bounds that will be critical in showing the existence ofb.

Lemma 16 We have:

– xBassigns strictly more than0.5probability to rowsiwithRij∗>1− 18z

1+3z.

– xBassigns strictly more than0.5probability to rowsiwithCij0>1− 18z

(21)

Proof We begin with the first case. Consider the random variableT= 1−Rij∗ whereiis sampled fromxB. By Lemma 11, we have that:

E[T]<1−1−6z 1 + 3z = 9z 1 + 3z. We have thatT ≥ 18z 1+3z wheneverRij∗≤1− 18z

1+3z, so we can apply Markov’s

inequality to obtain: Pr(T ≥ 18z 1 + 3z)< 9z 1+3z 18z 1+3z = 0.5.

The proof of the second case is identical to the proof given above, but uses the (identical) bound from Lemma 13. ut

The next lemma uses the same techniques to prove a pair of probability bounds that will be used to prove the existence ofs.

Lemma 17 We have:

– xS assigns strictly more than 1₃ probability to rowsi withCij∗>1− 27z

1+3z.

– xS assigns strictly more than 2₃ probability to rowsi withRij0 >1− 27z

1+3z.

Proof We begin with the first claim. Consider the random variableT = 1−Cij∗ whereiis sampled fromxS. By Lemma 12, we have that:

E[T]<1−1−6z 1 + 3z = 9z 1 + 3z. We have thatT ≥ 27z 1+3z wheneverCij∗≤1− 27z

inequality to obtain: Pr(T ≥ 27z 1 + 3z)< 9z 1+3z 27z 1+3z =1 3.

We now move on to the second claim. Consider the random variable T = 1−Rij∗ whereiis sampled fromx_B. By Lemma 15, we have that:

E[T]<1−1−15z 1 + 3z = 18z 1 + 3z. We have thatT ≥ 27z 1+3z wheneverRij∗≤1− 27z

inequality to obtain: Pr(T ≥ 27z 1 + 3z)< 18z 1+3z 27z 1+3z =2 3. u t

Finally, we can formally prove the existence of b ands, which completes the proof of correctness for our algorithm.

(22)

Proof (of Lemma 7)We begin by proving the first claim. If we sample a row

b randomly from xB, then Lemma 16 implies that probability that Rbj∗ ≤ 1− 18z

1+3z is strictly less than 0.5 and that the probability thatCbj0 ≤1−

18z

1+3z

is strictly less than 0.5. Hence, by the union bound, the probability that at least one of these events occurs is strictly less than 1. So, there is a positive probability that neither of the events occurs, which implies that there exists at least one rowb that satisfies the desired properties.

The second claim is proved using exactly the same technique, but using the bounds from Lemma 17, again observing that the probability that a randomly sampled row fromxSsatisfies the desired properties with positive probability.

u t

5 Communication complexity

We claim that Algorithm 3 can be adapted for the limited communication set-ting. We make the following modification to our algorithm. After computing

x∗,y∗,xˆ,and ˆy, we then use Lemma 1 to construct and communicate the sam-pled strategiesx∗_s,y∗_s,xˆs,and ˆys. These strategies are communicated between

the two players using 4·(logn)2 _{bits of communication, and the players also}

exchangevr= (x∗s)T ·Ry∗s andvc = ˆxTsCyˆs using logn rounds of

communi-cation. The algorithm then continues as before, except the sampled strategies are used in place of their non-sampled counterparts. Finally, in Steps 2 and 3, we test against the threshold 2₃−z+instead of 2₃−z.

Observe that, when sampled strategies are used, all steps of the algorithm can be carried out in at most (logn)2 _{communication. In particular, to}

imple-ment Step 4, the column player can communicate j∗ to the row player, and then the row player can communicate Rij∗ for all rows i ∈ supp(x∗_s) using (logn)2 _{bits of communication, which allows the column player to determine}

j0. Oncej0 has been determined, there are only 2·lognpayoffs in each matrix that are relevant to the algorithm (the payoffs in rowsi∈supp(x∗_s) in columns

j∗ andj0,) and so the two players can communicate all of these payoffs to each other, and then no further communication is necessary.

Now, we must argue that this modified algorithm is correct. Firstly, we argue that if the modified algorithm reaches Step 5, then the rows b and s

exist. To do this, we observe that the required preconditions of Lemma 7 are satisfied byx∗_s,j∗, andj0. Condition 2 holds because the modified Step 3 ensures that the column player’s best response payoff is at least 2₃−z+ > 2₃−z, while Condition 4 is ensured by the explicit check in Step 5. For Conditions 1 and 3, we use the fact that (x∗,y∗) is an -Nash equilibrium of the zero-sum game (R,−R). The following lemma shows that any approximate Nash equilibrium of a zero-sum game behaves like an approximate min-max strategy.

Lemma 18 If(x,y)is an-NE of a zero-sum game(M,−M), then for every strategy y0 we have:

(23)

Proof Letv=xT·M·ybe the payoff to the row player under (x,y). Suppose, for the sake of contradiction, that there exists a column player strategy y0

such that:

xT·M·y0< v−.

Since the game is zero-sum, this implies that the column player’s payoff under (x,y0_{) is strictly larger than}₋_v₊_{, which then directly implies that the best}

response payoff for the column player againstxis strictly larger than−v+. However, since the column player’s expected payoff under (x,y) is −v, this then implies that (x,y) is not an-NE, which provides our contradiction. ut

Since Step 2 implies that the row player’s payoff in (x∗,y∗) is at least

2

3−z+, Lemma 18 implies thatx

∗ _{secures a payoff of} 2

3−z no matter what

strategy the column player plays, which then implies that Conditions 1 and 3 of Lemma 7 hold.

Finally, we argue that the algorithm finds a (0.6528 +). The modified Steps 2 and 3 now return a (2₃ −z+)-WSNE, whereas the approximation guarantees of the other steps are unchanged. Thus, we our original analysis gives the following theorem.

Theorem 19 For every >0, there is a randomized expected-polynomial-time algorithm that usesOlog22n

communication and finds a(0.6528 +)-WSNE.

6 Query complexity

We now show that Algorithm 3 can be implemented in a payoff-query efficient manner. Let >0 be a positive constant. We now outline the changes needed in the algorithm.

– In Step 1 we use the algorithm of Lemma 2 to find ₂-NEs of (R,−R), and (C,−C). We denote the mixed strategies found as (x∗a,y∗a) and (ˆxa,yˆa),

respectively, and we use these strategies in place of their original counter-parts throughout the rest of the algorithm. We also compute ₂-approximate payoff vectors for each of these strategies, and use them whenever we need to know the payoff of a particular strategy under one of these strategies. In particular, we set vr to be the payoff ofx∗a according to the approximate

payoff vector ofy_a∗, and we setvc to be the payoff of ˆya according to the

approximate payoff vector for ˆxa.

– In Steps 2 and 3 we test against the threshold of 2₃−z+ rather than

2 3−z.

– In Step 4 we selectj∗to be the column that is maximal in the approximate payoff vector against x∗_a. We then spend n payoff queries to query every row in column j∗, which allow us to proceed with the rest of this step as before.

– In Step 5 we use the algorithm of Lemma 2 to find an approximate payoff vectorvfor the column player againstxB. We then selectj0to be a column

(24)

that maximizesv, and then spend n payoff queries to query every row in

j∗, which allows us to proceed with the rest of this step as before.

Observe that the query complexity of the algorithm is O(n·log2 n), where

the dominating term arises due to the use of the algorithm from Lemma 2 to approximate solutions to the zero-sum games.

We now argue that this modified algorithm produces a (0.6528 +)-WSNE. Firstly, we need to reestablish the existence of the rowsbandsused in Step 5. To do this, we observe that the preconditions of Lemma 7 hold forx∗_a. We start with Conditions 1 and 3. Note that the payoff for the row player under (x∗_a,y∗_a) is at least vr−₂ (since vr was estimated with approximate payoff vectors,)

and Step 2 ensures that vr > 2₃−z+. Hence, we can apply Lemma 18 to

argue thatx∗_asecures payoff at least 2₃−zagainst every strategy of the column player, which proves that Conditions 1 and 3 hold. Condition 2 holds because the check in Step 3, ensures that the approximate payoff ofj∗ againstx∗is at least 2₃−z+, and therefore the actual payoff of of j∗ againstx∗ is at least

2 3−z+

2. Finally, Condition 4 holds because pure strategy profiles of this form

are explicitly checked for in Step 5.

Steps 2 and 3 in the modified algorithm return a (2

3−z+)-WSNE, while

the other steps provided the same approximation guarantee as the original algorithm. So, we can reuse the analysis for the original algorithm to prove the following theorem.

Theorem 20 There is a randomized algorithm that, with high probability, finds a(0.6528 +)-WSNE using O(n·log2 n)payoff queries.

7 A communication-efficient algorithm for finding a (0.382 +)-NE

We study the following algorithm.

Algorithm 4

the value secured by ˆy in (−C, C). Without loss of generality assume thatvc ≤vr. – Ifvr≤ 3− √ 5 2 , return (ˆx,y ∗_). 2. Otherwise:

– Letj be a best response for the column player againstx∗_. – Letrbe a best response for the row player againstj.

– Define the strategy profilex0= 1 2−vr ·x

∗₊1−vr

2−vr ·r.

– Return (x0, j).

We show that this algorithm always produces a 3−

√ 5

2 -NE. We start by

(25)

the row player can achieve againsty∗ isvr, so the row player’s regret can be

at mostvr. Similarly, the maximum payoff that the column layer can achieve

against ˆxis vc ≤vr, so the column player’s regret can be at most vr. Step 1

only returns a strategy profile in the case wherevr≤3−

√ 5

2 , so this step always

produces a 3−

√ 5 2 -NE.

To analyse the quality of approximate equilibrium found by Step 2, we use the following Lemma.

Lemma 21 The strategy profile(x0, j)is a 1₂−₋vr_vr-NE.

Proof We start by analysing the regret of the row player. By definition, row

ris a best response against columnj. So, the regret of the row player can be expressed as: Rrj−(x0·R)j=Rrj− 1 2−vr ·((x∗)T ·R)j− 1−vr 2−vr ·Rrj ≤ 1 2−vr ·Rrj− 1 2−vr ·vr ≤ 1 2−vr ·1− 1 2−vr ·vr = 1−vr 2−vr ,

where in the first inequality we use the fact that x∗ is a min-max strategy that secures payoff at least vr, and the second inequality uses the fact that

Rrj ≤1.

We now analyse the regret of the column player. Letcbe a best response for the column player againstx0. The regret of the column player can be expressed as: ((x0)T ·C)c−((x0)T ·C)j = 1 2−vr ·((x∗)T ·C)c+ 1−vr 2−vr ·Crc− 1 2−vr ·((x∗)T ·C)x∗_j− 1−vr 2−vr ·Crj ≤1−vr 2−vr ·Crc− 1−vr 2−vr ·Crj ≤1−vr 2−vr .

The first inequality holds sincej is a best response againstx∗, and therefore ((x∗)T ·C)c≤(x∗)T·C)j, and the second inequality holds sinceCrc≤1 and

Crj ≥ 0. Thus, we have shown that both players have regret at most 1₂−₋v_vrr

under (x0_{, j}_{), and therefore (}_x0_{, j}_{) is a} 1−vr

2−vr-NE. ut

Step 2 is only triggered in the case where vr > 3−

√ 5

2 , and we have that 1−vr 2−vr = 3−√5 2 when vr = 3−√5 2 . Since 1−vr 2−vr decreases as vr increases, we

therefore have that Step 2 always produces a 3−

√ 5

2 -NE. This completes the

(26)

Communication complexity. We now argue that, for every > 0 the al-gorithm can be used to find a 3−√5

2 +

-NE using Olog22n

rounds of communication.

We begin by considering Step 1. Obviously, the zero-sum games can be solved by the two players independently without any communication. Then, the players exchange vr and vc using O(logn) rounds of communication. If

both vr and vc are smaller than 3−

√ 5

2 , then the algorithm from Lemma 1 is

applied to communicate ˆxs to the row player, andys∗ to the column player.

Since the payoffs under the sampled strategies are withinof the originals, we have that (ˆxs,y∗s) is a 3−√5 2 + -NE.

If the algorithm reaches Step 2, then the row player uses the algorithm of Lemma 1 to communicate x∗_s to the column player. The column player then computes a best responsejsagainstx∗s, and uses logncommunication rounds

to transmit it to the row player. The row player then computes a best response

rsagainstjs, then computes:x0s=

1 2−vr ·x

∗

s+

1−vr

2−vr·r, and the players output

(x0_s, js). To see that this produces a

3−√5

2 +

-NE, observe thatx∗_ssecures a payoff of at leastvr−for the row player, and repeating the proof of Lemma 21

with this weaker inequality gives that this strategy profile is a1−vr

2−vr +

-NE. Therefore, we have shown the following theorem.

Theorem 22 For every >0, there is a randomized expected-polynomial-time algorithm that usesOlog22n

communication and finds a 3−

√ 5 2 + -NE. 8 Lower bounds

Lower bound against Algorithm 3. We start by showing a tight lower bound against the base algorithm (Algorithm 1). Consider the following game, which we will denote as (R, C).

@ @ @ I II t b l r 0 1 1 0.9 2 3 0.9 0 2₃

In the game (R,−R), the unique Nash equilibrium is (b, l), which can be found by iterated elimination of dominated strategies. Similarly, in the game (−C, C), the unique Nash equilibrium is (b, r), which can again be found by elimination of dominated strategies. Note, however, that the game itself does not contain any dominated strategies. Hence, we havevR=vC= 2₃, so Step 2

(27)

is triggered, and the resulting strategy profile is (b, l). Under this strategy profile, the column player receives payoff 0, while the best response payoff to the column player is 2₃, so this is a 2₃-WSNE and no better.

This lower bound can be modified to work against Algorithm 3 changing both 2

3 payoffs to 0.6528. Then, by the same reasoning given above, Step 2 is

triggered, and the algorithm returns a 0.6528-WSNE.

Lower bounds against a better implementation.The lower bound given above exploits the fact that, as specified, our algorithm will return a 0. 6528-WSNE as soon as it finds one. In fact, given the game above, the algorithm will return in Step 2, and the vast majority of the algorithm will never run.

A more thoughtful implementation of the algorithm would be to run all of the steps, and then return the best WSNE found during this process. In particular:

– Step 2 should check the quality of WSNE provided by (ˆx,y∗).

– Step 3 should check the quality of WSNE provided by (x∗,y∗) and (ˆx,yˆ).

– If Pr(B)>0, then Step 4 should check the quality of the WSNE provided by (xB,j∗).

– If Pr(B)>0, then Step 5 should:

– find the best pure WSNE that can be found in the support ofxBand the columnj∗.

– determine if there are two rows b and s that satisfy the payoff con-straints listed in Step 5, and if so, find the quality of the WSNE pro-vided by the specified strategy profile.

The algorithm should then return the best WSNE found by one of these steps. Note that Steps 4 and 5 cannot be applied to all games, since their precondition is only guaranteed to hold in games where the previous steps failed to find a good WSNE.

A recent paper of Fearnley, Igwe, and Savani successfully applied genetic algorithms to find lower bounds against algorithms that compute approximate Nash equilibria [12]. Using the same approach, along with some hand tweaking of the output, we found the following game.

R=      0.35056 0.99 1 0 0 0 1 0.25 0.3      C=      1 0 0.3 0 0 0 0.3 0 1      In this game:

– The row player can secure payoff 0.64944 in (R,−R), and we have that

x∗ = (0.53979,0,0.46021)T _and_y∗_{= (0}_.₅₃₂₅₉_,₀_.₄₆₇₄₁_,_0).

– The column player can secure payoff 0 in (−C, C). Note that the column player can actually play any strategy to secure this payoff, but the row player must play the middle row. We will focus on the case where ˆyplays the middle column, and ˆxplays the middle row.

(28)

It can be verified that under these strategy profiles, all steps of the algorithm produce no better than a 0.64944-WSNE. We remark that this lower bound is within 0.0034 of the theoretical upper bound.

9 Conclusion

In this paper, we have developed a new technique for computing approximate Nash equilibria, and approximate well-supported Nash equilibria. This new technique has allowed us to improve upon the best known results in multiple settings. For well-supported Nash equilibria, we have presented a polynomial-time algorithm for finding a 0.6528-WSNE, and we have shown how to imple-ment it in a communication efficient manner, and a query efficient manner, improving upon the best known results in those settings. For approximate Nash equilibria, our techniques obtain a 0.382-NE, and again we showed how this can be carried out in a communication efficient manner, improving the best known results in that setting.

References

1. Y. Babichenko and A. Rubinstein. Communication complexity of approximate Nash equilibria. CoRR, abs/1608.06580, 2016. URLhttp://arxiv.org/abs/1608.06580. 2. H. Bosse, J. Byrka, and E. Markakis. New algorithms for approximate Nash equilibria

in bimatrix games. Theoretical Computer Science, 411(1):164–173, 2010.

3. X. Chen, X. Deng, and S.-H. Teng. Settling the complexity of computing two-player Nash equilibria.Journal of the ACM, 56(3):14:1–14:57, 2009.

4. V. Conitzer and T. Sandholm. Communication complexity as a lower bound for learning in games. InProc. of ICML, pages 185–192, 2004.

5. A. Czumaj, M. Fasoulakis, and M. Jurdzinski. Approximate well-supported Nash equi-libria in symmetric bimatrix games. InProc. of SAGT, pages 244–254, 2014.

6. C. Daskalakis, A. Mehta, and C. H. Papadimitriou. Progress in approximate Nash equilibria. InProc. of EC, pages 355–358, 2007.

7. C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.

8. C. Daskalakis, A. Mehta, and C. H. Papadimitriou. A note on approximate Nash equilibria. Theoretical Computer Science, 410(17):1581–1588, 2009.

9. J. Fearnley and R. Savani. Finding approximate Nash equilibria of bimatrix games via payoff queries. InProc. of EC, pages 657–674, 2014.

10. J. Fearnley, P. W. Goldberg, R. Savani, and T. B. Sørensen. Approximate well-supported Nash equilibria below two-thirds. InProc. of SAGT, pages 108–119, 2012. To appear inAlgorithmica.

11. J. Fearnley, M. Gairing, P. W. Goldberg, and R. Savani. Learning equilibria of games via payoff queries. InProc. of EC, pages 397–414, 2013.

12. J. Fearnley, T. P. Igwe, and R. Savani. An empirical study of finding approximate equilibria in bimatrix games. InProc. of SEA, pages 339–351, 2015.

13. P. W. Goldberg and A. Pastink. On the communication complexity of approximate Nash equilibria.Games and Economic Behavior, 85:19–31, 2014.

14. P. W. Goldberg and A. Roth. Bounds for the query complexity of approximate equilib-ria. InProc. of EC, pages 639–656, 2014.

15. S. Hart and Y. Mansour. How long to equilibrium? the communication complexity of uncoupled equilibrium procedures. Games and Economic Behavior, 69(1):107–126, 2010.

(29)

16. S. C. Kontogiannis and P. G. Spirakis. Well supported approximate equilibria in bima-trix games.Algorithmica, 57(4):653–667, 2010.

17. R. J. Lipton, E. Markakis, and A. Mehta. Playing large games using simple strategies. InProc. of EC, pages 36–41, 2003.

18. J. Nash. Non-cooperative games. The Annals of Mathematics, 54(2):286–295, 1951. 19. A. Rubinstein. Settling the complexity of computing approximate two-player Nash

equilibria. InProc. of FOCS, pages 258–265, 2016.

20. H. Tsaknakis and P. G. Spirakis. An optimization approach for approximate Nash equilibria. Internet Mathematics, 5(4):365–382, 2008.