Nash Memory - MoL 2017 22: Turing Learning with Nash Memory

4.1.1 Memory Methods and Solution Concepts

In coevolution, each player manages a population of candidate solutions. Using the “methods of fitness assignment” some better fit ones are “remembered” and remain in the population while the rest are “forgotten” before the next generation. The problem of designing a heuristic “memory mechanism” is to prevent forgetting good candidate solutions with certain traits. The contribution of a trait to the fitness may be highly contingent upon the context of evaluation. Even when candidate solutions with/without certain traits are equally fit, the candidates are at risk of drifting due to sampling error in the population dynamic leading to the lose of traits [8]. A solution concept specifies precisely a subset of candidate solutions qualify as optimal solutions to be kept in the population [25]. This is a more general definition of “methods of fitness assignment” introduced in Section 3.1.2. The study of solution concepts is important for evolutionary mechanisms that deal with candidate solutions with many traits while only maintain a limit amount of solutions in the memory [9]. As we described in Section 3.2, the ING game is a coevolutionary domain that is permeated by intransitive cycles. For a coevolutionary domain where a strategy consists ofnintegers, there arentraits to be “watched” by the solution concept.

Most solution concepts in the literature are instances of a general Best-of-Generation (BOG) model where the most fit individuals over the past few generations are retained in the memory. In

contrast to this approach, Stanley and Miikkulainen proposeddominance tournament(DT) [29]. The principle is to add the most fit individual of the generation only if it beats all the individuals in the memory. This prevents the case of intransitive relations as those of the ING game for example.

4.1.2 Introduction to Nash Memory

Nash Memory [8] is a solution concept introduced by Ficici and Pollack as a memory mechanism for coevolution. It was designed to better deal with cases where one or more previously acquired traits are only to be needed later in a coevolutionary process. This memory mechanism maintains two set of solutions and uses a game-theoretical method for the selection of candidates to better balance between different traits in coevolution. Before introducing the memory mechanism in detail, some concepts need to be clarified:

SupportSup(s) The support of a mixed strategysis the set of pure strategiessiwith non-zero probability: Sup(s) ={si|Pr(si|s)>0}. For a set of mixed strategies, we defineSup(S) = S

iSup(si),si∈S1.

Security setSec(s) The security set of strategy ofs is a set of pure strategies against which s has an expected non-negative payoff. In other words,Sec(s) ={si|E(s,si)≥0}. Similarly, Sec(S) =T

iSec(si),si∈S2.

Domination A strategysadominates a strategysbif for any strategy s,E(sa,s)≥E(sb,s) and there exists a strategys0such thatE(sa,s0)>E(sb,s0).

Support-secure If the security set contains the support set, then the strategy issupport-secure.

Most population management in coevolutionary computation maintain a population of candidate solutions according to some explicit organizing principle discovered over generations. In contrast, Nash Memory uses an implicit measurement [8] using pairs of mixed strategies retrieved from Nash equilibria. These pairs are recommendations to the players on which action to take. The following (non-zero-sum) example gives a taste of this approach. The game is an extended version of the battle-of-sex game described in Section 2.2.3. Now there are four options for the man: a) watch a football match, b) watch an opera, c) prepare for an exam and d) write a thesis.

Mman=     3 0 0 2 1 2 8 −1    

1_{The support of a strategy}_s_{was named}_C₍_s₎_{in the original paper [8].} 2_{The security set of a strategy was named}_S₍_s₎_{in the original paper [8].}

4.1 Nash Memory 33 Mwoman=     2 0 0 3 −1 −6 −8 4    

Classical explicit methods would calculate the fitness of each strategy by summing the payoff uniformly. That is, the fitness values of the four strategies are 3, 2, 3, 7 respectively. However, Nash Memory gives a different solution concept. It computes the Nash equilibria first:

1. The man plays b) and c) with a probability of 0.625 and 0.375 respectively while the woman plays b).

2. The man plays c) and d) with a probability of 0.705 and 0.294 respectively while the woman goes to the opera with a chance of 0.7.

3. Both go to the opera (i.e. the man plays strategy b).

Imagine the man has to remove an option for next week (i.e. forget a strategy in the memory). According to the fitness using explicit method, he should forget about going to the opera next week. However, a) is not participating in any Nash equilibria, thus using the approach of Nash memory, the man will not consider watching a football match next week instead.

4.1.3 The Operation of Nash Memory

The Nash memory mechanism consists of two mutually exclusive sets of pure strategies:N andM [8].N is defined as the support of the Nash strategysNthat is at least secure against the elements ofN and M (Sec(∫_N)⊇N ∪M). The objective of this set is to represent a mixed strategy that is secure against the candidate solutions discovered thus far. The security set is expected to increase monotonically as the search progresses, thereby forming a better and better approximation of strategies for the game. On the other hand,M plays the role of a memory or an accumulator. It contains pure strategies that are not currently inN , but were in the past. They may be inN again in the future. By the definition as in [8],N is unbounded in size andM is defined as a finite set. The size limit ofM isc, which is known as the capacity of the memory. For simplicity, in this section we assume there is only one Nash equilibrium.

Initialization

BothN andM are initialized as an empty set. Let the first setT to be delivered by the searching heuristics3. We initializeN so thatSup(N)⊆T andT ⊆Sec(N).

Data: The size limit ofM,bm

Result: The best strategy of at the end of iteration

InitializeN andM as described in Section 4.2. whiletermination condition is not satisfied do

Obtain a set of new strategiesT;

Evaluate strategies inT against the current Nash equilibria and store the winnerW; ObtainU=N ∪M∪W;

Evaluate the candidate strategies inU and obtain the payoff values; Compute the Nash equilibria and updateN;

UpdateM as described in Section 4.1.3 usingbm. end

returnthe best fit pure strategies inN .

Algorithm 4:Nash Memory.

U =W ∪N ∪M T

W N

N 0 _M0

Figure 4.1: The Updating ofN andM in Nash Memory.

Updating_N and_M

We compute a setW ={t∈T|E(t,sN)>0}as the winners against the current best strategysN. Using Linear Programming, we can obtain a new Nash strategys0_Nout of the setU =W ∪N ∪M4_.

Next,N0:=C(s0_N)andM0:=U −N0. This process guarantees thatW ⊆Sec(N ). Figure 4.1 gives an illustration of this process. Note that the updatedM0may have a capacity larger thanbm. A number of policies can be used to reduce the size ofM0:

• remove those with the least fitness value calculated in the classical way. • remove those that participated the least over the past generations. • remove items at random from M then those released fromN .

4_{To verify that a mixed strategy is Nash, Ficici, etc. proposed a simple testing method [8]. In this project, we use} external game solvers so no verification is necessary.

In document MoL 2017 22: Turing Learning with Nash Memory (Page 31-35)