The Study of Strategic form of Repeated Game Theory

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 4, April 2016)

295

The Study of Strategic form of Repeated Game Theory

V.Vinoba

1

, P. Hema

2

1

K.N. Government Arts College, Tamil Nadu, India

2_{Assistant Professor, RMKCET}

Abstract— In this paper, we consider a strategic form behavioural game theory. It is standard multi agent settings to assume that agents will adopt Nash equilibrium strategies. This paper gives a brief overview of game theory. Therefore in the first section we want to outline what game theory generally is and where it is applied. In the next section, we introduce some of the most important terms of Repeated game theory such as strategic form (or) normal form games, semi-extensive form and Nash equilibrium.

Keywords— Game theory, Semi-extensive form strategic form, Nash equilibrium

I. INTRODUCTION

A game is a description of strategic interaction that includes the constraints on the actions that the players can take and the player‟s interests, but does not specify the actions that the players do take. A solution is a systematic description of the outcomes that may emerge in a family of games. Game theory suggests reasonable solutions for classed of games and examines their properties. Nash equilibrium is one of the most basic concepts in game theory.

II. GAME THEORY

A game is made of three basic components: a set of players, a set of actions, and a set of preferences. These are collectively known as the rules of the game, and the modeller‟s objective is to describe a situation in terms of the rules of a game so as to explain what will happen in that situation. Trying to maximize their payoffs, the

player will devise plane known as strategies that pick

actions depending on the information that has arrived at each moment. The combination of strategies chosen by each player is known as the equilibrium. Given an equilibrium, the modeller can see what actions come out of the conjunction of all the players‟ plans, and this tells him the outcome of the game.

The number of players will be denoted by n . Let us

label the players with the integers 1 to n , and denote the

set of players by

N





1,2,...

n



. We assume

throughout that there are atleast two players, that is

n 

2

. There are three main mathematical models or forms

used in the study of games,(i) the extensive form (ii) the strategic or normal form and (iii) the coalitional form.

In the strategic form, many of the details of the game such as position and move are lost; the main concepts are those of a strategy and a payoff. In the strategic form, each player chooses a strategy from a set of possible strategies

We denote the strategy set or action space of player i

by Each player considers all the other players and their possible strategies, and then chooses a specific strategy from his strategy set. All players make such a choice simultaneously, the choices are revealed and the game ends with each player receiving some payoff. Each player‟s choice may influence the final outcome for all players. We model the payoffs as taking as numerical values. The mathematical and philosophical justification behind the assumption that each player can replace such

payoffs with numerical values is said to be utility theory.

We therefore assume that each player receives a numerical payoff that depends on the actions chosen by all the players.

The strategic form of a game is defined by the three objects:

(i) the set,

N





1,2,...

n



of players.

(ii)the sequence A1,A2,…….An of strategy set of players ,and

(iii)the sequence f1(a1,…..an)……..fn(a1……an) of real valued

payoff functions of the players.

A game in strategic form is said to be zero-sum if the sum of the payoffs to the players is zero no matter what actions are by the players.

(2)

International Journal of Emerging Technology and Advanced Engineering

296

We will develop a useful formalism, the semi-extensive form, for analysing repeated games, i.e. those which are repetitions of the same one-shot game (called the stage game). We will describe strategies for such repeated games as sequences of history-dependent stage-game strategies. The payoffs to the players in this repeated game will be functions of the stage-game payoffs. We will define the concept of Nash equilibrium and—after identifying the sub games in this formalism—the concept of a sub game-perfect equilibrium for a repeated game.

We will prove one result which applies to both finitely and infinitely repeated games, which makes it easy to identify at least some of the Nash equilibrium of a repeated game: any sequence of stage-game Nash equilibrium is a sub game-perfect repeated-game equilibrium.

We will then focus our attention on finitely-repeated games. We will exploit the existence of a final period— which wouldn‟t exist in an infinitely-repeated game—to establish a necessary condition for a repeated-game strategy profile to be a Nash equilibrium: the last period‟s play must be Nash on the equilibrium path.

We then turn to sub game perfection within the finitely-repeated game context and prove two theorems about sub game-perfect equilibrium. The first is a necessity theorem which is much stronger than the one we obtained for Nash equilibrium: the last period‟s play must be Nash even off the equilibrium path. The second pertains to the special case in which the stage game has a unique Nash equilibrium payoff.1 In this case the sub game-perfect equilibrium of the repeated game are any period-by-period repetitions of the stage-game Nash equilibrium.

Consider a game G (which we‟ll call the stage game or

the constituent game). As usual we let the player set be I={1,…,n}. In our present repeated-game context it will be clarifying to refer to a player‟s stage game choices as

actions rather than strategies. (We‟ll reserve “strategy” for

choices in the repeated game.) So each player has a

pure-action space Ai. The space of action profiles is

i i

i

A

X

A



_ . Each player has a von Neumann-Morgenstern

utility function defined over the outcomes of G,

R

A

g

i

:



. Here we use “U” for the payoff to the

entire repeated game.)

Let G be played several times (perhaps an infinite number of times) and award each player a payoff which is the sum of the payoffs she got in each period from playing G. Then this sequence of stage games is itself a game: a

repeated game or a super game.

Two statements are implicit when we say that in each

period we‟re playing the same stage game: a For each

player the set of actions available to her in any period in the game G is the same regardless of which period it is and regardless of what actions have taken place in the past.

The payoffs to the players from the stage game in any period depend only on the action profile for G which was played in that period, and this stage-game payoff to a player for a given action profile for G is independent of which period it is played.

Statements a and b are for our repeated game is

stationary (or, alternatively, independent of time and

history). This does not mean the actions themselves must

be chosen independently of time or history.

Then we interpret a and b above as saying that the payoff matrix is the same in every period.

We make the typical “observable action” or “standard signaling” assumption that the play which occurred in each repetition of the stage game is revealed to all the players before the next repetition. Therefore even if the stage game is one of imperfect information (as it is in simultaneous-move games)—so that during the stage game one of the players doesn‟t know what the others are doing/have done

that period—each player does learn what the others did

before another round is played. This allows subsequent choices to be conditioned on the past actions of other players.

IV. HISTORY AND PERIOD-TSTAGE-GAME STRATEGIES

We let the first period be labeled t=0. The last period, if

one exists, is period T, so we have a total of T+1 periods in

our game. We allow the case where T=



, i.e. we can have

an infinitely repeated game.

We‟ll refer to the action of the stage game G which player i executes in period t as

a

_it

The action profile played in period t is just the n-tuple of

individuals‟ stage-game actions



t



n t

t

a



₁

,

₂

,

...

…………. (1)

We want to be able to condition the players‟ stage-game action choices in later periods upon actions taken earlier by

other players. To do this we need the concept of a history: a

description of all the actions taken up through the previous

period. We define the history at time t to be



0 1 1



...

,





t

a

(3)

International Journal of Emerging Technology and Advanced Engineering

297

In other words, the history at time t specifies which stage-game action profile (i.e., combination of individual stage-game actions) was played in each previous period.

Note that the specification of

h

tincludes within it a

specification of all previous histories



h

0

,

h

1

,

...

h

t1



.

For example, the history

h

t is just the concatenation of

1  t

h

with the action profile

a

t1; i.e.

h

t=(

h

t1 ,

a

t1 ) The history of the entire

game is

h

T1





a

0

,

a

1

,

...

a

T



Note also that the set of all possible histories

h

t at time t is just

A

X

A

t

t 1

0 



……… (3)

To condition our strategies on past events, then, is to

make them functions of history. So we write player i‟s

period-t stage-game strategy as the function

s

_it, where

t i

a

=

s

_it(

h

t) is the stage-game action would play in period

t if the previous play had followed the history

h

t. A

player‟s stage-game action in any period and after any history must be drawn from her action space for that period, but because the game is stationary her stage-game

action space Ai does not change with time. Therefore we

write:



 





i

t t i t t

A

h

s

A

h

t

I

i









(

)

. Alternatively, we

can write





i



I

 



t

s

_it:

A

_it



A

_i

The period-t stage game strategy profile

s

tis



t



n t t t

s



₁

,

₂

...

This profile can be described by

s

_it:

s

t

;

A

t



A

i.e)



(

),

(

)...

.

(

)



)

(

,

t t ₁t t ₂t t t_n t

t t

h

s

h

s

h

s

h

s

A

h







V. STRATEGIES IN THE REPEATED GAME

So far we have been referring to stage-game strategies for a particular period. Now we can write, using these stage-game entities as building blocks, a specification for a

player‟s strategy for the repeated game. We write player i’s

strategy for the repeated game as

……….(5)

i.e. a (T+1)-tuple of history-contingent player-i

stage-game strategies. Each

s

_it takes a history as

h

t



A

t its

argument. The space Si of player-i repeated-game strategies

is the set of all such (T+1)-tuples of player-i stage game strategies

s

_it:

A

t



A

.

We can write a strategy profile s for the repeated game in two ways. We can write it as the n- tuple

Profile of players repeated game strategies



s

n



s



1, 2

,...

…………(6) as defined in (5).

Alternatively, we can write the repeated-game strategy profile s as



T



s



0

,

1

,...

………….(7)

i.e., as a collection of stage-game strategy profiles, one for each period, as defined in (4).

VI. PLAYING THE REPEATED GAME

In (6) and (7) we have written a repeated-game strategy profile as a collection of functions of histories. It is interesting to note that a repeated-game strategy profile is not itself a function of histories. The reason is simple: Once every one of the individual players‟ repeated-game strategies

s

_i is specified ( or alternatively once each time

period‟s history-contingent stage-game strategy profile t

s

is specified), the sequence of actions, and therefore the history of the entire game, is determined. Perhaps this will become clearer when we see below explicitly how the game is played out.

Let‟s see how this repeated game is played out once

every player has specified her repeated-game strategy si. It

is more convenient at this point to view this repeated-game strategy profile as expressed in (7), i.e. as a sequence of

T+1 history-dependent stage-game strategy profiles. When

the game starts, there is no past play, so the history

h

0 is

degenerate: Every player executes her

a

_i0



s

_i0stage-game

strategy from (5). This zero-th period play generates the history

h

1



(

a

0

)

, where

a

0





a

₁0_,

a

₂0

,...

a

_n0



. This history is then revealed to the players so that they can

condition their period-1 play upon the period-0 play. Each

(4)

International Journal of Emerging Technology and Advanced Engineering

298

Consequently, in the t=1 stage game the stage-game

strategy profile

(

)



11

(

1

),...

1

(

1

)



1 1

h

s

h

s

h

s

a

n

t



is

played. In order to form the updated history this stage-game strategy profile is then concatenated onto the

previous history:

h

2





a

0

,

a

1



. This new history is

revealed to all the players and they each then choose their

period-2 stage-game strategy

s

_i2

(

h

2

)

. And so it goes.…

We say that

h

T1 is the path generated by the

repeated-game strategy profile s.

VII. PAYOFF IN REPEATED GAME

To complete our description of the semi-extensive form we need to define the payoffs for this repeated game.

Consider some repeated-game strategy profile s and some

time period t. As the game is played out according to this

repeated-game strategy profile a history ht of actions up to

this period is compiled. The history-contingent stage-game strategy profile to be played this period is

s

t

(

h

t

)

The

payoff to player i from the period-t stage game when each

player executes her component of the stage-game strategy profile

s

t

(

h

t

)

is

g

_i

(

s

t

(

h

t

))

We define the payoff to

player i for the repeated game, when the repeated game

strategy profile s [as expressed in (7)] is played, to be









T

t

t t i t s

i

g

s

h

u

0

)

(



…….(8)

Where



is the common discount factor. Note the

appearance of the histories ht in the summand of (8).

These are the histories which are generated sequentially as the game is played. I.e

h

t





h

t1

,

h

t

(

s

t1

)



.

8. Equilibrium in Repeated Games Nash Equilibrium in the Repeated Game

Now that we know what strategies and payoffs are in the repeated game, we are in position to make clear what a Nash equilibrium is in this semi-extensive form of a repeated game. Now we will use the expression of a repeated-game strategy profile as a collection of players‟ repeated-game strategies, as in (6). As usual, we say that a

strategy profile s for the repeated game is a Nash

equilibrium if each player‟s part of

s

—i.e., the strategy

i

s

i, as given by (5)—is a best response given that the other

players are playing their parts of

s

.

More formally, we say that the repeated-game strategy

profile

s

is a Nash equilibrium if for all players i



i i



i S s

s

u

s

i i

 



arg

max

,

…………..(9) Let T1

h

be the

history generated by the repeated-game strategy profile

s

;

i.e. let

h

T1 be the path associated with

s

. If

s

is a

Nash-equilibrium strategy profile, then

h

T1 is its

associated equilibrium path.

VIII. CONCLUSION

Note from (8) that

u

_i

(

s

)

depends only upon

stage-game strategy profiles played along the equilibrium path. Therefore in a Nash equilibrium each player‟s repeated-game strategy need only be optimal along the equilibrium path.

REFERENCES

[1] F. Akyldiz, W. Su, Y. Sankarasubramaniam and E. Cayirci, ―Wireless sensor networks: a survey,‖ Computer Networks, vol.38, pp:393-422

[2] I. F. Akyildiz and I. H. Kasimoglu, ―Wireless sensor and actor networks:research challenges,‖ to be published Ad Hoc Networks, 2004.

[3] T. Basar and G. T. Olsder, Dynamic Non cooperative Game Theory, 2nd_{Ed., Society of Industrial and Applied Mathematics, 1999.}

[4] S. Buchegger and J. L. Boudec, ―Performance Analysis of the CONFIDANT Protocol Cooperation Of Nodes-Fairness In Dynamic Ad-hoc NeTworks,‖ MobiHoc, 2002.

[5] N. Bulusu, D. Estrin, L. Girod and J. Heidemann, ―Scalable coordination for wireless sensor networks: self-configuring localization systems,‖International Symposium on Communication Theory and Applications [6] Dan Li, Kerry D. Wong, Yu Hen Hu, Akbar M. Sayeed."Detection,

Classification, and Tracking of Targets,"IEEE Signal Processing Magazine, Volume 19, pp: 17-19, (2002).

[7] M. Felegyhazi, L. Buttyan and J. P. Hubaux, ―Equilibrium Analysis of Packet Forwarding Strategies in Wireless Ad Hoc Networks the Static Case,‖ Proceedings of Personal Wireless Communications (PWC „03),

[8] Wei-Peng Chen, Hou, J.C, Lui Sha. "Dynamicclustering for acoustic target tracking in wireless sensornetworks", IEEE Transactions on Mobile Computing,Volume 3, pp: 258-271, (2004).

[9] L. Buttyan and J.P. Hubaux, ―Nuglets: AVirtual Currency to Stimulate Cooperation in Self organized Mobile Ad-Hoc Networks,‖Technical Report DSC/2001/001,Swiss Fed. Inst. Of Technology, Jan. 2001 [10] S. Marti, T. Giuli, K. Lai and M. Baker, ―Mitigating routing

misbehavior in mobile ad hoc networks,‖ Proceedings of Mobicom 2000, Boston, MA,USA, August 2000.

(5)

International Journal of Emerging Technology and Advanced Engineering

299

[12] A. Perrig, R. Szewczyk, V. Wen, D. Culler and J. D. Tygar, ―SPINS: Security Protocols for Sensor Networks,‖ MobiCom, pp: 189-199, July 2001.

[13] V. Srinivasan, P. Nuggehalli, C. Chiasserini, and R. ao,―Cooperation in Wireless Ad Hoc Networks,‖ Proc. IEEEINFOCOM, vol. 2, pp. 808-817, Apr. 2003

[14] E. Shih, S. Cho, N. Ickes, R. Min, A. Sinha, A. Wang and A.Chandrakasan, ―Physical layer driven protocol and algorithm design for energy-efficient wireless sensor networks,‖ Proceedings of ACMMobiCom, Italy, pp:272-286, July 2001

[15] Nuggehalli, C. F. Chiasserini and R. R. Rao, ―Cooperation in Wireless Ad Hoc Networks,‖ INFOCOM,