Games on graphs - Zero-sum games - Games for optimal controller synthesis on hybrid automata

2.2 Zero-sum games

2.2.2 Games on graphs

We will be considering turn-based games on doubly-weighted labelled transition systems. A labelled transition system is a (possibly infinite) graph with labels assigned to edges. In this context, vertices are referred to as states, edges are referred to as transitions, and the set of edges is referred to as the labelled transition relation. The two weight functions assign real numbers to every transition, and are referred to as the price and reward functions.

To define turn-based games on labelled transition systems, we need a notion of a game graph. We obtain a game graph from a doubly-weighted transition system by introducing a partitioning of its set of states between the two players, Min and Max. Intuitively the game is played by moving a token along the transitions, from one state to another. Every move incurs a price and a reward. The owner of the state decides along which transition to move the token.

A game graph can be seen as an alternative to the strategic form represen- tation of a zero-sum game. In this context a payoff function determines how the prices (rewards) of individual moves contribute to the overall value of a particular play.

Transition systems. We will first introduce a doubly-weighted labelled transition system, then we will explain how to define a game graph, and finally we will define the notion of a game.

Definition 2.12 (Doubly-weighed labeled transition system). A doubly-weighted

labelled transition system, or simply a transition system, T = hS,Λ,−→, π, ρi consists of the following components:

State space. A (possibly infinite) set S.

Transition relation. A (possibly infinite) set of transition labels Λ and a ternary transition relation −→ ⊆ S×Λ×S. An element of −→ will be referred to as a transition, and we will write s−→λ s0 to denote the transition (s, λ, s0)∈ −→. Note that−→ induces a binary relation −→ ⊆λ S×S for every label λ∈Λ.

Weight functions Two weight functions π, ρ :S×Λ×S → R that assign a real number to every transition. The functionsπ andρ are referred to as the price and reward functions.

We say that the transition systemT is deterministic if, for everyλ∈Λ, the binary relation −→λ is in fact a partial function −→λ :S→S.

Remark 2.13. In reachability-price games the reward information is not taken into account, and hence theρ function will be omitted from the specification in this context.

We now briefly recall the standard notions related to transition systems and their executions, also called as runs. Given a states, a run of the transition system

fromsis a (possibly infinite) sequence of transitionsω =s₀−→λ1 s₁ −→λ2 s₂· · ·, where s = s₀. If two runs ω = s₀ −→ · · ·λ1 −→λk sk and ω0 = s0₀

λ01

−→ · · · are such that

sk = s0₀ then, ωω0 denotes the run s0 λ1

−→ · · · λk

−→ sk

λ0

−→ · · ·. Given a finite run

ω, Len(ω) will denote its length, i.e., the total number of transitions, Last(ω) will

n, where n6 Len(ω). The set of all runs of T is denoted by Runs. The set of all

finite runs of T is denoted by Runsfin. Note that Runsfin ⊆ Runs. We will also

write Runs(s) (Runs_fin(s)) to denote the set of all runs (all finite runs) starting in

a states. Whenever the transition systemT is not clear from the context, we will

use a superscriptT, e.g., we will write RunsT

fin to denote the set of finite runs of the

transition system T.

Given a finite run ω ∈Runs_fin, its price is the accumulated price of all the

transitions, i.e.:

Price(ω) =

Len(_Xω) i=1

π(λi),

and its reward is the accumulated reward of all of its transitions, i.e.: Reward(ω) =

Len(_Xω) i=1

ρ(λi).

Games. We will now define games on doubly-weighted labelled transition systems, together with the standard concepts such as strategies and game value. In order to define the notion of a zero-sum game on a weighted labelled transition system we need to define a game graph, which is obtained by partitioning of the state space between the two players, Min and Max. The following definition formalises this concept.

Definition 2.14(Doubly-weighted game graph). A doubly-weighted game-graph on a transition systemΓ =T,SMin,SMax consists of

• a transition systemT, and

• a partition of its state spaceS=SMin∪SMax.

A particular game is defined when a payoff function P is provided. The

following definition formalises the concept of a game.

Definition 2.15(Doubly-weighted game). Adoubly-weighted gameis a pairhΓ,Pi

• Γ is a doubly-weighted game graph, and

• P: Runs→Ris a payoff function.

We will abuse notation and when the payoff function is clear from the context we will write Γ to denote the game. In this context, a transition s−→λ s0 will often

be referred to as amove.

Assumption 2.16. When considering games on transition systems, we will be considering deterministic weighted labelled transition systems.

The results regarding games on finite transition systems presented in this section will refer to deterministic transition systems. Games on doubly-weighted transition systems will be used to define games on single-clock timed automata (introduced in Chapter 3), which yield deterministic transition systems. Hybrid automata with strong resets (introduced in Chapter 5) yield transition systems that are potentially non-deterministic, but in that case the games are defined differently without referring to the concepts presented here. In this context, the restriction made in Assumption 2.16 does not apply. We place this restriction because it allows for simpler definitions, e.g., we rely on this restriction when defining the notion of a run induced by the players’ strategies.

We now introduce the standard game-related definitions. A strategy of player Min is a partial functionµ: Runs_fin→Λ such that for every finite run ω ending in

a state of the player Min,Last(ω)−−−→µ(ω) s0. We say thatµ, a strategy of player Min,

is positional if for every two runsω, ω0∈Runs_fin, ifω andω0 end in the same state

then µ(ω) = µ(ω0), i.e., it can be viewed as a function µ : S → Λ. We will write

ΣMin _{for the set of all strategies of player Min, and Π}Min _{for the subset of all of her}

positional strategies. The sets of strategies of player Max is defined analogously. Given a run ω0 ending in a state s₀, and a pair of strategies σ ∈ΣMin and χ∈ΣMax, we write Run(ω0, µ, χ) to denote the unique run ω∈Runs(s₀) satisfying:

ifsi λi+1

transition systems only) ofωthenµ(ω0ωi) =λiifsi∈SMin, otherwise,χ(ω0ωi) =λi.

Note that ifµand χare positional then Run(ω0, µ, χ) = Run(Last(ω0), µ, χ).

Determinacy. We conclude by introducing the concept of game value. Firstly, we introduce the notion of the value of a game from a state. We do this by defining a zero-sum game in strategic form (see Section 2.2.1), which is specific for the given state. Secondly, we introduce the notion of game value as a function that assigns the value of the state-specific game to every state. Note that the game value is in fact a partial function from states to reals.

Given a states, letas=

ΣMin_,_ΣMax_,_P_(Run(_s,_·_,_·₎₎_{be a zero-sum game in}

strategic form. We say that the game Γ isdetermined for statesifas is determined,

i.e., Val∗(as) =Val∗(as), and positionally determined if:

Val(as) = inf µ∈_ΠMinVal µ₍ as) = sup χ∈_ΠMax Valχ(as).

We say that a game Γ is determined if it is determined from every state. For simplicity, we will write Val(s) rather than Val(as) in this context. Hence, Val can

be viewed as a partial functionS→R.

Definition 2.17(Game value decidability). We will say that a gameΓis decidable if the partial function Val:S→R has decidable value.

In the definition above we have emphasised that Val is a partial function

because Γ does not have to be determined from every state. Games on finite graphs

We will now focus on games on finite doubly-weighted game graphs. A finite doubly- weighted game graph is a game graph of a finite doubly-weighted labelled transition system with the property that every transition label uniquely determines the transition. This means that, given a state, choosing a transition amounts to choosing

the next state. Such transition systems are clearly deterministic. In this context we will be using the term “edge” rather than “transition”.

We start by introducing the definition of a finite doubly-weighted game graph, and showing how this definition relates to that of a game graph on a doubly-weighted labelled transition system. Then we proceed to discuss two techniques used in the context of games on finite doubly-weighed game graphs, which are relevant to this thesis.

Definition 2.18 (Finite doubly-weighted game graph). A finite double-weighted

game graph, or simply a finite game graph is given by Γ = SMin,SMax,E, π, ρ, where:

• SMin∩SMax=∅ are the sets of states for players Min and Max respectively,

• SMin∪SMax,E is a finite directed graph,

• π:E→R is the price function,

• ρ:E→R is the reward function.

Remark 2.19. IfΓwas to be viewed as a doubly-weighed labelled transition system, then the transition relation could be defined ass (s,s

0₎

−−−→s0 for every(s, s0)∈E.

The notion of a run is similar to the notion of a run of a doubly-weighted transition system, with the only difference that there are no labels (they are omitted as they are redundant). We write s→ s0 to denote a move, wheree = (s, s0) ∈E.

The price of the move isπ(e) and the reward is ρ(e). A run is a (possibly infinite)

sequence of movesω=s₀ →s₁ →s₂· · ·.

The definitions of players’ strategies differ slightly. In this context a strategy, rather than choosing a label of the next transition, chooses the next edge (transition) to traverse. Thus, a strategy of player Min is a function µ: Runs_fin → E, that is

defined for all finite runs ending in a states ∈ SMin. A strategy of player Max is

naturally follow. Note that this definition is in fact equivalent to the original, but better reflects the syntax of finite doubly-weighted game graphs (Remark 2.19) Optimality equations. We want to capture global optimality of player’s strategies using local conditions. For that purpose, we introduce the notion ofoptimality equations [Bel57; Ber01], sometimes referred to asBellman equations. A solution to

these equations can be seen as a witness for existence of positional optimal strategies for both players.

Optimality equations will be discussed in more detail, later in this section, in the context of finite average-price-per-reward games and finite reachability-price games. Below, we present only the basic idea behind the concept. Moreover, we explain how optimality equations are being used to obtain the results presented in this thesis.

We say that a functionf :S→ R is a solution to the optimality equations

for a game Γ, denoted byf |= Opt(Γ), if the following equations are satisfied: f(s) = min

(s,s0₎_∈_Eexpr(s, s 0

, f(s0)),

for every states∈SMin, and:

f(s) = max

(s,s0₎_∈_Eexpr(s, s 0

, f(s0)),

for every states∈SMax, where expr(s, s0, f(s0)) is an algebraic expression involving s,s0, and f(s0).

The actual form of expr(s, s0, f(s0)) depends on the game Γ, and needs to be

chosen in such a way that we can formulate and prove a theorem of the following form:

Given a finite game Γ, if a functionf :S→R is such thatf |= Opt(Γ)

The proof of the theorem is typically established by proving the following two inequalities:

f(s) > Val∗(s)−ε, f(s) 6 Val∗(s) +ε,

for everys ∈ S and ε > 0. To prove the first inequality we construct a positional

strategyµε for player Min in such a way that for every counter strategy χwe have

P(Run(s, µε, χ)) 6 f(s) +ε for every s ∈ S. The construction is based on the

following principle: in every state s ∈ SMin player Min can choose a state s0 such

that:

f(s)>expr(s, s0, f(s0))−ε,

and if s ∈ SMax, the above inequality holds for all states s0, such that (s, s0) ∈ E.

Similarly, the second inequality is proved by constructing an appropriate strategy for player Max.

The strategy constructed in the proofs of the inequalities is an ε-optimal

positional strategy for a given player. Notice that a solution to the optimality equations establishes local optimality constraints, and that the strategy is constructed in such a way that these constraints are being satisfied. In this sense, the solution to the optimality equations can be seen as a witness for the existence ofε-optimal

positional strategies for both players.

To summarise, optimality equations will be used as a characterisation of game values, and their solutions as a witness for existence of ε-optimal positional

strategies. In Chapter 4 we will use this characterisation to prove that certain games are indeed determined, and in Chapter 5 we will use it to show that a certain reduction is correct. In the latter case, we will be dealing with a complex and a simple game. In both cases we will establish an optimality equation characterisation of game values, and then prove that the solutions to those two sets of optimality

equations are related to one another.

Strategy improvement. In the previous paragraph we have introduced the concept of optimality equations, and explained how to prove that if a solution to those equations exists then the game is determined and both players haveε-optimal posi-

tional strategies. We will use the concept ofstrategy improvement to prove that the

solutions indeed exist. Strategy improvement is a technique that is used to compute

ε-optimal positional strategies by applying local improvements.

We introduce the basic idea behind strategy improvement and its connection to optimality equations. Furthermore, we explain how it will be used in this thesis. Strategy improvement will be discussed in more detail in Chapter 4, in the context of average-price-per-reward games.

The algorithm relies on two assumptions. First, that in a single player game we can compute an positional optimal strategy for the distinguished player, i.e., given a positional strategy of one player, we can compute a positional best response strategy for the other player. Second, that when two positional strategies are fixed, and hence we are dealing with a zero-player game, a solution to the optimality equations exists.

Given a finite game Γ, a strategy improvement algorithm proceeds according to the following scheme:

1. Arbitrarily choose a strategyµ∈ΠMin.

2. Compute the best response strategyχ∈ΠMax.

3. Compute a solutionf to the optimality equations Opt(Γµχ). Positional strate-

giesµandχinduce a sub-game Γµχ, a game in which every state has only one

4. If for somes∈SMin there existss0 such that (s, s0)∈E and f(s)>expr(s, s0, f(s0)),

then set µ(s) = s0 and go to Step 2, otherwise terminate and output the

functionsf,µ, and χ.

Upon termination, the algorithm outputs the solution to the optimality equations Opt(Γ), i.e., the functionf and theε-optimal positional strategiesµandχfor

players Min and Max, respectively.

In Step 4, we are myopically improving the strategy of player Min. We are using the solution to the optimality equations for the zero-player game as an indi- cator of where the improvement should be made. The strategyχis a best response

strategy of player Max toµ, a strategy of player Min. Therefore, the solution f to

Opt(Γµχ) is such that for everys∈SMax, we havef(s)>expr(s, χ(s), f(χ(s)))−ε.

If this was not true, we would have a contradiction to the fact that χ is a best

response, sincef is a witness to the existence of anε-optimal positional strategy.

If no improvement is possible thenµ, and hence χ, areε-optimal. Moreover,

they are exactly the kind of strategies that would have been construed in the proof of the theorem that establishes that optimality equations characterise the game values, as discussed in the previous paragraph.

Given a strategy µ of player Min and a strategy χ, the best response of

player Max, the solution to Opt(Γµχ) can be seen as a valuation of strategy µ in

every state. The sets of positional strategies for both players are finite. To prove termination of the strategy improvement algorithm it suffices to prove that after every iteration of the strategy improvement algorithm the new strategy of player Min in every state has a valuation that is not higher than the valuation of the previous strategy. Formally, to establish termination and correctness of a strategy improvement algorithm, it suffices to prove a theorem of the following type:

Letµ be a positional strategy of player Min, let χ be a positional best

response strategy of player Max, and letf be a solution to the optimality

equations for the zero-player game induced by those strategies. If µ0 is

an improvement of µ, as described in Step 4, χ0 is the respective best

response, and f0 is the respective solution to the optimality equations,

then the following holds for alls∈S:

f0(s)6f(s)

and the inequality is strict for somes∈S.

In Chapter 4 strategy improvement, together with optimality equations, will be used to prove determinacy of average-price-per-reward games on an extension of the finite doubly-weighted game graphs.

Average-price-per-reward games

In this section we introduce average-price-per-reward games on doubly-weighted labelled transition systems. We will focus on a variant of those games, referred to as finite average-price-per-reward games, which are played on finite game graphs. We will provide a characterisation of game values using a set of equations, referred to as average-price-per-reward optimality equations. The aim of this section is to provide

In document Games for optimal controller synthesis on hybrid automata (Page 51-76)