Introduction - 5 Reachability-Time Games - Competitive optimisation on timed automata

5 Reachability-Time Games

5.1. Introduction

5.1.1. Definition

DEFINITION5.1.1(Reachability-Time Games). Areachability-time gameon a timed automaton is a tuple(Γ,RT_Min,RT_Max), where:

– Γ= (_T,LMin,LMax)is a timed game automaton such thatT = (L,C,S,A,E,δ,ξ,F)

is a timed automaton,LMinis the set of locations controlled by player Min, andLMax

is the set of locations controlled by player Max;

– RT_Min : Runs → R andRT_Max : Runs → Rare payoff functions, which for every run of the timed automaton return the amount the player Min loses and the player Max wins, respectively. The functionsRT_MinandRT_Maxare defined in the following way: for a runr =hs0,(t1,a1),s1,(t2,a2), . . .i ∈Runs we have

RT_Min(r) =RT_Max(r) =

(

∑Stopi=1(r)ti ifStop(r)<∞

∞ otherwise.

Since the functions RT_Min and RT_Max are equal, we write RT : Runs _→ R for this function.

We define QMin = {(`,ν) ∈ Q : ` ∈ LMin}, QMax = Q\QMin, SMin = S∩QMin, SMax =S\SMin,RMin={[s] : s∈ QMin}, andRMax =R \ RMin.

The strategies of player Min and player Max are defined as usual (see Section 3.4.2). We writeΣMinfor the set of strategies for player Min, and we writeΣMaxfor the set of strategies

for player Max. We write ΠMin and ΠMax for the sets of positional strategies for player

Min and for player Max, respectively. Reachability-time payoff functionRT: Runs→R

naturally gives rise to the function RT : S×ΣMin×ΣMax → R in the following way.

For strategies µ ∈ Σ_Min and χ ∈ Σ_Max of respective players and a state s ∈ S we have

RT(s,µ,χ) =RT(Run(s,µ,χ)).

5.1.2. Value of Reachability-Time Game

If player Min uses the strategyµ ∈ ΣMin and player Max uses the strategyχ ∈ ΣMax then

player Min loses the value RT(s,µ,χ) and player Max wins the value RT(s,µ,χ). In a

reachability-time game player Min is interested in minimising the value she loses and player Max is interested in maximising the value he wins. We define theupper valueVal(s)and the

lower valueVal(s)of the reachability-time game at the states∈ Sby

Val(s) = inf µ∈ΣMin

sup χ∈ΣMax

RT(Run(s,µ,χ)), andVal(s) = sup

χ∈ΣMax

inf µ∈ΣMin

RT(Run(s,µ,χ)).

From Proposition 1.2.4 the inequality Val(s) ≤ Val(s) always holds. A reachability-time game isdeterminedif for everys ∈S, the lower and upper values atsare equal to each other; then we say that thevalueVal(s)exists andVal(s) =Val(s) =Val(s).

For strategiesµ∈ΣMinandχ∈ ΣMax, we define Valµ₍_s_{) =} _sup

χ∈ΣMin

RT(Run(s,µ,χ)), andValχ(s) = inf µ∈ΣMin

RT(Run(s,µ,χ)).

We say that a strategy µ ∈ ΣMin or χ ∈ ΣMax, respectively, is optimal if for every s ∈ S,

we haveValµ₍_s_{) =} _Val₍_s₎_or _Val

χ(s) = Val(s), respectively. For an ε > 0, we say that a strategyµ∈ ΣMinor χ∈ ΣMax isε-optimalif for everys ∈ S, we haveValµ(s) ≤ Val(s) +ε

orVal_χ(s)_≥Val(s)₋ε, respectively. Note that if a game is determined then for everyε>0,

both players haveε-optimal strategies.

For anε> 0, we say that a strategyµ∈ ΣMinfor Min isε-optimalif for everys ∈ S, we

haveValµ₍_s₎_≤_Val₍_s_{) +}

ε. For anε>0, we say that a strategyχ∈ΣMax for Max isε-optimal

if for everys ∈S, we haveValχ(s)≥Val(s)−ε. Optimal andε-optimal strategies for player Max are defined analogously.

We say that a reachability-time game is positionally determinedif for every s ∈ S, we have

sup µ∈ΠMin

inf χ∈ΣMax

RT(Run(s,µ,χ)) =Val(s) = inf

χ∈ΠMax

sup µ∈ΣMin

RT(Run(s,µ,χ)).

Note that if the reachability-time game is positionally determined then for every ε > 0,

and Theorem 5.4.12) yield a constructive proof of the following fundamental result for reachability-time games.

THEOREM 5.1.2(Positional determinacy). Reachability-time games are positionally deter-

mined.

5.1.3. Optimality Equations

LetΓbe a timed game automaton, and letT:S _→RandD:S_→N. We write (T,D) |= OERT

MinMax(Γ), and we say that (T,D) is a solution of optimality

equationsOERT

MinMax(Γ), if for alls∈S, we have:

– ifD(s) =∞thenT(s) =∞; and – ifs_∈ Fthen(T(s),D(s)) = (0, 0); and – ifs∈SMin\Fthen T(s) = inf a,t{t+T(s 0₎ _: _s a −→t s0}, and D(s) = min 1+d0 : T(s) =inf a,t{t+T(s 0₎ _: _s a −→t s0andD(s0) =d0} ; – ifs_∈SMax\Fthen T(s) = sup a,t { t+T(s0) : s₋_→a t s0}, and D(s) = max1+d0 : T(s) =sup a,t { t+T(s0) : s −→a t s0 andD(s0) =d0} ;

LEMMA5.1.3(ε-Optimal strategies from optimality equations). If(T,D) |= OERT_MinMax(Γ),

then for alls ∈ S, we haveVal(s) = T(s)and for everyε > 0, both players havepositional ε-optimal strategies.

PROOF. We show that for every ε > 0, there exists a positional strategy µε : SMin →

A_×R_⊕for player Min, such that for every strategyχfor player Max, ifs ∈ Sis such that D(s)<∞, then we haveRT(Run(s,µε,χ))≤T(s) +ε. The proof, that for everyε>0, there exists a positional strategyχε :SMax → A×R⊕for player Max, such that for every strategy

µfor player Min, ifs∈Sis such thatD(s)< ∞then we haveRT(Run(s,µ,χε))≥T(s)−ε, is similar and omitted. The proof, that ifD(s) =∞then player Max has a strategy to prevent ever reaching a final state, is routine and omitted as well. Together, these facts imply thatT

is equal to the value function of the reachability-time game, and the positional strategiesµε

andχε, defined in the proof below for allε >0, areε-optimal.

For ε0 > 0, T : S → R, ands ∈ SMin\F, we say that a timed action(a,t)∈ A×R⊕ is

ε0-optimal for(T,D)insifs−→a t s0, and

D(s0) _≤ D(s)₋1, and (5.1.1)

t+T(s0) _≤ T(s) +ε0. (5.1.2)

Observe that for every states _∈SMinand for everyε0 >0, there is aε0-optimal timed action

for(T,D)insbecause(T,D)|= OERT

we have that for everys _∈SMax\Fand timed action(a,t), such thats −→a t s0, we have

D(s0) _≤ D(s)₋1, and (5.1.3)

t+T(s0) _≤ T(s). (5.1.4)

Let ε > 0; we defineµε : SMin → A×R⊕ by settingµε(s), for every s ∈ SMin, to be a timed action which isε0(s)-optimal for(T,D)ins, whereε0(s)>0 is sufficiently small (to be

determined later). Letχbe an arbitrary strategy for player Max and letr =Run(s,µε,χ) = hs0,(a1,t1),s1,(a2,t2), . . .i. LetN=Stop(r). Our goal is to prove thatRT(r)≤T(s) +ε, i.e.,

thatT(s)_≥_∑N

k=1tk−ε.

For every states∈ S, such thatD(s)< ∞, defineε0(s) =ε·2−D(s). Note that if we add

left- and right-hand sides of the inequalities (5.1.2) or (5.1.4), respectively, for all statessi,

andε0(si)-optimal timed actionsµε(si)ifsi ∈SMin, wherei=0, 1, . . . ,N−1, then we get T(s) = T(s0) ≥ N

∑

k=1 tk− N−1

∑

k=0 ε0(s_k) ≥ N−1

∑

k=0 tk−ε.

The first inequality holds by T(sN) = T(sStop(r)) = 0, and the second inequality holds because N−1

∑

k=0 ε0(s_k) = N−1

∑

k=0 (ε_·2−D(sk)₎ _≤ _ε_· ∞

∑

d=1 2−d _≤ ε,

where the first inequality follows by (5.1.1) and (5.1.3).

It may be worth noting that if the finite values of the function Dare bounded, i.e., if

B < ∞, where B = sup_s_∈_S{D(s) : D(s) < ∞}, then in the above proof it is sufficient to define ε0(s) = ε/B, for all s ∈ S, which gives arguably more realistically “physically

implementable”ε-optimal strategies.

In document Competitive optimisation on timed automata (Page 98-101)