Computing time-bounded reachability probabilities

In the preceding section we have established the theory which is necessary for the main contribution of this chapter. In particular, we will make use of the fixed-point charac-terization in Thm. 5.1 and the fact (provided by Thm. 5.2) that we may restrict ourselves to TTPDL schedulers. With these preliminaries, we are now ready to reduce the prob-lem of computing maximum time-bounded reachability in CTMDPs to the probprob-lem of maximizing the step-bounded reachability probability in (discrete-time) MDPs.

The latter is a well-studied problem which can be solved efficiently, e.g. by value itera-tion algorithms [Ber95]. The discretizaitera-tion that we use for our reducitera-tion is defined such that it is exact up to an a priori given error boundε > 0; hence, the results can be made arbitrarily precise. We study the complexity of our approach and show how to synthesize ε-optimal schedulers automatically.

5.3.1 Discretizing time in locally uniform CTMDPs

As before, letC be a locally uniform CTMDP, G ⊆ S a set of goal states, s ∈ S an initial state andz ∈ R≥0a time bound. We aim at computing pmax(s, z) up to an a priori fixed errorε > 0. If s ∈ G, this is trivial as pmax(s, z) = 1 for all z ∈ R≥0. To computepmax(s, z) fors ∉ G, we use the fixed point characterization of pmaxfrom Thm. 5.1. More precisely, consider the first sub-interval [0, τ] of the integral in Eq. (5.4) separately and split the whole integral accordingly:

pmax(s, z) = Ω(pmax)(s, z)

∫

₀^τ^E^(s)e^−E^(s)t^{⋅ max}^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ pmax(s^′,z − t) dt +

∫

_τ^z^E^(s)e⁻^E(s)t^{⋅ max}^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ p^max(s^′,z − t) dt. (5.6)

Now, letA(s, z) and B(s, z) denote the first, resp. second summand in Eq. (5.6). Shifting the range of integration in B(s, z) by (−τ), the next Lemma derives a straightforward recursive representation of the probabilityB(s, z) which can easily be used for our dis-cretization purposes:

Lemma 5.3. For all s ∈S, z ∈ R≥0and τ ∈[0, z] it holds that

B(s, z) = e⁻^E(s)τ⋅ p_max(s, z − τ). (5.7)

Proof. We proceed as follows:

B(s, z) =

∫

τ^zE(s)e⁻^E(s)t⋅ maxα∈Act ∑

s^′∈S

P(s, α, s^′) ⋅ p^max(s^′,z − t) dt

∫

₀^z−τ^E^(s)e⁻^E(s)(t+τ) ^{⋅ max}^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ pmax(s^′,z −(t + τ)) dt

∫

0^z−τE(s)e^−E^(s)te^−E^(s)τ⋅ max_α∈Act ∑

s^′∈S

P(s, α, s^′) ⋅ pmax(s^′,z −(t + τ)) dt

=e⁻^E(s)τ⋅

∫

₀^z−τ^E^(s)e⁻^E(s)t^{⋅ max}^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ p^max(s^′,(z − τ) − t) dt

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

pmax(s,z−τ)

=e^−E^(s)τ⋅ p_max(s, z − τ). ◻

Note that the factor e⁻^E(s)τ in Eq. (5.7) is the probability that no transition occurs in states in the first τ time units. Hence, B(s, z) is the maximum probability of the event that starting from states, the set G is reached within z time units while no transition occurs in the time interval[0, τ]. To be more precise, let #[0,τ]∶ Paths^ω→ N be the random variable which describes the number of transitions that occur in time interval [0, τ]. Then, it holds thatB(s, z) = supD∈MLPr^ω_ν_s_,D(◇^[0,z]G ∩ #_[0,τ]= 0). With the same reasoning, the first summandA(s, z) of (5.6) is the maximum probability to reach G within time z with at least one transition taking place in[0, τ]. Hence,

A(s, z) = sup

D∈ML

Pr_ν^ω_s_,D(◇^[0,z]G ∩ #[0,τ]≥ 1).

Now, decompose the underlying event of A(s, z) into disjoint subsets according to the number of transitions that occur in time interval[0, τ]:

(◇^[0,z]G ∩ #_[0,τ]≥ 1) =⊍^∞

n=1(◇^[0,z]G ∩ #_[0,τ]=n).

Accordingly, letAn(s, z) be the maximum probability to reach G in z time units with exactlyn transitions occurring in the first time slice[0, τ]. In this way, we maximize the probability of each event(◇^[0,z]G ∩ #[0,τ]=n) separately:

An(s, z) = sup

D∈ML

Pr^ω_ν_s_,D(◇^[0,z]G ∩ #[0,τ]=n) . (5.8)

To relateA(s, z) with the probabilities An(s, z), observe that A(s, z) = sup

D∈ML

Pr_ν^ω_s_,D(◇^[0,z]G ∩ #[0,τ]≥ 1)

= sup

D∈ML

Pr_ν^ω_s_,D(⊍^∞

n=1(◇^[0,z]G ∩ #_[0,τ]=n))

≤

∑∞ n=1( sup

D∈ML

Pr^ω_ν_s_,D(◇^[0,z]G ∩ #[0,τ]=n))

∞

∑n=1

An(s, z).

(5.9)

The next major step is to derive an analytic expression for the probabilityA1(s, z):

Lemma 5.4. Let C = (S, Act, R, ν) be a CTMDP, G ⊆ S a set of goal states, s ∈ S an initial state, z ∈ R≥0a time bound and τ > 0 a step duration. For A₁(s, z) as defined in Eq. (5.8) it holds

A1(s, z) =

∫

₀^τÊ^(s)e⁻Ê(s)t^{⋅ max}^α∈Act_s^∑_′_∈S^P^{(s, α, s}^′^{) ⋅ e}⁻Ê(s^′^)(τ−t)^{⋅ p}^max^(s^′^,^{z − τ}^{) dt.}

(5.10)

Note thatA1(s, z) is the maximum probability to reach G within z time units and that exactly one transition occurs within time interval[0, τ]. This is reflected in the integral in Lemma 5.4: Here, the integration variablet corresponds to the precise point in time when state s is left; further, if we move to state s^′ after t units of time, we stay in the successor state s^′ for at least (τ − t) time units (i.e. the time that remains in the first step duration) with probability e⁻^E(s^′^)(τ−t). Finally, we multiply with pmax(s^′,z−τ), i.e.

with the maximum achievable probability to reachG in the remaining(z − τ) time units, starting in states^′.

Proof. Let E = (◇^[0,z]G ∩ #_[0,τ] = 1) be the event that corresponds to the probabil-ity A1(s, z). Given an ML scheduler D, the measure of the event E differs from the time-bounded reachability event ◇^[0,z]G in the additional requirement that exactly one transition occurs in time interval[0, τ]. Hence, we obtain the probability

Pr^ω_ν_s_,D(◇^[0,z]G ∩ #_[0,τ]= 1) =

∫

₀^τ^E^(s)e⁻^E(s)t^⋅_α∈Act^∑ ^D^{(s, t)(α)}

⋅∑

s^′∈S

P(s, α, s^′) ⋅ e^−E^(s^′^)(τ−t)⋅ Pr^ω

ν_s′,D(sÐ→^{α ,t} ^⋅,⋅^+(τ−t))(◇^[0,z−τ]G) dt. (5.11) The terme^−E^(s^′^)(τ−t)in Eq. (5.11) is the probability that after leaving states at time point t and entering the successor states^′, no transition occurs for the next(τ − t) time units.

The ML-scheduler D(s Ð→ ⋅, ⋅^α,t ⁺^(τ−t)) is defined such that if π = s^′ ÐÐ→ π^α^′^,t^′ ^′′ for some π^′′∈Paths^⋆, thenD(sÐ→ ⋅, ⋅^α,t ⁺^(τ−t))(π, t) = D(sÐ→ π^α,t ^′,t), where π^′=s^′ÐÐÐÐÐ→ π^α^′^,t^′⁺^(τ−t) ^′′.

Hence, if states is left at time t and no transition occurs in the successor state s^′within the followingτ − t time units, then D(sÐ→ ⋅, ⋅^α,t ⁺^(τ−t)) decides on the remaining path as D does on the suffix of the complete path. Note that due to the memoryless property of the exponential distribution, we may split the sojourn time in states^′in two parts: First, the sojourn in states^′beforeτ and the remaining sojourn time. Hence Eq. (5.11) expresses the probability to reachG from state s within time bound z and that exactly one transition occurs in time interval[0, τ].

With these preliminaries, we introduce theML-scheduler D₁^z, which induces the maxi-mum probability for the eventE. Similar to the scheduler D^z, it is deterministic; however, it is not fully positional: To ease its definition, letg(s, α, t) ∈ [0, 1] be the maximum prob-ability to reachG in z time units, if state s has been left at time t and action α has been chosen and no transition occurs in the remainingτ − t time units:

g(s, α, t) = ∑ that at least one transition has occurred. Hence, we defineD^z₁ such that it optimizes the probability to reachG in the remaining time z −(∆(π) + t). Therefore we set D₁^z(π, t) = D^z(π↓, ∆(π) + t) if ∣π∣ > 0.

Now we prove thatD^z₁ is optimal w.r.t.E by contraposition: Assume that there exists D^′ ∈ ML such that Pr_ν^ω_s_,D^′(E) > Pr^ωνs,D₁^z(E). By the following derivation, this leads to a

∫

₀^τ^E^(s)e⁻^E(s)t^⋅_s^∑′∈S

P(s, D^z1(s, t), s^′) ⋅ e⁻^E(s^′^)(τ−t)⋅ Pr^ω_ν_s′_,Dz−τ(◇^[0,z−τ]G) dt

=Pr^ω_ν_s_,D^z

1(E).

Hence, the schedulerD₁^zyields the maximum probability for the eventE and we obtain A₁(s, z) = sup

D∈ML

Pr^ω_ν_s_,D(◇^[0,z]G ∩ #_[0,τ]= 1)

∫

₀^τ^E^(s)e⁻^E(s)t^⋅_α∈Act^∑ ^D^z¹(s, t)(α) ⋅ ∑

s^′∈S

P(s, α, s^′) ⋅ e⁻^E(s^′^)(τ−t)

⋅ Pr^ω

ν_s′,D^z₁(sÐ→^{α ,t} ^⋅,⋅^+(τ−t))(◇^[0,z−τ]G) dt

∫

₀^τ^E^(s)e⁻^E(s)t^⋅_s^∑′∈S

P(s, D^z1(s, t), s^′) ⋅ e⁻^E(s^′^)(τ−t)

⋅ Pr^ω_ν

s′,D^z(⋅,⋅^+τ)(◇^[0,z−τ]G) dt

∫

₀^τ^E^(s)e⁻^E(s)t^⋅_s^∑′∈S

P(s, D^z1(s, t), s^′) ⋅ e⁻^E(s^′^)(τ−t)

⋅ Pr^ω_ν_s′_,D^z−τ(◇^[0,z−τ]G) dt

∫

0^τE(s)e⁻^E(s)t⋅∑

s^′∈S

P(s, D^z1(s, t), s^′) ⋅ e⁻^E(s^′^)(τ−t)⋅ p_max(s^′,z − τ) dt

∫

₀^τ^E^(s)e^−E^(s)t^{⋅ max}^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ e^−E^(s^′^)(τ−t)⋅ p_max(s^′,z − τ) dt,

completing the proof. ◻

Now we approximate the probabilityA(s, z) from below via a new probability X(s, z), which is closely related to A1(s, z): More precisely, we obtain X(s, z) by bounding the probabilitye⁻^E(s^′^)(τ−t)in Eq. (5.10) from above by 1. HenceA₁(s, z) ≤ X(s, z); moreover, by a continuity argument we can prove thatX(s, z) ≤ A(s, z).

With these two inequalities and the definition ofX(s, z) we establish an error bound for our discretization. Formally, the following sandwich lemma proves thatX(s, z) con-verges toA(s, z) for τ → 0:

Lemma 5.5 (One-step approximation). LetC = (S, Act, R, ν) be a CTMDP, G ⊆ S a set of goal states, λ = max_s∈SE(s), s ∈ S an initial state, z ∈ R≥0a time bound and τ > 0 a step duration. If we define

X(s, z) =

∫

₀^τ^E^(s)e⁻^E(s)t^{⋅ max}^α∈Act_s^∑_′_∈S^P^{(s, α, s}^′^{) ⋅ p}^max^(s^′^,^{z − τ}^{) dt,} ^(5.12)

then X(s, z) approximates A(s, z) in the following sense:

X(s, z) ≤ A(s, z) ≤ X(s, z) + (λτ)²

2 . (5.13)

Proof. To establish the lower bound in Eq. (5.13), it suffices to note that

By definition, for all s^′ ∈ S, the function pmax(s^′, ⋅) is monotonically increasing in its second argument, that is, for increasing time boundsz, the maximum reachability prob-ability pmax(s^′,z) increases. Reversely, the function p^max(s^′,z − t) is monotonically de-creasing for inde-creasingt.

Hencet < τ implies that p_max(s^′,z − t) ≥ pmax(s^′,z − τ) and we obtain A(s, z) ≥

∫

₀^τ^E^(s)e⁻^E(s)t^{⋅ max}^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ p^max(s^′,z − τ) dt^(5.12)= X(s, z).

For the upper bound in Eq. (5.13), let us first investigate the relation betweenX(s, z) andA₁(s, z). By Lemma 5.4, we derive:

Therefore, we have proved thatX(s, z) is an upper bound for A1(s, z); formally:

A1(s, z) ≤ X(s, z). (5.14)

In the next step, we also obtain an upper bound for the sum∑^∞n=2A_n(s, z): To see how this works, recall that for an exponential distribution with rateλ and a time interval[0, τ], the Poisson distributionρ(n, λτ) = e⁻^λτ⋅^(λτ)_n!ⁿ expresses the probability thatn transitions occur within[0, τ]. This allows us to derive an upper bound, first for each Aⁿ(s, z) sepa-rately:

Moreover, by maximality ofλ, the probability that more than n transitions occur in any states ∈S within time interval[0, τ] is at most

whereRn(x) = ∑^∞i=n+1 xⁱ

i! is the remainder term of the Taylor expansion of f(x) = e^x for the point 0. By Taylor’s theorem, there existsξ ∈[0, λτ] such that

Rn(λτ) = f⁽ⁿ⁺¹⁾(ξ)

(n + 1)! ⋅(λτ)ⁿ⁺¹ = e^ξ

(n + 1)!⋅(λτ)ⁿ⁺¹, (5.17) where f⁽ⁿ⁺¹⁾ denotes the(n + 1)-th derivative of f .

Summarizing the above reasoning, we obtain an upper bound for the error that is induced by approximatingA(s, z) by only considering X(s, z):

A(s, z)^(5.9)≤

∑∞ n=1

An(s, z)^(5.14)≤ X(s, z) +∑^∞

n=2

An(s, z)^(5.15)≤ X(s, z) +∑^∞

n=2

ρ(n, λτ)

(5.16)

= X(s, z) + e⁻^λτ ⋅ R₁(λτ).

By Taylor’s theorem and Eq. (5.17), there existsξ ∈ [0, λτ] such that R1(λτ) = ^e₂^ξ ⋅(λτ)². For an upper bound, chooseξ maximal in[0, λτ]. Then

A(s, z) ≤ X(s, z) + e⁻^λτ ⋅ R₁(λτ) ≤ X(s, z) + e⁻^λτ⋅ e^λτ

2 (λτ)²=X(s, z) + (λτ)² 2 . Thus we haveA(s, z) ≤ X(s, z) +^(λτ)₂², completing the proof for the upper bound. ◻

By Eq. (5.13), we can approximateA(s, z) from below via X(s, z), allowing for an error of at most ^(λτ)₂². Thus, forτ → 0⁺it holds that X(s, z) = A(s, z). We use the one-step error bound ^(λτ)₂² later in Thm. 5.5 to derive the overall error bound for our analysis.

5.3.2 Reduction to step-bounded reachability in MDPs

Based onX(s, z) and B(s, z), we are now ready to derive a discretization for pmax(s, z) in a locally uniform CTMDPC with respect to a step duration τ:

Definition 5.9 (Discretization). LetC =(S, Act, R, ν) be a CTMDP, and let τ > 0 be a step duration. The induced MDPC_τ =(S, Act, Pτ,ν) is defined such that for all s, s^′ ∈S and α ∈ Act(s):

P_τ(s, α, s^′) =⎧⎪⎪

⎨⎪⎪⎩

(1 − e^−E^(s)τ) ⋅ P(s, α, s^′) if s /= s^′ (1 − e^−E^(s)τ) ⋅ P(s, α, s^′) + e^−E^(s)τ if s = s^′. Further, for all α ∉ Act(s), we define P^τ(s, α, s^′) = 0.

In the MDPC_τ, each step corresponds to one time slice of lengthτ in the CTMDPC.

For a single step and a fixed successor states^′ /= s, P^τ(s, α, s^′) equals the probability that

a transition tos^′ occurs withinτ time units, given that α is chosen. In case that s^′ = s, the first summand of Pτ(s, α, s) is the probability to take the loop back to s; the second summand denotes the probability that no transition occurs within timeτ and thus s = s^′.

Letp^Cmax^τ (s, k) be the maximum probability to reach G starting from state s in at most k discrete steps in the (discrete time) MDP Cτ. Obviously p^Cmax^τ (s, k) = 1 if s ∈ G and p^Cmax^τ (s, 0) = 0 if s ∉ G. Further, for s ∉ G and k > 0, p^Cmax^τ (s, k) is defined recursively:

p^C_max^τ (s, k) = maxα∈Act ∑

s^′∈S

P_τ(s, α, s^′) ⋅ p^Cmax^τ (s^′,k − 1). (5.18)

The next theorem proves that the probability to reachG from state s within at most k =^z_τ steps in the discrete-time MDPCτconverges from below (forτ→ 0) to the corresponding time-bounded reachability probability in the CTMDPC:

Theorem 5.5. LetC =(S, Act, R, ν) be a CTMDP, λ = max^s∈SE(s), G ⊆ S a set of goal states, z ∈ R≥0 a time bound and k ∈ N>0 the number of discretization steps, such that τ = ^z_k. Then it holds for all s ∈S:

p^C_max^τ (s, k) ≤ p^Cmax(s, z) ≤ p^Cmax^τ (s, k) + (λz)²

2k . (5.19)

The proof is by induction on the numberk of discretization steps, where the lower and upper bounds are established for each step of lengthτ using Lemma 5.4 and Lemma 5.5.

Proof. Recall that p^C_max(s, z) = A(s, z) + B(s, z) and X(s, z) ≤ A(s, z) ≤ X(s, z) +^(λτ)₂² by Eq. (5.13). We prove Eq. (5.19) by induction onk:

1. For k = 1, we have z = τ. If s ∈ G, then p^Cmax^τ (s, 1) = 1 = p^Cmax(s, τ), proving (5.19);

if k = 1 and s ∉ G, the lower bound in (5.19) holds as p^Cmax^τ (s, 1) = maxα∈Act(1 − e⁻^E(s)τ) ⋅ P(s, α, G) = X(s, τ) ≤ p^Cmax(s, τ). For the upper bound, note that s ∉ G impliesB(s, τ) = 0. Thus p^Cmax(s, τ) = A(s, τ) + B(s, τ) = A(s, τ). By Lemma 5.5, we know thatA(s, τ) ≤ X(s, τ)+^(λτ)₂². Moreover,X(s, τ) = p^Cmax^τ (s, τ) by definition.

Thereforep^C_max(s, τ) ≤ p^Cmax^τ (s, τ) +^(λτ)₂².

2. For the induction step, together with Lemma 5.5 (which provides X(s, z)) and Lemma 5.3 (the analytic expression forB(s, z)) we have

X(s, z) + B(s, z) = [

∫

₀^τ^E^(s)e⁻^E(s)t^max^α∈Act_s^∑′∈S

P(s, α, s^′) ⋅ p^Cmax(s^′,z − τ) dt]

+[e⁻^E(s)τ⋅ p^C_max(s, z − τ)]

=[maxα∈Act ∑

By definition of P_τ(s, α, s^′) (where the second summand in Eq. (5.20) corresponds to the special case ofs = s^′), we derive from Eq. (5.20):

X(s, z) + B(s, z) = maxα∈Act ∑

s^′∈S

P_τ(s, α, s^′) ⋅ p^Cmax(s^′,z − τ). (5.21) First we consider the lower bound on the left part of Eq. (5.19): By induction hy-pothesis, it holds thatp^Cmax^τ (s^′,k − 1) ≤ p^Cmax(s^′,z − τ) for all s^′∈S. Then

The proof for the upper bound is as follows: By Lemma 5.5 it holds thatA(s, z) ≤ X(s, z) +^(λτ)₂². Together with Eq. (5.21) we derive From here, we complete the induction step: Therefore, rewrite the summands^(λτ)₂ ² and ^(λ(z−τ))_2(k−1)² in the right part of Eq. (5.22) further:

(λτ)²

2 + (λ(z − τ))²

2(k − 1) = (λτ)²k(k − 1) + (λ(z − τ))²k

2k(k − 1) (* ask = z

τ *)

= (λτ)²⋅^z_τ ⋅^z−τ_τ +(λ(z − τ))²⋅^z_τ 2k ⋅^z−τ_τ

= λ²z(z − τ) + λ²(z − τ)²⋅^z_τ 2k ⋅ ^z−τ_τ

= λ²τz + λ²z(z − τ)

2k = λ²(τz + z²− τz)

2k = (λz)²

2k . In this way, the right part of Eq. (5.22) can be simplified to p^Cmax^τ (s, k) +^(λz)_2k². ◻ Example 5.3. Consider the CTMDPC in Fig. 5.3(a). To compute the maximum probability to reach G ={s2} within z time units up to a precision of ε, choose k ∈ N such that ε ≥ ^(λz)2k², where λ = max_s∈SE(s) = 3. The step duration τ = ^z_k induces the discretized MDPCτ which

is depicted in Fig. 5.3(b). ♢

5.3.3 Algorithm and complexity

LetC = (S, Act, R, ν) be a locally uniform CTMDP, G a set of goal states and z a time bound. For some error boundε > 0, let k be the number of steps needed to satisfy ε ≥

(λz)²

2k . Thenτ = ^z_k induces the discretized MDPCτofC with step duration τ. By Thm. 5.5, the maximum probability to reachG within z time units inC can be approximated (up to ε) by maximizing the step-bounded reachability p^Cmax^τ forG inC_τwithink steps. The latter can be computed efficiently by the well-knownvalue iteration approach [Ber95]. Briefly, it starts with a probability vector ⃗v0 with ⃗v0(s) = 1 if s ∈ G and 0, otherwise. In each iteration,⃗vi is obtained from⃗vi−1 according to Eq. (5.18). In each round,i corresponds to the number of steps in the MDPCτ; hence,⃗vi(s) equals p^C^max^τ (s, i).

The value iteration approach on the discretized MDPCτ has the following complex-ity. For s ∈ S and α ∈ Act(s), let post(s, α) = {s^′∈S ∣ R(s, α, s^′) > 0} be the set of α-successors of state s. The size ofC is denoted by m = ∑s∈S∑α∈Act∣post(s, α)∣. In the worst case,Cτis obtained by adding a self-loop for each states ∈S and action α ∈ Act(s).

Thus, the size ofCτ is bounded by 2m. For a given error bound ε, it is easy to derive the numberk of value-iteration steps: By Thm. 5.5,∣p^Cmax(s, z) − p^Cmax^τ (s, k)∣ ≤ ^(λz)_2k². Letting

(λz)²

2k ≤ ε, we conclude that the smallest k to guarantee ε is^(λz)_2ε². In each value iteration step, the update of the vector⃗vi takes time 2m. Thus, the worst-case time complexity of our approach isO(m ⋅ (λz)²/ε).

5.3.4 Synthesis of ε-optimal schedulers

Let C, G, z, k, τ = ^z_k and Cτ be as before. A byproduct of the value iteration on the discretized MDPC_τis anε-optimal scheduler for the set of goal states G and time bound z.

More precisely, in any of the i value iteration steps, for each state s ∈ S, an action αs,i

Figure 5.3: The discretization of a locally uniform CTMDP.

is chosen according to Eq. (5.18). In this way, we obtain a history-dependent (or, to be more precise, step-dependent) scheduler for the MDP Cτ. This scheduler induces a τ-scheduler (denoted Dτ) of the original CTMDP C as follows: Dτ(s, tπ) = αs,i if tπ ∈ [(k − i)τ, (k − i + 1)τ). The following theorem shows that Dτ is anε-optimal scheduler in the underlying CTMDPC:

Theorem 5.6 (ε-optimal scheduler). The scheduler Dτ is an ε-optimal scheduler forC w.r.t. the maximum time-bounded reachability probability.

Proof. LetC = (S, Act, R, ν) be a locally uniform CTMDP, G a set of goal states and z a time bound. For some error bound ε > 0, let k be the number of steps needed to satisfyε ≥ ^(λz)_2k². LetC_τ be the induced MDP withτ = ^z_k, andDτ be theτ-scheduler as described. To show that Dτ is an ε-optimal scheduler for C w.r.t. the maximum time-bounded reachability probability, we prove that for all statess ∈S it holds that

∣Pr^ωνs,Dτ(◇^[0,z]G) − p^Cmax^τ (s, k)∣ ≤ ε.

It is sufficient to show the following equality:

p^C_max^τ (s, k) ≤ Pr^ωνs,Dτ(◇^[0,z]G) ≤ p^Cmax^τ (s, k) + ε. (5.23) By Theorem 5.5, the upper bound can be shown directly:

Pr_ν^ω_s_,D_τ(◇^[0,z]G) ≤ p^Cmax(s, z) ≤ p^Cmax^τ (s, k) + (λz)²

2k ≤p^C_max^τ (s, k) + ε.

Now we discuss how to show the lower bound of Eq. (5.23). First, note that under any TTPDL scheduler D, the CTMDPC is totally stochastic and for s ∉ G, the probability Pr^ω_ν_s_,D(◇^[0,z]G) can be computed by:

Pr_ν^ω_s_,D(◇^[0,z]G) =

∫

₀^z^E^(s)e⁻^E(s)t^⋅_s^∑′∈S

P(s, D(s, t), s^′) ⋅ Prν^ω_s′,D(◇^[0,z−t]G) dt.

NoteDτis aTTPDL scheduler, thus it holds that Pr^ω_ν_s_,D_τ(◇^[0,z]G) =

∫

₀^z^E^(s)e^−E^(s)t^⋅_s^∑′∈S

P(s, Dτ(s, t), s^′) ⋅ Pr^ων_s′,Dτ(◇^[0,z−t]G) dt.

This integral can then be split into two partsA(s, z) and B(s, z) at time t = τ: it follows in a similar way as Eq. (5.6) with the difference of taking the actionDτ(s, t) instead of the maximum over allα ∈ Act. The lower bound can then be established by induction onk, by adapting the lower bound proof of Eq. (5.19) of Thm. 5.5 appropriately. ◻

In document Model checking nondeterministic and randomly timed systems (Page 146-157)