In the preceding section we have established the theory which is necessary for the main contribution of this chapter. In particular, we will make use of the fixed-point charac-terization in Thm. 5.1 and the fact (provided by Thm. 5.2) that we may restrict ourselves to TTPDL schedulers. With these preliminaries, we are now ready to reduce the prob-lem of computing maximum time-bounded reachability in CTMDPs to the probprob-lem of maximizing the step-bounded reachability probability in (discrete-time) MDPs.
The latter is a well-studied problem which can be solved efficiently, e.g. by value itera-tion algorithms [Ber95]. The discretizaitera-tion that we use for our reducitera-tion is defined such that it is exact up to an a priori given error boundε > 0; hence, the results can be made arbitrarily precise. We study the complexity of our approach and show how to synthesize ε-optimal schedulers automatically.
5.3.1 Discretizing time in locally uniform CTMDPs
As before, letC be a locally uniform CTMDP, G ⊆ S a set of goal states, s ∈ S an initial state andz ∈ R≥0a time bound. We aim at computing pmax(s, z) up to an a priori fixed errorε > 0. If s ∈ G, this is trivial as pmax(s, z) = 1 for all z ∈ R≥0. To computepmax(s, z) fors ∉ G, we use the fixed point characterization of pmaxfrom Thm. 5.1. More precisely, consider the first sub-interval [0, τ] of the integral in Eq. (5.4) separately and split the whole integral accordingly:
pmax(s, z) = Ω(pmax)(s, z)
=
∫
0τE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ pmax(s′,z − t) dt +
∫
τzE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ pmax(s′,z − t) dt. (5.6)
Now, letA(s, z) and B(s, z) denote the first, resp. second summand in Eq. (5.6). Shifting the range of integration in B(s, z) by (−τ), the next Lemma derives a straightforward recursive representation of the probabilityB(s, z) which can easily be used for our dis-cretization purposes:
Lemma 5.3. For all s ∈S, z ∈ R≥0and τ ∈[0, z] it holds that
B(s, z) = e−E(s)τ⋅ pmax(s, z − τ). (5.7)
Proof. We proceed as follows:
B(s, z) =
∫
τzE(s)e−E(s)t⋅ maxα∈Act ∑s′∈S
P(s, α, s′) ⋅ pmax(s′,z − t) dt
=
∫
0z−τE(s)e−E(s)(t+τ) ⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ pmax(s′,z −(t + τ)) dt
=
∫
0z−τE(s)e−E(s)te−E(s)τ⋅ maxα∈Act ∑s′∈S
P(s, α, s′) ⋅ pmax(s′,z −(t + τ)) dt
=e−E(s)τ⋅
∫
0z−τE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ pmax(s′,(z − τ) − t) dt
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
pmax(s,z−τ)
=e−E(s)τ⋅ pmax(s, z − τ). ◻
Note that the factor e−E(s)τ in Eq. (5.7) is the probability that no transition occurs in states in the first τ time units. Hence, B(s, z) is the maximum probability of the event that starting from states, the set G is reached within z time units while no transition occurs in the time interval[0, τ]. To be more precise, let #[0,τ]∶ Pathsω→ N be the random variable which describes the number of transitions that occur in time interval [0, τ]. Then, it holds thatB(s, z) = supD∈MLPrωνs,D(◇[0,z]G ∩ #[0,τ]= 0). With the same reasoning, the first summandA(s, z) of (5.6) is the maximum probability to reach G within time z with at least one transition taking place in[0, τ]. Hence,
A(s, z) = sup
D∈ML
Prνωs,D(◇[0,z]G ∩ #[0,τ]≥ 1).
Now, decompose the underlying event of A(s, z) into disjoint subsets according to the number of transitions that occur in time interval[0, τ]:
(◇[0,z]G ∩ #[0,τ]≥ 1) =⊍∞
n=1(◇[0,z]G ∩ #[0,τ]=n).
Accordingly, letAn(s, z) be the maximum probability to reach G in z time units with exactlyn transitions occurring in the first time slice[0, τ]. In this way, we maximize the probability of each event(◇[0,z]G ∩ #[0,τ]=n) separately:
An(s, z) = sup
D∈ML
Prωνs,D(◇[0,z]G ∩ #[0,τ]=n) . (5.8)
To relateA(s, z) with the probabilities An(s, z), observe that A(s, z) = sup
D∈ML
Prνωs,D(◇[0,z]G ∩ #[0,τ]≥ 1)
= sup
D∈ML
Prνωs,D(⊍∞
n=1(◇[0,z]G ∩ #[0,τ]=n))
≤
∑∞ n=1( sup
D∈ML
Prωνs,D(◇[0,z]G ∩ #[0,τ]=n))
=
∞
∑n=1
An(s, z).
(5.9)
The next major step is to derive an analytic expression for the probabilityA1(s, z):
Lemma 5.4. Let C = (S, Act, R, ν) be a CTMDP, G ⊆ S a set of goal states, s ∈ S an initial state, z ∈ R≥0a time bound and τ > 0 a step duration. For A1(s, z) as defined in Eq. (5.8) it holds
A1(s, z) =
∫
0τE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ e−E(s′)(τ−t)⋅ pmax(s′,z − τ) dt.(5.10)
Note thatA1(s, z) is the maximum probability to reach G within z time units and that exactly one transition occurs within time interval[0, τ]. This is reflected in the integral in Lemma 5.4: Here, the integration variablet corresponds to the precise point in time when state s is left; further, if we move to state s′ after t units of time, we stay in the successor state s′ for at least (τ − t) time units (i.e. the time that remains in the first step duration) with probability e−E(s′)(τ−t). Finally, we multiply with pmax(s′,z−τ), i.e.
with the maximum achievable probability to reachG in the remaining(z − τ) time units, starting in states′.
Proof. Let E = (◇[0,z]G ∩ #[0,τ] = 1) be the event that corresponds to the probabil-ity A1(s, z). Given an ML scheduler D, the measure of the event E differs from the time-bounded reachability event ◇[0,z]G in the additional requirement that exactly one transition occurs in time interval[0, τ]. Hence, we obtain the probability
Prωνs,D(◇[0,z]G ∩ #[0,τ]= 1) =
∫
0τE(s)e−E(s)t⋅α∈Act∑ D(s, t)(α)⋅∑
s′∈S
P(s, α, s′) ⋅ e−E(s′)(τ−t)⋅ Prω
νs′,D(sÐ→α ,t ⋅,⋅+(τ−t))(◇[0,z−τ]G) dt. (5.11) The terme−E(s′)(τ−t)in Eq. (5.11) is the probability that after leaving states at time point t and entering the successor states′, no transition occurs for the next(τ − t) time units.
The ML-scheduler D(s Ð→ ⋅, ⋅α,t +(τ−t)) is defined such that if π = s′ ÐÐ→ πα′,t′ ′′ for some π′′∈Paths⋆, thenD(sÐ→ ⋅, ⋅α,t +(τ−t))(π, t) = D(sÐ→ πα,t ′,t), where π′=s′ÐÐÐÐÐ→ πα′,t′+(τ−t) ′′.
Hence, if states is left at time t and no transition occurs in the successor state s′within the followingτ − t time units, then D(sÐ→ ⋅, ⋅α,t +(τ−t)) decides on the remaining path as D does on the suffix of the complete path. Note that due to the memoryless property of the exponential distribution, we may split the sojourn time in states′in two parts: First, the sojourn in states′beforeτ and the remaining sojourn time. Hence Eq. (5.11) expresses the probability to reachG from state s within time bound z and that exactly one transition occurs in time interval[0, τ].
With these preliminaries, we introduce theML-scheduler D1z, which induces the maxi-mum probability for the eventE. Similar to the scheduler Dz, it is deterministic; however, it is not fully positional: To ease its definition, letg(s, α, t) ∈ [0, 1] be the maximum prob-ability to reachG in z time units, if state s has been left at time t and action α has been chosen and no transition occurs in the remainingτ − t time units:
g(s, α, t) = ∑ that at least one transition has occurred. Hence, we defineDz1 such that it optimizes the probability to reachG in the remaining time z −(∆(π) + t). Therefore we set D1z(π, t) = Dz(π↓, ∆(π) + t) if ∣π∣ > 0.
Now we prove thatDz1 is optimal w.r.t.E by contraposition: Assume that there exists D′ ∈ ML such that Prνωs,D′(E) > Prωνs,D1z(E). By the following derivation, this leads to a
=
∫
0τE(s)e−E(s)t⋅s∑′∈SP(s, Dz1(s, t), s′) ⋅ e−E(s′)(τ−t)⋅ Prωνs′,Dz−τ(◇[0,z−τ]G) dt
=Prωνs,Dz
1(E).
Hence, the schedulerD1zyields the maximum probability for the eventE and we obtain A1(s, z) = sup
D∈ML
Prωνs,D(◇[0,z]G ∩ #[0,τ]= 1)
=
∫
0τE(s)e−E(s)t⋅α∈Act∑ Dz1(s, t)(α) ⋅ ∑s′∈S
P(s, α, s′) ⋅ e−E(s′)(τ−t)
⋅ Prω
νs′,Dz1(sÐ→α ,t ⋅,⋅+(τ−t))(◇[0,z−τ]G) dt
=
∫
0τE(s)e−E(s)t⋅s∑′∈SP(s, Dz1(s, t), s′) ⋅ e−E(s′)(τ−t)
⋅ Prων
s′,Dz(⋅,⋅+τ)(◇[0,z−τ]G) dt
=
∫
0τE(s)e−E(s)t⋅s∑′∈SP(s, Dz1(s, t), s′) ⋅ e−E(s′)(τ−t)
⋅ Prωνs′,Dz−τ(◇[0,z−τ]G) dt
=
∫
0τE(s)e−E(s)t⋅∑s′∈S
P(s, Dz1(s, t), s′) ⋅ e−E(s′)(τ−t)⋅ pmax(s′,z − τ) dt
=
∫
0τE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ e−E(s′)(τ−t)⋅ pmax(s′,z − τ) dt,
completing the proof. ◻
Now we approximate the probabilityA(s, z) from below via a new probability X(s, z), which is closely related to A1(s, z): More precisely, we obtain X(s, z) by bounding the probabilitye−E(s′)(τ−t)in Eq. (5.10) from above by 1. HenceA1(s, z) ≤ X(s, z); moreover, by a continuity argument we can prove thatX(s, z) ≤ A(s, z).
With these two inequalities and the definition ofX(s, z) we establish an error bound for our discretization. Formally, the following sandwich lemma proves thatX(s, z) con-verges toA(s, z) for τ → 0:
Lemma 5.5 (One-step approximation). LetC = (S, Act, R, ν) be a CTMDP, G ⊆ S a set of goal states, λ = maxs∈SE(s), s ∈ S an initial state, z ∈ R≥0a time bound and τ > 0 a step duration. If we define
X(s, z) =
∫
0τE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ pmax(s′,z − τ) dt, (5.12)then X(s, z) approximates A(s, z) in the following sense:
X(s, z) ≤ A(s, z) ≤ X(s, z) + (λτ)2
2 . (5.13)
Proof. To establish the lower bound in Eq. (5.13), it suffices to note that
By definition, for all s′ ∈ S, the function pmax(s′, ⋅) is monotonically increasing in its second argument, that is, for increasing time boundsz, the maximum reachability prob-ability pmax(s′,z) increases. Reversely, the function pmax(s′,z − t) is monotonically de-creasing for inde-creasingt.
Hencet < τ implies that pmax(s′,z − t) ≥ pmax(s′,z − τ) and we obtain A(s, z) ≥
∫
0τE(s)e−E(s)t⋅ maxα∈Acts∑′∈SP(s, α, s′) ⋅ pmax(s′,z − τ) dt(5.12)= X(s, z).
For the upper bound in Eq. (5.13), let us first investigate the relation betweenX(s, z) andA1(s, z). By Lemma 5.4, we derive:
Therefore, we have proved thatX(s, z) is an upper bound for A1(s, z); formally:
A1(s, z) ≤ X(s, z). (5.14)
In the next step, we also obtain an upper bound for the sum∑∞n=2An(s, z): To see how this works, recall that for an exponential distribution with rateλ and a time interval[0, τ], the Poisson distributionρ(n, λτ) = e−λτ⋅(λτ)n!n expresses the probability thatn transitions occur within[0, τ]. This allows us to derive an upper bound, first for each An(s, z) sepa-rately:
Moreover, by maximality ofλ, the probability that more than n transitions occur in any states ∈S within time interval[0, τ] is at most
whereRn(x) = ∑∞i=n+1 xi
i! is the remainder term of the Taylor expansion of f(x) = ex for the point 0. By Taylor’s theorem, there existsξ ∈[0, λτ] such that
Rn(λτ) = f(n+1)(ξ)
(n + 1)! ⋅(λτ)n+1 = eξ
(n + 1)!⋅(λτ)n+1, (5.17) where f(n+1) denotes the(n + 1)-th derivative of f .
Summarizing the above reasoning, we obtain an upper bound for the error that is induced by approximatingA(s, z) by only considering X(s, z):
A(s, z)(5.9)≤
∑∞ n=1
An(s, z)(5.14)≤ X(s, z) +∑∞
n=2
An(s, z)(5.15)≤ X(s, z) +∑∞
n=2
ρ(n, λτ)
(5.16)
= X(s, z) + e−λτ ⋅ R1(λτ).
By Taylor’s theorem and Eq. (5.17), there existsξ ∈ [0, λτ] such that R1(λτ) = e2ξ ⋅(λτ)2. For an upper bound, chooseξ maximal in[0, λτ]. Then
A(s, z) ≤ X(s, z) + e−λτ ⋅ R1(λτ) ≤ X(s, z) + e−λτ⋅ eλτ
2 (λτ)2=X(s, z) + (λτ)2 2 . Thus we haveA(s, z) ≤ X(s, z) +(λτ)22, completing the proof for the upper bound. ◻
By Eq. (5.13), we can approximateA(s, z) from below via X(s, z), allowing for an error of at most (λτ)22. Thus, forτ → 0+it holds that X(s, z) = A(s, z). We use the one-step error bound (λτ)22 later in Thm. 5.5 to derive the overall error bound for our analysis.
5.3.2 Reduction to step-bounded reachability in MDPs
Based onX(s, z) and B(s, z), we are now ready to derive a discretization for pmax(s, z) in a locally uniform CTMDPC with respect to a step duration τ:
Definition 5.9 (Discretization). LetC =(S, Act, R, ν) be a CTMDP, and let τ > 0 be a step duration. The induced MDPCτ =(S, Act, Pτ,ν) is defined such that for all s, s′ ∈S and α ∈ Act(s):
Pτ(s, α, s′) =⎧⎪⎪
⎨⎪⎪⎩
(1 − e−E(s)τ) ⋅ P(s, α, s′) if s /= s′ (1 − e−E(s)τ) ⋅ P(s, α, s′) + e−E(s)τ if s = s′. Further, for all α ∉ Act(s), we define Pτ(s, α, s′) = 0.
In the MDPCτ, each step corresponds to one time slice of lengthτ in the CTMDPC.
For a single step and a fixed successor states′ /= s, Pτ(s, α, s′) equals the probability that
a transition tos′ occurs withinτ time units, given that α is chosen. In case that s′ = s, the first summand of Pτ(s, α, s) is the probability to take the loop back to s; the second summand denotes the probability that no transition occurs within timeτ and thus s = s′.
LetpCmaxτ (s, k) be the maximum probability to reach G starting from state s in at most k discrete steps in the (discrete time) MDP Cτ. Obviously pCmaxτ (s, k) = 1 if s ∈ G and pCmaxτ (s, 0) = 0 if s ∉ G. Further, for s ∉ G and k > 0, pCmaxτ (s, k) is defined recursively:
pCmaxτ (s, k) = maxα∈Act ∑
s′∈S
Pτ(s, α, s′) ⋅ pCmaxτ (s′,k − 1). (5.18)
The next theorem proves that the probability to reachG from state s within at most k =zτ steps in the discrete-time MDPCτconverges from below (forτ→ 0) to the corresponding time-bounded reachability probability in the CTMDPC:
Theorem 5.5. LetC =(S, Act, R, ν) be a CTMDP, λ = maxs∈SE(s), G ⊆ S a set of goal states, z ∈ R≥0 a time bound and k ∈ N>0 the number of discretization steps, such that τ = zk. Then it holds for all s ∈S:
pCmaxτ (s, k) ≤ pCmax(s, z) ≤ pCmaxτ (s, k) + (λz)2
2k . (5.19)
The proof is by induction on the numberk of discretization steps, where the lower and upper bounds are established for each step of lengthτ using Lemma 5.4 and Lemma 5.5.
Proof. Recall that pCmax(s, z) = A(s, z) + B(s, z) and X(s, z) ≤ A(s, z) ≤ X(s, z) +(λτ)22 by Eq. (5.13). We prove Eq. (5.19) by induction onk:
1. For k = 1, we have z = τ. If s ∈ G, then pCmaxτ (s, 1) = 1 = pCmax(s, τ), proving (5.19);
if k = 1 and s ∉ G, the lower bound in (5.19) holds as pCmaxτ (s, 1) = maxα∈Act(1 − e−E(s)τ) ⋅ P(s, α, G) = X(s, τ) ≤ pCmax(s, τ). For the upper bound, note that s ∉ G impliesB(s, τ) = 0. Thus pCmax(s, τ) = A(s, τ) + B(s, τ) = A(s, τ). By Lemma 5.5, we know thatA(s, τ) ≤ X(s, τ)+(λτ)22. Moreover,X(s, τ) = pCmaxτ (s, τ) by definition.
ThereforepCmax(s, τ) ≤ pCmaxτ (s, τ) +(λτ)22.
2. For the induction step, together with Lemma 5.5 (which provides X(s, z)) and Lemma 5.3 (the analytic expression forB(s, z)) we have
X(s, z) + B(s, z) = [
∫
0τE(s)e−E(s)tmaxα∈Acts∑′∈SP(s, α, s′) ⋅ pCmax(s′,z − τ) dt]
+[e−E(s)τ⋅ pCmax(s, z − τ)]
=[maxα∈Act ∑
By definition of Pτ(s, α, s′) (where the second summand in Eq. (5.20) corresponds to the special case ofs = s′), we derive from Eq. (5.20):
X(s, z) + B(s, z) = maxα∈Act ∑
s′∈S
Pτ(s, α, s′) ⋅ pCmax(s′,z − τ). (5.21) First we consider the lower bound on the left part of Eq. (5.19): By induction hy-pothesis, it holds thatpCmaxτ (s′,k − 1) ≤ pCmax(s′,z − τ) for all s′∈S. Then
The proof for the upper bound is as follows: By Lemma 5.5 it holds thatA(s, z) ≤ X(s, z) +(λτ)22. Together with Eq. (5.21) we derive From here, we complete the induction step: Therefore, rewrite the summands(λτ)2 2 and (λ(z−τ))2(k−1)2 in the right part of Eq. (5.22) further:
(λτ)2
2 + (λ(z − τ))2
2(k − 1) = (λτ)2k(k − 1) + (λ(z − τ))2k
2k(k − 1) (* ask = z
τ *)
= (λτ)2⋅zτ ⋅z−ττ +(λ(z − τ))2⋅zτ 2k ⋅z−ττ
= λ2z(z − τ) + λ2(z − τ)2⋅zτ 2k ⋅ z−ττ
= λ2τz + λ2z(z − τ)
2k = λ2(τz + z2− τz)
2k = (λz)2
2k . In this way, the right part of Eq. (5.22) can be simplified to pCmaxτ (s, k) +(λz)2k2. ◻ Example 5.3. Consider the CTMDPC in Fig. 5.3(a). To compute the maximum probability to reach G ={s2} within z time units up to a precision of ε, choose k ∈ N such that ε ≥ (λz)2k2, where λ = maxs∈SE(s) = 3. The step duration τ = zk induces the discretized MDPCτ which
is depicted in Fig. 5.3(b). ♢
5.3.3 Algorithm and complexity
LetC = (S, Act, R, ν) be a locally uniform CTMDP, G a set of goal states and z a time bound. For some error boundε > 0, let k be the number of steps needed to satisfy ε ≥
(λz)2
2k . Thenτ = zk induces the discretized MDPCτofC with step duration τ. By Thm. 5.5, the maximum probability to reachG within z time units inC can be approximated (up to ε) by maximizing the step-bounded reachability pCmaxτ forG inCτwithink steps. The latter can be computed efficiently by the well-knownvalue iteration approach [Ber95]. Briefly, it starts with a probability vector ⃗v0 with ⃗v0(s) = 1 if s ∈ G and 0, otherwise. In each iteration,⃗vi is obtained from⃗vi−1 according to Eq. (5.18). In each round,i corresponds to the number of steps in the MDPCτ; hence,⃗vi(s) equals pCmaxτ (s, i).
The value iteration approach on the discretized MDPCτ has the following complex-ity. For s ∈ S and α ∈ Act(s), let post(s, α) = {s′∈S ∣ R(s, α, s′) > 0} be the set of α-successors of state s. The size ofC is denoted by m = ∑s∈S∑α∈Act∣post(s, α)∣. In the worst case,Cτis obtained by adding a self-loop for each states ∈S and action α ∈ Act(s).
Thus, the size ofCτ is bounded by 2m. For a given error bound ε, it is easy to derive the numberk of value-iteration steps: By Thm. 5.5,∣pCmax(s, z) − pCmaxτ (s, k)∣ ≤ (λz)2k2. Letting
(λz)2
2k ≤ ε, we conclude that the smallest k to guarantee ε is(λz)2ε2. In each value iteration step, the update of the vector⃗vi takes time 2m. Thus, the worst-case time complexity of our approach isO(m ⋅ (λz)2/ε).
5.3.4 Synthesis of ε-optimal schedulers
Let C, G, z, k, τ = zk and Cτ be as before. A byproduct of the value iteration on the discretized MDPCτis anε-optimal scheduler for the set of goal states G and time bound z.
More precisely, in any of the i value iteration steps, for each state s ∈ S, an action αs,i
s0
Figure 5.3: The discretization of a locally uniform CTMDP.
is chosen according to Eq. (5.18). In this way, we obtain a history-dependent (or, to be more precise, step-dependent) scheduler for the MDP Cτ. This scheduler induces a τ-scheduler (denoted Dτ) of the original CTMDP C as follows: Dτ(s, tπ) = αs,i if tπ ∈ [(k − i)τ, (k − i + 1)τ). The following theorem shows that Dτ is anε-optimal scheduler in the underlying CTMDPC:
Theorem 5.6 (ε-optimal scheduler). The scheduler Dτ is an ε-optimal scheduler forC w.r.t. the maximum time-bounded reachability probability.
Proof. LetC = (S, Act, R, ν) be a locally uniform CTMDP, G a set of goal states and z a time bound. For some error bound ε > 0, let k be the number of steps needed to satisfyε ≥ (λz)2k2. LetCτ be the induced MDP withτ = zk, andDτ be theτ-scheduler as described. To show that Dτ is an ε-optimal scheduler for C w.r.t. the maximum time-bounded reachability probability, we prove that for all statess ∈S it holds that
∣Prωνs,Dτ(◇[0,z]G) − pCmaxτ (s, k)∣ ≤ ε.
It is sufficient to show the following equality:
pCmaxτ (s, k) ≤ Prωνs,Dτ(◇[0,z]G) ≤ pCmaxτ (s, k) + ε. (5.23) By Theorem 5.5, the upper bound can be shown directly:
Prνωs,Dτ(◇[0,z]G) ≤ pCmax(s, z) ≤ pCmaxτ (s, k) + (λz)2
2k ≤pCmaxτ (s, k) + ε.
Now we discuss how to show the lower bound of Eq. (5.23). First, note that under any TTPDL scheduler D, the CTMDPC is totally stochastic and for s ∉ G, the probability Prωνs,D(◇[0,z]G) can be computed by:
Prνωs,D(◇[0,z]G) =
∫
0zE(s)e−E(s)t⋅s∑′∈SP(s, D(s, t), s′) ⋅ Prνωs′,D(◇[0,z−t]G) dt.
NoteDτis aTTPDL scheduler, thus it holds that Prωνs,Dτ(◇[0,z]G) =
∫
0zE(s)e−E(s)t⋅s∑′∈SP(s, Dτ(s, t), s′) ⋅ Prωνs′,Dτ(◇[0,z−t]G) dt.
This integral can then be split into two partsA(s, z) and B(s, z) at time t = τ: it follows in a similar way as Eq. (5.6) with the difference of taking the actionDτ(s, t) instead of the maximum over allα ∈ Act. The lower bound can then be established by induction onk, by adapting the lower bound proof of Eq. (5.19) of Thm. 5.5 appropriately. ◻