3.3 Upper-reward bounded quantiles
3.3.3 Qualitative quantiles
Now, we investigate a special case for the computation of upper-reward bounded quantile queries, i.e., the computation of qualitative quantiles where the probability threshold is either 0 or 1. So, we consider queries of the form
qus QPθ(A Uď? B)
with Q P t@, Du, θ P tą 0, = 1u. The computation of qualitative quantiles allows the usage of less expensive methods and was studied in [UB13, Section 4] restricted for the case of state rewards. For the computation a polynomial time-algorithm was presented in [UB13, Algorithm 1], which relies on iterated reachability computations analysing the structure of the graph representing the model under consideration. Figure 3.4 shows this algorithm for the computation of qualitative quantiles where the considered reward function is restricted to state rewards. As can be seen, no methods for solving linear programs are needed in this case. So, the implementation of this algorithm could be done straight forward by utilising the graph-based methods already available for doing standard calculations of probabilistic model checkers.
Since the application of quantiles for the analysis of energy-critical systems revealed that a tool support that is restricted to state rewards only does not meet the require- ments in many cases, the demand arose that the framework should provide support for
3.3 Upper-reward bounded quantiles Input: MDP M = (S, Act, P ), state-reward function rew : S Ñ N,
quantile query qu(Q(A Uď?B)) with Q P t@P
ą0, DPą0, @P=1, DP=1u for each s P S do if s P B then v(s) Ð 0 else v(s) Ð 8 done X Ð ts PS : v(s) = 0u R Ð t0u
Z Ð ts PS : s P AzB and rew(s) = 0u
while R ‰ H do
r Ðmin R
Y Ð ts P X : v(s) ď ruzZ
for each s P SzX with s P A and s |ù Q (Z U Y ) do
v(s) Ð r +rew(s) X Ð X Y tsu R Ð R Y tv(s)u done R Ð Rztru done return v
Figure 3.4: Algorithm 1 from [UB13]
transition rewards as well. In order to allow transition rewards, a transformation is used to encode the transition-reward function rew : S ˆ Act Ñ N over system M into a new reward function rew : rS Ñ N (consisting of state rewards only) over the transformedĂ system MĂby introducing intermediate states. Those additional states simply serve as an emulation for the transition rewards of the original model. This encoding then allows the application of [UB13, Algorithm 1] (as shown in Figure 3.4) for the computation of qualitative quantiles when the model under consideration is equipped with transition rewards. Figure 3.5 depicts the idea of the encoding transformation when rew(s, α) = 0, rew(s, β) ą 0, and rew(t, γ) ą 0. The figure shows that only rewards greater zero are taken into account and the zero-reward transitions stay without any modification.
Formally, for a given MDP M = (S, Act, P ) and a specific transition-reward function rew : S ˆ Act Ñ N, the transformation defines a new MDPM = (rĂ S,Act,Ă P )r with
• rS = S Y tsα : s PS, α P Act(s), rew(s, α) ą 0u
t u1 u2 u3 s α 3 4 1 4 β γ t u1 u2 u3 s α 3 4 1 4 sβ tγ ˆ β β ˆ γ γ
Figure 3.5: Encoding of transition rewards into state rewards
• P (s, α, t) =r $ ’ ’ ’ & ’ ’ ’ % P (s, α, t) if rew(s, α) = 0 0 if rew(s, α) ą 0 and t ‰ sα 1 if rew(s, α) ą 0 and t = sα
P (u, β, t) if s = uβ for u P S, β P Act(u), rew(u, β) ą 0 and a new state-reward function rew : rS Ñ N withĂ
Ă rew(t) =
#
0 if t P S
rew(s, α) if t = sα for s P S, α P Act(s)
From a theoretical point of view it is possible to introduce a newly created intermediate state for each state-action pair (even if its reward is zero), but since the calculation of qualitative quantiles is currently only supported for an explicit representation of the model’s state space (see Section 5.2), this transformation would increase the size of the transformed model dramatically. So, for practical purposes it is much better to consider only the states that are essential for the encoding, and therefore utilise the presented transformation.
The following shows that the transformation indeed preserves the reachability prob- abilities, and therefore this transformation is sound.
Lemma 3.3.2. Let s P S, A, B Ď S and r P N. For each scheduler S in M there
exists a scheduler Sr in MĂsuch that
PrS M,s(A UďrB) =Pr r S Ă M,s(A 1 UďrB), where A1 = A Y ts
α : s P A, α P Act(s)u. For each scheduler Sr in MĂthere exists a corresponding scheduler S in M fulfilling the same equation.
3.3 Upper-reward bounded quantiles
Proof. Let ρ = s0α0s1α1 . . . αnsn with s = s0 and t = sn denote a finite path from
state s to state t in M. Using the transformation M MĂ there exists a unique finite path ρrin MĂstarting in s and ending in t. So, it is possible to assign each finite path in M to exactly one finite path in M. Vice versa, when a path inĂ MĂstarts in an arbitrary state of S and also ends in a state of S it is possible to assign this path unambiguously to a path in M. In this way one obtains a bijective function between paths in M and paths in MĂ starting in a state of S and also terminating in a state of S.
The probabilities along the paths ρ and ρrare the same, and also the accumulated reward along both paths is the same since the transformation introduces exactly the same reward to a newly created state sα when rew(s, α) ą 0.
It is possible to find a mimicking scheduler Sr overMĂfor an arbitrary scheduler S over M using the following observations:
• For s P S and α P Act(s) with rew(s, α) = 0, Sr simply picks α PAct(s) inĂ MĂas this action was preserved by the transformation.
• If s P S and α P Act(s) with rew(s, α) ą 0, the scheduler Sr will pick action ˆ
α P ĂAct(s) which was introduced by the transformation. Since ˆα only has the newly created intermediate state sα as the sole successor the positive transition- reward rew(s,α) will be added to the accumulated reward in state sα in any case. In sα the only available action is α and therefore will be taken and all successors of α in M can be reached with their respective probabilities.
Therefore, it is possible to mimic any scheduler over M in MĂand the accumulated reward is the same for both schedulers.
Since the transformation M MĂdoes not introduce any nondeterminism to the transformed model MĂ, and since the previous observations can be inverted, each scheduler Sr over MĂcan also be mimicked by a scheduler over M, again accumulating the same reward.
So, the following holds for arbitrary r and s P S: PrS M,s(A UďrB) =Pr r S Ă M,s((A Y tsα : s P A, α PAct(s)u) U ďr B)
This shows the claim.
Lemma 3.3.2 is of help when considering the computation of quantiles since direct consequences of this lemma are as follows:
Corollary 3.3.3. Let s P S and A, B Ď S. For Q P tD, @u, D P tě, ąu, and
p P [0, 1] X Q the following holds: quM s (QPDp(A Uď? B)) =qu Ă M s (QPDp(A1 Uď?B)), with A1 = A Y ts α : s P A, α PAct(s)u as in Lemma 3.3.2.
Corollary 3.3.3 shows that the transformation is indeed quantile-preserving, and especially the computation of qualitative quantiles (where p is either 0 or 1) over models using a transition-reward function is therefore possible utilising the presented transformation and afterwards the qualitative algorithm introduced in [UB13, Algorithm 1].