Interval Markov Decision Processes - Decision algorithms for modelling, optimal control and ver

Real-world systems operate in the presence of uncertainty. There are various types of uncertainty that can influence the modelling of complex (physical or artificial) systems. One of the most important types of uncertainty is the inherent imprecision that is introduced by measurement errors and discretization artifacts which necessarily happen due to incom- plete knowledge about the system behaviour.

A severe limitation of Markov Decision Processes (MDPs) is that the probability values used in the transitions are specific, fixed values, which can have a considerable impact on the outcome of the model checking [KU02, NG05, Kat16]. In fact, due to measurement

4.5. Interval Markov Decision Processes 51

uncertainties or modeling errors, it is possible to obtain two MDP models of an underlying (physical) system with slightly different values on the transition probabilities such that they give very different outcomes, even though the two MPDs model the same system. A favorable method for avoiding this issue is to include such uncertainties and errors in the model itself.

We now define Interval Markov Decision Processes (IMDPs) as an extension of MDPs, which allows for the inclusion of transition probability uncertainties as intervals. That is, the probabilities assigned by transition probability distributions to states are not fixed numbers; rather, they are known to lie within a given interval. IMDPs belong to the family of uncertain MDPs and allow to describe a set of MDPs with identical (graph) structures that differ in distributions associated with transitions. Formally,

Definition 4.10 (Interval Markov decision processes). An Interval Markov Decision Process (IMDP)Mis a tuple(S, ¯s, A ,

AP

, L, I), where S is a finite set of states, ¯s ∈ S is the initial state,A is a finite set of actions,

AP

is a finite set of atomic propositions, L : S→ 2AP_is

a labelling function, and I : S×A ×S → I∪{[0, 0]} is a total interval transition probability function with I = { [l, u] ⊆ R | 0 < l ≤ u ≤ 1 }. We denote by [M], the class of all finite-state finite-transition IMDPs.

We denote the set of available actions at state s ∈ S by A (s). Furthermore, for each state s and action a ∈ A (s), we write s−→ ha a

s if h

s ∈ Disc(S) is a feasible distribution, i.e.

for each state s0 ∈ S we have ha

ss0 = has(s0) ∈ I(s, a, s0). By H a

s , we denote the set of feasible

distributions for state s and action a. We require that the setHa s = { h a s | s a −→ ha s} is non-

empty for each state s and action a∈ A (s). Hence, the set of actions that are enabled from

scan also be described asA (s) = { a ∈ A | Ha s 6= ; }.

We extend I to sets of states as follows: given S0⊆ S, we let

I(s, a, S0) = min ¨ 1,X s0_∈S0 inf I(s, a, s0) « , min ¨ 1,X s0_∈S0 sup I(s, a, s0) « .

Remark 4.4. The size of a givenMis determined as follows. Let|S| denote the number of states inM. Then each state hasO_{(|A |) actions and at most}O_{(|A | · |S|) transitions, each of which is} associated with a probability interval. Therefore, the overall size ofMi.e.,|M| is inO(|S|2

|A |). The formal semantics of an IMDP is as follows. A path in M is a finite or infinite sequence of states in the formξ = s1hsa11s2s2· · · , where s1 = ¯s and for each i ≥ 1, si ∈ S,

a_i_{∈ A (s}_i), the transition probability hai

s_is_i₊₁ > 0. Path ξ can be finite or infinite. The sets of

all finite and infinite paths inMare denoted by Pathsfin_Mand Pathsinf_M, respectively. The i-th state and action along the pathξ are denoted by ξ[i] and ξ(i), respectively. For a finite pathξ ∈ Pathsfin_M, let last(ξ) indicate its last state. Moreover, let Pathsξ_M = { ξ0 ∈ PathsinfM | ξ is a prefix of ξ0_{} denote the set of infinite paths with the prefix ξ ∈ Paths}fin

Mwhich is also

known as the cylinder set ofξ.

In order to resolve nondeterministic transitions, schedulers and natures need to be de- fined for IMDPs. Intuitively, a scheduler is referred to every possible resolution of nonde- terminism while a nature is referred to every resolution of uncertainty. Formally,

52 Chapter 4 : An Overview of Probabilistic Systems ¯ s t u b a [₁ 3 , 2 3 ] [₁₀1 ,1] 4[,1 2 3_] [ 2₅, 3₅] c,[1, 1] d,[1, 1]

Figure 4.5: An example of IMDPs: the IMDPM

Definition 4.11(Scheduler and nature in IMDPs). Given an IMDPM, a scheduler is a functionσ: Pathsfin_M→ Disc(A ) that to each finite path ξ assigns a distribution over the set of actions enabled by the last state ofξ, that is, σ(ξ) ∈ Disc(A (last(ξ)). A nature is a function

π: Pathsfin

M× A → Disc(S) that to each finite path ξ and action a ∈ A (last(ξ)) assigns a

feasible distribution, i.e. an element ofHa

s where s= last(ξ). The sets of all schedulers and all

natures ofMare denoted byΣ and Π, respectively.

A schedulerσ is said to be deterministic (D) if σ(ξ) = δafor all finite pathsξ and some a ∈

A (last(ξ)). Similarly, a nature is said to be deterministic if π(ξ, a) = δha

last(ξ)for all finite paths

ξ, for all a ∈ A (last(ξ)), and some ha

last(ξ)∈ Hlast(ξ)a . Furthermore, a schedulerσ (nature π) is

Markovian (M) if it depends only on last(ξ). Given a finite path ξ of an IMDP, a scheduler

σ, and a nature π, the system evolution proceeds as follows. First, an action a ∈ A (si),

where si= last(ξ), is chosen nondeterministically by σ. Then, π resolves the uncertainties

and chooses nondeterministically one feasible distribution ha

si ∈ H

si. Finally, the next state s_i+1is chosen randomly according to the distribution ha

s_i, and pathξ is appended by si+1.

For a schedulerσ and a nature π, let Prσ,π_M denote the unique probability measure over (Pathsinf

M,B) such that the probability Pr σ,π M[Paths

M] of starting in s0 equals 1 if s0 = ¯s and

0, otherwise; and the probability Prσ,πM [Paths ξha

last(ξ)s0s0

M ] of traversing a finite path ξh a last(ξ)s0s0 equals Prσ,π_M[Pathsξh a last(ξ)s0s0 M ] = Pr σ,π M[Paths ξ

M] · σ(ξ)(a) · π(ξ, a)(s0). Here, B is the standard σ-algebra over Pathsinf

Mgenerated from the set of all cylinder sets{ Paths ξ

M | ξ ∈ Paths fin M}.

The unique probability measure is obtained by the application of the extension theorem (see, e.g. [Bil79]).

It is worthwhile to note that the scheduler does not choose an action but a distribution over actions. Such a randomization typically simplifies and speeds up the procedure of solving difficult problems. For instance, it is well-known that randomization is useful in the context of bisimulations as it allows to define coarser equivalence relations [Seg95]. To the contrary, nature is not allowed to randomize over the set of feasible distributionsHa

s .

This is in fact not necessary, since the setHa

s is closed under convex combinations.

In order to avoid ambiguity, we sometimes describe the IMDP M as a tuple (SM, ¯sM,AM,

AP

M, LM, IM) by adding the IMDP model symbol as a subindex to its generic

elements.

In document Decision algorithms for modelling, optimal control and veriﬁcation of probabilistic systems (Page 62-65)