Markov Decision Processes 4
4.5 Model Checking
In this section, we summarize the standard existing algorithms to perform ex-haustive model checking of probabilistic reachability properties for finite-state MDP. The first two, which use linear programming and value iteration, are sim-ilar to the algorithms we presented for DTMC in Section3.2.3. The third ap-proach, policy iteration, is based on the enumeration of schedulers and thus has no DTMC counterpart. Again, we use the notions and assumptions from LTS analogously here when describing memory usage and runtime. In particular, the MDPs considered have n states, maximum fan-out b = maxs∈S|T (s)|, and we are able to evaluate/store the transition function in constant time/memory.
Using Linear Programming
Exact reachability probabilities for DTMC can be computed by solving a linear equation system. The equations capture the relation between the probability values of a state and those of its successors: a state’s reachability probability is the weighted sum of the probabilities of its successors according to the probab-ilistic transition function. For MDP, this relationship is more complex due to the additional nondeterministic choice between probability distributions over successors. Depending on whether we are interested in maximum or minimum reachability probabilities, we need to determine, for all states at once, the max-imum or minmax-imum of the weighted sums over all the successor distributions.
This can be formulated as a linear programming (LP) problem as shown in Algorithms9 and12, which are based on [BK08,FKNP11]. Let us focus on Algorithm9for the moment, which computes maximum reachability probab-ilities. It builds the LP problem based on sets of states that reach a target state with maximum probability one (Smax=1 ) or zero (Smax=0 ). As for DTMC, these sets can be obtained with standard graph algorithms inside functions Smax0 and Smax1, the details of which we omit here. In particular, no numeric compu-tations are necessary. The variables xs of the LP problem represent, for each state s, the (maximum) probability of reaching aφ-state. Line7produces one constraint for every transition of every relevant state. By using the greater-or-equal operator here and asking for maximal values in line 3, we achieve the global maximisation of the reachability probability.
Minimum reachability probabilities can be computed with LP as shown in Algorithm 12. In the LP problem itself, we see that maximisation has been replaced by minimisation, and the probability values xs are required to be less than or equal than the weighted sum over the successors. However, the graph-based computations necessary to obtain the sets of probability-one and probability-zero states are also different and cannot directly be expressed
Input: Finite MDP M = hS,A,T,sinit,AP,Li and property Pmax(φ) Output: JPmax(φ)KM(value in [0,1])
1 Smax=1 := Smax1(M,φ),e.g.{s ∈ S |φ(L(s))}
2 Smax=0 := Smax0(M,φ),
e.g.{s ∈ S\ Smax=1 | ∀s0: (Paths(s,s0) 6= ∅ ⇒ ¬φ(L(s0)))}
3 lp :=maximise ∑s∈Sxssubject to
4 xs= 1 for all s ∈ Smax=1
5 xs= 0 for all s ∈ Smax=0
6 0 < xs<1 for all s /∈ Smin=1 ∪ S=0min
7 xs≥ ∑s0∈Sμ(s0) ∙ xs0 for all s /∈ Smax=1 ∪ Smax=0 and ha,μi ∈ T (s)
8 end
9 solve the linear program lp andreturn xsinit
Algorithm 9: MDP max. reachability checking with linear programming
1 function Smin0(M = hS,T,sinit,AP,Li,φ)
2 R := {s ∈ S |φ(L(s))}
3 repeat
4 R0:= R
5 R := R0∪ {s ∈ S | ∀ha,μi ∈ T (s): ∃s0∈ R:μ(s0) > 0}
6 until R = R0
7 return S \ R
Algorithm 10: Computing the set of states Smin=0 for an MDP [FKNP11]
1 function Smin1(M = hS,T,sinit,AP,Li, Smin=0)
2 R := S \ S=0min 3 repeat
4 R0:= R
5 R := R0\ {s ∈ R0| ∃ha,μi ∈ T (s): ∃s0∈ S \ R0:μ(s0) > 0}
6 until R = R0
7 return R
Algorithm 11: Computing the set of states Smin=1 for an MDP [FKNP11]
with our established notions like states reachable via finite paths. We thus give the (non-numeric) fixpoint computations to obtain these sets in Algorithms10 and11.
There is a wide range of methods to solve LP problems. The complex-ity of solving an LP problem is polynomial in the number of variables. The worst-case runtime of algorithms9and12with an asymptotically optimal LP solution algorithm is consequently a polynomial in n. In practice, the state spaces of realistic and relevant MDP models are too large to be analysed with LP methods [FKNP11]. Memory usage depends on the way the LP problem is represented and solved; what we see in our algorithms is that the number of constraints we generate is in O(n ∙ b). Clearly, exhaustive model checking for MDP suffers from the state space explosion problem again (which affects the value and policy iteration approaches presented below just as well as this LP-based one).
Value Iteration
For DTMC models, the practical limitations of the exact linear equation system-based technique can be alleviated by using numeric dynamic programming in the form of value iteration. As mentioned, state space explosion is still the prin-cipal problem, but value iteration can be used on much larger state spaces before it becomes a limitation in practice. The value iteration algorithm for DTMC (Algorithm5) only needs small changes to be applicable to MDP. The result is shown as Algorithm13, which is the “Gauss-Seidel” variant of [FKNP11]. The main difference is that we need to compute the maximum over all transitions in addition to the weighted sum according to the probability distributions in line8.
To model-check minimum instead of maximum reachability queries, we would simply replace every occurrence of min by max, except for the one in line9.
Memory usage is in O(n) as for DTMC, and runtime is dependent onεand the structure of the model, too.
Policy Iteration
A new approach for MDP is to use policy iteration, also called Howard’s al-gorithm for its discoverer Ronald A. Howard. Policy is an alternative name for what we called a scheduler in Definition40. The idea of the algorithm, which is shown as Algorithm14[FKNP11], is to
1. start with some arbitrary (memoryless) scheduler (line1),
2. use one of the DTMC model checking techniques to obtain the reachability probabilities for the Markov chain induced by the current scheduler (line4),
Input: Finite MDP M = hS,A,T,sinit,AP,Li and property Pmin(φ) Output: JPmin(φ)KM (value in [0,1])
1 Smin=0 := Smin0(M,φ)
2 Smin=1 := Smin1(M, Smin=0)
3 lp :=minimise ∑s∈Sxssubject to
4 xs= 1 for all s ∈ Smin=1
5 xs= 0 for all s ∈ Smin=0
6 0 < xs<1 for all s /∈ Smin=1 ∪ S=0min
7 xs≤ ∑s0∈Sμ(s0) ∙ xs0 for all s /∈ Smin=1 ∪ S=0minand ha,μi ∈ T (s)
8 end
9 solve the linear program lp andreturn xsinit
Algorithm 12: MDP min. reachability checking with linear programming
Input: Finite MDP M = hS,T,sinit,AP,Li, property Pmax(φ) andε>0 Output: JPmax(φ)KM(value in [0,1])
1 Smax=0 := Smax0(M,φ)
2 Smax=1 := Smax1(M,φ)
3 with v ∈ S → [0,1]:
4 foreach s ∈ S do v(s) := 1 if s ∈ Smax=1 , otherwise 0
5 repeat
6 error := 0
7 foreach s ∈ S \ (S=1max∪ Smax=0 ) do
8 vnew:= max
∑s0∈support(μ)μ(s0) ∙ v(s0) | ha,μi ∈ T (s)
9 if vnew>0then error := max{error,|vnew− v(s)|/v(s)}
10 v(s) := vnew
11 end
12 until error <ε
13 return v(sinit)
14 end
Algorithm 13: MDP model checking with value iteration [FKNP11]
Input: Finite MDP M = hS,A,T,sinit,AP,Li, property Pmax(φ) Output: JPmax(φ)KM(value in [0,1])
1 S:= arbitrary scheduler for M
2 repeat
3 S0:= S
4 compute ps:= JP(φ)K in ind(M,S) with initial state s for all s ∈ S
5 foreach s ∈ S do S(s) := argha,μi∈T (s)max
∑s0∈support(μ)μ(s0) ∙ ps0
6 until S = S0
7 return psinit
Algorithm 14: MDP model checking with policy iteration [FKNP11]
3. improve the scheduler by changing its decisions where it does not yet go to the successors with the highest probabilities (line5), and
4. repeat until the scheduler cannot be improved any more.
As shown, Algorithm14computes maximum probabilities. To compute min-imum probabilities instead, the max operation in line5would need to be re-placed by min. Note also that the computation in line4can be performed for all states in one go using only a slightly modified version of any of the tech-niques presented for DTMC model checking in Section3.2.3.
An upper bound on the runtime of policy iteration is the number of different schedulers, which is exponential. However, the performance of policy iteration in practice is competitive and comparable to that of value iteration [FKNP11].
Memory usage is obviously in O(n).
Other Approaches and Implementations
A prominent and widely-used tool that implements these standard techniques for exhaustive MDP model checking is PRISM[KNP11]. There are also other approaches, implemented in different tools, in addition to the three described so far. To name a few examples:
– The PASS tool [HHWZ10b] uses probabilistic counterexample-guided ab-straction refinement [HWZ08] to combat the state space explosion problem.
Instead of performing its analysis on the full state space of the given MDP, it extracts predicates from the high-level description of the process (i.e. a VMDP) to compute a more abstract model. It exploits the bounds given as part of qualitative form of probabilistic properties to improve the abstraction until the probability in that abstraction is lower or higher than the bound.
This approach works very well whenever a coarse abstraction is sufficient to
satisfy the bound, but takes very long if many refinement steps are needed.
By representing an infinite number of variable valuations in a finite number of predicates, PASScan deal with some infinite-state models.
– Another tool that can deal with infinite-state MDPs is INFAMY[HHWZ09].
Originally developed for continuous-time Markov chains, it has been exten-ded to also work for MDP models. The key idea is to explore only the part of the model’s state space, similar to a number of BFS layers around the initial state, that is sufficient for the bounds of the given property or the acceptable error in answering a query.
– Finally, an early implementation that used refinement and very specific state space reduction techniques [DJJL01,DJJL02] was the RAPTUREtool.