3.6 Sparse Matrices
4.1.1 Schemes for Encoding
We begin by considering the problem of representing DTMCs and CTMCs as MTBDDs. These two types of models are both described by real-valued matrices. From their incep-
tion, MTBDDs [CMZ+93, CFM+93, BFG+93] have been used to represent matrices, a
process we described in Section 3.7. The basic idea is that a matrix can be thought of as a function mapping pairs of indices to real numbers. Given an encoding of these indices into Boolean variables, we can instead view the matrix as a function mapping Boolean variables to real numbers, which is exactly what an MTBDD represents.
In the simple example of Section 3.7, matrix indices were simply integers and we
encoded them using their standard binary representation. In our case, however, the
transition matrix of the DTMC or CTMC is indexed by states. Hence, what we actually need is an encoding of the model’s state space into MTBDD variables. One approach is to enumerate the set of states in the model, assigning each one a unique integer, and then proceed as before. As we will see, though, by taking a more structured approach to the encoding, we can dramatically improve the efficiency of this representation.
The use of MTBDDs to represent probabilistic models has been proposed on a number
of occasions, for example [BFG+93, HMPS94, BCHG+97]. The issue of developing an effi-
cient encoding, however, was first considered by Hermanns et al. in [HMKS99]. One of the main contributions of the paper is a set of ‘rules of thumb’ for deriving compact MTBDD encodings of CTMCs from descriptions in high-level formalisms such as process algebras and queueing networks. This extends previous work [EFT91, DB95] which considers the efficient encoding of non-probabilistic process algebra descriptions into BDDs.
The key observation of Hermanns et al. is that one should try to preserve structure and regularity from the high-level description of the CTMC in its MTBDD encoding. For example, in the process-algebraic setting, a system is typically described as the parallel composition of several sequential components. They show that it is more efficient to first obtain a separate encoding for each of these components, and only then combine them into a global encoding. Regularity in the high-level description which can be reflected in the low-level MTBDD representation results in an increase in the number of shared nodes and, subsequently, a decrease in the size of the data structure.
In [dAKN+00], we described how these ideas can be be applied and extended to encode
models described using the PRISM language. In this case, a model’s state space is defined by a number of integer-valued PRISM variables and its behaviour by a description given in terms of these variables. Hence, to benefit from structure in this high-level description,
N States MTBDD Nodes ‘Enumerated’ ‘Structured’ 5 240 807 271 7 1,344 3,829 482 9 6,912 15,127 765 11 33,792 54,389 1,096 13 159,744 184,157 1,491 15 737,280 594,309 1,942
Table 4.1: MTBDD sizes for two different encoding schemes
there must be a close correspondence between PRISM variables and MTBDD variables. To achieve this, we encode each PRISM variable with its own set of MTBDD variables. For the encoding of each one, we use the standard binary representation of integers.
Consider a model with three PRISM variables, v1, v2 and v3, each of range {0, 1, 2}.
Our structured encoding would use 6 MTBDD variables, say x1, . . . , x6, with two for each
PRISM variable, i.e. x1, x2 for v1, x3, x4 for v2 and x5, x6 for v3. The state (2, 1, 1), for
example, would become (1, 0, 0, 1, 0, 1).
An interesting consequence of this encoding is that we effectively introduce a number
of extra states into the model. In our example, 6 MTBDD variables encode 26 = 64 states,
but the model actually only has 33 = 27 states, leaving 37 unused. We refer to these extra
states as dummy states. To ensure that these do not interfere with model checking, when we store the transition matrix for the model, we leave the rows and columns corresponding to dummy states blank (i.e. all zero).
We now present some experimental results to illustrate the effect that the choice of encoding can have on the size of the MTBDD. We use a CTMC model of the cyclic server polling system of [IT90]. By varying N , the number of stations attached to the server, we consider several models of different sizes (for more information, see Appendix E). Table 4.1 shows statistics for each model. We give the number of states and the size of the MTBDD (number of nodes) which represents it for the two different encoding schemes described above: ‘enumerated’, where we assign each state an integer and encode it using the standard binary encoding; and ‘structured’, where we work from a high-level description, encoding each PRISM variable with its own set of MTBDD variables. It is clear from the table that the ‘structured’ encoding results in far more compact storage.
This encoding scheme has two other important advantages. These both result from the close correspondence between PRISM variables and MTBDD variables. Firstly, it facilitates the process of constructing an MTBDD, i.e. the conversion of a description in the PRISM language into an MTBDD representing the corresponding model. Since the
description is given in terms of PRISM variables, this can be done with an almost direct translation. We discuss this process in more detail in Section 4.3 and Appendix C.
Secondly, we find that useful information about the model is implicitly encoded in the MTBDD. As described in Section 3.4, when using PRISM, the atomic propositions used in PCTL or CSL specifications are predicates over PRISM variables. It is therefore simple, when model checking, to construct a BDD which represents the set of states satisfying such a predicate by transforming it into one over MTBDD variables. We will give some examples of this in Section 5.1. With most other encodings, it would be necessary to use a separate data structure to keep track of which states satisfy which atomic propositions.