Dissertations, Theses, and Masters Projects Theses, Dissertations, & Master Projects 2000
Data structures for the analysis of large structured Markov
Data structures for the analysis of large structured Markov
models
models
andrew S. Miner
College of William & Mary - Arts & Sciences
Follow this and additional works at: https://scholarworks.wm.edu/etd
Part of the Computer Sciences Commons Recommended Citation
Recommended Citation
Miner, andrew S., "Data structures for the analysis of large structured Markov models" (2000). Dissertations, Theses, and Masters Projects. Paper 1539623985.
https://dx.doi.org/doi:10.21220/s2-sjja-aj08
This Dissertation is brought to you for free and open access by the Theses, Dissertations, & Master Projects at W&M ScholarWorks. It has been accepted for inclusion in Dissertations, Theses, and Masters Projects by an authorized administrator of W&M ScholarWorks. For more information, please contact [email protected].
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI films the
text directly from the original or copy submitted. Thus, some thesis and
dissertation copies are in typewriter face, while others may be horn any type of
computer printer.
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print colored or poor quality illustrations and
photographs, print bleedthrough, substandard margins, and improper alignment
can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete manuscript and
there are missing pages, these wilt be noted. Also, if unauthorized copyright
material had to be removed, a note wilt indicate the deletion.
Oversee materials (e.g., maps, drawings, charts) are reproduced by sectioning
the original, beginning at the upper left-hand comer and continuing from left to
right in equal sections with small overlaps.
Photographs included in the original manuscript have been reproduced
xerographicaily in this copy. Higher quality 6" x 9* black and white photographic
prints are available for any photographs or illustrations appearing in this copy for
an additional charge. Contact UMI directly to order.
Bell & Howell Information and Learning
300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
_________ <H>
U1VLI
800-521-0600
DATA STRUCTURES FOR THE ANALYSIS OF LARGE
STRUCTURED MARKOV MODELS
A Dissertation
Presented to
The Facility o f the Department o f Computer Science
The College o f W illiam & M ary in V irginia
In P a rtia l Fulfillm ent
O f the Requirements for the Degree o f
D octor o f Philosophy
by
Andrew Stephen M iner
2000
Copyright 2001 by Miner, Andrew Stephen
All rights reserved.
UMI'
UMi Microform9989345
Copyright 2001 by Bell & Howell Information and Learning Company. All rights reserved. This microform edition is protected against
unauthorized copying underTitie 17, United States Code.
Belt & Howell information and Learning Company 300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Mi 48106-1346
APPROVAL SHEET
This dissertation is subm itted in p artial fulfillm ent of
the requirements for the degree o f
D octor o f Philosophy Andrew S. Miner Approved, June 2000 Gianfranco Ciardo Thesis-Adwsor
"S-ktK
\hk
Steve Park Virginia Torczon A lex Pothen Old. Dominion Universityn
Hi
Table o f C ontents
A cknow ledgm ents x i
L ist o f T ab les x iii
L ist o f F ig u res x v iii
L is t o f A lg o rith m s x x L is t o f S ym bols x x ii A b strac t x x iii 1 In tro d u c tio n 2 1.1 C o ntribution s... 4 1.2 O rg a n iza tio n ... 5
2 Background
7
2.1 N otation and basic d efin itio n s... 7
2.2 Sparse m atrix s to ra g e ... 9
2.3 Solving linear system s... - ... 12
tv
3 M arkov Chains
19
3.1 Random, variables and im portant d is trib u tio n s ... 19
3.2 Stochastic processes and Markov c h a in s ... 24
3.3 Discrete-tim e Markov chains... 25
3.3.1 Transient a n a ly s is ... 26
3.3.2 Stationary analysis . ... 29
3.3.3 Mean tim e to absorption... 31
3.4 Continuous-time Markov chains ... 33
3.4.1 Transient a n a ly s is ... 34
3.4.2 Stationary analysis... 37
3.4.3 Mean tim e to absorption... 40
3.5 Phase-type d istrib u tio n s... 41
3.5.1 Discrete phase-types... 42
3.5.2 Continuous phase-types... 44
3.5.3 Phase-types and general d is trib u tio n s ... 45
4 H ig h -le v e l form alism s 47 4.1 M odel p a ra d ig m ... 47 4.2 P etri n e ts ... 50 4.3 Logical analysis... 55 4 4 M arkov analysis... 57 4.5 Structured m o d e ls ... 62 v
5 E x p lic it S ta te S pace G en eratio n 69
5.1 Generation algorithm ... 70
5.2 Traditional d ata s tru c tu re s ... 72
5.2.1 Storage o f s ta te s ... 72
5.2.2 Storing a set o f s ta te s ... 74
5.2.3 Storage o f unexplored states... 78
5.2.4 C om pression... 81
5.3 M ulti-level trees for structured models . ... 83
5.3.1 Representing unexplored s ta te s ... 88
5.3.2 E xploiting lo c a lity ... 92
5.3.3 C om pression... 93
5.4 Generating local states f ir s t ... 94
5.4.1 T rad itio n al structure... 95
5.4.2 B it v e c to rs ... 96
5.4.3 M u lti-level arrays... 97
5.5 Experim ental re s u lts ... 99
5.6 C o n clu sio n ... 108
6 Sym bolic S ta te Space G en eratio n 110 6.1 Decision d ia g ra m s ... I l l 6.1.1 M anipulating M D D s ... 113
6.2 Generating S w ith B D D s ... 118
6.3 Generating S w ith MDDs ... 121
vi
6.3.2 Occurrence o f synchronizing events... 127
6.3.3 Complete generation a lg o rith m ... 130
6.4 Logical queries on the state space... 131
6.5 Experim ental results ... 134
6.5.1 Dining philosophers m odel... 134
6.5.2 Slotted ring m o d e l... 139
6.5.3 FMS m o d e l... 141
6.5.4 Kanban m o d e l... 144
6.6 Conclusion ... . 147
7 T ra n s itio n R a te M a trix S to rag e 148 7.1 Kronecker a lg e b ra ... 150
7.2 Sparse Kronecker representations... 154
7.3 Representing Q w ith Kronecker a lg e b ra ... 155
7.4 Kronecker overheads... 158
7.4.1 Overheads from using the potential states . ... 159
7.4.2 Overheads from using the actual states ... 161
7.5 Decision diagrams to store the state space... 163
7.5.1 State searches... 164
7.5.2 Computing state in d ic es... 165
7.5.3 Determ ining the next reachable s ta te ... 166
7.6 M atrix diagrams to store the transition rate m a trix ... 167
v ii
7.6.1 Kronecker products with, m atrix diagram s... 169
7.6.2 A ddition o f m atrix d ia g ra m s ... 171
7.6.3 Subm atrix selection w ith m atrix diagram s... 173
7.6.4 Representing R with, m atrix d iag ram s... 176
7.6.5 M a trix diagram column ac c e s s... 178
7.7 Experim ental re s u lts ... 184
7.8 Conclusion ... 189
8 A S ta tio n a ry A p p ro x im a tio n 191 8.1 Exact ag g re g atio n ... 193
8.2 Approximate aggregation using decision diagram s... 194
8.2.1 O ur decision diagram s tru c tu re ... 195
8.2.2 A partition based on decision diagrams ... 197
8.2.3 A simple case o f exact aggregation... 198
8.2.4 O ur approxim ation... 200
8.2.5 An exam ple... 203
8.2.6 Exploiting event lo c a lity ... 207
8.3 Algorithm ic d e ta ils ... 212
8.3.1 The fixed-point c o m p u ta tio n ... 213
8.3.2 D ata structures...215
8.3.3 Computing measures...215
8.4 Product-form m odels... 217
8.5 Experim ental re s u lts ... 219
8.5.2 Load-dependent service m o d e l... 222
8.5.3 Kanban m o d e l... 227
8.5.4 Flexible manufacturing system (FM S) m o d e l... 228
8.6 C o n c lu s io n ... 231
9 A p p lic a tio n s 233 9.1 D istributed algorithm verificatio n ... 233
9.2 Web server performance evalu atio n ... 242
10 C onclusion 250 10.1 Future research... . 252
10.1.1 E x te n s io n s... 252
10.1.2 Related w o rk ... 253
10.1.3 New d ire c tio n s ... 254
A S M A R T 256 A.1 SM AR T Language... 257
A .1.1 Function d e c la ra tio n s ... 258
A .1.2 A rra y s ... 259
A .1.3 Fixed-point ite ra tio n s ... 259
A.2 Random variables... 260
A.3 M odel form alism s... 261
A.4 D istributed version (under developm ent)... 262
rx
A .4.1 D istributed algorithm s... 262
A .4.2 Concurrent solutions... 262
B B enchm arks 264 B .l Kanban m o d e l... 264
B.2 Flexible manufacturing system (FMS) m o d e l... 265
B.3 Dining philosophers m odel... 269
B.4 Slotted ring network protocol m o d e l... 271
C A n alysis o f Case 273
D D in in g philosophers states 275
B ib lio g ra p h y 280
x
This work would have been impossible w ithout the help o f many people. First and foremost, I would like to thank m y advisor, Gianfranco Ciardo, who tolerated my unusual sleeping schedules. His office door was always open and we had countless fru itfu l discussions about SM A R T, research ideas, and papers. We also attended some excellent conferences and shared an extrem ely dangerous daily commute v ia rickety bicycles in Torino.
I would also like to thank Steve Park, who somehow managed to find tim e in his busy schedule as Departm ent Chair (and now as Dean o f A rts and Sciences) to help me w ith grant proposals, cover letters, and other delicate matters o f diplomacy and word-smithing. I must also profusely thank Evgenia S m irai. Although I could not possibly enumerate the entire list o f items fo r which I owe her thanks (which grows by the day), I w ill at least say that I would alm ost certainly be unemployed without her help. Many thanks also to Andreas Stathopoulos, V irg in ia Torczon, and Alex Pothen (and to the rest o f the committee members already named) for reading this rather large body o f work and providing many valuable constructive comments. Thanks also to Susanna D onatelli for arranging my productive research visit to the Universita di Torino, Ita ly . I hope it was the first o f many such visits. I would also like to thank the V irginia Space Grant Consortium, the NASA Graduate Student Researchers Program , and the Departm ent of Computer Science at W illiam and M ary for financial support during my graduate studies.
Finally, on a personal note, I thank my fam ily and friends for th eir support, especially Shannon. I w ill always remember Wednesday pool, “celebrity" Cheese Shop lunches, disc golf, frisbee a t a ll hours, M ystery Science Theater 3000, and lam trips to Dennys.
This document was prepared using the document preparation system [62]. The
figures were draw n with. Tgif.
x i
List o f Tables
2.1 Memory required to store a m a trix ... 11
3.1 Computing 7r(6.0) using u n ifo rm iza tio n ... 38
4.1 Comparison of memory required for vs. ... . 60
4.2 Event rates for the structured model o f Figure 4 . 5 ... 68
5.1 State space sizes for benchmark m o d e ls ... 100
5.2 Mem ory usage for traditional techniques... 101
5.3 Mem ory usage for structured techniques... 103
6.1 Results for Dining Philosophers, two philosophers per subm odel... 137
6.2 Results for Slotted Ring, one node per subm odel... 141
6.3 FM S decom positions... 143
6.4 Results for FM S, 10 submodel decom position... 143
6.5 Results for Kanban, 4 submodel decom position... 146
7.1 Next functions for the structured model o f Figure 4 .5 ... 157
7.2 C T M C sizes and memory requirements fo r Kanban and F M S ... 185
8.1 CTM C sizes fo r the fbrk-join m odel... 221
8.2 Iterations for the fork-join model, N = 40, X = 1 ... 221
8.3 C TM C sizes for the load-dependent service m odel... 224
8.4 Iterations for the load-dependent service model, N = 40, A = 1 ... 224
8.5 C TM C sizes for the Kanban m o d e l... 228
8.6 Iterations for the Kanban model, iV = 66 ... 229
8.7 CTM C sizes for the FMS m odel. ... 230
8.8 Iterations for the FMS model, N = 33 ... 230
9.1 Results for verification o f Algorithm 9 . 1 ... 240
9.2 Results for verification of modified Algorithm 9 .1 ... 241
9.3 Rates for the model o f Figure 9.4 w ith 5 servers... 244
9.4 Rates for the model o f Figure 9.4 w ith 10 s e rv e rs ... 247
B.1 Transition rates for the Kanban m o d el... 265
B.2 Transition rates for the Flexible M anufacturing System m odel... 268
List o f Figures
2.1 Storage o f a sparse m a trix ... 10
3.1 Exam ple D T M C s ... 27
3.2 Exam ple C T M C s ... 35
3.3 Discrete phase d is trib u tio n s... 42
3.4 Continuous phase d is trib u tio n s ... 44
4.1 P etri net of an open queueing n etw o rk... 52
4.2 Producer / Consumer P e tri n e t... 53
4.3 A sim ple model and its state space... 57
4.4 Underlying CTM C based on S ... 60
4.5 Exam ple o f a structured m o d e l ... 65
4.6 Structured models and logical product fo rm ... 67
5.1 Some P-semiflows and invariants o f a P etri n e t ... 73
5.2 A hash table w ith c h a in in g ... 75
5.3 Storing unexplored states using an extra pointer per n o d e ... 79
5.4 Storing unexplored states using a linked lis t ... 80
xiv
5.6 Compression using a sorted array o f states... 82
5.7 Compression using an unordered array o f states and an ordering array . . . 82
5.8 Chiola’s m ulti-level technique to store the reachability set o f a S P N ... 83
5.9 M athem atical representation o f a m ulti-level s tru c tu re ... 85
5.10 A m ulti-level tree representing the structure in Figure 5 .9 ... 86
5.11 Representing U w ith a second m ulti-level tr e e ... 89
5.12 Representing U and Tt in one m ulti-level t r e e ... 91
5.13 The compressed version o f the m ulti-level tree o f Figure 5 .1 0 ... 93
5.14 A m ulti-level array representing the structure in Figure 5 . 9 ... 98
5.15 Traditional generation times for K a n b a n ... 104
5.16 Traditional generation times for F M S ... 105
5.17 Traditional generation times for D ining Philosophers... 106
5.18 Traditional generation times for Slotted R in g ... 107
6.1 M D D representations o f m in ({r.y t2} ) ... 113
6.2 C o m p u tin g / =Case(min({art y })Tz < 0 . z < l.z < 2 ) ... 117
6.3 M DDs for the Case computation o f Figure 6 .2 ... 118
6.4 A BDD encoding reachable markings for a simple P etri n e t ... 119
6.5 The M DD equivalent o f Figure 5.14 ... 122
6.6 Application o f Equation 6.3 for an event local to submodel 2 ... 125
6.7 M D D example fo r a synchronizing event e . . . _____ . . . 128
6.8 Perform in g a query on S ... 132
xv
6.9 Symbolic generation times and memory usage for Dining Philosophers . . . 135
6.10 Symbolic generation times and memory usage for Slotted R in g ... 139
6.11 Symbolic generation times and memory usage for F M S ... 142
6.12 Symbolic generation times and memory usage for K a n b a n ... 145
7.1 Exam ple Kronecker product ... 152
7.2 Example Kronecker s u m ... 153
7.3 Local matrices for the structured model o f Figure 4 .5 ... 158
7.4 Kronecker matrices for the structured model o f Figure 4 .5 ... 159
7.5 State space data s tru c tu re ... 164
7.6 A n example m atrix diagram and the m atrix it represents ... 168
7.7 M atrix diagram o f the Kronecker product o f Figure 7 .1 ... 171
7.8 Example o f m atrix diagram a d d itio n ... 174
7.9 Exam ple o f a m atrix diagram representing a subm atrix... 174
7.10 M a trix diagram for the structured model of Figure 4 .5 ... 177
7.11 Interesting portion o f the m atrix diagram in Figure 7 .1 0 ... 180
7.12 A trace o f Algorithm 7 .4 ... 181
7.13 Column m ultiplication tim es. Kanban m o d e l... 187
7.14 Column m ultiplication times, FM S m o d e l... 188
8.1 A simple aggregation example ... 194
8.2 Adding redundant nodes to an R O M D D ... 195
8.3 Exam ple o f sets A{p) a n d B (p )... 196
8.4 Decision diagram with, node la b e ls ... 204
x v i
8.6 Computing A. matrices at level 3 ... 205
8.7 Level 3 C T M C ... 206
8.8 Level 2 C T M C ... 207
8.9 Level 1 C T M C ... 208
8.10 MDD for a product-form network with. 4 queues and 4 custom ers...219
8.11 A Fork-join m o d e l... 220
8.12 Error for the fork-join model against N ... 222
8.13 Error for the fork-join model against.X ... 223
8.14 A load-dependent service m o d el... 223
8.15 Error for the load-dependent service model against N ... 225
8.16 Error for the load-dependent service model against X ... 226
8.17 Relative error for the Kanban m o d e l... 227
8.18 Relative error for the FMS m o d el... 229
9.1 Message passing subnet for machine i. Algorithm 9 .1 ... 236
9.2 Probe subnet for machine i, Algorithm 9 .1 ... 238
9.3 Special subnet for machine 0, Algorithm 9 .1 ... 239
9.4 Model o f a group o f K Web servers... 242
9.5 Probability o f a fu ll system. K = 5 servers and J = 5 jo b s ... 245
9.6 Average number o f requests in the system, K = 5 servers and J = 5 jobs . . 246
9.7 CPU tim es fo r K — 5 servers and J = 5 jo b s ...247
9.8 Probability o f a fa ll system, K — 10 servers and J = 10 jo b s... 248
xvii
9.9 Average number o f requests in the system, K = 10 servers and J = 10 jobs 249
9.10 CPU tim es for K = 10 servers and J — 10 jo b s ... 249
B.1 P etri net o f the Kanban m o d e l... . 265
B.2 P etri net o f the Flexible Manufacturing System m o d e l...267
B.3 The 1th philosopher subnet... 270
B.4 The P e tri net for ten dining philosophers ... 271
B.5 The i th network node for slotted rin g ... 272
D .l M DD encoding for 5 , 5 dining philosophers... 276
x v iii
2.1 Computing a Jacobi iteration by ro w s ... 16
2.2 Computing a Jacobi iteration by co lu m n s... 16
2.3 Computing a Gauss-Seidel iteration by row s... 17
5.1 Traditional state space generation ... 70
5.2 Inserting a state into a m ulti-level tre e ... 87
5.3 Choosing and removing a state from a m ulti-level t r e e ... 90
6.1 The Case operator on MDDs ... 115
6.2 Generating S using B D D s ... 120
6.3 Adding states due to local even ts... 126
6.4 Determ ining states due to synchronizing eve n ts... 129
6.5 Generating S using M D D s ... 130
7.1 Building a m atrix diagram o f a Kronecker p ro d u c t... 170
7.2 Adding two m atrix d ia g ra m s ... 173
7.3 Com puting a subm atrix using m atrix d iag ram s... 175
7.4 O btaining a m atrix diagram c o lu m n ... 179
8.1 Com puting the A m a tric e s ... 212
x ix
8.2 Computing the B matrices ... 213
8.3 O ur fixed-point ite ra tio n ... 214
9.1 M odified term ination detection ... . 235
B.1 SM ART code for the Kanban m odeL ...266
xx
IV Set of natural n u m b e rs ... 7
R Set of real n u m b e rs ... 7
tj(M ) Th e number of non-zero elements o f m atrix M ... 8
RowSum(M) A vector of row sums o f m atrix M ... 8
Diag (x ) T he m atrix w ith vector x along the d ia g o n a l... 9
P The probability m atrix o f a D T M C ... 25
p T he stationary probability vector o f a D T M C ... 29
R The transition rate m atrix o f a C T M C ... 34
Q The infinitesim al generator m atrix o f a C T M C ... 34
ir Th e stationary probability vector o f a C T M C ... 39
£ Th e set o f events o f a m o d e l... 48
S T he set o f potential (possible) states... 48
S T h e set o f actual (reachable) s ta te s ... 55
s s ' n-step reachability... 55
R Transition m atrix based on potential s ta te s ... 58
R 5 Transitions due to event e ... 58
x x i
K T he number o f submodels in a structured m o d e l... 62
first(e) The first submodel affected by event e ... 63
last(e) T he last submodel affected by event e ... 63
fx=i Cofactors o f a fu n c tio n ... I l l
Xs Characteristic function o f set S ... 118
0 Kronecker product . ... 150
© Kronecker su m ... 153
B{p) Substates encoded below node p ... 196
A(p) Substates above node p ... 196
xxil
High-level modeling formalisms are increasingly popular tools for studying complex: sys tems. Given a high-level model, we can autom atically verify certain system properties or compute perform ance measures about the system. In the general case, measures must be computed using discrete-event sim ulations. In certain cases, exact num erical analysis is possible by constructing and analyzing the underlying stochastic process o f the system, which is a continuous-time Markov chain (C TM C ) in our case. Unfortunately, the number o f states in the underlying C TM C can be extrem ely large, even if the high-level model is “small” . In this thesis, we develop data structures and techniques th at can tolerate these large numbers o f states.
First, we present a m ulti-level data structure for storing the set o f reachable states of a model. We then introduce the concept o f event “locality” , which considers the components of the model th at an event may affect. We show how a state generation algorithm using our m ulti-level structure can exploit event locality to reduce CPU requirements.
Then, we present a symbolic generation technique based on our m ulti-level structure and our concept o f event locality, in which operations are applied to sets o f states. The extremely compact data structure and efficient m anipulation routines we present allow for the examination o f much larger systems than was previously possible.
The transition rate m atrix o f the underlying C T M C can be represented w ith Kronecker algebra under certain conditions. However, the use o f Kronecker algebra introduces several sources o f C PU overhead dining num erical solution. W e present data structures, including
our new data structure called matrix diagrams, that can reduce this CPU overhead. Using our techniques, we can compute measures for targe systems in a fraction o f the tim e required by current state-of-the-art techniques.
Finally, we present a technique for approxim ating stationary measures using aggrega tions of the underlying C TM C . O ur technique utilizes exact knowledge o f the underlying CTM C using our compact data structure for the reachable states and a Kronecker repre sentation for the transition rates. W e prove th at the approximation is exact for models possessing a product-form solution.
xxrn
DATA STRUCTURES FOR THE ANALYSIS OF LARGE
STRUCTURED MARKOV MODELS
Introduction
Advancements in technology demand increasingly complex systems. The widespread growth
o f the internet and wireless communications, for instance, have fueled research in techniques
for analyzing large systems. The design of such a system almost certainly requires the use
o f computer models and simulations to assist engineers in making im portant design deci
sions. As a result, high-level modeling formalisms, such as stochastic Petri nets, are gaining
acceptance as tools to study such systems. These high-level models allow for autom atic
verification and performance evaluation o f systems whose analysis would otherwise be im
possible.
Generally, a model to be analyzed has certain properties that need to be verified, such
as “the system never reaches a deadlocked state". In some cases, there are also performance
or reliab ility measures o f interest to be determined, such as “what is the probability that
the system is down”. The form er type o f analysis can only be performed in general by a
systematic examination o f the states o f the high-level model [19, 27, 49, 57[. For certain
types o f models, efficient symbolic techniques can be used, which do not require explicit
exam ination o f every state [17,32, 6 8 ,7 8 ,7 9 , 8 0 ,81j. These techniques are quite promising,
as the states described by a high-level model can easily num ber in the m illions o r billions.
2
CHAPTER L IN TR O D U C TIO N 3
Performance evaluation, o f a high-level model can be performed either by discrete-event
simulation, by exact analysis, or by approxim ation. W hile discrete-event simulation is
applicable to an extrem ely general class o f problems, accurate solutions may require long
sim ulation runs, especially if the analysis involves the study o f rare events. Exact analysis,
on the other hand, is applicable to certain types o f stochastic models only. In our work,
we consider a fairly general class of formalisms in which the underlying stochastic process
is a Markov chain. In this case, analysis o f the model requires generation and analysis of
the underlying Markov chain. As mentioned above, it is not uncommon fo r these Markov
rhatns to contain m illio n s or billions of states. This leads to difficulties, as exact analysis
requires us to represent the reachable states o f the model, the transition rate m atrix of the
Markov chain, and a solution vector corresponding to the computed probability for each
state. These three structures pose obvious storage difficulties when the number o f states
o f the Markov chain becomes large. Much attention has been given to the transition rate
m atrix, as it is the largest o f the three structures. Techniques based on Kronecker algebra
have received much attention [12, 14,16,18 , 28, 29,41, 42, 44, 56, 84, 85, 86, 91], although
some alternatives have also been investigated [38, 39, 52].
Approximation techniques often involve decomposing the model into submodels, which
are then analyzed in isolation. The results obtained for the submodels are then combined.
Fixed-point iterations can be used to resolve the dependencies between submodels. The
overall storage and C P U requirements for the analyses o f the submodels are usually a
small fraction o f those for an exact analysis o f the entire model- As a result, model-based
decompositions have been successfully used [20,30, 50, 5 1 ,5 3 ,6 9 ,9 3 ,9 8 ,1 0 0 ,1 0 5 ,1 0 7 } to
accurately approximate performance measures when an exact solution is mfeasihTp.
1.1
C on trib u tion s
In this work, we address each of the three m ajor structures required for exact stationary
analysis: the set o f reachable states S , the transition rate m atrix R o f the Markov chain,
and the stationary probability vector ir. We consider oniy a very small subset o f model ver
ification problems; namely, that o f generating and exam ining the set o f reachable states S.
First, we develop a m ulti-level data structure for storing S that can be used w ith struc
tured models. We then show how, when an event occurs, we can update a portion o f the
data structure only, by exploiting structural properties o f the model. This new concept of
the “locality” o f an event can substantially reduce generation times, and is used throughout
the work.
Second, we develop a technique for symbolically generating S that can be applied to
a general class o f structured models. We present an encoding scheme that combines ideas
from our m ulti-level structure and from decision diagrams. We then develop specialized
m anipulation routines for our encoding which allow us to generate S extrem ely efficiently.
It has been shown that a Kronecker representation for the transition rate m atrix R
can reduce the storage requirements for R by orders o f magnitude. However, Kronecker
techniques suffer from significant sources o f CPU overhead. O ur th ird contribution consists
o f new data structures and techniques th at elim inate or reduce these CPU overheads.
W ith efficient representations fo r S and R , the only remaining bottleneck for exact
analysis is the solution vector 7r. O ur fourth m ajor contribution is a technique for approx
imating 7r. U nlike other approximations, ours makes use o f exact knowledge o f S and R
by using our previous contributions. This enables our technique to correctly assign a zero
CHAPTER I . IN TR O D U C TIO N 5
probability to unreachable states.
Finally, the software tool SM A R T [26], which, is discussed in Appendix A , represents a
considerable contribution to the academic and m odeling community.
1.2
O rganization
The rem ainder o f the thesis is organized as follows. The next three chapters are background
chapters. Chapter 2 introduces our notation and gives some im portant background infor
mation about storing matrices and solving linear systems. Chapter 3 presents an overview
of random variables and stochastic processes, w ith particular emphasis on Markov chains.
Chapter 4 describes how a high-level formalism can be used to generate a Markov chain.
Various classes o f structured models are defined.
O ur m ain contributions are presented in four chapters. Chapter 5 describes our m ulti
level data structure for explicit storage of the states o f the model. We compare our data
structure w ith several other exp licit storage schemes. Chapter 6 presents our symbolic
technique for generating and storing states o f a structured model w ith certain properties.
Our new approach is compared w ith existing symbolic approaches. Chapter 7 discusses ap
proaches in which the transition rate m atrix o f the underlying Markov chain for a structured
model is represented algebraically using Kronecker products and sums. We present data
structures and techniques for reducing or elim inating overheads inherent w ith Kronecker
approaches. Chapter 8 describes a novel technique fo r approximating stationary measures
o f a structured model, based on exact knowledge o f the underlying M arkov chain.
Exam ple applications o f our techniques are presented in Chapter 9. Concluding remarks
and directions for future work are given in Chapter 10. There are four appendices. Ap
pendix A discusses SM ART, a software package that incorporates the techniques described
in this thesis. Appendix 5 describes the models we use as benchmarks throughout the work.
Appendix C presents a detailed analysis o f one o f the algorithm s described in Chapter 6.
Finally, Appendix D derives an expression for the number o f reachable states for one o f our
benchmark models.
Chapter 2
Background
This chapter covers basic concepts th at are used throughout our work. Section 2.1 intro
duces our notation and presents a few basic definitions used in the rem ainder o f this thesis.
Section 2.2 gives an overview o f data structures used to store sparse matrices. Finally.
Section 2.3 briefly describes the iterative techniques we use to solve the linear systems that
arise in our work. For in-depth treatm ent o f these topics, we refer the reader to [58, 83, 96}
on the subject o f sparse m atrix storage, and to [6, 47, 96, 103] on the subject o f solving
linear systems.
2.1 N o ta tio n and basic d efin ition s
Sets are denoted in upper-case calligraphic letters, such as S. The fundamental sets are
exceptions: the set o f naturals is denoted iV , the set o f reals is denoted IR, and the sets o f
positive and non-negative reals are denoted 1R+ and 2ft*, respectively. The m inim al and
maximal elements o f a set S o f reals are denoted by min(«S) and max(<S), respectively.
Matrices are w ritten in upper-case bold letters, such as M . W e say a real m atrix
M e 2ftmxn has m rows and n columns. The identity m atrix o f size n x n i s denoted as
7
I n, although, if the size is clear from the context it w ill be w ritten as simply I . The m atrix
l l7ixn (o mxn) is the m atrix o f a ll ones (zeroes) w ith m rows and n columns, although it
w ill be w ritte n as simply 1 (0) if the size is clear from the contact. The m atrix element at
row i and column j is denoted as for i € ( 0 ,... ,m — 1} and j € { 0 ,... ,n — 1 }. A.
set is used to indicate more than one row or column. For exam ple. M [X , J \ refers to the
submatrix o f M w ith rows I and columns J . Row i (column j ) o f m atrix M is denoted
M [i. •} (M [* ,jJ ). The number of non-zero elements in m atrix M is denoted t7(M ). The
transpose o f m atrix M is denoted M T . The inverse o f a square m atrix M is denoted M “ L.
A m atrix w ith a single column (row) is called a column (row) vector. Vectors w ill be
denoted w ith lower-case bold letters, such as x . Elements o f a vector are denoted as x [t].
If x is a colum n vector, then x [i] = x [i.0 l otherwise x [i] = x [0 ,i]. The same notation is
used for both row and column vectors, as usually it is clear from the context if a vector is
a row or column vector. A probability vector is a vector whose elements are non-negative
and stun to one:
n—I
x 6 -R? A ^ 2 x [*] — 1-t=0
The dot product o f two vectors x .y 6 R n is defined as
n—L
x - y = ^ x [ ily [ ij. i=0
RowSum ( M ) is the vector whose elements are the row sums o f m atrix M . T h a t is, if
M & HV71*-71 then RowSum(NL) = M - 1 " *1 G JRT1. Drag (x ) is the square m atrix w ith x
CHAPTER 2. BACKGROUND 9
along the diagonal and zeroes elsewhere:
D iag {x)[itj \ =
j
if t = ji f *
2.2
Sparse m a trix storage
A m atrix M 6 jRmxn can be stored using fu ll storage, in which every element o f M is
exp licitly stored. This is typically done using a tw o-dim ensional array o f size m x n , or a
one-dimensional array o f size m n. In either case, fu ll storage requires exactly m -n -b f bits o f
memory, where bf is the number o f bits to store a floating-point number o f desired precision.
A m atrix is called sparse if it contains relatively few non-zero entries: r?(M) <SC m - n.
Memory can be conserved by storing only the non-zero elements o f sparse matrices [83, 96].
To do so, we must also store some indexing inform ation. Thus, sparse-storage structures
may be inefficient when applied to dense matrices due to the overhead o f the indexing
inform ation. The amount o f “sparseness” required for a sparse-storage structure to be more
memory-efficient than fu ll storage depends on the structure, the number o f b its required for
floating-point representation, and other factors.
One way to represent a sparse m atrix is to use a linked list for each row, which stores
only the non-zero elements o f that row. Each node m the lis t stores a column index and
the associated value for th at column. We say such a m atrix is stored in sparse row-wise
form at. W hile it is relatively easy to access a row o f a m atrix stored in th is form at, it
is not so easy to access a column o f the m atrix. I f we require column access, we ran use
sparse column-wise form at. This is essentially the same structure, except each lis t stores
the non-zero elements o f a column. I f we require both row and column arress, we ran store
0.0 0.0 3.1 0.0 0.0 4.1 0.0 0.0 0.0 0.0 5.9 0.0 0.0 2.6 5.3 0.0 0.0 0.0 5.8 0.0 9.7 0.0 0.0 0.0 0.0 0.0 9.3 0.0 Full Storage I 2HH 3 2 • » 3.1 3 • » 53 0 • m 53 5 93 5 4.1 6 2.6 4 5.S 6 M l
Sparse row-wise using linked lists
0 1 2 3 4 5 6 2 53 I M l 2 53 4.10 t 2.6 3 9 3 2 9.7
Sparse column-wise using linked lists
(03) 3.1 F7zo>
tL M ,
(03) 4.1 0 3 ) (1.6) 5.9 2.6 1(2.4) 5.8 (2.6) 9.7 \\ (33) 9 3Sparse w ith row and column access
N 2 3.1 5 4.1 3 5.9 6 2.6 0 53 4 5.8 6 9.7 5 93
Sparse row-wise using' arrays
0 1 2 3 4 5 6 I 0 | 1 | 1 | 2 [ 3 | 4 j 6 | 8 1
!
l / X X X / /
2 0 I 2 0 3 I 2 5 3 3.1 53uS 00 4.1 93 2.6 9.7
Sparse column-wise using arrays
F ig u re 2 .1 : Storage o f a sparse m atrix
linked lists for both the rows and the columns [58]. Alternatively, we can convert from
row-wise format to column-wise form at in 0 (t/(M )) operations.
I f we have prior knowledge o f the rows or columns ran be represented using
arrays instead o f linked lists. For efficiency, instead o f using a separate array fo r each row
or column, we use a single array. In place o f the pointers to linked lists, we m ain tain array
indices to the first non-zero element o f each, row or column. To mark the last non-zero
element o f the m atrix, an extra index is added.
CHAPTER 2. BACKGROUND II
Storage o f m atrix M 6 iRmxn, where:
6 / = # bits for a floating-point number o f desired precision bp — # b its for a pointer
bi = # bits for an integer o f appropriate size
Technique Full storage
By rows w ith linked lists By columns with. linked lists
By rows and columns with, linked lists By rows w ith arrays
By columns w ith arrays
M e m o ry m n b f m bp + T ](M .)(b i + b / + bp) nbp4 -T /(M )(6i 4 - 6 / 4 - bp) mbp 4- nbp 4- r?(M )(26i 4 - 6 / 4 - 2bp) (m 4- I)6 i 4 -rj(M )(6 i 4 -6 /) (n 4- I)6 i 4- i/(M )(6 i 4- 6 /)
T a b le 2 .1 : Memory required to store a m atrix
An example illustrating the data structures used for sparse storage o f a m atrix is given
in Figure 2.1. Each structure represents the same 4 x 7 m atrix. For clarity, null pointers
are not drawn. The storage requirement for each structure is shown in Table 2.1. Note
th at integers o f size one or two bytes can be used if the number o f rows, columns, and the
number of non-zero elements is sufficiently sm all. Further memory savings can be achieved
by using the m inim al number o f bits to store each integer. For instance, i f sparse, column
wise storage is used w ith arrays, the row indices can be stored in bits, and the
“pointers” to the elements can be stored in fIog2(»?(M) 4 -1)] bits. Also, note th at the
sparse structures described do not work w ell for “ultra-sparse” m atrices, in which many
rows and columns are empty. Row-wise, sparse-storage structures can be modified to store
only the non-empty rows, and these modified structures w ill conserve memory if most o f
the rows are empty. S im ilar modifications can be made for column-wise storage.
Another im portant benefit to using sparse-storage structures is the savings in compu
tational complexity. A frequently used m atrix operation is th at o f vector-m atrix m u ltip
cation. Given a m atrix M g R m* n and vectors x G 2 R "\y G StnT a ll stored using fo il
storage, the cost o f computing x M or M y is m n floating-point m ultiplications. However,
m ultiplication algorithm s for sparse matrices require only 7j(M ) floating-point m ultiplica
tions, assuming the m atrix is stored using a sparse structure. For large, sparse matrices
this difference is substantial.
2.3 S o lvin g linear sy stem s
Many computations o f interest w ill require us to solve a linear system o f equations o f the
form x M = y for an unknown vector x . This form can be rearranged by
x M = y
(xM )t = y T
Mt x t = y T
to obtain the preferred form A x —
b.
It is im portant to note that the techniques we discuss apply to A x =b;
thus if our solution technique requires row access o f A , then this translates to column access o f M .Solution o f the linear system A x =
b
is a thoroughly discussed problem [6, 47, 96, 103] and several techniques are available. For our applications, A is typically a very large,extremely sparse, square m a trix Usually, we do not use techniques th at compute the
inverse o f A , fo r two reasons.
1. Tim e requirements: Computing the inverse o f an. ti x n m atrix requires 0 ( n 3) floating
point operations.
CHAPTER 2. BACKGROUND 13
2. Memory requirements: Since A is sparse, it can be stored using 0 (q (A )) memory.
However, the inverse o f a sparse m atrix is not necessarily sparse (and usually is not
sparse), so storage o f A - t w ill require 0 ( n 2) memory.
Instead, we prefer “indirect” techniques th at perform, a series o f m airix-vector m ultipli
cations, each requiring 0{rj(A)} floating-point operations. Since the m atrix A is never
modified, relatively low-precision floating-point representation can be used for A . This
combined w ith a sparse-storage structure results in significant memory savings.
We consider iterative techniques that compute a sequence x R o f approximations to x .
Given an in itia l guess xo, the remainder o f the sequence is computed from an equation o f
the form
Xn+t = B x„ + k (2.1)
where the m atrix B and the vector k are specified by our iterative technique. The sequence
is guaranteed to converge for any in itia l guess xo, provided
Iim B n = 0.
n - w o
This occurs when p (B ), the largest eigenvalue o f B , is strictly less than one. The asymptotic
rate o f convergence depends on p (B ): the sm aller the value o f p (B ), the faster the sequence
computed by Equation 2.1 is likely to converge.
O f course, we cannot compute x ^ ; instead we must compute x,v fo r some large value o f
N r and hope that xjy is an accurate approxim ation to x . The number o f iterations required
to satisfy a tolerance e can be obtained by the approxim ate relationship [96]
p (B )* = e,
which is not used in practice, since p (B ) is usually not known. Instead, the technique
used most often in practice is to somehow compare successive vectors. One technique
frequently used is to compare some norm o f the difference o f successive vectors w ith a desired
tolerance e. Th e iterations then continue until an absolute precision has been achieved:
||X tf - X tf-rlf < e.
Alternatively, we can use a relative measure. The technique we use is to continue iterations
until the maximum relative difference between elements o f x,v and x*v—i is w ith in the
desired precision: max t XA r[t]-XA T-t [t] xatW < E.
Relative precision is safer to use when entries in the vector x differ in size by orders o f
magnitude. T h is is not uncommon, especially in computing probability vectors for Markov
chains.
Another frequently used technique is that o f residual testing. Since we are solving the
system A x = fa, the idea is th at A x ^ w ill be “close” to b if x,v is “close” to x . O f course,
if A is a large m atrix, the cost o f computing the residual tan be high. Residual testing may
not work w ell fo r ill-conditioned systems.
CHAPTER 2. BACKGROUND 15
2 .3 .1 J a c o b i a n d G au ss-S e id e l
Conceptually, we split the m atrix A into matrices L , D , and U such th at A = D — L — U ,
where L and U are strictly lower- and upper-triangular matrices, respectively, and D is a
diagonal m atrix. Thus we have the system D x — L x — U x — b .
For the Jacobi technique, we use the following iteration:
D x „_ t - L x„ - U x „ = b.
In this case we can compute x n+t using
Xn-i-i = D - t (L U )x n -t- D - t b
where D - t is triv ia l to compute since D is a diagonal m atrix. A single Jacobi iteration
can be computed either using Algorithm 2.1, which accesses elements o f A by rows, or
using Algorithm 2.2, which accesses elements o f A by columns. In the algorithms, we store
the diagonal elements of A separately in a vector d . Thus, m atrix A is represented by
the two structures A ' and d , where A ' = A — Diag (d ). Another common practice is to
store d~l , where D _I = Diag (d - t ); in that case, the divisions by d [ij in Algorithm 2.1 and
Algorithm 2 .2 are replaced w ith m ultiplications by d~l [t|.
Jacobi does not use the newest approxim ation o f x during the computation. T h at is,
once we have determined x„+ t[i], we do not use it u n til we compute x„+2- Thus Jacobi
is insensitive to the ordering o f the rows and columns o f A . However, it makes sense to
use x re+t[i] if it is known when computing the rem aining entries o f X n +i. This is the idea
RowJacobi(x0/d, x nem, A ', d , b )
• Inputs: vector the current probability vector x n; m atrix A ' = A —D ; vector d , where D = Diag (d ); and vector b .
• Output: vector x nem, the next probability vector x
n+i-1: fo r e a c h row r^ d o
2: x neJt,[r] -jp r (b [rj - A '[r, •] - x ^ ) • Dot product of A '[r, •] and x ^ j d[r]
3: e n d fo r
A lg o rith m 2.1: Com puting a Jacobi iteration by rows
ColJacobi(x0/d. x nem, A r, d , b )
• Inputs: vector x^u, the current probability vector x „; m atrix A ' = A — D ; vector d, where D = Diag (d ); and vector b .
• Output: vector x nem, the next probability vector x n + l.
1: X nettr < b
2: fo r e a c h c o lu m n c d o
3: X neta Xnem ~ X oW[c] A '[*, c]
4: e n d fo r 5: f o r e a c h c o lu m n c d o Xnettffc] Vector equation 6: X neuf[c] 7: e n d fo r d[c]
A lg o rith m 2 .2 : Com puting a Jacobi iteration by columns
behind the Gauss-Seidel iteration. Form ally, we have
D x n+! - L x n+i - U x« = b
which can also be w ritten
Xn+i = (D — L )_ IU x „ + (D - L ) - rb
although in practice we do not compute the inverse o f D — L . Since xa+1[i -f- l j must be
CHAPTER 2. BACKGROUND 17
RowGaussSeidel(x, A ', d , b )
• Inputs: vector x , the current probability vector x„; matrix A ' = A — D ; vector d, where D = Diag (d ); and vector b.
• Output: vector x (overwritten), the next probability vector x „ + i-1: for each row r do
2: x[r] <— (b [r] — A '[rT«] - x ) • Dot product o f A '[r, •] and x
3: e n d fo r
A lg o rith m 2 .3 : Computing a Gauss-Seidel iteration, by rows
computed after we have computed x„+ i[i], Gauss-Seidel is usually implemented using row
access o f A , as in A lgorithm 2.3. One benefit of row Gauss-Seidel is th at we only need to
store a single vector x . in which xn+i[t] overwrites x ^ t). This is possible because x„[i] is
no longer used once x n+t[i] has been computed. An algorithm for Gauss-Seidel that uses
column access o f A was recently developed in [39]: this algorithm requires an auxiliary
vector w in addition to the single vector x .
Both the Jacobi and Gauss-Seidel iterations fa ll under the type o f Equation 2.1: for
Jacobi we have B /oc = D - l (L -f-U ), and for Gauss-Seidel we have B cs = (D — L )- l U . The
Stein-Rosenberg theorem [103] states that for non-negative Jacobi matrices Bj acr exactly
one o f the following statements holds.
1- p (B jo c ) = p (B < ;s ) = 0 .
2. 0 < p(B cs) < p (B jac) < 1.
3- pCB/oc) — p (B c s ) =
1-4. 1 < p(B/ac) < p (B c s ).
Thus if the Jacobi m atrix D - I (L -F IT ) is non-negative, then Jacobi and Gauss-Seidel w ill
either both converge or both diverge. Furtherm ore, if both techniques converge, then
Gauss-Seidel has a faster asym ptotic rate o f convergence (meaning that Gauss-Gauss-Seidel is expected to
converge fester). Since updated values are used immediately w ith Gauss-Seidel, the variable
ordering may affect the rate of convergence [96J. In contrast, the convergence rate o f Jacobi
is independent o f the variable ordering.
Both the Jacobi and Gauss-Seidel techniques can make use o f relaxation [96. 103[. The
idea is that for each iteration, we are changing our approximation o f x by
Xn+l = xn +
6
(Xn)
where 6{xn) is determ ined by our iterative technique. We can alter the speed o f convergence
by instead computing
x n+i = x n -Fa/d(x„)
where w is called the relaxation param eter. I f we use 1 < or < 2 (0 < co < 1), then the
technique is called over-relaxation (under-relaxation). D eterm in in g an optim al value for cj
is not a triv ia l task.
Another popular group o f techniques to solve a linear system A x = b are projection
techniques, in which an exact solution is approximated from a sequence o f approximations
taken from an m-dimension subspace [96}. W hile these techniques are quite sophisticated,
they require storage o f m vectors in addition to the solution vector x . As we w ill see in
later chapters, the size o f the solution vector can become quite large; thus the storage o f m
additional vectors is often not possible due to excessive memory requirements.
Chapter 3
M arkov Chains
This chapter presents some m athem atical background on the concept o f “randomness”,
which is fundam ental to much o f our work. Section 3.1 discusses random variables and
describes the im portant distribution functions for our work. Section 3.2 continues the dis
cussion w ith an overview of stochastic processes (fam ilies o f random variables) and Markov
chains (a special type of stochastic process). For more d etail on this m aterial, we refer the
reader to [31, 89]. Section 3.3 and Section 3.4 give thorough discussions on discrete-tim e
and continuous-time Markov chains, which are critical to our work. For more on Markov
chains, the reader is referred to [55], an excellent treatm ent on discrete-tim e Markov chains,
and to [89, 96]. Finally, Section 3.5 gives a brief overview o f phase-type random variables,
which use M arkov chains in their definition. This topic is covered particularly w ell in [75].
3.1 R an d om variables and im portant d istrib u tio n s
Suppose we conduct some experim ent. The set o f a ll possible outcomes o f the experiment
is called the sample space. Given some sample space W , a random variable [31, 89] is a
function X : W — <S. I f the set S is countable, X is a discrete random variable, otherwise
19
X is a continuous random variable. Typically, if X is discrete then S C IV , and if X is
continuous then S Q M . Random variables are w ritten in upper-case.
A discrete random variable can be completely described by its probability distribution
or probability mass function [89{, which specifies the probability o f each possible value o f X .
An important point is that two random variables X and Y may have the same probability
distribution, but this does not im ply that X and Y are equal. Arguably the simplest
distribution is when the random variable is not random a t all: random variable X is said
to be a constant ar, w ritten X ~ Const(ar), if
= X
otherwise
A constant random variable is then just what its name implies: a random variable that
is only allowed to take on a single value. A more interesting distribution is the Bernoulli
distribution: random variable X is said to be a Bernoulli random variable w ith success
parameter p, w ritten X ~ B em ouIli(p), if
P r { X = n} = <
1 —p if n = 0 p if n = 1
0 otherwise
where 0 < p < 1. Thus, a Bem oulIi(p) random variable can take on values 0 and 1, except
for the lim itin g cases BemouIli(O) = Const(O) and B em o u lli(l) = C o n st(l).
Consider an in fin ite sequence o f independent Bernoulli random variables X i , X2, . . . a ll
w ith success param eter p. Let J be the position o f the first occurrence o f the value 1. T h at
is, X j = I and fo r a ll 0 < i < J, X f = 0. Then J is said to be a Geometric random variable
CHAPTER 3. M ARKOV CHAINS 21
w ith success parameter p, w ritten. J ~ G eom (p), and
P r{J = n> = {
q(1 — p)n lp i f n > 0otherwise
Let K be the number o f 0Ts before the firs t 1. Note that K is one less than J . Then
K is said to be a M odified Geometric random, variable w ith success parameter pr w ritten.
K ~ ModGeom(p), and
Let Y be the stun o f the first n variables, Y — X i. Then Y is said to be a Binom ial
random variable w ith parameters n and p, w ritten Y ~ B inom ial(n,p), and
P r { F = t } = | ( i ) p *(l — p)n * i f 0 < i < n otherwise
Consider the lim iting case o f a Binom ial random variable where n —»■ oo and p ->• 0 such that
the product np remains a constant A. This is the Poisson distribution w ith parameter A,
w ritten Poisson(A). I f Z ~ Poisson(A), then we have
Pr{Z = i} =
Urn
f M p ‘ ( I - p r a —footp—>0,np=A \ 1 / fon 4 f P *(l ~ p ) ~ * ( l- P ) n n —*oo,p—+ 0 ,T ip = \ t l ^Tl — t ) l=
lim
i n p C n - D p - C n - i + D p d - p r f 1- ^
a-K3a,p-»0jip=A tL \ T lJ I —t —X = ^ ( i r e A1 = T e \ ilW ith continuous random variables, the distribution cannot be specified by the proba
bilities for each possible value o f X . Since X can take on an uncountably infinite number o f
values, the probability o f X taking on a single value is always zero. Instead, a continuous
random variable is described by either its probability density function or by its cumulative
distribution function (C D F) [89], which specifies P r { X < x} for every value o f x € JR. A
continuous random variable X is said to be an Exponential random variable w ith rate A.
w ritten X ~ Expo(A), if
p \ J 1 ~ e -Ax i f x > 0
P r { X < x > = | 0
-In our work we are prim arily interested in the Exponential distribution. The Exponential
distribution is often used to build other distributions. For instance, a continuous random
variable Y is said to be an Erlang random variable w ith n stages and param eter p , w ritten
Y ~ E rlan g (n ,/i), if
Y = Xt H--- F X nr
where X i , . . . ,X n are independent and identically distributed random variables w ith dis
tribution Expo (fi).
We are especially interested in distributions th at satisfy the memoryless property
P r { X > s -F f|X > t} = P r { X > s} (3.1)
for any non-negative values o f s and t . I f X ~ Geom (p), then we have
P r { X > n } = P r {The first n Bernoulli random variables are zero}
= ( I- P ) n,
CHAPTER 3. M A RK O V CHAINS 23 which gives us _ r „ Pr{JY > s + t AX > t \ P r { X > S + t \ X > t } = ^ - p r { x > 1 J --- * P r { Z > s + i } Pr { X > t} (1 ~ p ) 3+t (1 - p ) t = ( 1 - P ) a = P r { X > s } t
thus satisfying Equation 3-1. I f X ~ Expo(A)r then we have
r» t v xi v - l P r { X > s + t}
g —A (s+ t) g —At
= e_Aj
= P r { X > s}
and Equation 3.1 is satisfied- Thus the Geometric and Exponential distributions are
mem-oryless, and it can. be shown that they are the only memoryless distributions.
The Exponential distribution is closely related to both the Geometric and Poisson dis
tributions. Suppose we have a sequence o f independent random variables X i , X i, . . . that
are all exponentially distributed w ith rate A. Let N be a Geom etric random variable w ith
success param eter p . Then the random variable Y given by
AT
i=l
is an Exponential random, variable w ith rate Ap. I f J is the integer such that
J J + i
l < J > i (3.2)
£=t i = l
then J is a Poisson random variable w ith param eter A.
3.2
S to ch a stic p rocesses and M arkov ch ain s
A stochastic process [31t 89] is a collection o f random variables (X (£ ) : t £ T } . The set S
o f possible values for X (t) is called the state space. The param eter t is often considered to
be tim e. I f T is countable, the stochastic process is a discrete-tim e process; otherwise it is
a continuous-time process. We say the process is in state s 6 S a t tim e t 6 T if X ( t ) = s.
M arkov processes are special cases o f stochastic processes th at obey the memoryless or
Markovian property: only the current state o f the process determines the probability or
rate o f switching to another state [96]. This is expressed form ally as
Vn € ]N, V io ,-.. Tin+t 6 <StV£0, . . . r W i € T,£o < " ' < fn-n»
P r { X ( £ n+r) = i„ -u l |X (£n) = t n A - A l ( t 0) = i Q} = P r { X ( £ n + I) = i Jl+i|X (fTl) = i „ } .
A M arkov process w ith a discrete state space S is called a Markov chain. A Markov chain
whose transition probabilities do not depend on tim e is called homogeneous. We w ill lim it
our discussion to homogeneous Markov chains with, finite state spaces.
The states o f a Markov chain can be classified based on th eir a b ility to reach other states
[55]. In a Markov chain, we say state j is reachable from state i i f
P r { X ( £ ) = j |X ( 0 ) = i} > 0
CHAPTER 3. M ARKO V CHAINS 25
for some tim e t > 0. A state t is called transient if there exists a state j such that j is
reachable from i but i is not reachable from j . Conversely, a state i is called recurrent if for
every state j reachable from i, i is reachable from j . A recurrent state is called absorbing if
no other state is reachable from it. Two states t and j are said to be mutually reachable if i
is reachable from j and j is reachable from i. The equivalence relation “m utually reachable”
creates equivalence classes over the set o f states, where a ll states in a given class are either
transient or recurrent. A set o f states X C S is called a recurrent class if a ll pairs o f states
in X are m utually reachable and no state outside o f X is reachable from a state in AT. A
Markov chain is called irreducible if <