Markov Chains - Thesis Overview - Performance Modelling for System-Level Design

1.4 Thesis Overview

2.1.1 Markov Chains

Consider a discrete-time stochastic process {Xi | i ≥ 1} on an appropriate probabil-

ity space (Ω, F,P), where each random variable1_X

iis discrete and assumes values

in a countable set of states S. Set S is called the state space of {Xi | i ≥ 1} and for

each S ∈ S,P(Xi= S) indicates the probability that {Xi| i ≥ 1} is in S or visits S at

time-epoch i. If for all i ≥ 1 and S1, . . . , Si+1∈ S

P(Xi+1= Si+1| Xk= Skfor all 1 ≤ k ≤ i) =P(Xi+1= Si+1| Xi= Si) (2.1)

then {Xi| i ≥ 1} is said to satisfy the Markovian property and is called a discrete-time

Markov chain. For any S, T ∈ S, the probabilityP(Xi+1 = T | Xi = S) = PS,T(i)

is called the (one-step) transition probability from S to T and indicates the probability that discrete-time Markov chain {Xi| i ≥ 1} transfers from S to T at time-epoch i.

Of special interest are time-homogeneous discrete-time Markov chains, for which tran- sition probability PS,T(i) is independent of i for all S, T ∈ S. If PS,T(i) = PS,T(j)

for any two time-epochs i, j and all S, T ∈ S, then the discrete-time Markov chain is

1_{Strictly speaking, the X}

i’s are no random variables in the sense of appendix A because S is not a subset of ¯R. This is however immaterial since a complete function can be defined, which uniquely assigns an element of ¯R to each element of S.

said to have stationary transition probabilities. In that case, PS,T is defined as

PS,T =P(Xi+1= T | Xi= S)

for all time-epochs i ≥ 1 and S, T ∈ S. For any fixed enumeration of the states in S, the matrix P = (PS,T) with S, T ∈ S is called the transition matrix of {Xi | i ≥ 1}.

DenotingP(X1 = S) with IS for S ∈ S, the n-dimensional joint probabilities of a

time-homogeneous discrete-time Markov chain {Xi| i ≥ 1} can be written as

P(Xi= Sifor all 1 ≤ i ≤ n) = IS1·

n−1_Y i=1

PSi,Si+1

The probability ISis called the initial probability of S and indicates the probability that

{Xi | i ≥ 1} departs from state S. The distribution I of initial probabilities over S is

referred to as the initial distribution of {Xi| i ≥ 1}. In this thesis, time-homogeneous

discrete-time Markov chains are conveniently called Markov chains.

Definition 2.1 (Markov Chain) A Markov chain is a discrete-time stochastic process

{Xi | i ≥ 1}, where each discrete random variable Xi assumes values in a countable state

space S such that, for each time-epoch i ≥ 1, the transition probabilities

P(Xi+1= Si+1| Xk= Sk for all 1 ≤ k ≤ i) =P(Xi+1= Si+1 | Xi= Si)

are independent of i, for any S1, . . . , Si+1∈ S.

An important theorem in Markov theory is the existence theorem [39, 24]. It states that for any countable set of states S, sequence {IS | S ∈ S} and matrix (PS,T) with

S, T ∈ S, satisfying respectively IS ≥ 0 and X S∈S IS = 1 PS,T ≥ 0 and X T ∈S PS,T = 1

there exists a probability space (Ω, F,P) and a Markov chain {Xi | i ≥ 1} defined

on it with state space S, initial distribution I and transition matrix P = (PS,T). As a

result, the Markov chain {Xi| i ≥ 1} is completely determined by the triple (S, I, P),

with which it is therefore often conveniently represented. If the state space S is finite, the Markov chain can be visualised as a graph. Each state in S is represented as a node labelled with the corresponding name of the state. For every non-zero transition probability PS,T, a directed arrow is drawn from the node representing

S to the node representing T , labelled with transition probability PS,T. For any

state S ∈ S with non-zero initial probability, a symbol > directed towards the node representing S is drawn, labelled with initial probability IS. In case IS = 1 for state

S, the label 1 to the symbol > is usually omitted. This is illustrated with an example.

Example 2.1 Let (S, I, P) represent a Markov chain, where S = {A, B, C}, I is defined as

IA= 1, IB = IC= 0, and where the transition matrix P is given by

P =



P_PA,A_B,A _PPA,B_B,B P_P_B,CA,C

PC,A PC,B PC,C   =  0 1 3 23 0 1 2 12 0 1 0  

1 / 2 1 / 3 1 1 / 2 2 / 3 C B A

Figure 2.1: Visualisation of a Markov chain as a graph.

A graphical representation of this Markov chain is depicted in figure 2.1.

Probability Space The remainder of this section elaborates on the probability space (Ω, F,P) on which a Markov chain {Xi| i ≥ 1} represented by (S, I, P) is defined.

The sample space Ω is the set of all infinite sequences S = (S1, S2, . . .), where Si∈ S

for all i ≥ 1. Hence, Ω is the set S∞_{of all infinite state sequences. A cylinder of rank n}

is a subset Cnof Ω of the form Cn = {S ∈ Ω | S1..n ∈ A} where A ⊆ Sn. In case A

is a singleton set {S}, the set {T ∈ Ω | T_1..n= S}, which is denoted by S , is called a thin cylinder of rank n. The σ-algebra F is the σ-algebra generated by the set of cylinders and the probability measureP is defined as

P(S1..n) = IS1·

n−1_Y i=1

PSi,Si+1

for each thin cylinder S1..n ∈ F with S = (S1, S2, . . .) in Ω. Since any cylinder Cn

can be written as a countable union of disjoint thin cylinders of the same rank, P(Cn) =

S_1..n⊆Cn

P(S1..n)

by the property of countable additivity. Furthermore, it can be shown that any subset

C of Ω can be written as a countable intersection of cylinders in F. For example, {S} =

∞

n=1

S_1..n

for any S ∈ Ω. Since F is closed under countable intersection, it follows that C ∈ F. As a result F = 2Ω_{, which implies that any discrete random variable defined on}

(Ω, F,P) is measurable. For brevity, the probability P({S}) on a singleton set {S} ∈

F is also denoted byP(S).

Next to the probability on infinite state sequences, the probability on finite state sequences is frequently used in this chapter. Although probability measureP is only defined for (sets of) infinite state sequences, the notationP(S) is also used for any finite state sequence S ∈ Sn_{, which is justified by defining that}_{P(S) = P(S} _{), and}

a ) b ) c ) S ₁ S _n S 1 S n T 1 T n

Figure 2.2: Visualisation of different cylinder types. a) Thin cylinder S_1..nof rank n. b) Cylinder of rank n. c) Generalised cylinder.

Now, a generalisation of the probability on cylinders is introduced. Let U be a set of finite state sequences (possibly of different lengths). Set U will be called proper if there exists no state sequence in U that is a prefix of any other state sequence in U . In case U is proper, the probabilityP(U) on U is defined as P(U) = P(U ), where

U = [

S∈U

Such a set U will be called a generalised cylinder. Figure 2.2 illustrates the differ- ences between thin cylinders, cylinders and generalised cylinders.

A proper set of finite state sequences U will be called an initial set if all the state sequences in U have the same initial state. For initial set U ,P∗_{(U ) is defined as}

P∗_{(U ) =} 1

IS₁

S∈U

P(S)

and will be referred to as the conditional probability on U . Intuitively,P∗_{(U ) denotes}

the sum of the probabilities on the state sequences S in U conditional on departing from state S₁. A proper set of finite state sequences U is called a final set if all state sequences in U have the same final state.

Consider a final set of finite state sequences U and an initial set of finite state se- quences V such that the initial state of the state sequences in V is equal to the final

state of the state sequences in U . Then the concatenation of U and V , denoted by U ◦V , is defined as

U ◦ V = {(S1, . . . , Sn−1, Sn= T1, T2, . . . , Tm) | (S1, . . . , Sn) ∈ U, (T1, . . . , Tm) ∈ V }

Because both U and V are proper, the concatenation U ◦ V is also proper. Hence, P(U ◦ V ) = P(U) · P∗_{(V ). If U is also an initial set, then}_P∗_{(U ◦ V ) =}_P∗_{(U ) ·}_P∗_{(V ).}

It can now be observed that the initial probability IS =P(X1 = S) indicating that

the Markov chain {Xi| i ≥ 1} departs from state S represents in fact the probability

P((S)) on the finite state sequence (S). On the other hand, the probability PS,T =

P(Xi+1 = T | Xi = S) indicating that the Markov chain {Xi | i ≥ 1} transfers

from state S to state T (at any time-epoch i ≥ 1) represents in fact the probability P∗_{((S, T )) on the finite state sequence (S, T ) by the property of time-homogeneity.}

In document Performance Modelling for System-Level Design (Page 32-36)