1.4 Thesis Overview
2.1.1 Markov Chains
Consider a discrete-time stochastic process {Xi | i ≥ 1} on an appropriate probabil-
ity space (Ω, F,P), where each random variable1X
iis discrete and assumes values
in a countable set of states S. Set S is called the state space of {Xi | i ≥ 1} and for
each S ∈ S,P(Xi= S) indicates the probability that {Xi| i ≥ 1} is in S or visits S at
time-epoch i. If for all i ≥ 1 and S1, . . . , Si+1∈ S
P(Xi+1= Si+1| Xk= Skfor all 1 ≤ k ≤ i) =P(Xi+1= Si+1| Xi= Si) (2.1)
then {Xi| i ≥ 1} is said to satisfy the Markovian property and is called a discrete-time
Markov chain. For any S, T ∈ S, the probabilityP(Xi+1 = T | Xi = S) = PS,T(i)
is called the (one-step) transition probability from S to T and indicates the probability that discrete-time Markov chain {Xi| i ≥ 1} transfers from S to T at time-epoch i.
Of special interest are time-homogeneous discrete-time Markov chains, for which tran- sition probability PS,T(i) is independent of i for all S, T ∈ S. If PS,T(i) = PS,T(j)
for any two time-epochs i, j and all S, T ∈ S, then the discrete-time Markov chain is
1Strictly speaking, the X
i’s are no random variables in the sense of appendix A because S is not a subset of ¯R. This is however immaterial since a complete function can be defined, which uniquely assigns an element of ¯R to each element of S.
said to have stationary transition probabilities. In that case, PS,T is defined as
PS,T =P(Xi+1= T | Xi= S)
for all time-epochs i ≥ 1 and S, T ∈ S. For any fixed enumeration of the states in S, the matrix P = (PS,T) with S, T ∈ S is called the transition matrix of {Xi | i ≥ 1}.
DenotingP(X1 = S) with IS for S ∈ S, the n-dimensional joint probabilities of a
time-homogeneous discrete-time Markov chain {Xi| i ≥ 1} can be written as
P(Xi= Sifor all 1 ≤ i ≤ n) = IS1·
n−1Y i=1
PSi,Si+1
The probability ISis called the initial probability of S and indicates the probability that
{Xi | i ≥ 1} departs from state S. The distribution I of initial probabilities over S is
referred to as the initial distribution of {Xi| i ≥ 1}. In this thesis, time-homogeneous
discrete-time Markov chains are conveniently called Markov chains.
Definition 2.1 (Markov Chain) A Markov chain is a discrete-time stochastic process
{Xi | i ≥ 1}, where each discrete random variable Xi assumes values in a countable state
space S such that, for each time-epoch i ≥ 1, the transition probabilities
P(Xi+1= Si+1| Xk= Sk for all 1 ≤ k ≤ i) =P(Xi+1= Si+1 | Xi= Si)
are independent of i, for any S1, . . . , Si+1∈ S.
An important theorem in Markov theory is the existence theorem [39, 24]. It states that for any countable set of states S, sequence {IS | S ∈ S} and matrix (PS,T) with
S, T ∈ S, satisfying respectively IS ≥ 0 and X S∈S IS = 1 PS,T ≥ 0 and X T ∈S PS,T = 1
there exists a probability space (Ω, F,P) and a Markov chain {Xi | i ≥ 1} defined
on it with state space S, initial distribution I and transition matrix P = (PS,T). As a
result, the Markov chain {Xi| i ≥ 1} is completely determined by the triple (S, I, P),
with which it is therefore often conveniently represented. If the state space S is finite, the Markov chain can be visualised as a graph. Each state in S is represented as a node labelled with the corresponding name of the state. For every non-zero transition probability PS,T, a directed arrow is drawn from the node representing
S to the node representing T , labelled with transition probability PS,T. For any
state S ∈ S with non-zero initial probability, a symbol > directed towards the node representing S is drawn, labelled with initial probability IS. In case IS = 1 for state
S, the label 1 to the symbol > is usually omitted. This is illustrated with an example.
Example 2.1 Let (S, I, P) represent a Markov chain, where S = {A, B, C}, I is defined as
IA= 1, IB = IC= 0, and where the transition matrix P is given by
P =
PPA,AB,A PPA,BB,B PPB,CA,C
PC,A PC,B PC,C = 0 1 3 23 0 1 2 12 0 1 0
1 / 2 1 / 3 1 1 / 2 2 / 3 C B A
Figure 2.1: Visualisation of a Markov chain as a graph.
A graphical representation of this Markov chain is depicted in figure 2.1.
Probability Space The remainder of this section elaborates on the probability space (Ω, F,P) on which a Markov chain {Xi| i ≥ 1} represented by (S, I, P) is defined.
The sample space Ω is the set of all infinite sequences S = (S1, S2, . . .), where Si∈ S
for all i ≥ 1. Hence, Ω is the set S∞of all infinite state sequences. A cylinder of rank n
is a subset Cnof Ω of the form Cn = {S ∈ Ω | S1..n ∈ A} where A ⊆ Sn. In case A
is a singleton set {S}, the set {T ∈ Ω | T1..n= S}, which is denoted by S , is called a thin cylinder of rank n. The σ-algebra F is the σ-algebra generated by the set of cylinders and the probability measureP is defined as
P(S1..n) = IS1·
n−1Y i=1
PSi,Si+1
for each thin cylinder S1..n ∈ F with S = (S1, S2, . . .) in Ω. Since any cylinder Cn
can be written as a countable union of disjoint thin cylinders of the same rank, P(Cn) =
X
S1..n⊆Cn
P(S1..n)
by the property of countable additivity. Furthermore, it can be shown that any subset
C of Ω can be written as a countable intersection of cylinders in F. For example, {S} =
∞
\
n=1
S1..n
for any S ∈ Ω. Since F is closed under countable intersection, it follows that C ∈ F. As a result F = 2Ω, which implies that any discrete random variable defined on
(Ω, F,P) is measurable. For brevity, the probability P({S}) on a singleton set {S} ∈
F is also denoted byP(S).
Next to the probability on infinite state sequences, the probability on finite state se- quences is frequently used in this chapter. Although probability measureP is only defined for (sets of) infinite state sequences, the notationP(S) is also used for any finite state sequence S ∈ Sn, which is justified by defining thatP(S) = P(S ), and
a ) b ) c ) S 1 S n S 1 S n T 1 T n
Figure 2.2: Visualisation of different cylinder types. a) Thin cylinder S1..nof rank n. b) Cylinder of rank n. c) Generalised cylinder.
Now, a generalisation of the probability on cylinders is introduced. Let U be a set of finite state sequences (possibly of different lengths). Set U will be called proper if there exists no state sequence in U that is a prefix of any other state sequence in U . In case U is proper, the probabilityP(U) on U is defined as P(U) = P(U ), where
U = [
S∈U
S
Such a set U will be called a generalised cylinder. Figure 2.2 illustrates the differ- ences between thin cylinders, cylinders and generalised cylinders.
A proper set of finite state sequences U will be called an initial set if all the state sequences in U have the same initial state. For initial set U ,P∗(U ) is defined as
P∗(U ) = 1
IS1
X
S∈U
P(S)
and will be referred to as the conditional probability on U . Intuitively,P∗(U ) denotes
the sum of the probabilities on the state sequences S in U conditional on departing from state S1. A proper set of finite state sequences U is called a final set if all state sequences in U have the same final state.
Consider a final set of finite state sequences U and an initial set of finite state se- quences V such that the initial state of the state sequences in V is equal to the final
state of the state sequences in U . Then the concatenation of U and V , denoted by U ◦V , is defined as
U ◦ V = {(S1, . . . , Sn−1, Sn= T1, T2, . . . , Tm) | (S1, . . . , Sn) ∈ U, (T1, . . . , Tm) ∈ V }
Because both U and V are proper, the concatenation U ◦ V is also proper. Hence, P(U ◦ V ) = P(U) · P∗(V ). If U is also an initial set, thenP∗(U ◦ V ) =P∗(U ) ·P∗(V ).
It can now be observed that the initial probability IS =P(X1 = S) indicating that
the Markov chain {Xi| i ≥ 1} departs from state S represents in fact the probability
P((S)) on the finite state sequence (S). On the other hand, the probability PS,T =
P(Xi+1 = T | Xi = S) indicating that the Markov chain {Xi | i ≥ 1} transfers
from state S to state T (at any time-epoch i ≥ 1) represents in fact the probability P∗((S, T )) on the finite state sequence (S, T ) by the property of time-homogeneity.