1.6 Stochastic Processes
1.6.3 Markov Chains
The simplest stochastic process is a sequence of exchangeable random vari- ables; that is, a sequence with no structure. A simple structure can be im- posed by substituting conditioning for independence. A sequence of random
variables with the Markov property is called a Markov process. A Markov process in which the state space is countable is called aMarkov chain. (The term “Markov chain” is also sometimes used to refer to any Markov process, as in the phrase “Markov chain Monte Carlo”, in applications of which the state space is often continuous.)
The theory of Markov chains is usually developed first for discrete-time
chains, that is, those with a countable index set, and then extended to
continuous-timechains.
If the state space is countable, it is equivalent toX ={1,2, . . .}. IfX is a random variable from some sample space to X, and
πi= Pr(X =i), (1.270)
then the vectorπ= (π1, π2, . . .) defines a distribution ofX onX. Formally, we define a Markov chain (of random variables)X0, X1, . . .in terms of an initial distributionπand a conditional distribution forXt+1 givenXt. LetX0 have distributionπ, and givenXt=j, letXt+1have distribution (pij;i∈ X); that is,pij is the probability of a transition from statej at timetto stateiat time t+ 1, and K= (pij) is called thetransition matrixof the chain. The initial distributionπand the transition matrix K characterize the chain, which we sometimes denote as Markov(π, K). It is clear thatKis a stochastic matrix, and hence ρ(K) =kKk∞= 1, and (1,1) is an eigenpair ofK.
IfK does not depend on the time (and our notation indicates that we are assuming this), the Markov chain is stationary.
A discrete-time Markov chain{Xt} with discrete state space{x1, x2, . . .} can be characterized by the probabilities pij = Pr(Xt+1 = xi | Xt = xj). Clearly,Pi∈Ipij = 1. A vector such asp∗j whose elements sum to 1 is called a stochastic vectoror a distribution vector.
Because for eachj,Pi∈Ipij= 1, K is aright stochastic matrix.
The properties of a Markov chain are determined by the properties of the transition matrix. Transition matrices have a number of special properties, which we discuss in Section 0.3.6, beginning on page818.
(Note that many people who work with Markov chains define the transition matrix as the transpose of K above. This is not a good idea, because in ap- plications with state vectors, the state vectors would naturally have to be row vectors. Until about the middle of the twentieth century, many mathematicians thought of vectors as row vectors; that is, a system of linear equations would be written as xA=b. Nowadays, almost all mathematicians think of vectors as column vectors in matrix algebra. Even in some of my previous writings, e.g., Gentle (2007), I have called the transpose of K the transition matrix, and I defined a stochastic matrix in terms of the transpose. The transpose of a right stochastic matrix is a left stochastic matrix, which is what is commonly meant by the unqualified phrase “stochastic matrix”. I think that it is time to adopt a notation that is more consistent with current matrix/vector notation. This is merely a change in notation; no concepts require any change.)
If we assume that Xt is a random variable taking values in {x1, x2, . . .} and with a PDF (or probability mass function) given by
Pr(Xt=xi) =πi(t), (1.271) and we write π(t)= (π(t)
1 , π (t)
2 , . . .), then the PDF at timet+ 1 is
π(t+1)=Kπ(t). (1.272)
Many properties of a Markov chain depend on whether the transition matrix is reducible or not.
Because 1 is an eigenvalue and the vector 1 is the eigenvector associated with 1, from equation (0.3.70), we have
lim t→∞K
t= 1π
s, (1.273)
where πsis the Perron vector of KT.
This also gives us the limiting distribution for an irreducible, primitive Markov chain,
lim t→∞π
(t)=π s.
The Perron vector has the property πs=KTπsof course, so this distribution is the invariant distributionof the chain.
The definition means that (1,1) is an eigenpair of any stochastic matrix. It is also clear that ifK is a stochastic matrix, thenkKk∞= 1, and because
ρ(K)≤ kKk for any norm and 1 is an eigenvalue ofK, we haveρ(K) = 1. A stochastic matrix may not be positive, and it may be reducible or irre- ducible. (Hence, (1,1) may not be the Perron root and Perron eigenvector.)
If the state space is countably infinite, the vectors and matrices have in- finite order; that is, they have “infinite dimension”. (Note that this use of “dimension” is different from our standard definition that is based on linear independence.)
We write the initial distribution asπ(0). A distribution at timet can be expressed in terms ofπ(0) andK:
π(t)=Ktπ(0). (1.274)
Ktis often called thet-step transition matrix.
The transition matrix determines various relationships among the states of a Markov chain. Stateiis said to beaccessiblefrom statejif it can be reached from state j in a finite number of steps. This is equivalent to (Kt)
ij >0 for some t. If state i is accessible from state j and state j is accessible from state i, statesiand j are said to communicate. Communication is clearly an equivalence relation. The set of all states that communicate with each other is an equivalence class. States belonging to different equivalence classes do not communicate, although a state in one class may be accessible from a state
in a different class. If all states in a Markov chain are in a single equivalence class, the chain is said to be irreducible.
The limiting behavior of the Markov chain is of interest. This of course can be analyzed in terms of limt→∞Kt. Whether or not this limit exists depends
on the properties ofK.
Galton-Watson Process
An interesting class of Markov chains are branching processes, which model numbers of particles generated by existing particles. One of the simplest branching processes is the Galton-Watson process, in which at time t each particle is assumed to be replaced by 0,1,2, . . . particles with probabilities π0, π1, π2, . . ., where πk ≥ 0,π0+π1 <1, and Pπk = 1. The replacements of all particles at any time t are independent of each other. The condition π0+π1<1 prevents the process from being trivial.
*** add more
Continuous Time Markov Chains
In many cases it seems natural to allow the index of the Markov process to range over a continuous interval. The simplest type of continuous time Markov chain is a Poisson process.
Example 1.32 Poisson process
Consider a sequence of iid random variables,Y1, Y2, . . .distributed as exponential(0, θ), and build the random variablesTk =Pki=1Yi. (The Yis are the exponential
spacings as in Example1.18.) *** prove Markov property *** complete
birth process ***add
K(t) = etR
***fixRintensity rate.riinonpositive,rijfori6=jnonnegative,Pi∈Irij = 0 for allj.