INTRODUCTORY APPLICATIONS
2.2 Finite Markov Chains
Contents
• Finite Markov Chains – Overview – Definitions – Simulation
– Marginal Distributions – Stationary Distributions – Ergodicity
– Forecasting Future Values – Exercises
– Solutions
Overview
Markov chains are one of the most useful classes of stochastic processes Attributes:
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• very useful in their own right
You will find them in many of the workhorse models of economics and finance
In this lecture we review some of the theory of Markov chains, with a focus on numerical methods Prerequisite knowledge is basic probability and linear algebra
Definitions
The following concepts are fundamental
Stochastic Matrices A stochastic matrix (or Markov matrix) is an n×n square matrix P = P[i, j] such that
1. each element P[i, j]is nonnegative, and 2. each row P[i,·]sums to one
Let S := {0, . . . , n−1}
Evidently, each row P[i,·]can be regarded as a distribution (probability mass function) on S It is not difficult to check3that if P is a stochastic matrix, then so is the k-th power Pkfor all k∈N
Markov Chains A stochastic matrix describes the dynamics of a Markov chain{Xt} that takes values in the state space S
Formally, we say that a discrete time stochastic process{Xt}taking values in S is a Markov chain with stochastic matrix P if
P{Xt+1= j|Xt=i} =P[i, j] for any t≥0 and i, j ∈S; hereP means probability
Remark: This definition implies that{Xt}has the Markov property, which is to say that, for any t, P{Xt+1|Xt} =P{Xt+1|Xt, Xt−1, . . .}
Thus the state Xtis a complete description of the current position of the system Thus, by construction,
• P[i, j]is the probability of going from i to j in one unit of time (one step)
• P[i,·]is the conditional distribution of Xt+1given Xt =i
Another way to think about this process is to imagine that, when Xt = i, the next value Xt+1 is drawn from the i-th row P[i,·]
Rephrasing this using more algorithmic language
• At each t, the new state Xt+1is drawn from P[Xt,·]
Example 1 Consider a worker who, at any given time t, is either unemployed (state 0) or em-ployed (state 1)
Let’s write this mathematically as Xt =0 or Xt =1 Suppose that, over a one month period,
1. An employed worker loses her job and becomes unemployed with probability β∈ (0, 1) 2. An unemployed worker finds a job with probability α ∈ (0, 1)
3 Hint: First show that if P and Q are stochastic matrices then so is their product — to check the row sums, try postmultiplying by a column vector of ones. Finally, argue that Pnis a stochastic matrix using induction.
In terms of a stochastic matrix, this tells us that P[0, 1] =αand P[1, 0] = β, or P=
1−α α β 1−β
Once we have the values α and β, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least once over the next 12 months?
• Etc.
We’ll cover such applications below
Example 2 Using US unemployment data, Hamilton[Ham05]estimated the stochastic matrix
P :=
0.971 0.029 0 0.145 0.778 0.077
0 0.508 0.492
where
• the frequency is monthly
• the first state represents “normal growth”
• the second state represents “mild recession”
• the third state represents “severe recession”
For example, the matrix tells us that when the state is normal growth, the state will again be normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process{Xt}
This Markov process can also be represented as a directed graph, with edges labeled by transition probabilities
Here “ng” is normal growth, “mr” is mild recession, etc.
Simulation
One of the most natural ways to answer questions about Markov chains is to simulate them (As usual, to approximate the probability of event E, we can simulate many times and count the fraction of times that E occurs)
To simulate a Markov chain, we need its stochastic matrix P and a probability distribution ψ for the initial state
Here ψ is a probability distribution on S with the interpretation that X0is drawn from ψ The Markov chain is then constructed via the following two rules
1. At time t=0, the initial state X0is drawn from ψ
2. At each subsequent time t, the new state Xt+1is drawn from P[Xt,·]
In order to implement this simulation procedure, we need a function for generating draws from a given discrete distribution
We already have this functionality in hand—in the filediscrete_rv.py
The module is part of theQuantEconpackage, and defines a class DiscreteRV that can be used as follows
In [64]: from quantecon import DiscreteRV In [65]: psi = (0.1, 0.9)
In [66]: d = DiscreteRV(psi) In [67]: d.draw(5)
Out[67]: array([0, 1, 1, 1, 1]) Here
• psi is understood to be a discrete distribution on the set of outcomes 0, ..., len(psi) -1
• d.draw(5) generates 5 independent draws from this distribution
Let’s now write a function that generates time series from a specified pair P, ψ Our function will take the following three arguments
• A stochastic matrix P,
• An initial state or distribution init
• A positive integer sample_size representing the length of the time series the function should return
Let’s allow init to either be
• an integer in 0, . . . , n−1 providing a fixed starting value for X0, or
• a discrete distribution on this same set that corresponds to the initial distribution ψ
In the latter case, a random starting value for X0is drawn from the distribution init The function should return a time series (sample path) of length sample_size
One solution to this problem can be found in filemc_tools.pyfrom theQuantEconpackage The relevant function is mc_sample_path
Let’s see how it works using the small matrix P :=
0.4 0.6 0.2 0.8
(2.5) It happens to be true that, for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25 — we’ll see why later on
If you run the following code you should get roughly that answer import numpy as np
from quantecon import mc_sample_path P = np.array([[.4, .6], [.2, .8]])
s = mc_sample_path(P, init=(0.5, 0.5), sample_size=100000) print((s == 0).mean()) # Should be about 0.25
Marginal Distributions
Suppose that
1. {Xt}is a Markov chain with stochastic matrix P 2. the distribution of Xtis known to be ψt
What then is the distribution of Xt+1, or, more generally, of Xt+m? (Motivation for these questions is given below)
Solution Let’s consider how to solve for the distribution ψt+mof Xt+m, beginning with the case m=1
Throughout, ψtwill refer to the distribution of Xtfor all t Hence our first aim is to find ψt+1given ψtand P
To begin, pick any j∈ S.
Using thelaw of total probability, we can decompose the probability that Xt+1 = j as follows:
P{Xt+1= j} =
∑
i∈S
P{Xt+1= j|Xt =i} ·P{Xt=i}
(In words, to get the probability of being at j tomorrow, we account for all ways this can happen and sum their probabilities)
Rewriting this statement in terms of marginal and conditional probabilities gives ψt+1[j] =
∑
i∈S
P[i, j]ψt[i]
There are n such equations, one for each j∈ S
If we think of ψt+1and ψtas row vectors, these n equations are summarized by the matrix expres-sion
ψt+1=ψtP
In other words, to move the distribution forward one unit of time, we postmultiply by P By repeating this m times we move forward m steps into the future
Hence ψt+m =ψtPmis also valid — here Pmis the m-th power of P As a special case, we see that if ψ0is the initial distribution from which X0is drawn, then ψ0Pmis the distribution of Xm
This is very important, so let’s repeat it
X0 ∼ψ0 =⇒ Xm∼ ψ0Pm (2.6)
and, more generally,
Xt ∼ψt =⇒ Xt+m ∼ψtPm (2.7)
Note: Unless stated otherwise, we follow the common convention in the Markov chain literature that distributions are row vectors
Example: Powers of a Markov Matrix We know that the probability of transitioning from i to j in one step is P[i, j]
It turns out that that the probability of transitioning from i to j in m steps is Pm[i, j], the[i, j]-th element of the m-th power of P
To see why, consider again (2.7), but now with ψtput all probability on state i
If we regard ψtas a vector, it is a vector with 1 in the i-th position and zero elsewhere
Inserting this into (2.7), we see that, conditional on Xt =i, the distribution of Xt+mis the i-th row of Pm
In particular
P{Xt+m= j} =Pm[i, j] = [i, j]-th element of Pm
Example: Future Probabilities Recall the stochastic matrix P for recession and growthconsidered above
Suppose that the current state is unknown — perhaps statistics are available only at the end of the current month
We estimate the probability that the economy is in state i to be ψ[i]
The probability of being in recession (state 1 or state 2) in 6 months time is given by the inner product
Example 2: Cross-Sectional Distributions Recall our model of employment / unemployment dynamics for a given workerdiscussed above
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime experi-ences are described by the specified dynamics, independently of one another
Let ψ be the current cross-sectional distribution over{0, 1}
• For example, ψ[0]is the unemployment rate
The cross-sectional distribution records the fractions of workers employed and unemployed at a given moment
The same distribution also describes the fractions of a particular worker’s career spent being em-ployed and unemem-ployed, respectively
Stationary Distributions
As stated in the previous section, we can shift probabilities forward one unit of time via postmul-tiplication by P
Some distributions are invariant under this updating process — for example, In [2]: P = np.array([[.4, .6], [.2, .8]]) # after import numpy as np In [3]: psi = (0.25, 0.75)
In [4]: np.dot(psi, P)
Out[4]: array([ 0.25, 0.75])
Such distributions are called stationary, or invariant Formally, a distribution ψ∗ on S is called stationary for P if ψ∗= ψ∗P
From this equality we immediately get ψ∗ =ψ∗Ptfor all t
This tells us an important fact: If the distribution of X0 is a stationary distribution, then Xt will have this same distribution for all t
Hence stationary distributions have a natural interpretation as stochastic steady states — we’ll discuss this more in just a moment
Mathematically, a stationary distribution is just a fixed point of P when P is thought of as the map ψ7→ψPfrom (row) vectors to (row) vectors
At least one such distribution exists for each stochastic matrix P — applyBrouwer’s fixed point theorem, or seeEDTC, theorem 4.3.5
There may in fact be many stationary distributions corresponding to a given stochastic matrix P For example, if P is the identity matrix, then all distributions are stationary
One sufficient condition for uniqueness is uniform ergodicity:
Def. Stochastic matrix P is called uniformly ergodic if there exists a positive integer m such that all elements of Pmare strictly positive
For further details on uniqueness and uniform ergodicity, see, for example,EDTC, theorem 4.3.18
Example Recall our model of employment / unemployment dynamics for a given worker dis-cussed above
Assuming α∈ (0, 1)and β∈ (0, 1), the uniform ergodicity condition is satisfied
Let ψ∗ = (p, 1−p)be the stationary distribution, so that p corresponds to unemployment (state 0)
Using ψ∗ =ψ∗P and a bit of algebra yields
p= β α+β
This is, in some sense, a steady state probability of unemployment — more on interpretation below Not surprisingly it tends to zero as β→0, and to one as α→0
Calculating Stationary Distribution As discussed above, a given Markov matrix P can have many stationary distributions
That is, there can be many row vectors ψ such that ψ=ψP
In fact if P has two distinct stationary distributions ψ1, ψ2then it has infinitely many, since in this case, as you can verify,
ψ3:= λψ1+ (1−λ)ψ2 is a stationary distribuiton for P for any λ∈ [0, 1]
If we restrict attention to the case where only one stationary distribution exists, one option for finding it is to try to solve the linear system ψ(In−P) =0 for ψ, where Inis the n×n identity But the zero vector solves this equation
Hence we need to impose the restriction that the solution must be a probability distribution One function that will do this for us and implement a suitable algorithm is mc_compute_stationary from mc_tools.py
Let’s test it using the matrix (2.5) import numpy as np
from quantecon import mc_compute_stationary P = np.array([[.4, .6], [.2, .8]])
print(mc_compute_stationary(P))
If you run this you should find that the unique stationary distribution is (0.25, 0.75)
Convergence to Stationarity Let P be a stochastic matrix such that the uniform ergodicity as-sumption is valid
We know that under this condition there is a unique stationary distribution ψ∗
In fact, under the same condition, we have another important result: for any nonnegative row vector ψ summing to one (i.e., distribution),
ψPt→ψ∗ as t→∞ (2.8)
In view of ourpreceding discussion, this states that the distribution of Xtconverges to ψ∗, regardless of the distribution of X0
This adds considerable weight to our interpretation of ψ∗as a stochastic steady state For one of several well-known proofs, seeEDTC, theorem 4.3.18
The convergence in (2.8) is illustrated in the next figure
Here
• P is the stochastic matrix for recession and growthconsidered above
• The highest red dot is an arbitrarily chosen initial probability distribution ψ, represented as a vector inR3
• The other red dots are the distributions ψPtfor t=1, 2, . . .
• The black dot is ψ∗
The code for the figure can be found in the file examples/mc_convergence_plot.py in themain repository— you might like to try experimenting with different initial conditions
Ergodicity
Under the very same condition of uniform ergodicity, yet another important result obtains: If 1. {Xt}is a Markov chain with stochastic matrix P
2. P is uniformly ergodic with stationary distribution ψ∗ then,∀j∈ S,
1 n
∑
n t=11{Xt =j} →ψ∗[j] as n→∞ (2.9) Here
• 1{Xt =j} =1 if Xt= j and zero otherwise
• convergence is with probability one
• the result does not depend on the distribution (or value) of X0
The result tells us that the fraction of time the chain spends at state j converges to ψ∗[j]as time goes to infinity This gives us another way to interpret the stationary distribution — provided that the convergence result in (2.9) is valid
Technically, the convergence in (2.9) is a special case of a law of large numbers result for Markov chains — seeEDTC, section 4.3.4 for details
Example Recall our cross-sectional interpretation of the employment / unemployment model discussed above
Assume that α∈ (0, 1)and β∈ (0, 1), so the uniform ergodicity condition is satisfied We saw that the stationary distribution is(p, 1−p), where
p= β α+β
In the cross-sectional interpretation, this is the fraction of people unemployed
In view of our latest (ergodicity) result, it is also the fraction of time that a worker can expect to spend unemployed
Thus, in the long-run, cross-sectional averages for a population and time-series averages for a given person coincide
This is one interpretation of the notion of ergodicity
Forecasting Future Values
Let P be an n×n stochastic matrix with
Pij =P{xt+1 =ej|xt =ei} where eiis the i-th unit vector inRn.
We are said to be “in state i” when xt=ei Let ¯y be an n×1 vector and let yt = ¯y0xt In other words, yt = ¯yi if xt =ei
Here are some useful prediction formulas:
Premultiplication by(I−βP)−1amounts to “applying the resolvent operator“
Exercises
Exercise 1 According to the discussionimmediately above, if a worker’s employment dynamics obey the stochastic matrix
P= Your exercise is to illustrate this convergence
First,
• generate one simulated time series{Xt}of length 10,000, starting at X0=0
• plot ¯Xn−p against n, where p is as defined above Second, repeat the first step, but this time taking X0 =1 In both cases, set α= β=0.1
The result should look something like the following — modulo randomness, of course (You don’t need to add the fancy touches to the graph—see the solution if you’re interested)