Finite Markov Chains - INTRODUCTORY APPLICATIONS

INTRODUCTORY APPLICATIONS

2.2 Finite Markov Chains

Contents

• Finite Markov Chains – Overview – Definitions – Simulation

– Marginal Distributions – Stationary Distributions – Ergodicity

– Forecasting Future Values – Exercises

– Solutions

Overview

Markov chains are one of the most useful classes of stochastic processes Attributes:

• simple, flexible and supported by many elegant theoretical results

• valuable for building intuition about random dynamic models

• very useful in their own right

You will find them in many of the workhorse models of economics and finance

In this lecture we review some of the theory of Markov chains, with a focus on numerical methods Prerequisite knowledge is basic probability and linear algebra

Definitions

The following concepts are fundamental

Stochastic Matrices A stochastic matrix (or Markov matrix) is an n×n square matrix P = P[i, j] such that

1. each element P[i, j]is nonnegative, and 2. each row P[i,·]sums to one

Let S := {0, . . . , n−1}

Evidently, each row P[i,·]can be regarded as a distribution (probability mass function) on S It is not difficult to check³that if P is a stochastic matrix, then so is the k-th power P^kfor all k∈_N

Markov Chains A stochastic matrix describes the dynamics of a Markov chain{X_t} that takes values in the state space S

Formally, we say that a discrete time stochastic process{Xt}taking values in S is a Markov chain with stochastic matrix P if

P{Xt+1= _j|Xt=_i} =P[_{i, j}] for any t≥0 and i, j ∈S; hereP means probability

Remark: This definition implies that{X_t}has the Markov property, which is to say that, for any t, P{X_t+1|Xt} =_P{X_t+1|Xt, X_t−1, . . .}

Thus the state Xtis a complete description of the current position of the system Thus, by construction,

• P[i, j]is the probability of going from i to j in one unit of time (one step)

• P[i,·]is the conditional distribution of X_t+1given X_t =i

Another way to think about this process is to imagine that, when Xt = i, the next value X_t+1 is drawn from the i-th row P[i,·]

Rephrasing this using more algorithmic language

• At each t, the new state X_t+1is drawn from P[Xt,·]

Example 1 Consider a worker who, at any given time t, is either unemployed (state 0) or em-ployed (state 1)

Let’s write this mathematically as X_t =0 or X_t =1 Suppose that, over a one month period,

1. An employed worker loses her job and becomes unemployed with probability β∈ (0, 1) 2. An unemployed worker finds a job with probability α ∈ (_{0, 1})

3 Hint: First show that if P and Q are stochastic matrices then so is their product — to check the row sums, try postmultiplying by a column vector of ones. Finally, argue that Pⁿis a stochastic matrix using induction.

In terms of a stochastic matrix, this tells us that P[0, 1] =αand P[1, 0] = β, or P=

1−α α β 1−β

Once we have the values α and β, we can address a range of questions, such as

• What is the average duration of unemployment?

• Over the long-run, what fraction of time does a worker find herself unemployed?

• Conditional on employment, what is the probability of becoming unemployed at least once over the next 12 months?

• Etc.

We’ll cover such applications below

Example 2 Using US unemployment data, Hamilton[Ham05]estimated the stochastic matrix

P :=





0.971 0.029 0 0.145 0.778 0.077

0 0.508 0.492



 where

• the frequency is monthly

• the first state represents “normal growth”

• the second state represents “mild recession”

• the third state represents “severe recession”

For example, the matrix tells us that when the state is normal growth, the state will again be normal growth next month with probability 0.97

In general, large values on the main diagonal indicate persistence in the process{X_t}

This Markov process can also be represented as a directed graph, with edges labeled by transition probabilities

Here “ng” is normal growth, “mr” is mild recession, etc.

Simulation

One of the most natural ways to answer questions about Markov chains is to simulate them (As usual, to approximate the probability of event E, we can simulate many times and count the fraction of times that E occurs)

To simulate a Markov chain, we need its stochastic matrix P and a probability distribution ψ for the initial state

Here ψ is a probability distribution on S with the interpretation that X₀is drawn from ψ The Markov chain is then constructed via the following two rules

1. At time t=0, the initial state X₀is drawn from ψ

2. At each subsequent time t, the new state X_t+1is drawn from P[X_t,·]

In order to implement this simulation procedure, we need a function for generating draws from a given discrete distribution

We already have this functionality in hand—in the filediscrete_rv.py

The module is part of theQuantEconpackage, and defines a class DiscreteRV that can be used as follows

In [64]: from quantecon import DiscreteRV In [65]: psi = (0.1, 0.9)

In [66]: d = DiscreteRV(psi) In [67]: d.draw(5)

Out[67]: array([0, 1, 1, 1, 1]) Here

• psi is understood to be a discrete distribution on the set of outcomes 0, ..., len(psi) -1

• d.draw(5) generates 5 independent draws from this distribution

Let’s now write a function that generates time series from a specified pair P, ψ Our function will take the following three arguments

• A stochastic matrix P,

• An initial state or distribution init

• A positive integer sample_size representing the length of the time series the function should return

Let’s allow init to either be

• an integer in 0, . . . , n−1 providing a fixed starting value for X₀, or

• a discrete distribution on this same set that corresponds to the initial distribution ψ

In the latter case, a random starting value for X₀is drawn from the distribution init The function should return a time series (sample path) of length sample_size

One solution to this problem can be found in filemc_tools.pyfrom theQuantEconpackage The relevant function is mc_sample_path

Let’s see how it works using the small matrix P :=

0.4 0.6 0.2 0.8

(2.5) It happens to be true that, for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25 — we’ll see why later on

If you run the following code you should get roughly that answer import numpy as np

from quantecon import mc_sample_path P = np.array([[.4, .6], [.2, .8]])

s = mc_sample_path(P, init=(0.5, 0.5), sample_size=100000) print((s == 0).mean()) # Should be about 0.25

Marginal Distributions

Suppose that

1. {X_t}is a Markov chain with stochastic matrix P 2. the distribution of Xtis known to be ψt

What then is the distribution of X_t+1, or, more generally, of X_t+m? (Motivation for these questions is given below)

Solution Let’s consider how to solve for the distribution ψt+mof Xt+m, beginning with the case m=1

Throughout, ψ_twill refer to the distribution of X_tfor all t Hence our first aim is to find ψ_t+1given ψtand P

To begin, pick any j∈ S.

Using thelaw of total probability, we can decompose the probability that X_t+1 = j as follows:

P{X_t+1= j} =

∑

i∈S

P{X_t+1= j|Xt =i} ·_P{Xt=i}

(In words, to get the probability of being at j tomorrow, we account for all ways this can happen and sum their probabilities)

Rewriting this statement in terms of marginal and conditional probabilities gives ψ_t+1[j] =

∑

i∈S

P[i, j]ψ_t[i]

There are n such equations, one for each j∈ S

If we think of ψt+1and ψtas row vectors, these n equations are summarized by the matrix expres-sion

ψ_t+1=ψ_tP

In other words, to move the distribution forward one unit of time, we postmultiply by P By repeating this m times we move forward m steps into the future

Hence ψt+m =ψtP^mis also valid — here P^mis the m-th power of P As a special case, we see that if ψ₀is the initial distribution from which X₀is drawn, then ψ₀P^mis the distribution of X_m

This is very important, so let’s repeat it

X₀ ∼ψ₀ =⇒ X_m∼ ψ₀P^m (2.6)

and, more generally,

X_t ∼ψ_t =⇒ X_t+m ∼ψ_tP^m (2.7)

Note: Unless stated otherwise, we follow the common convention in the Markov chain literature that distributions are row vectors

Example: Powers of a Markov Matrix We know that the probability of transitioning from i to j in one step is P[i, j]

It turns out that that the probability of transitioning from i to j in m steps is P^m[i, j], the[i, j]-th element of the m-th power of P

To see why, consider again (2.7), but now with ψ_tput all probability on state i

If we regard ψtas a vector, it is a vector with 1 in the i-th position and zero elsewhere

Inserting this into (2.7), we see that, conditional on X_t =i, the distribution of X_t+mis the i-th row of P^m

In particular

P{X_t+m= j} =P^m[i, j] = [i, j]-th element of P^m

Example: Future Probabilities Recall the stochastic matrix P for recession and growthconsidered above

Suppose that the current state is unknown — perhaps statistics are available only at the end of the current month

We estimate the probability that the economy is in state i to be ψ[i]

The probability of being in recession (state 1 or state 2) in 6 months time is given by the inner product

Example 2: Cross-Sectional Distributions Recall our model of employment / unemployment dynamics for a given workerdiscussed above

Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime experi-ences are described by the specified dynamics, independently of one another

Let ψ be the current cross-sectional distribution over{0, 1}

• For example, ψ[0]is the unemployment rate

The cross-sectional distribution records the fractions of workers employed and unemployed at a given moment

The same distribution also describes the fractions of a particular worker’s career spent being em-ployed and unemem-ployed, respectively

Stationary Distributions

As stated in the previous section, we can shift probabilities forward one unit of time via postmul-tiplication by P

Some distributions are invariant under this updating process — for example, In [2]: P = np.array([[.4, .6], [.2, .8]]) # after import numpy as np In [3]: psi = (0.25, 0.75)

In [4]: np.dot(psi, P)

Out[4]: array([ 0.25, 0.75])

Such distributions are called stationary, or invariant Formally, a distribution ψ^∗ on S is called stationary for P if ψ^∗= ψ^∗P

From this equality we immediately get ψ^∗ =ψ^∗P^tfor all t

This tells us an important fact: If the distribution of X₀ is a stationary distribution, then X_t will have this same distribution for all t

Hence stationary distributions have a natural interpretation as stochastic steady states — we’ll discuss this more in just a moment

Mathematically, a stationary distribution is just a fixed point of P when P is thought of as the map ψ7→ψPfrom (row) vectors to (row) vectors

At least one such distribution exists for each stochastic matrix P — applyBrouwer’s fixed point theorem, or seeEDTC, theorem 4.3.5

There may in fact be many stationary distributions corresponding to a given stochastic matrix P For example, if P is the identity matrix, then all distributions are stationary

One sufficient condition for uniqueness is uniform ergodicity:

Def. Stochastic matrix P is called uniformly ergodic if there exists a positive integer m such that all elements of P^mare strictly positive

For further details on uniqueness and uniform ergodicity, see, for example,EDTC, theorem 4.3.18

Example Recall our model of employment / unemployment dynamics for a given worker dis-cussed above

Assuming α∈ (0, 1)and β∈ (0, 1), the uniform ergodicity condition is satisfied

Let ψ^∗ = (p, 1−p)be the stationary distribution, so that p corresponds to unemployment (state 0)

Using ψ^∗ =ψ^∗P and a bit of algebra yields

p= ^β α+β

This is, in some sense, a steady state probability of unemployment — more on interpretation below Not surprisingly it tends to zero as β→0, and to one as α→0

Calculating Stationary Distribution As discussed above, a given Markov matrix P can have many stationary distributions

That is, there can be many row vectors ψ such that ψ=ψP

In fact if P has two distinct stationary distributions ψ₁, ψ₂then it has infinitely many, since in this case, as you can verify,

ψ₃:= λψ₁+ (1−λ)ψ₂ is a stationary distribuiton for P for any λ∈ [0, 1]

If we restrict attention to the case where only one stationary distribution exists, one option for finding it is to try to solve the linear system ψ(In−P) =0 for ψ, where Inis the n×n identity But the zero vector solves this equation

Hence we need to impose the restriction that the solution must be a probability distribution One function that will do this for us and implement a suitable algorithm is mc_compute_stationary from mc_tools.py

Let’s test it using the matrix (2.5) import numpy as np

from quantecon import mc_compute_stationary P = np.array([[.4, .6], [.2, .8]])

print(mc_compute_stationary(P))

If you run this you should find that the unique stationary distribution is (0.25, 0.75)

Convergence to Stationarity Let P be a stochastic matrix such that the uniform ergodicity as-sumption is valid

We know that under this condition there is a unique stationary distribution ψ^∗

In fact, under the same condition, we have another important result: for any nonnegative row vector ψ summing to one (i.e., distribution),

ψP^t→ψ^∗ as t→_∞ (2.8)

In view of ourpreceding discussion, this states that the distribution of X_tconverges to ψ^∗, regardless of the distribution of X₀

This adds considerable weight to our interpretation of ψ^∗as a stochastic steady state For one of several well-known proofs, seeEDTC, theorem 4.3.18

The convergence in (2.8) is illustrated in the next figure

Here

• P is the stochastic matrix for recession and growthconsidered above

• The highest red dot is an arbitrarily chosen initial probability distribution ψ, represented as a vector inR³

• The other red dots are the distributions ψP^tfor t=1, 2, . . .

• The black dot is ψ^∗

The code for the figure can be found in the file examples/mc_convergence_plot.py in themain repository— you might like to try experimenting with different initial conditions

Ergodicity

Under the very same condition of uniform ergodicity, yet another important result obtains: If 1. {X_t}is a Markov chain with stochastic matrix P

2. P is uniformly ergodic with stationary distribution ψ^∗ then,∀j∈ S,

1 n

∑

n t=1

1{Xt =_j} →ψ^∗[_j] _{as n}→_∞ (2.9) Here

• 1{Xt =j} =1 if Xt= j and zero otherwise

• convergence is with probability one

• the result does not depend on the distribution (or value) of X0

The result tells us that the fraction of time the chain spends at state j converges to ψ^∗[j]as time goes to infinity This gives us another way to interpret the stationary distribution — provided that the convergence result in (2.9) is valid

Technically, the convergence in (2.9) is a special case of a law of large numbers result for Markov chains — seeEDTC, section 4.3.4 for details

Example Recall our cross-sectional interpretation of the employment / unemployment model discussed above

Assume that α∈ (0, 1)and β∈ (0, 1), so the uniform ergodicity condition is satisfied We saw that the stationary distribution is(p, 1−p), where

p= ^β α+β

In the cross-sectional interpretation, this is the fraction of people unemployed

In view of our latest (ergodicity) result, it is also the fraction of time that a worker can expect to spend unemployed

Thus, in the long-run, cross-sectional averages for a population and time-series averages for a given person coincide

This is one interpretation of the notion of ergodicity

Forecasting Future Values

Let P be an n×n stochastic matrix with

P_ij =_P{xt+₁ =e_j|xt =e_i} where e_iis the i-th unit vector inRⁿ.

We are said to be “in state i” when xt=e_i Let ¯y be an n×1 vector and let y_t = ¯y⁰x_t In other words, yt = ¯y_i if xt =e_i

Here are some useful prediction formulas:

Premultiplication by(I−βP)⁻¹amounts to “applying the resolvent operator“

Exercises

Exercise 1 According to the discussionimmediately above, if a worker’s employment dynamics obey the stochastic matrix

P= Your exercise is to illustrate this convergence

First,

• generate one simulated time series{X_t}of length 10,000, starting at X₀=0

• plot ¯Xn−p against n, where p is as defined above Second, repeat the first step, but this time taking X₀ =1 In both cases, set α= β=0.1

The result should look something like the following — modulo randomness, of course (You don’t need to add the fancy touches to the graph—see the solution if you’re interested)

In document Quantitative Economics With Python (Page 190-200)