• No results found

Finite Markov Chains

In document Quantitative Economics With Python (Page 190-200)

INTRODUCTORY APPLICATIONS

2.2 Finite Markov Chains

Contents

• Finite Markov Chains Overview Definitions Simulation

Marginal Distributions Stationary Distributions Ergodicity

Forecasting Future Values Exercises

Solutions

Overview

Markov chains are one of the most useful classes of stochastic processes Attributes:

• simple, flexible and supported by many elegant theoretical results

• valuable for building intuition about random dynamic models

• very useful in their own right

You will find them in many of the workhorse models of economics and finance

In this lecture we review some of the theory of Markov chains, with a focus on numerical methods Prerequisite knowledge is basic probability and linear algebra

Definitions

The following concepts are fundamental

Stochastic Matrices A stochastic matrix (or Markov matrix) is an n×n square matrix P = P[i, j] such that

1. each element P[i, j]is nonnegative, and 2. each row P[i,·]sums to one

Let S := {0, . . . , n−1}

Evidently, each row P[i,·]can be regarded as a distribution (probability mass function) on S It is not difficult to check3that if P is a stochastic matrix, then so is the k-th power Pkfor all k∈N

Markov Chains A stochastic matrix describes the dynamics of a Markov chain{Xt} that takes values in the state space S

Formally, we say that a discrete time stochastic process{Xt}taking values in S is a Markov chain with stochastic matrix P if

P{Xt+1= j|Xt=i} =P[i, j] for any t≥0 and i, j ∈S; hereP means probability

Remark: This definition implies that{Xt}has the Markov property, which is to say that, for any t, P{Xt+1|Xt} =P{Xt+1|Xt, Xt1, . . .}

Thus the state Xtis a complete description of the current position of the system Thus, by construction,

• P[i, j]is the probability of going from i to j in one unit of time (one step)

• P[i,·]is the conditional distribution of Xt+1given Xt =i

Another way to think about this process is to imagine that, when Xt = i, the next value Xt+1 is drawn from the i-th row P[i,·]

Rephrasing this using more algorithmic language

• At each t, the new state Xt+1is drawn from P[Xt,·]

Example 1 Consider a worker who, at any given time t, is either unemployed (state 0) or em-ployed (state 1)

Let’s write this mathematically as Xt =0 or Xt =1 Suppose that, over a one month period,

1. An employed worker loses her job and becomes unemployed with probability β∈ (0, 1) 2. An unemployed worker finds a job with probability α ∈ (0, 1)

3 Hint: First show that if P and Q are stochastic matrices then so is their product — to check the row sums, try postmultiplying by a column vector of ones. Finally, argue that Pnis a stochastic matrix using induction.

In terms of a stochastic matrix, this tells us that P[0, 1] =αand P[1, 0] = β, or P=

 1−α α β 1−β



Once we have the values α and β, we can address a range of questions, such as

• What is the average duration of unemployment?

• Over the long-run, what fraction of time does a worker find herself unemployed?

• Conditional on employment, what is the probability of becoming unemployed at least once over the next 12 months?

• Etc.

We’ll cover such applications below

Example 2 Using US unemployment data, Hamilton[Ham05]estimated the stochastic matrix

P :=

0.971 0.029 0 0.145 0.778 0.077

0 0.508 0.492

 where

• the frequency is monthly

• the first state represents “normal growth”

• the second state represents “mild recession”

• the third state represents “severe recession”

For example, the matrix tells us that when the state is normal growth, the state will again be normal growth next month with probability 0.97

In general, large values on the main diagonal indicate persistence in the process{Xt}

This Markov process can also be represented as a directed graph, with edges labeled by transition probabilities

Here “ng” is normal growth, “mr” is mild recession, etc.

Simulation

One of the most natural ways to answer questions about Markov chains is to simulate them (As usual, to approximate the probability of event E, we can simulate many times and count the fraction of times that E occurs)

To simulate a Markov chain, we need its stochastic matrix P and a probability distribution ψ for the initial state

Here ψ is a probability distribution on S with the interpretation that X0is drawn from ψ The Markov chain is then constructed via the following two rules

1. At time t=0, the initial state X0is drawn from ψ

2. At each subsequent time t, the new state Xt+1is drawn from P[Xt,·]

In order to implement this simulation procedure, we need a function for generating draws from a given discrete distribution

We already have this functionality in hand—in the filediscrete_rv.py

The module is part of theQuantEconpackage, and defines a class DiscreteRV that can be used as follows

In [64]: from quantecon import DiscreteRV In [65]: psi = (0.1, 0.9)

In [66]: d = DiscreteRV(psi) In [67]: d.draw(5)

Out[67]: array([0, 1, 1, 1, 1]) Here

• psi is understood to be a discrete distribution on the set of outcomes 0, ..., len(psi) -1

• d.draw(5) generates 5 independent draws from this distribution

Let’s now write a function that generates time series from a specified pair P, ψ Our function will take the following three arguments

• A stochastic matrix P,

• An initial state or distribution init

• A positive integer sample_size representing the length of the time series the function should return

Let’s allow init to either be

• an integer in 0, . . . , n−1 providing a fixed starting value for X0, or

• a discrete distribution on this same set that corresponds to the initial distribution ψ

In the latter case, a random starting value for X0is drawn from the distribution init The function should return a time series (sample path) of length sample_size

One solution to this problem can be found in filemc_tools.pyfrom theQuantEconpackage The relevant function is mc_sample_path

Let’s see how it works using the small matrix P :=

 0.4 0.6 0.2 0.8



(2.5) It happens to be true that, for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25 — we’ll see why later on

If you run the following code you should get roughly that answer import numpy as np

from quantecon import mc_sample_path P = np.array([[.4, .6], [.2, .8]])

s = mc_sample_path(P, init=(0.5, 0.5), sample_size=100000) print((s == 0).mean()) # Should be about 0.25

Marginal Distributions

Suppose that

1. {Xt}is a Markov chain with stochastic matrix P 2. the distribution of Xtis known to be ψt

What then is the distribution of Xt+1, or, more generally, of Xt+m? (Motivation for these questions is given below)

Solution Let’s consider how to solve for the distribution ψt+mof Xt+m, beginning with the case m=1

Throughout, ψtwill refer to the distribution of Xtfor all t Hence our first aim is to find ψt+1given ψtand P

To begin, pick any j∈ S.

Using thelaw of total probability, we can decompose the probability that Xt+1 = j as follows:

P{Xt+1= j} =

iS

P{Xt+1= j|Xt =i} ·P{Xt=i}

(In words, to get the probability of being at j tomorrow, we account for all ways this can happen and sum their probabilities)

Rewriting this statement in terms of marginal and conditional probabilities gives ψt+1[j] =

iS

P[i, j]ψt[i]

There are n such equations, one for each j∈ S

If we think of ψt+1and ψtas row vectors, these n equations are summarized by the matrix expres-sion

ψt+1=ψtP

In other words, to move the distribution forward one unit of time, we postmultiply by P By repeating this m times we move forward m steps into the future

Hence ψt+m =ψtPmis also valid — here Pmis the m-th power of P As a special case, we see that if ψ0is the initial distribution from which X0is drawn, then ψ0Pmis the distribution of Xm

This is very important, so let’s repeat it

X0ψ0 =⇒ Xmψ0Pm (2.6)

and, more generally,

Xtψt =⇒ Xt+mψtPm (2.7)

Note: Unless stated otherwise, we follow the common convention in the Markov chain literature that distributions are row vectors

Example: Powers of a Markov Matrix We know that the probability of transitioning from i to j in one step is P[i, j]

It turns out that that the probability of transitioning from i to j in m steps is Pm[i, j], the[i, j]-th element of the m-th power of P

To see why, consider again (2.7), but now with ψtput all probability on state i

If we regard ψtas a vector, it is a vector with 1 in the i-th position and zero elsewhere

Inserting this into (2.7), we see that, conditional on Xt =i, the distribution of Xt+mis the i-th row of Pm

In particular

P{Xt+m= j} =Pm[i, j] = [i, j]-th element of Pm

Example: Future Probabilities Recall the stochastic matrix P for recession and growthconsidered above

Suppose that the current state is unknown — perhaps statistics are available only at the end of the current month

We estimate the probability that the economy is in state i to be ψ[i]

The probability of being in recession (state 1 or state 2) in 6 months time is given by the inner product

Example 2: Cross-Sectional Distributions Recall our model of employment / unemployment dynamics for a given workerdiscussed above

Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime experi-ences are described by the specified dynamics, independently of one another

Let ψ be the current cross-sectional distribution over{0, 1}

• For example, ψ[0]is the unemployment rate

The cross-sectional distribution records the fractions of workers employed and unemployed at a given moment

The same distribution also describes the fractions of a particular worker’s career spent being em-ployed and unemem-ployed, respectively

Stationary Distributions

As stated in the previous section, we can shift probabilities forward one unit of time via postmul-tiplication by P

Some distributions are invariant under this updating process — for example, In [2]: P = np.array([[.4, .6], [.2, .8]]) # after import numpy as np In [3]: psi = (0.25, 0.75)

In [4]: np.dot(psi, P)

Out[4]: array([ 0.25, 0.75])

Such distributions are called stationary, or invariant Formally, a distribution ψ on S is called stationary for P if ψ= ψP

From this equality we immediately get ψ =ψPtfor all t

This tells us an important fact: If the distribution of X0 is a stationary distribution, then Xt will have this same distribution for all t

Hence stationary distributions have a natural interpretation as stochastic steady states — we’ll discuss this more in just a moment

Mathematically, a stationary distribution is just a fixed point of P when P is thought of as the map ψ7→ψPfrom (row) vectors to (row) vectors

At least one such distribution exists for each stochastic matrix P — applyBrouwer’s fixed point theorem, or seeEDTC, theorem 4.3.5

There may in fact be many stationary distributions corresponding to a given stochastic matrix P For example, if P is the identity matrix, then all distributions are stationary

One sufficient condition for uniqueness is uniform ergodicity:

Def. Stochastic matrix P is called uniformly ergodic if there exists a positive integer m such that all elements of Pmare strictly positive

For further details on uniqueness and uniform ergodicity, see, for example,EDTC, theorem 4.3.18

Example Recall our model of employment / unemployment dynamics for a given worker dis-cussed above

Assuming α∈ (0, 1)and β∈ (0, 1), the uniform ergodicity condition is satisfied

Let ψ = (p, 1−p)be the stationary distribution, so that p corresponds to unemployment (state 0)

Using ψ =ψP and a bit of algebra yields

p= β α+β

This is, in some sense, a steady state probability of unemployment — more on interpretation below Not surprisingly it tends to zero as β0, and to one as α→0

Calculating Stationary Distribution As discussed above, a given Markov matrix P can have many stationary distributions

That is, there can be many row vectors ψ such that ψ=ψP

In fact if P has two distinct stationary distributions ψ1, ψ2then it has infinitely many, since in this case, as you can verify,

ψ3:= λψ1+ (1−λ)ψ2 is a stationary distribuiton for P for any λ∈ [0, 1]

If we restrict attention to the case where only one stationary distribution exists, one option for finding it is to try to solve the linear system ψ(In−P) =0 for ψ, where Inis the n×n identity But the zero vector solves this equation

Hence we need to impose the restriction that the solution must be a probability distribution One function that will do this for us and implement a suitable algorithm is mc_compute_stationary from mc_tools.py

Let’s test it using the matrix (2.5) import numpy as np

from quantecon import mc_compute_stationary P = np.array([[.4, .6], [.2, .8]])

print(mc_compute_stationary(P))

If you run this you should find that the unique stationary distribution is (0.25, 0.75)

Convergence to Stationarity Let P be a stochastic matrix such that the uniform ergodicity as-sumption is valid

We know that under this condition there is a unique stationary distribution ψ

In fact, under the same condition, we have another important result: for any nonnegative row vector ψ summing to one (i.e., distribution),

ψPtψ as t→ (2.8)

In view of ourpreceding discussion, this states that the distribution of Xtconverges to ψ, regardless of the distribution of X0

This adds considerable weight to our interpretation of ψas a stochastic steady state For one of several well-known proofs, seeEDTC, theorem 4.3.18

The convergence in (2.8) is illustrated in the next figure

Here

• P is the stochastic matrix for recession and growthconsidered above

• The highest red dot is an arbitrarily chosen initial probability distribution ψ, represented as a vector inR3

• The other red dots are the distributions ψPtfor t=1, 2, . . .

• The black dot is ψ

The code for the figure can be found in the file examples/mc_convergence_plot.py in themain repository— you might like to try experimenting with different initial conditions

Ergodicity

Under the very same condition of uniform ergodicity, yet another important result obtains: If 1. {Xt}is a Markov chain with stochastic matrix P

2. P is uniformly ergodic with stationary distribution ψ then,∀j∈ S,

1 n

n t=1

1{Xt =j} →ψ[j] as n (2.9) Here

• 1{Xt =j} =1 if Xt= j and zero otherwise

• convergence is with probability one

• the result does not depend on the distribution (or value) of X0

The result tells us that the fraction of time the chain spends at state j converges to ψ[j]as time goes to infinity This gives us another way to interpret the stationary distribution — provided that the convergence result in (2.9) is valid

Technically, the convergence in (2.9) is a special case of a law of large numbers result for Markov chains — seeEDTC, section 4.3.4 for details

Example Recall our cross-sectional interpretation of the employment / unemployment model discussed above

Assume that α∈ (0, 1)and β∈ (0, 1), so the uniform ergodicity condition is satisfied We saw that the stationary distribution is(p, 1−p), where

p= β α+β

In the cross-sectional interpretation, this is the fraction of people unemployed

In view of our latest (ergodicity) result, it is also the fraction of time that a worker can expect to spend unemployed

Thus, in the long-run, cross-sectional averages for a population and time-series averages for a given person coincide

This is one interpretation of the notion of ergodicity

Forecasting Future Values

Let P be an n×n stochastic matrix with

Pij =P{xt+1 =ej|xt =ei} where eiis the i-th unit vector inRn.

We are said to be “in state i” when xt=ei Let ¯y be an n×1 vector and let yt = ¯y0xt In other words, yt = ¯yi if xt =ei

Here are some useful prediction formulas:

Premultiplication by(I−βP)1amounts to “applying the resolvent operator“

Exercises

Exercise 1 According to the discussionimmediately above, if a worker’s employment dynamics obey the stochastic matrix

P= Your exercise is to illustrate this convergence

First,

• generate one simulated time series{Xt}of length 10,000, starting at X0=0

• plot ¯Xn−p against n, where p is as defined above Second, repeat the first step, but this time taking X0 =1 In both cases, set α= β=0.1

The result should look something like the following — modulo randomness, of course (You don’t need to add the fancy touches to the graph—see the solution if you’re interested)

In document Quantitative Economics With Python (Page 190-200)