Homework set 4 - Solutions

Full text

(1)

Homework set 4 - Solutions

Math 495 – Renato Feres

Problems

R for continuous time Markov chains

The sequence of random variables of a Markov chain may represent the states of a random system recorded at a succession of time steps. For a full description of the process we often need to specify another sequence of random variables, T

0

< T

1

< · · · < T N , giving the random times at which the state transitions occur. A continuous time Markov chain with state space D consists of a sequence X

0

, X

1

, . . . , X N taking values in D together with a sequence T

0

, T

1

, . . . , T N in [0, ∞), where T i is the random time when the chain jumps to the state X i . A more precise definition will be given shortly.

It will be helpful to keep in mind the following diagram as we discuss some of the main ideas related to continuous time Markov chains, at least in the case of discrete D. The nodes (circles) stand for the elements of D. In this discrete case, they may be labeled by integers: D = {1,2,...,k}. The black dot (a token) indicates which state is presently occu- pied, and the arrows represent state transitions. The significance of the labels λ i j , called transition rates (they are not probabilities; they can take on any positive value, possibly greater than 1) will be explained below in the context of Theorem 0.1. They are numbers that encode information specifying both the transition probabilities and the waiting times between state transitions. The resulting process can be imagined as a kind of random walk of the token around the diagram. The walk starts at X

0

∈ D at time T

0

, jumping at time T i to state X i for each i = 1,2,....

Figure 1: Diagram representing a continuous time Markov chain system.

• A useful identity. The following general observation will be used in the proof of Theorem 0.1. Suppose that X and Y are random variables where Y is of the continuous type, having probability density function f Y (y). Then

P (X ∈ A) = Z

−∞

P (X ∈ A|Y = y)f Y (y) d y.

(2)

This can be derived as follows. First observe that P (X ∈ A) = E(1 A (X )), where 1 A (x) is the indicator function of the set A. (Recall that, by definition, 1 A (x) equals 1 if x ∈ A and 0 if x ∉ A.) This is clear since

E ( 1 A (X )) = 0 × P(1 A (X ) = 0) + 1 × P(1 A (X ) = 1) = P(1 A (X ) = 1) = P(X ∈ A).

On the other hand we know that E [X

2

] = E[E[X

2

|X

1

]] for any random variables X

1

, X

2

. Therefore,

P (X ∈ A) = E(1 A (X )) = E [E[1 A (X )|Y ]] = Z

−∞

E £

1 A (X )|Y = y ¤ f Y (y) d y = Z

−∞

P (X ∈ A|Y = y)f Y (y) d y.

The same argument shows, more generally, that given another random variable Z ,

P (X ∈ A|Z = z,Y = y) = Z

−∞

P (X ∈ A|Z = z)f Y (y) d y. (1)

An example of the use of this identity will appear in the proof of Theorem 0.1, below.

• Exponential random variables. Exponential random variables will be used as probabilistic models of waiting times between successive events in a random sequence. We recall here their main properties.

1. The exponential distribution. A random variable T of the continuous type is said to be exponential with parameter λ if its probability density function is

f (t ) = λe

−λt

for all t ≥ 0 and 0 elsewhere.

0 1 2 3 4 5

0.0 0.5 1.0 1.5 2.0

waiting time

probability density

lambda=0.5 lambda=1.0 lambda=2.0

Figure 2: Graphs of exponential densities for different values of the parameter λ. If T is an exponentially distributed

random waiting for some random event to happen, we will think of P (T ≤ t) = 1−e

−λt

as the probability that the event

will have happened by time t .

(3)

By simple integration it follows that

F T (t ) = 1 − e

−λt

, E (T ) = 1

λ , Var(T ) = 1 λ

2

.

The four functions in R associated with the exponential distribution (random numbers, p.d.f., c.d.f. and quantile) are rexp , dexp , pexp , and qexp .

The graph of Figure 2 was obtained with the following script:

x=seq(0,5,by=0.05) y0=dexp(x,0.5) y1=dexp(x,1.0) y2=dexp(x,2.0)

plot(x,y0,type=’l’,lty=1,ylim=range(c(0,2)),xlab=’waiting time’,ylab=’probability density’) lines(x,y1,lty=2)

lines(x,y2,lty=4)

legend(x=3.0,y=2, #place a legend at an appropriate place on the graph c(’lambda=0.5’,’lambda=1.0’,’lambda=2.0’), #legend text

lty=c(1,2,4),#define the line types (dashed, solid, etc.) bty=’n’)

grid()

2. The memoryless property. In all the examples discussed in this assignment, random events of some kind will happen in succession at random times. I will use the term waiting time to refer to the time difference between a random event and its successor. (I am using the term ‘event’ here in its ordinary sense, as some sort of discrete occurrence, and not in the technical sense as sets of the σ-algebra of a probability space.) Waiting times will be modeled using exponential random variables.

The property of exponential random variables that makes them natural mathematical models of a random waiting time is that they have the memoryless property. This means that, if T is exponentially distributed waiting time,

P (T > s + t|T > s) = P (T > t).

In words, if the random event with exponentially distributed waiting time has not happened by time s (T > s) and if the distribution of T has parameter λ, then your best prediction of how much longer to wait is still E [T ] = 1/λ. The underlying mechanism that causes the event to happen cannot have any sort of internal clock telling the time s already elapsed. Imagine, for example, that you and a friend regularly meet 3 hours past noon to study and that your friend is perfectly reliable (deterministic). Then you know at 2:00 PM that you will need to wait for him exactly one more hour. But if 3 hours is only the expected waiting time and your friend is perfectly unreliable (memoryless), then the time you have already waited for him does not affect your estimation of how much longer you’ll still have to wait, which is 3 hours. (A much better example is the decay process of radioactive atoms. The time of decay of an unstable atomic nucleus is, by quantum theory, exactly exponentially distributed. You may have seen elsewhere the concept of half-life, which is the median waiting time till decay of a radioactive atom.)

To see that the memoryless property holds for an exponential random variable T , set A = {T > t + s} and B = {T > s}, so A∩B = A. Therefore, P(A|B) = P(A∩B)/P(B) = P(A)/P(B) and the proof reduces to showing

P (T > t + s) = P(T > t)P(T > s).

(4)

Note that P (T > t) = 1 − P(T ≤ t) = 1 − F T (t ) = e

−λt

. Therefore, the memoryless property reduces to noting the relation e λ(t+s) = e λt e λs , which is, of course, a property of the exponential function.

It is useful to keep in mind this characterization of an exponential random variable: The probability that the random event has not happened by time t decreases exponentially as e

−λt

if the waiting time is exponential with parameter λ.

3. Sums of independent exponential random variables. Suppose you meet on campus at random and inde- pendently with each of 3 friends once every two hours on average. The respective waiting times T

1

, T

2

, T

3

of meeting each friend are assumed exponentially distributed. The waiting time to meet any one of the three friends is naturally M = min{T

1

, T

2

, T

3

}. What is the expected value of M ? It may be intuitively clear that the answer should be two thirds of an hour. This is indeed true due to property (a) in the following theorem.

Theorem 0.1 (Independent exponential random variables). Let T

1

, . . . , T k be independent, exponentially distributed random variables with parameters λ

1

, . . . , λ k , respectively. Let M = min{T

1

, . . . , T k }. Then

(a) M is exponentially distributed with parameter λ

1

+ · · · + λ k .

(b) M is independent of which T i is minimum. In other words, P (M > t|T i = M) = P (M > t ).

(c) The probability that T i is the minimum is P (T i = M) = λ i / ( λ

1

+ · · · + λ k ) Proof. Due to independence and noting that P (T i > t ) = 1 − F T

i

(t ) = e

−λi

t , we have

P (M > t) = P(T

1

> t , . . . , T k > t ) = P (T

1

> t ) · · · P (T k > t ) = e

−λ1

t · · · e

−λk

= e

−(λ1+···+λk)t

.

But this means that M is exponentially distributed with parameter λ

1

+ · · · + λ k , proving (a). For property (b) note:

P (M > t|T i = M) = P (T i > t |T i = M) = Z

0

P (T i > t |T i = s, M = s) f M (s) d s.

We have used here identity 1 given at the beginning of this tutorial. Now,

P (T i > t |T i = s, M = s) =

1 if s > t 0 if s ≤ t Therefore,

P (M > t|T i = M) = Z

t

f M (s) d s = P(M > t).

But this is the claim of part (b). Now for part (c), using again identity 1,

P (T i = M) = Z

0

P (T i = M|T i = t ) f T

i

(t ) d t = Z

0

P (T j ≥ t for all j 6= i |T i = t )λ i e

−λi

t d t . (2)

Using the assumption that the T

1

, . . . , T k are independent,

P (T j ≥ t for all j 6= i |T i = t ) = P (T j ≥ t for all j 6= i ) = Y

j 6=i

P (T j ≥ t ) = Y

j 6=i

e

−λj

t = e

−(λ1+···+λk)t +λi

t (3)

Putting together identities 2 and 3,

P (T i = M) = Z

0

λ i e

−λi

t e

−(λ1+···+λk)t +λi

t d t = Z

0

λ i e

−(λ1+···+λk)t

d t = λ i

λ

1

+ · · · + λ k

.

This is what we wanted to prove. ä

(5)

4. Interpretation of the theorem and the diagram. We are now ready to interpret diagram 1 shown above.

In diagram 3, below, I have simplified the first one by eliminating the nodes that are not connected to the one containing the token, labeled by i , as well as the arrows that are not directed from i to one of the other nodes. This is the part of the diagram needed for determining the next state transition.

Figure 3: Part of diagram 1 indicating the current state i (which contains the moving token) and the states to which the token can jump in the next step. Only the arrows issuing from i are shown.

The mechanism of state transition is as follows. Suppose that there are k arrows issuing from i , which I indicate by the pairs (i , 1), . . . , (i , k). Among these is the pair (i , j ) with the label λ i , j shown on the diagram.

Think of each arrow (i , j ) as a possible action, or event, happening at an exponentially distributed waiting time T i , j with parameter λ i , j . The times T i ,1 , . . . , T i ,k are assumed to be independent. Then, according to Theorem 0.1, the transition goes as follows.

(a) Generate a random number t ≥ 0 from an exponential distribution with parameter λ i = λ i 1 + · · · + λ i k .

This is the time when the next transition will happen. (Part (a) of the theorem.)

(b) Now, independently of the value obtained for the minimal time M = t, generate an integer between 1 and k with p.m.f.

p( j ) = λ i j

λ i 1 + · · · + λ i k

.

This integer is then the index of the node to which the token is moved and t is the new time. (Part (c) of the theorem.)

5. Example: random switching. The simplest example is defined by diagram 4.

Figure 4: Diagram representing an exponential random variable with parameter λ.

The set of states is {0, 1} and the system is initially in state 0. After an exponentially distributed waiting time with parameter λ, the state switches to 1 and remains there forever. Here the chain has only two steps:

X

0

= 0 and X

1

= 1. The only quantity of interest is the random time when the switch occurs. So, in a sense,

the diagram represents an exponential random variable with parameter λ.

(6)

6. Example: branching out. In this case the chain starts at X

0

= 0, jumps to X

1

∈ {1, 2, 3}, then stops there.

The information of interest here is the transition time T and the value of the state X

1

.

Figure 5: Switching to one of several possible states in one step.

According to Theorem 0.1, the transition time is exponentially distributed with parameter λ

1

2

3

, and the new state is taken from {1, 2, 3} with p.m.f.

p(i ) = λ i

λ

1

+ λ

2

+ λ

3

.

7. Example: reversible switching. This process is described in Figure 6. The chain is

X

0

= 0, X

1

= 1, X

2

= 0, X

3

= 1, . . .

The states alternate between 0 and 1 in a perfectly deterministic way. The quantities of interest are the random times of the back and forth state jumps: if the current state is 0, the system will next switch to 1 after an exponentially distributed waiting time with parameter λ; and if the current state is 1, the next switch to 0 will happen after an exponentially distributed waiting time with parameter µ.

Figure 6: A reversible switching process.

For a concrete example, suppose λ = 2 and µ = 1. A typical sample history of the process from time 0 to 10 can be obtained as follows. (The use of stepfun is a somewhat tricky. You should look at ?stepfun for the details. Note that the vector times has length one unit less that the vector states . )

#We generate a sample history of the process up to time tmax tmax=20

#The exponential parameters are:

lambda=2 mu=1

#Initialize the current time to 0 t=0

#The vector ’times’ will record the state transition times

(7)

times=matrix(0,1,1)

#The vector ’states’ will record the sequence of states,

#which may be either 0 or 1 states=matrix(0,1,1)

#The step index is n n=1

while (t<tmax) { if (states[n]==0) {

dt=rexp(1,lambda) states[n+1]=1} else { dt=rexp(1,mu)

states[n+1]=0 }

t=t+dt times[n]=t n=n+1 }

#We can visualize the history by plotting states

#against times. Note the use of the ’step function’ operation.

#Check ?stepfun for the details of how to use it.

F=stepfun(times,states,f=0)

plot(F,xlab=’Time’,xlim=range(c(0,tmax)),ylab=’State’,main=’A sample history’) grid()

0 5 10 15 20

0.0 0.2 0.4 0.6 0.8 1.0

A sample history

Time

State

Figure 7: A sample history of the Markov chain defined by the diagram of Figure 6. Each circle indicates the value of the state (0 or 1) over the time interval to the right of the circle.

8. Example: Poisson processes. The Poisson process is introduced in Section 3.2 of the textbook. It can be

defined by the diagram of Figure 8. The token is initially in the circle node representing state X

0

= 0; if at

(8)

any given time t the token is in state n, it will jump to state n + 1 at time t + T , where T is an exponentially distributed waiting time with parameter λ. Let N(t) denote the state of the process at time t. This is a piecewise constant discontinuous function of t that counts the total number of transitions up to time t .

Figure 8: Diagram defining a Poission process with rate parameter λ.

The following graph shows one sample history of a Poisson process with rate λ = 1 (per minute, say) over 10 minutes.

0 2 4 6 8 10

0 2 4 6 8 10 12

Sample history of a Poisson process

Time

State

Figure 9: A sample history of the Poisson process with λ = 1 over the time interval [0,10].

The graph was generated by the following R script.

lambda=1

#Initialize the current time to 0 t=0

#The vector ’times’ will record the state transition times times=matrix(0,1,1)

#The vector ’states’ will record the sequence of states

#from the set {0, 1, 2, ...}

states=matrix(0,1,1)

#The step index is n n=1

while (t<tmax) {

dt=rexp(1,lambda)

(9)

t=t+dt times[n]=t

states[n+1]=n #The ’states’ vector is one unit longer than ’times’

n=n+1 }

#We now plot the sample history.

F=stepfun(times,states,f=0)

plot(F,xlab=’Time’,xlim=range(c(0,tmax)),ylab=’State’, main=’Sample history of a Poisson process’) grid()

The following definition gives an alternative way to introduce the Poisson process.

Definition 0.2. A Poisson process with rate λ > 0 is a random process N(t) ∈ {0,1,2,...}, which we interpret as the number of random events occurring between times 0 and t , such that

(a) The process starts at 0. That is, N (0) = 0.

(b) The numbers of events occurring over disjoint time intervals are independent. That is, if (a, b] and (c, d ] are disjoint time intervals then N (b) − N (a) and N (d) − N (c) are independent random variables.

(c) The process is time homogeneous. This means that N (b) − N (a) and N (b + s) − N (a + s), which are the numbers of events over an interval (a, b] and over this interval’s time translate (a + s,b + s], have the same probability distribution for all s ≥ 0.

(d) The probability of transition from N (0) = 0 to N (h) = 1 for a small time h > 0 is, up to first order in h, given by λh. More precisely,

lim

h→0

P (N (h) = 1)

h = λ.

(e) The probability of more than one transition over a small time interval is 0 up to first order in the length of the interval. That is,

h→0 lim

P (N (h) ≥ 2)

h = 0.

From this definition, we can recover the characterization of the Poisson process in terms of the diagram.

This is shown in the next theorem. (The details of the proof of Theorem 0.3 are not needed for solving the homework problems. It is OK to simply skim through it, at least for now. I hope to return to some of this in class when covering chapter 3.)

Theorem 0.3. The Poisson process N (t ) with rate λ > 0 satisfies the following properties:

(a) For each non-negative integer n

P (N (t ) = n) = ( λt) n e

−λt

n!

(b) Let T

1

< T

2

< . . . be the state transition times of N (t ) and D j = T j −T j −1 for j = 1,2,... the waiting times between transitions. Then D

1

, D

2

, . . . are independent, exponentially distributed random variables with parameter λ.

Proof. We begin by observing that, by properties (a) and (b),

P (N (s + t) = n|N (s) = m) = P(N (s + t) − N (s) = n − m|N (s) − N (0) = m) = P(N (s + t) − N (s) = n − m).

(10)

By property (c),

P (N (s + t) = n|N (s) = m) = P(N (t) − N (0) = n − m) = P(N (t) = n − m).

Defining p i (t ) = P(N (t) = i ) we obtain

p n (t + s) = P(N (s + t) = n) =

n

X

m=0

P (N (s + t) = n|N (s) = m)P(N (s) = m) =

n

X

m=0

p n−m (t )p m (s).

Note: if we define a matrix P (t ) = ¡p i j (t )¢ whose elements are p i j (t ) = p j −i (t ), the expression just proved can be written in matrix form as

P (t + s) = P(t)P(s).

Note that P (t )P (s) = P(t + s) = P(s + t) = P(s)P(t). Properties (a), (d), and (e) imply that P(t) converges toward the identity matrix as t approaches 0. That is,

p mn (t ) = P(N (t) = n − m) →

1 if m = n 0 if m 6= n

as t → 0. Let I denote the identity matrix. (Since the number of states is infinite, P(t) is an infinite matrix.

This, however, does not create any essential difficulties.) We have shown P (s) → I as s converges towards 0. The matrix-valued function P (t ) can then be shown to be continuous:

P (t + s) − P(t) = P(t)P(s) − P(t) = P(t)[P(s) − I ] → 0, therefore P(t + s) → P(t)

as s converges towards 0.

We can also show that P (t ) is a differentiable function of t . First observe the following consequence of properties (d) and (e):

h→0 lim

p mn (h) − p mn (0)

h =

 

 

 

 

0 if n ≥ m + 2

lim h→0 p

1

h

(h)

= λ if n = m + 1

lim h→0 p

0(h)−1

h = −lim h→0 P (N (h)≥1)

h = −λ if n = m Therefore, p mn

0

(0) = λ mn where

λ mn =

 

 

 

 

 

 

0 if n ≥ m + 2 λ if n = m + 1

−λ if n = m 0 if n < m We conclude that

P

0

(t ) = lim

h→0

P (t + h) − P(t)

h = lim

h→0

P (h)P (t ) − P(t)

h = lim

h→0

(P (h) − I )P(t)

h = ΛP (t )

where Λ = (λ mn ). The matrix differential equation P

0

(t ) = ΛP(t) can be written out explicitly as a system of ordinary differential equations in the entries of P (t ). Each equation has the form p

0

i (t ) = P

j =1 λ i j p j (t ).

(11)

As most entries of Λ are zero, the above sum only has finitely many terms for each k. Explicitly,

p

00

(t ) = −λp

0

(t )

p

10

(t ) = −λp

1

(t ) + λp

0

(t ) p

20

(t ) = −λp

2

(t ) + λp

1

(t )

· · · · · ·

p

0

n (t ) = −λp n (t ) + λp n−1 (t )

· · · · · ·

It is now a simple induction argument to check that p n (t ) =

(

λt) n!

n

e

−λt

is a solution of the system of ordinary differential equations with the correct initial condition. We can finally appeal to the uniqueness property of solutions of systems of differential equations to conclude that part (a) of the theorem holds.

It remains to show part (b): the waiting times D

1

, D

2

, . . . are independent and exponentially distributed with parameter λ. Note the equivalences of events (keep in mind that N(T n ) = n):

N (T n−1 + t ) − N (T n−1 ) = 0 ⇔ N (T n−1 + t ) = n − 1 ⇔ T n−1 + t < T n ⇔ D n > t .

Therefore,

P (D n > t ) = P (N (T n−1 + t ) − N (T n−1 ) = 0) = P(N (t) = 0) = p

0

(t ) = e

−λt

.

This means that the D n are all exponentially distributed with parameter λ. To prove independence, let m < n and suppose that D m = s. Then T m−1 +s = T m ≤ T n−1 . So the intervals (T m−1 , T m−1 +s] and (T n−1 , ∞) are disjoint. Thus for all t ≥ 0, we have by property (b) that the random variables N (T n−1 + t ) − N (T n−1 ) and N (T m−1 + s) − N (T m−1 ) are independent. Independence of D m and D n now results from the obser- vation that (1) the event D n > t is the same as N (T n−1 + t ) − N (T n−1 ) = 0 and (2) D m = s is the same as N (T m−1 + s) − N (T m−1 ) = 1 and N (T m−1 + s

0

) − N (T m−1 ) = 0 for all s

0

< s. ä

9. General continuous type Markov chains. Arguments used in the proof of Theorem 0.3 also prove a much more general result. Going back to the diagram of Figure 1, let λ i j be the rate constant for the arrow connecting state i to state j . Define λ i i = − P

j λ i j and the n-by-n matrix Λ = (λ i j ). Note that the elements of each row of Λ add up to 0. For the Poisson process this matrix is

Λ =

−λ λ 0 0 · · ·

0 −λ λ 0 · · ·

0 0 −λ λ ···

.. . .. . .. . .. .

Also define the matrix-valued function P (t ) = (p i j (t )) where each element p i j (t ) gives the probability that the process started at time 0 in state i will be in state j at time t . Then it can be shown that

(a) P (0) = I , where I is the identity matrix;

(b) P (t + s) = P(t)P(s) for all non-negative t and s;

(c) P

0

(t ) = ΛP(t) = P(t)Λ, where P

0

(t ) is the derivative of P (t ) in t .

If you have taken a course in ordinary differential equations or matrix algebra, you may have learned that

(12)

these conditions characterize the matrix exponential:

P (t ) = e

Λt

.

In the proof of Theorem 0.3 we have effectively computed a matrix exponential by solving a system of differential equations. The resulting matrix was in that case

e

Λt

=

e

−λt

λte

−λt (

λt)

22!

e

−λt

· · · 0 e

−λt

λte

−λt

· · ·

0 0 e

−λt

· · ·

.. . .. . .. .

Methods for finding matrix exponentials are studied in matrix algebra courses. I simply note here that the Taylor series expansion

P (t ) = I + Λt + ··· + Λ n t n n! + · · ·

makes sense for matrices in general, and indeed holds true for finite matrices. It is possible to prove this fact for infinitely many states in many cases.

10. The Poisson distribution. A random variable Y is said to have the Poisson distribution with parameter λ if Y = N (1), where N (t) is the Poisson process with rate parameter λ discussed above. From Theorem 0.3 it follows that Y has probability mass function supported on the non-negative integers 0, 1, 2, . . . and given by the function

p(x) = λ x e

−λ

x!

Therefore, a Poisson random variable gives the random number of transition events occurring in the time interval [0, 1] for the process described by the diagram of Figure 8.

The R functions associated to Poisson random variables are dpois(x, lambda)

ppois(q, lambda) qpois(p, lambda) rpois(n, lambda)

More generally, Theorem 0.3 says that the random variable N (t ), t > 0, is also supported on the set of non-negative integers, and has p.m.f.

P (N (t ) = x) = ( λt) x e

−λt

x! .

11. The gamma distribution. Recall that T n is the time of the nth transition event for the process represented by the diagram of Figure 8. From Theorem 0.3 we obtain the cumulative distribution function of T n :

F T

n

(t ) = P (T n ≤ t ) = P (N (t ) ≥ n) = X

j =n

p j (t ) = X

j =n

( λt) j e

−λt

j ! .

The probability density function of T n is obtained by taking the derivative of F T

n

(t ) with respect to t . In

(13)

problem 1, below, you will show that the p.d.f. is

f T

n

(t ) = λe

−λt

( λt) n−1

(n − 1)! . (4)

Definition 0.4 (The Gamma distribution). A continuous random variable X has the gamma distribution with parameters α,β (or X has the Γ(α,β) distribution) for α > 0,β > 0, if the probability density function of X is

f X (x) = x α−1 e

−x/β

Γ(α)β α

supported on the interval (0, ∞). The parameter α is called the shape, and β the rate of the gamma distribu- tion; 1/ β is called the scale.

Recall that the gamma function Γ(α) equals (α−1)! if α is a positive integer. For values of α > 0 that are not necessarily integer, the gamma function is defined by

Γ(α) = Z

0

y α−1 e

−y

d y.

It follows from the above discussion that T n is a gamma random variable with a Γ(n,1/λ) distribution. This shows the close relationship among the exponential, Poisson, and gamma distributions.

dgamma(x, shape=a, scale = b) pgamma(q, shape=a, scale = b) qgamma(p, shape=a, scale = b) rgamma(n, shape=a, scale = b)

12. Example: birth-and-death processes. A widely studied Markov chain goes by the name birth-and-death chain. It may be defined by the following diagram.

Figure 10: The continuous time birth-and-death Markov chain, with birth rate λ and death rate µ.

This chain can serve as a crude probabilistic model of a queueing system, in which new customers join the queue at a rate λ and are served at a rate µ. The random process N(t), giving the number of customers in line at time t , has time dependent p.m.f. p n (t ) = P(N (t) = n) that can be computed using matrix exponen- tiation. Writing P (t ) = (p i j (t )), where p i j (t ) = p j −i (t ) we have, as indicated in the remark about general continuous time Markov chain, that P (t ) = e

Λt

. Here Λ is the matrix

Λ =

−λ λ 0 0 0 . . .

µ −(µ + λ) λ 0 0 . . .

0 µ −(µ + λ) λ 0 ...

.. . .. . .. . .. . .. .

You will simulate this chain in one of the below problems.

(14)

• A random service line. In an idealized waiting line (or queue), the time for a new person to join the line is exponentially distributed with parameter λ. The service time for the person at the front of the line is exponential with parameter µ. (I assume that µ and λ are measured in reciprocal of minute.) We wish to simulate the process for 30 minutes with λ = 1 and µ = 1 and graph the length of the line as a function of time.

0 5 10 15 20 25 30

0 1 2 3 4 5 6 7

Sample history of a birth and death process

Time

State

Figure 11: Sample history of the birth-and-death process of problem 3.

It was produced by the following script.

#Birth and death chain

#We generate a sample history of the process up to time tmax tmax=30

#The parameters are:

lambda=1 mu=1

#Initialize the current time to 0 t=0

#The vector ’times’ will record the state transition times times=matrix(0,1,1)

#The vector ’states’ will record the sequence of states

#from the set {0, 1, 2, ...}

states=matrix(0,1,1)

#I will let the first state be 0.

#The step index is n n=1

while (t<tmax) {

if (states[n]==0) {

(15)

dt=rexp(1,lambda) states[n+1]=1 }else{

dt=rexp(1,lambda+mu) s=states[n]

s=sample(c(s-1,s+1),1,prob=c(mu/(mu+lambda),lambda/(mu+lambda))) states[n+1]=s #The ’states’ vector is one unit longer than ’times’

} t=t+dt times[n]=t n=n+1 }

#We now plot the sample history.

F=stepfun(times,states,f=0)

plot(F,xlab=’Time’,xlim=range(c(0,tmax)),ylab=’State’, main=’Sample history of a birth and death process’) grid()

• Growth in a cell culture. This is an example of a branching process. We wish to simulate the growth in the number of cells by the following algorithm. When a cell is born, draw sample exponential random times T b and T d with rates λ and µ, respectively. If T b < T d , then the simulated cell divides at T b into two new cells (and T d is discarded.) If T b < T b , then the cell dies (and T b is discarded.) We simulate this process for 20 minutes starting from a single cell with µ = 1 (per minute) and

1. λ = 1.00 per minute.

2. λ = 1.05 per minute.

3. λ = 1.10 per minute.

Observe that if there are n cells presently in the culture, the rate at which a new division or death happens is n λ and n µ, respectively. This is due to Theorem 0.1.

Figure 12: Diagram for problem

The R-function NumberCells(tmax,lambda,mu,k) (see below) gives the number of cells at the end of tmax (here k is the initial number of cells).

Prior to showing the program, think about the following questions:

1. If the current number of cells is n, what is the probability distribution of the waiting time till the next event (birth or division)?

2. Let p b and p d be the probabilities that the next event is a birth (cell division) or a death. What are the

values of p b and p d in terms of λ and µ?

(16)

Note: If the number of cells at the end of the 20 minute period is recorded in a vector Cells of length N = 10

4

, then the mean and standard deviation of the data are given in R by mean(Cells) and sd(Cells) . If you have some problem with the sd function, try this: sd(as.numeric(Cells)) .

The function NumberCells can be as follows.

NumberCells=function(tmax,lambda,mu,k,plotgraph) {

#tmax is the final time

#lambda is the division rate

#mu is the death rate

#k is the initial number of cells

#plotgraph is 1 if a sample history of the process is to

#be plotted, and 0 if not.

#Initialize the current time to 0 t=0

#The vector ’times’ will record the state transition times times=matrix(0,1,1)

#The vector ’states’ will record the sequence of states

#from the set {0, 1, 2, ...}

states=matrix(0,1,1)

#Initial state:

states[1]=k

#The step index is n n=1

while (t<tmax) { if (states[n]==0) {

dt=tmax states[n+1]=0 }else{

s=states[n]

#The next event happens at a exponential waiting time with rate rate=s*(lambda+mu)

dt=rexp(1,rate)

#Probability that the next event is a division (birth) pb=lambda/(lambda+mu)

#Probability that the next event is a death pd=mu/(lambda+mu)

s=sample(c(s-1,s+1),1,prob=c(pd,pb))

states[n+1]=s #The ’states’ vector is one unit longer than ’times’

} t=t+dt times[n]=t n=n+1 }

#The final number of cells is

if (plotgraph==1) {

(17)

#We now plot the sample history.

F=stepfun(times,states,f=0)

plot(F,xlab=’Time’,xlim=range(c(0,tmax)),ylab=’Number of cells’, main=’Sample history of population growth process’)

grid() }

states[n]

}

We can now perform the experiments asked for in the problem.

N=10^4 tmax=20 lambda=1.00 mu=1

k=1

plotgraph=0

Cells=matrix(0,1,N) for (i in 1:N){

Cells[i]=NumberCells(tmax,lambda,mu,k,plotgraph) }

mean(as.numeric(Cells)) sd(as.numeric(Cells))

#########################

lambda=1

> mean(as.numeric(Cells)) [1] 0.9485

> sd(as.numeric(Cells)) [1] 5.944761

#########################

lambda=1.05

> mean(as.numeric(Cells)) [1] 2.4901

> sd(as.numeric(Cells)) [1] 13.21272

#########################

lambda=1.1

> mean(as.numeric(Cells)) [1] 7.7333

> sd(as.numeric(Cells)) [1] 32.93234

Each sample history can end in extinction (number of cells equal to 0) or not. Here is a typical looking graph

for the number of cells when extinction does not occur, for λ = 1.3. Note that the final number of cells can vary

drastically since the standard deviation can be very large.

(18)

0 5 10 15 20

0 200 400 600 800

Sample history of population growth process

Time

Number of cells

Figure 13: Sample history of the cell culture process with λ = 1.3 and µ = 1.

Problems

1. (Text, Exercise 2.3, page 58.) Consider the Markov chain with state space S = {0,1,2,...} and transition probabil- ities

p(x, x + 1) = 2/3; p(x,0) = 1/3.

Show that the chain is positive recurrent and give the limiting probability π.

Solution. The chain is clearly irreducible. To show that the chain is positive recurrence it is sufficient to show that it admits an invariant probability measure. The equation for an invariant measure is

π(x) = X

x∈S

π(y)p(y,x).

We obtain the equations:

π(x) = 2

3 π(x − 1) = ··· = µ 2 3

n

π(0) for x ≥ 1 and π(0) = 1 3

X

x=0

π(x) = 1 3 . Then

π(x) = 1 3

µ 2 3

n

for n ≥ 0 .

2. (Text, Exercise 2.13, page 60.) Consider a population of animals with the following rule for (asexual) reproduc-

tion: an individual that is born has probability q of surviving long enough to produce offspring. If the individual

(19)

does produce offspring, she produces one or two offspring, each with equal probability. After this the individual no longer reproduces and eventually dies. Suppose the population starts with four individuals.

(a) For which values of q is it guaranteed that the population will eventually die out?

(b) If q = 0.9, what is the probability that the population survives forever?

Solution. (a) Let Y be the random variable giving the number of offspring of one individual. It is given that

P(Y = k) =

 

 

 

 

1 − q if k = 0 q/2 if k = 1 q/2 if k = 2.

The expected value of Y is

µ := E(Y ) = 0(1 − q) + 1q/2 + 2q/2 = 3q/2.

We know that if µ < 1 then the population will eventually die out with probability 1. This corresponds to q < 2/3 .

(b) Let φ(s) be the generating function of Y . This is the function

φ(s) = X

k=0

p k s k = (1 − q) + (q/2)s + (q/2)s

2

.

The extinction probability, defined by

a = P

1

(population eventually dies out),

is the smallest positive root of φ(s) = s. Now

s = (1 − q) + (q/2)(s + s

2

)

Equivalently, a is the smallest positive root of

s

2

2 − q

q s + 2(1 − q) q = 0.

When q = 0.9, the equation becomes

s

2

− 11 9 s + 2

9 = 0.

This equation has roots s = 1,2/9. Therefore, the extinction probability is a = 2/9. The extinction probability for initial population size X

0

= 4 is a

4

= (2/9)

4

.and the probability of not going extinct is

probability of population surviving forever if q = 0.9 is 1 − (2/9)

4

≈ 0.9976.

3. (Text, Exercise 3.1, page 82.) Suppose that the number of calls per hour arriving at an answering service follows a Poisson process with λ = 4.

(a) What is the probability that fewer than two calls come in the first hour?

(20)

(b) Suppose that six calls arrive in the first hour. What is the probability that at least two calls will arrive in the second hour?

(c) The person answering the phones waits until fifteen phone calls have arrived before going to lunch. What is the expected amount of time that the person will wait?

(d) Suppose it is known that exactly eight calls arrived in the first two hours. What is the probability that exactly five of them arrived in the first hour?

(e) Suppose it is known that exactly k calls arrived in the first four hours. What is the probability that exactly j of them arrived in the first hour?

Solution. (a) We know that the probability of k calls by time t is

P(X t = k) = e

−λt

( λt) k k! . The probability of fewer than 2 calls in the first hour is

P(X

1

= 0) + P(X

2

= 1) = e

−4

+ 4e

−4

= 5/e

4

≈ 0.092 .

(b) From the result of (a), the probability of at least 2 calls in the first hour is 1 − 5/e

4

≈ 0.908. Because of the independence of X

2

− X

1

and X

1

− X

0

, the probability that at least two calls arrive in the second hour is the same as in the first hour. So this probability is approximately 0.908 .

(c) Let T

1

, . . . , T

15

be the times between consecutive calls. We know that these are independent exponentially distributed random variables with parameter λ = 4. Thus the expected time is

E(T

1

+ · · · + T

15

) = 15E(T ) = 15/λ = 15/4.

Therefore, the expected time is 15/4 hours .

(d) We want to find the probability P(X

1

= 5|X

2

= 8). The definition of conditional probability implies that

P(X

1

= 5|X

2

= 8) = P(X

1

= 5, X

2

= 8)

P(X

2

= 8) = P(X

2

= 8|X

1

= 5) P(X

1

= 5) P(X

2

= 8) . This gives

P(X

1

= 5|X

2

= 8) = P(X

1

= 3)P(X

1

= 5) P(X

2

= 8) =

(4×1)3

3!

e

−4×1 (4×1)5! 5

e

−4×1

(4×2)8

8!

e

−4×2

=

à 8 5

! 1 2

8

= 7

32 . Therefore,

P(X

1

= 5|X

2

= 8) = 7 32 .

(e) We now want the conditional probability P(X

1

= j |X

4

= k) for k ≥ j . The argument is the same as in (d). The result is

P(X

1

= j |X

4

= k) = Ã k

j

! µ 3 4

¶ µ 1 3

j

.

4. Do a stochastic simulation of the Markov chain of exercise 3.5, page 84 of the textbook: X t is the Markov chain

with state space {1, 2} and rates α(1,2) = 1, α(2,1) = 4. Then

(21)

(a) Plot the graph of a sample history of the process over the time interval [0, 20].

(b) Give the mean (sample) waiting time.

(c) What should be the exact value of the mean waiting time for a very long sample history? (Check whether your experimentally obtained result is close to the exact value. For better precision, you may take a longer time interval, say [0, 500].)

Solution. (a) The graph of the sample history over the time interval from 0 to 20 is shown below:

0 5 10 15 20

0.0 0.2 0.4 0.6 0.8 1.0

A sample history

Time

State

Figure 14: Sample history of the process of problem 4.

This graph can be obtained using the program

#We generate a sample history of the process up to time tmax tmax=20

#The exponential parameters are:

lambda=1 mu=4

#Initialize the current time to 0 t=0

#The vector ’times’ will record the state transition times times=matrix(0,1,1)

#The vector ’states’ will record the sequence of states,

#which may be either 0 or 1 states=matrix(0,1,1)

#The step index is n n=1

while (t<tmax) { if (states[n]==0) {

dt=rexp(1,lambda)

(22)

states[n+1]=1} else { dt=rexp(1,mu) states[n+1]=0 }

t=t+dt times[n]=t n=n+1 }

#We can visualize the history by plotting states

#against times. Note the use of the ’step function’ operation.

#Check ?stepfun for the details of how to use it.

F=stepfun(times,states,f=0)

plot(F,xlab=’Time’,xlim=range(c(0,tmax)),ylab=’State’,main=’A sample history’) grid()

(b) To obtain the mean waiting time, we may do as follows: run the previous program with a larger value of tmax , say tmax=500 . Then

n=length(times)

times_shift=c(0,times[1:n-1]) difference=times-times_shift mean(difference)

gives the mean time. I obtained the value 0.621 .

(c) The exact value can be obtained as follows. The mean time between transitions from 1 to 2 is 1 and from 2 to 1 is 1/4. Because these states alternate, the mean one-step time is the average (1 + 1/4)/2 = 0.625 . So the value obtained in (b) is reasonable.

Figure

Updating...

References

Updating...

Related subjects :