1.4 Thesis overview
2.1.3 Absorbing CTMCs
For any J ĎS, if Pijptq ‰0 for some tě0 implies j PJ for all iPJ then we say the set J is closed.
Definition 36 (Absorbing state).
If for all j‰i, i, jPS, Pijptq “0 for all tě0 then we say that state iis absorbing. Definition 37 (Absorbing set).
We call the collection A of all absorbing states of S the absorbing set, and A is necessarily closed.
Definition 38 (Absorbing CTMC).
We call a CTMC tXptq :t ě0u absorbing if every state is either absorbing or tran- sient.
Not all CTMCs with absorbing states fit our definition of an ‘absorbing CTMC’. The results which follow do not necessarily hold for such processes.
Definition 39 (Transient set).
For an absorbing CTMC with state space S and absorbing set A we call the set of transient statesS˚
“SzAthe transient set. We say that the transient set is irreducible if it is a communicating class.
For the following discussion, suppose that an absorbing CTMC tXptq : t ě 0u has state space S, absorbing set A, transient set S˚ and generator matrix Q “ rq
ijs.
Denote thejth (arbitrarily ordered) absorbing state by a
j, and letvj “ rvijsiPS˚ be a
vector such thatvij “qiaj for alliPS
˚ for eachaj
PA. Further, letV“ rvijsiPS,ajPA
30 Markov processes
Definition 40 (Subgenerator matrix).
We define the subgenerator matrix by Q˚“ rq
ijsi,jPS˚.
The canonical form of the generator matrixQ is given by the block matrix form Q“ « Q˚ V O O ff . (2.26)
In the case of a single absorbing state, V is a vector, and we writeV“v so that the canonical form of the generator is given by
Q“ « Q˚ v 0 0 ff . (2.27)
The following proposition and theorem are proved by Theorem 2.4.3 in Latouche and Ramaswami [72].
Proposition 9.
The subgenerator matrix Q˚ of an absorbing CTMC is invertible.
Theorem 8.
Given an absorbing CTMC,limtÑ8PpXptq “iq “0for alliPS˚. That is, absorption into some jPA occurs eventually with probability equal to 1.
Definition 41 (Time-to-absorption).
We call random variable T “mintt:Xptq “i, iPAu the time-to-absorption. Definition 42 (Phase-type distribution).
Consider an absorbing CTMC tXptq :tě0u and initial distribution α “ rαis, where
αi “ PpXp0q “ iq for all i P S. The phase-type distribution is the distribution of
time-to-absorption of such a process, and is parameterized by the subgenerator matrix Q˚ together with initial distribution α; we write T
„ P HpQ˚, αq to denote such distribution.
Most often, absorbing CTMCs are considered with only one absorbing state, and the phase-type distribution is usually defined as such. Throughout this thesis we will be particularly interested in absorbing CTMCs with multiple absorbing states. Therefore, we give a corresponding treatment of the phase-type distribution.
A proof of the following theorem follows from a very slight modification (to account for multiple absorbing states) of Theorem 2.4.1 in Latouche and Ramaswami [72].
Markov processes 31
Theorem 9.
The phase-type distribution P HpQ˚, αq has cumulative distribution function
Fptq “PpT ătq “1´αeQ˚t1. (2.28)
Further, P HpQ˚, αq has probability density function
fptq “F1ptq “ ´αeQ˚t
Q˚1
“αeQ˚tV1. (2.29)
The two alternative forms are due to the property Q1 “ 0 of the generator of the Markov chain, which gives
Q˚1
`V1“0,
Q˚1
“ ´V1. (2.30)
In the case of a single absorbing state V1 is replaced throughout by v. Definition 43 (Hazard function).
The hazard functionλiptq given that the process starts in stateiPS˚ is defined for all
tě0 as, λiptq “ lim hÑ0` PptăT ăt`h | T ąt, Xp0q “iq h “ fiptq 1´Fiptq, (2.31)
where fiptq is the probability density of absorption occurring at time t given that the process starts in state i, and Fiptq is the corresponding cumulative distribution func- tion.
The hazard function can be interpreted as the conditional (on not having been ab- sorbed before time t) expected exponential rate of absorption at timet.
Definition 44 (Survival function).
The survival function Siptq given that the process starts in state i is defined for all
tě0 as,
Siptq “1´Fiptq. (2.32)
The survival function can be interpreted as the probability that the process has not been absorbed by timet.
The next theorem follows from Theorem 9 (Theorem 2.4.1 in [72]), and the fact that
T „P HpQ˚, e
32 Markov processes
Theorem 10.
For all iPS˚ and all tě0, we have
λiptq “ eie Q˚t V1 eieQ˚t 1 , (2.33) and Siptq “eieQ˚t1. (2.34) Definition 45 (Time-to-absorption intoj,J).
For each statejPA, we define the time-to-absorption intoj byTj “mintt:Xptq “ju
given that such t exists, and Tj “ 8 otherwise.
The definition for time-to-absorbption into a set J Ď A is analogous, with TJ “
mintt:Xptq PJu, given sucht exists, and TJ “ 8 otherwise. Definition 46 (Cause-specific hazard function).
Suppose that ||A|| ą1 (i.e. suppose that process tXptqu is associated with more than one absorbing state). We define the cause-specific hazard function associated with state (cause) j PA given the process starts in state ias
λijptq “ lim hÑ0` PptăT ăt`h, XpTq “j | T ąt, Xp0q “iq h “ fijptq 1´Fiptq, (2.35)
where fijptq the probability density associated with Tj and Fiptq is the cumulative
distribution function associated with T, each given that the process starts in state i. The definition for the cause-specific hazard rate associated with a set J ĎA is anal- ogous, λiJptq “ lim hÑ0` PptăT ăt`h, XpTq PJ |T ąt, Xp0q “iq h ““ fiJptq 1´Fiptq, (2.36)
where fiJ is the probability density function associated withTJ given that the process starts in state i.
The cause-specific hazard function is interpreted as the contribution to the hazard function associated with state j conditional on not having been absorbed into any
kPAbefore timet. In Chapter 3 we introduce a modified-cause-specific hazard rate, conditional only on not having been absorbed into some subset ofA, which is relevant to the evolution of gene duplicates.
By the analysis of the Markov chain, applying Proposition 4 we have
fijptq “ ” eieQ ˚t V ı j “eieQ˚tvj, (2.37)
Markov processes 33
where vj is the jth column of V. Together with Theorem 9 (Theorem 2.4.1 in [72]), we have the following proposition.
Proposition 10.
For all iPS˚ and jPA
λijptq “
eieQ˚tvj eieQ˚t
1. (2.38)
The following proposition follows from the fact that eventstXptq “iuandtXptq “ju
are disjoint for i‰j. Proposition 11.
For all iPS˚, J ĎA, we have
fiJptq “ ÿ jPJ fijptq, λiJptq “ ÿ jPJ λijptq, fiptq “ ÿ jPA fijptq, λiptq “ ÿ jPA λijptq. (2.39)
Definition 47 (Probability given initial distribution).
We denote the probability of an event A given initial distribution α0 by Pα0pAq.
We denote the probability of an eventA conditional on event B given initial distribu- tion distribution α0 by Pα0pA | Bq.
The following proposition follows immediately from the law of total probability Proposition 12.
For any distribution α0 and event A,
Pα0pAq “
ÿ
iPS
α0iPpA|Xp0q “iq. (2.40)
The next proposition establishes thatPα0p.qis analogous toPp.qin terms of conditional probabilities.
Proposition 13.
For any distribution α0 and events A, B Pα0pA |Bq “
Pα0pA, Bq
34 Markov processes
Proof.
Applying the law of total probability (and noting that B and tXp0q “ iu are not necessarily independent) we have
Pα0pA|Bq “ ÿ iPS PpA |B, Xp0q “iqPpXp0q “i|Bq “ÿ iPS PpA, B, Xp0q “iqPpXp0q “i, Bq PpB, Xp0q “iqPpBq “ ř iPSPpA, B, Xp0q “iq PpBq “ ř iPSPpA, B |Xp0q “iqPpXp0q “iq PpBq (2.42)
Applying the law of total probability to the denominator we have
Pα0pA |Bq “ ř iPSPpA, B |Xp0q “iqPpXp0q “iq ř iPSPpB |Xp0q “iqPpXp0q “iq “ ř iPSα0iPpA, B |Xp0q “iq ř iPSα0iPpB |Xp0q “iq (2.43) Finally, applying Proposition 12 we have
Pα0pA |Bq “ Pα0pA, Bq
Pα0pBq . (2.44)
The following proposition combines Proposition 4 with the observations from [72; 87] (e.g. the first equation in the proof of Theorem 2.4.1 in [72] is enough).
Proposition 14 (Distribution over transient states).
For an absorbing CTMC, the distribution over the transient statesP˚ptq “ rP
ijptqsi,jPS˚
is given by, for all tě0,
P˚ptq “eQ˚t. (2.45)
Definition 48 (Conditional transient distribution).
We definepiptqto be the distribution over the transient states conditional on the process not having been absorbed, and given start in state iPS˚. That is,p
iptq “ rpijptqsjPS˚
where for all jPS˚ for each iPS˚,
pijptq “PpXptq “j | Xptq RA, Xp0q “iq. (2.46)
When the initial distribution in the transient states is α0 “ rα0jsjPS˚, we denote the
analogous distribution by pα
0
ptq. That is pα
0
ptq “ rpα0jptqs where, for allj PS˚,
Markov processes 35
The following proposition is stated without demonstration in [31]. We provide a proof here.
Proposition 15.
For all tě0, and any initial distribution in the transient states α0,
pα 0 ptq “ α0e Q˚t α0eQ˚t 1. (2.48) Proof.
For eachjPS˚, and alltě0, we have
pα0jptq “Pα0pXptq “j |Xptq RAq. (2.49) Applying Proposition 13 we have
pα0jptq “ Pα0pXptq “j, Xptq RAq Pα0pXptq RAq “ Pα0pXptq “jq Pα0pXptq RAq , (2.50)
since the events tpXptq “juand tpXptq “jq X pXptq RAquare equivalent. Now, applying Proposition 12, and noting that PpXptq P S˚
| Xp0q R S˚q “ 0, we have pα0jptq “ ř iPS˚α0iPpXptq “j |Xp0q “iq ř iPS˚α0iPpXptq RA|Xp0q “iq “ ř iPS˚α0iPijptq ř iPS˚α0iPpT ąt |Xp0q “iq . (2.51)
Now applying Proposition 14 and Theorem 9 we have
pα0jptq “ ř iPS˚α0ieieQ ˚t ej ř iPS˚α0jeieQ˚t1 “ α0eQ˚tej α0eQ˚t 1, (2.52)
which in vector form is
pα0ptq “ α0eQ˚t α0eQ˚t 1. (2.53) Corollary 3. For all tě0, p iptq “ eieQ ˚t eieQ˚t 1. (2.54)
36 Markov processes
Definition 49 (Quasi-stationary distribution).
A distribution α “ rαisiPS˚ is called a quasi-stationary distribution (QSD) if, for all
tě0
pαptq p
αptq1
“α. (2.55)
As shown by Darroch and Seneta [31], when the set of transient states is irreducible the quasi-stationary distribution is unique. We will be primarily concerned with the case that the QSD is unique. Van Doorn and Pollett [121] provide a comprehensive review of quasi-stationary distributions, including conditions for existence and uniqueness, and a guide for computing QSDs in MATLAB.
Definition 50 (Yaglom Limit).
The Yaglom limit given initial distribution α0 is defined (given the limits exist) by
yα
0
“ ryjsjPS˚, where for all jPS˚,
yj “ lim
tÑ8
Pα0pXptq “jq
Pα0pXptq RAq “tlimÑ8Pα0pXptq “j | Xptq RAq. (2.56)
The following observation (stated here as a proposition) follows immediately from Definitions 48 and 50.
Proposition 16.
For initial distribution α0 we have
y
α0 “tlimÑ8pα0
ptq. (2.57)
Theorems 1 and 2 of Vere Jones [122] prove that the Yaglom limit is a quasi-stationary distribution. Thus, in the case of a unique QSD, the Yaglom limit is also unique, and we denote it by y. We can interpret yi “ αi as the long run probability that the
process is in state igiven that it has not been absorbed [31].
When the Yaglom limit depends on the initial distribution, it remains the case that every Yaglom limit is a QSD, and every QSD is a Yaglom limit for some initial distribu- tion (namely, itself) [121]. We say that a QSDα is associated with initial distribution
α0 when α“y
α0.
Definition 51 (Ratio of means distribution).
We define the ratio of means distributionα1 “ rα1jsjPS˚ associated with initial distri-
bution α0 by, for all jPS˚,
α1j “ Eα0pT
˚
jq
Likelihood and model selection 37 or equivalently α1 “ Eα0pT ˚ q Eα0pTq , (2.59) where T˚
j is a random variable tracking time spent in state j PS˚ before absorption,
and T˚“ rT˚
jsjPS˚.
The ratio of means distribution was introduced by Darroch and Seneta [30], but has received relatively little attention. Darroch and Seneta [30] noted that the ratio of means distribution is dependent on the initial distribution, while (in the context under discussion) the quasi-stationary is not. Artalejo and Lopez-Herrero [3] revived some interest in this distribution by noting that in certain contexts, dependence on the initial distribution is a desirable property. Often the initial distribution is known, and the assumption that the process is at equilibrium is not well justified, but analysis of long term behaviour is still pertinent.
Artalejo and Lopez-Herrero [3] discussed the examples of population biology, and in particular, epidemic models. In Chapter 5 we will see that this distribution also has applications in evolutionary biology.
Our final result for this section follows from Proposition 4, and Theorem 9, and was first shown by Darroch and Seneta [31]. Darroch and Seneta [30] also showed an equivalent result for the discrete-time case.
Theorem 11.
For any initial distribution α0, the ratio of means distribution is given by
α1 “ ş8 0 α0eQ ˚t ş8 0 α0eQ ˚t 1 “ α0pQ ˚ q´1 α0pQ˚q´11. (2.60)