2.5 Branching Process Conditioned On Total Progeny
2.5.7 Branching Process Approximation To Finite Random Digraph
In Section 2.5.5 we showed the Poisson ospring branching process is invariant to the rate of the Poisson distribution. As described in Section2.5.2, a branching process with Poisson ospring can be used to approximate an epidemic with xed infectious periods in the early stages. We now investigate this approximation using the random digraph representation to give a relationship between the rank chain of the digraph and path of the branching process.
The random digraph conditioned on connectedness and the branching process condi- tioned on its total progeny are both tools to consider the nal size of an epidemic. Since the branching process approximation assumes a large population, it would seem that increasing s, the number of susceptibles in the nite graph, should aect how close the two processes are. In particular, the random digraph is λ-dependent whereas the branching process is not.
For the random digraph, consider the constant infectious period case, let c = 1 without loss of generality. Figure2.15shows the eect of varying λ at various values of s, given one root node and a conditioned connectedness of ve, on the expected size of the rst generation, X1. The graph was produced using the expression derived in Section2.4.4
for the digraph simulations.
In Figure2.15 a horizontal line has been added to represent the equivalent branching process, with a single ancestor and conditioned total progeny of ve, i.e. a = 1 and k = 5. The line is horizontal as the path probabilities, and hence the expectations, are invariant to λ. The line corresponds to the digraph letting the population be innite in size, i.e. letting s → ∞.
2.5 Branching Process Conditioned On Total Progeny 133 0.0 0.5 1.0 1.5 2.0 1.6 1.7 1.8 1.9 2.0 2.1 2.2 λ E ( X1 ) 5 15 25 75 BP
Figure 2.15: Comparing the expected size of the rst generation in a conditioned random digraph and a conditioned Poisson branching process. With a single root and ancestor, r = a = 1 and conditioned on d = k = 5 while varying λ. The digraphs have dierent number of initial susceptibles s.
seventy ve susceptibles, s = 75 which is not particularly large. The approximation is fairly good even for small populations. Conversely, for the smallest population shown, s = 5, the approximation is very poor.
Finally, the limiting behaviour illustrated in Figure 2.15 can be shown algebraically with the following example. The step probabilities of a conditioned random digraph are s-invariant for suitable λ, see Section 2.4.9. Though the number of susceptibles, s is implicit in the denition of λ. Using the example derived in Section 2.4.7.1, for
2.5 Branching Process Conditioned On Total Progeny 134 (r, s, d) = (1, s, 2) and letting s → ∞ we have,
P1,s[Z1 = (1, 2)|Z0 = (1, 1), D = 2] = 2e −λc r+s 1 + 2e−r+sλc → 23 as s→ ∞ = P1[W1 = (1, 2)|W0= (1, 1), T = 2].
That is, the probability tends to the conditioned step probability of the equivalent conditioned branching process, with a Poisson ospring corresponding to the xed infectious period.
Since the digraph step probabilities cannot be expressed in an easily obtainable alge- braic form, we cannot generalise this for any such step probability. Though, by the denition of the branching process approximation, we expect this to be true.
135
Chapter 3
Inference For Final Size Data Using Markov
Chain Monte Carlo Methods
3.1 Introduction And Motivation
We are motivated by the need to analyse epidemic data, specically nal size data, to gain insight from previous outbreaks. As discussed in Sections 1.2.6 and 2.1, obser- vations of epidemics are often incomplete in regard to each individual as well as only covering a subset of the population.
In this chapter we consider the following problem. Given nal size data and a stochastic epidemic model, what can be inferred about the parameters of this model from the data, what insight can be gained?
The nal size data will consist only of counts of the number of susceptibles infected at the end of the epidemic. We shall use the stochastic Susceptible-Infective-Removed (SIR) epidemic model dened in Section 1.2.2, using the directed random graph rep- resentation investigated in Chapter2. To make statistical inference about parameters of the model given the data we use Markov Chain Monte Carlo (MCMC) methods as outlined in Section 1.3.2, we shall present update algorithms specic to the nal size data.
3.1 Introduction And Motivation 136 For nal size data, the likelihood under the simple SIR model is intractable, since the only information from the data is the state of the population at the end and beginning of the epidemic. There is no explicit information about the start of the epidemic, specically the number of initial infectives is unknown and at rst we shall assume a single initial infective, later this will be considered another unknown parameter. To proceed we must augment the likelihood with sucient information about the course of the epidemic to obtain a tractable expression. The imputed course of the epidemic will be the representations investigated in Chapter2, rstly we shall consider the edge representation and then the generation representation.
Using MCMC we will make inference for the infection rates of the SIR model. Without more detailed temporal information, it is not possible to make inference about the in- fectious period directly. For the xed infectious period case, it is impossible to separate the infection rate and infectious period, the two parameters are indistinguishable. For this chapter we shall only consider a xed infectious period, that will be considered a known constant of the model.
In Section 3.2we consider the simple SIR epidemic model, with a single type of indi- vidual with the course of the epidemic as missing data that we impute. Imputation of edges has been investigated by Demiris and O'Neill(2005a), we present this approach and the generation representation developed in Chapter2. Both methods are compared using sample data sets.
We dene an epidemic with missing data to include all types of data that are obser- vations of a process omitting some detail, e.g. the exact infection and removal times. Final size data can then be viewed as an example of missing data, where the infection times, removal times and which individuals infect each other are unknown. Partially observed epidemics are dened to be those where the data only represent a subset of
3.1 Introduction And Motivation 137 the population, i.e. a specied fraction of the total population. In Section3.3we extend the algorithm to enable the analysis of such data.
Including unobserved individuals naturally leads to incorporating multiple types of individuals, some of which may be unobserved. We briey expand the algorithm in Section 3.4, though we present a more complete general framework in Section 3.5, allowing individuals to have multiple levels of mixing. Thus, the general framework allows for arbitrary types of individuals and an arbitrary number of levels of mixing, together with a general form for the rates of contacts among individuals. There are limits to the type of model that can be tted, namely no temporal eects can be included, e.g. weekday-weekend cycles. Also, the data may be too sparse to implement the complicated general model, which may lead to overtting or poorly converging Markov Chain Monte Carlo algorithms.
The multi-type multi-level algorithm is applied to the household data presented in Longini et al. (1988), and comparisons are made to the edge imputation methods by O'Neill(2009) on the same data set.
Finally, in Section3.7we consider practical considerations of implementing the MCMC algorithms. In particular the use of parallel computing using GNU OpenMP and the need for arbitrary precision using GNU MPFR. To implement the MCMC algorithm in the C programming language we have also used the GNU Scientic Library (GSL), seeGalassi et al.(2003) for further details.