Modelling and Bayesian analysis of the Abakaliki smallpox data

(1)

Modelling and Bayesian Inference of the Abakaliki

Smallpox Data: Supplementary Material

We describe (i) for the likelihood, how to integrate out the unknown protection statuses of individuals outside the compound; (ii) the details of the MCMC algorithm. For ease of reference we start by recalling the notation used.

1 Notation

The principal notation used is listed in Table 1, most of which is also used in the main manuscript. We also extend the notation of the main manuscript for sets of individuals; specifically,Na

b is used to denote a set of individuals with characteristicsa and b, wherea

denotes location (such asOC, meaning outside the compound) andbdenotes status such as FTC or n-FTC (i.e. not FTC), sus (initially susceptible), inf (ever infected) or n-inf (never infected). Note that ‘initially susceptible’ can mean either ‘not vaccinated’ or ‘vaccinated, but not protected’.

2 Transformation of the Likelihood

We now explain how to write the likelihood in a form that does not include the protection statuses for individuals outside of the compounds. In section 2.1 we will show that

π(r,γ|Φ) =π(r,˜γ, x, y|Φ), (1)

where instead of separate protection statuses for each outside individual, we only require quantities xand y where

x = Number of FTC, vaccinated, unprotected but never

infected individuals outside the compounds

y = Number of non-FTC, vaccinated, unprotected but

(2)

Parameter Meaning N Population size

ncom Number of individuals in the compounds

λa Global infection rate

λf FTC infection rate

λh Compound same-confession infection rate

b Infectivity factor during fever v Vaccine efficacy

tq Time quarantine measures introduced

θ Fixed parameters for disease progression κ Identity of initial infective

eκ Exposure time ofκ

s Vector of vaccination statuses (all individuals) p Vector of protection statuses (all individuals) ˜

p Vector of protection statuses (compound individuals only) Φ (κ, eκ, tq, b, v, λa, λf, λh,θ,s)

e Vector of exposure times other than eκ

i Vector of fever-period start times r Vector of rash times (data) τ Vector of removal times q Vector of quarantine times γ (e,i,q,τ,p)

˜

γ (e,i,q,τ,p)˜

N_inf Set of ever-infected individuals

Na

[image:2.612.74.422.212.581.2]

b Set of individuals with location aand statusb

(3)

and where

˜

γ = (e,i,q,τ,˜p)

where p˜ = (p0, p1, ..., pncom−1). Thus γ˜ is the same as γ but with protection statuses for individuals inside the compounds only.

Given (1), x and y can then be integrated out to yield a likelihood which is computa-tionally much faster to calculate. First note that

π(r,γ˜, x, y|Φ) =π(r,˜γ|x, y,Φ)π(x, y|Φ)

and hence

π(Φ,γ˜, x, y|r) = π(r,γ˜|x, y,Φ)π(x, y|Φ)π(Φ)

π(r) .

Integrating out x and y, which is equivalent to summing in this case since they take discrete values, we obtain the new target density

π(Φ,˜γ|r) =X

x,y

π(Φ,˜γ, x, y|r) = π(Φ) π(r)

X

x,y

π(r,˜γ|x, y,Φ)π(x, y|Φ).

2.1 Derivation of equation (1)

First, note that bothxandy are determined by the data, the vaccination status of individ-uals outside the compound, and the protection status of individindivid-uals outside the compound. Since only the latter is unknown, we may think of x and y as functions of p, or more pre-cisely those elements ofp that correspond to vaccinated individuals outside the compound. In particular, summation over all values of x and y is equivalent to summation over such elements of p.

Recall that Λ(t) denotes the total infectious pressure acting on all susceptibles at time t, and that we have

π(r,γ|Φ) =



 Y

j∈Ninf

Λj(ej−)



×e

−RT eκΛ(t)dt

× Y

j∈Ninf

fI(ij−ej)fF(rj−ij)fR(τj−rj)fQ(qj −max(rj, tq))

×v

N−1

P

r=0 prsr

(1−v)

N−1

P

r=0

(1−pr)sr

. (2)

We may write Λ(t) = ΛOC(t) + ΛCN(t) + ΛCC(t), where (i) ΛOC(t) is the pressure on

individuals outside the compounds; (ii) ΛCN(t) is the pressure on those inside the

(4)

infected. Then the only terms in (2) with unknown protection statuses for outside individ-uals are ΛOC(t) and the terms involving v. We now expand the likelihood of the protection

status, according to whether each individual is inside/outside and by whether they do/do not become infected, as follows:

v

N

P

r=0 prsr

(1−v)

N

P

r=0

(1−pr)sr

=v

ncom−1

P

r=0 prsr

(1−v)

ncom−1

P

r=0

(1−pr)sr

×v

N

P

r=ncom prsr

(1−v)

N

P

r=ncom r∈Ninf

(1−pr)sr

×(1−v)

N

P

r=ncom r∈Nn−inf

(1−pr)sr

.

The terms corresponding to individuals inside the compounds do not depend on x and y, and neither does the term concerning outside infectives, since their protection status is known. Thus the part of the likelihood dependent on x and y, which are determined by p as previously explained, is

Lx,y(p) = e−

RT

eκΛOC(t)dt_v N

P

r=ncom prsr

(1−v)

N

P

r=ncom r∈Nn−inf

(1−pr)sr

. (3)

The integral of ΛOC is the sum over all infectivesjof the pressure fromjto any given FTC

or non-FTC individual outside of the compounds (throughout all time (eκ, T)), summed

over all the initially susceptible FTC and non-FTC individuals. Thus,

e−

RT

eκΛOC(t)dt _{= exp}

− X

j∈Ninf

X

k∈Noc sus

Ψjk

(4)

with Ψjk = total infectious pressure fromjto susceptiblekduring the time interval (eκ, T).

We now expand the right-hand side of (4) according to whether susceptibles become infected or not, and whether they are FTC or not, yielding

exp− X

j∈Ninf

X

k∈Noc sus

Ψjk

= exp − X

j∈Ninf

_X

k∈Noc inf

Ψjk +

X

k∈Noc n−inf

Ψjk

!

= exp − X

j∈Ninf

_X

k∈Noc inf,F T C

Ψjk +

X

k∈Noc inf,n−F T C

Ψjk

+ X

k∈Noc n−inf,F T C

Ψjk +

X

k∈Noc

n−inf,n−F T C

Ψjk

!

(5)

[image:5.612.129.381.145.319.2]

Figure 1: Tree diagram for protection status of individuals outside the compounds.

Recall that the data only tell us if individuals are vaccinated, not whether they are protected. Of the four Ψjk values in equation (5), the numbers of initially susceptible FTC and

non-FTC individuals outside that do become infective are known, but not the number of initial susceptibles that are never infected. Figure 1 displays how individuals can be in the latter two categories.

So, the total number of initially susceptible, never infected individuals outside the com-pounds is given by

| N_noc₋_inf| =aoc_n−inf,F T C+aocn−inf,n−F T C+x+y

where aoc_n₋_{inf,F T C} is the known number of non-vaccinated, never infective FTC individ-uals outside the compounds and aoc_n₋_inf,n₋_{F T C} is the corresponding number of non-FTC individuals. Using this and (5), (3) becomes

Lx,y(p) = exp

− X

j∈Ninf

X

k∈Noc inf,F T C

Ψjk +

X

Ψjk

+χF(j)×(aocn−inf,F T C+x) +χN F(j)×(aocn−inf,n−F T C+y)

×v

N

P

r=ncom prsr

(1−v)

N

P

r=ncom r∈Nn−inf

(1−pr)sr

(6)

where

χF(j) = b(rj−ij) + (min(qj, τj)−rj)

×

( _λa

N−1 +

λf

n−1 iffj = FTC

λa+λf

N−1 otherwise and

χN F(j) = b(rj−ij) + (min(qj, τj)−rj)

×

( _λa

N−1 iffj = FTC

λa+λf

N−1 otherwise

represent respectively the contribution from infective j to a never infected FTC and non-FTC susceptible outside the compounds over all time (eκ, T), this contribution being equal

for all susceptiblesk.

Considering the protection status likelihood parts of equation (6), note that

(1−v)

N

P

r=ncom r∈Nn−inf

(1−pr)sr

= (1−v)x+y,

since the sum is equal to the number of vaccinated but unprotected individuals outside who do not become infected, and hence

v

N

P

r=ncom prsr

=vbocn−inf−x−y_,

whereboc_n₋_inf is the number of never-infected vaccinated individuals outside the compounds. Splitting up the terms involving x and y, equation (6) can be written in the form

Lx,y(p) =

exp



− X

j∈Ninf

X

k∈Noc inf,F T C

Ψjk+

X

Ψjk+aocn−inf,F T CχF(j) +aocn−inf,n−F T CχN F(j)

! 

×exp − X

j∈Ninf

χF(j)×x

!

×(1−v)xvbocn−inf,F T C−x

×exp − X

j∈Ninf

χN F(j)×y

!

×(1−v)yvbocn−inf,n−F T C−y ₍₇₎

(7)

2.2 Removing x and y

We now sum (7) over x and y. Recall that this is equivalent to summing Lx,y(p) over all

possible values of the elements ofprelating to vaccinated individuals outside the compound. For fixedx and y, write p(x, y) for a typical vectorp that yields x and y. First note that the number of possible vectors p(x, y) is

boc_n₋_{inf,F T C} x

boc_n₋_inf,n₋_{F T C} y

,

where boc_n₋_{inf,F T C} and boc_n₋_inf,n₋_{F T C} are respectively the (known) number of vaccinated, not-infected FTC and non FTC individuals outside the compounds. We may thus write

Lx,y=

X

p(x,y)

Lx,y(p) =

boc_n₋_{inf,F T C} x

boc_n₋_inf,n₋_{F T C} y

Lx,y(p),

which in combination with (7) means that the required sums (over x and y, which can be separated from each other) are of the form

b

X

x=0

b x

e−αx(1−v)xvb−x = ((1−v)e−α+v)b.

WritingL=P

x,yLx,y,χF =

P

j∈Ninf

χF(j) and χN F = P j∈Ninf

χN F(j) we thus obtain

L = exp− X

j∈Ninf

X

k∈Noc inf,F T C

Ψjk+

X

Ψjk

− aoc_n₋_{inf,F T C}χF −aocn−inf,n−F T CχN F

×(1−v)e−χF +vb

oc

n−inf,F T C

(1−v)e−χN F +vb

oc

n−inf,n−F T C

. (8)

(8)

algo-rithm we work with log-likelihood, so taking logs gives an overall log likelihood of

log(π(r,˜γ|Φ)) = log



  

Y

j∈Ninf j6=κ

Λj(ej−)



  

−

Z T

eκ

(ΛCN(t)−ΛCC(t))dt−

X

j∈Ninf



 X

k∈Noc inf

Ψjk





−aoc_n−inf,F T CχF −aocn−inf,n−F T CχN F

+boc_n₋_{inf,F T C}log v+ (1−v)e−χF₊_boc

n−inf,n−F T Clog v+ (1−v)e−χN F

+ log



 Y

j∈Ninf

fI(ij−ej)fF(rj−ij)fR(τj−rj)fQ(qj −max(rj, tq))





+

ncom−1

X

r=0

sr prlog(v) + (1−pr) log(1−v)

+

N−1

X

r=ncom r∈Ninf

srlog(1−v).

3 MCMC

Here we give details of the MCMC algorithm.

3.1 Unknown vaccination statuses in MCMC algorithm

Although the total number of vaccinated individuals within the Abakaliki compounds is known, we do not have complete information on the composition of compound individuals with respect to vaccination status and FTC membership. Table 2 shows 5 potential con-figurations of the 12 individuals with unknown details. This means that all the elements ofs (the vector of vaccination statuses) are assumed known, with 12 exceptions, which can take 5 possible configurations. A described below in the MCMC algorithm, updates to s involve potentially changing the configuration from the 5 that are possible, while all other elements of sare unchanged.

3.2 Full conditional Distributions

Here we give, up to proportionality, the full conditional distributions for the parameters that will be updated in the MCMC algorithm. Although in principle we could update all parameters individually using a Metropolis-Hastings step involving the full likelihood, algorithm efficiency is increased if we only include those parts of the likelihood that can change during a particular update.

The parameters to be updated are tq, λa, λf, λh, v, b, ˜p, s and the exposure, fever,

(9)

[image:9.612.186.426.173.661.2]

(10)

generic unknown constant (arising from the fact that we generally only know full conditional distributions up to proportionality).

log(π(tq|r,θ,γ˜, κ, b, v, λa, λf, λh, eκ,s)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(ΛCN(t) + ΛCC(t))dt−log(L) + log(π(tq)).

log(π(b|r,θ,˜γ, κ, tq, v, λa, λf, λh, eκ,s)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(ΛCN(t) + ΛCC(t))dt−log(L) + log(π(b)).

log(π(v|r,θ,γ˜, κ, tq, b, λa, λf, λh, eκ,s)) +c=

−log(L) +

ncom−1

X

r=0

prsrlog(v) + ncom−1

X

r=0

(1−pr)srlog(1−v)

+

N−1

X

r=ncom r∈Nn−inf

(1−pr)srlog(1−v) + log(π(v)).

log(π(λa|r,θ,γ˜, κ, tq, v, b, λf, λh, eκ,s)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(ΛCN(t) + ΛCC(t))dt−log(L) + log(π(λa)).

log(π(λf|r,θ,˜γ, κ, tq, v, b, λa, λh, eκ,s)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(ΛCN(t) + ΛCC(t))dt−log(L) + log(π(λf)).

log(π(λh|r,θ,γ˜, κ, tq, v, b, λf, λa, eκ,s)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(11)

For any i= 0,1, . . . , ncom−1, defining ˜pi∗ = (p0, p1, ..., pi−1, pi+1, ..., pncom−1),

log(π(˜pi|r,Φ,e,i,q,τ,p˜∗i)) +c=

−

Z T

eκ

ΛCN(t)dt+pilog(v) + (1−pi) log(1−v).

log(π(s|r,θ,˜γ, κ, tq, v, b, λa, λh, eκ)) +c=

−

Z T

eκ

ΛCN(t)dt+ ncom−1

X

r=0

prsrlog(v) + ncom−1

X

r=0

(1−pr)srlog(1−v)

+

N−1

X

r=ncom r∈Nn−inf

(1−pr)srlog(1−v).

Finally, fori= 0,1, ..., N−1 whereiis infective, defininge∗_i = (e0, e1, ..., ei−1, ei+1, ..., eN−1) and similarly defining i∗_i,q∗_i and τ∗_i,

log(π(ei|r,Φ,e∗i,i,q,τ,p)) +˜ c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(ΛCN(t) + ΛCC(t))dt−log(L).

log(π(ii|r,Φ,e,ii∗,q,τ,p)) +˜ c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

log(π(qi|r,Φ,e,i,q∗i,τ,˜p)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

log(π(τi|r,Φ,e,i,q,τ∗i,˜p)) +c=

log



 Y

j∈Ninf

Λj(ej−)



− Z T

eκ

(12)

3.3 MCMC Algorithm

The MCMC algorithm consists of individual Metropolis-Hastings updates for all of the required parameters. For a generic parameter α the update consists of a proposed new value α∗ drawn from proposal density (or mass function)q(α∗|α), which is then accepted with probability

min

π(α∗)q(α|α∗) π(α)q(α∗_|_α),1

,

whereπ(α∗) andπ(α) denote the full conditional density (or mass function) forαevaluated at the proposed and current value, respectively. In most cases below, the proposal density q is symmetric in the sense that q(α|α∗) =q(α∗|α), which simplifies the computations.

For tq,λa,λf,λh,v, andb, we use a Gaussian proposal distribution with pre-specified

variance, centered on the current parameter value. Proposed values outside the possible set for each parameter are immediately rejected (for example, all the infection rates must be positive, 0≤v ≤1, etc.) Note that if such a rejection occurs, no further proposal attempt is made. The acceptance probability is calculated using the full conditional distributions listed above.

Protection status updates (i.e. updates to p) are achieved by first randomly selecting a˜ fixed number (e.g. 5 or 10) of compound individuals among those who were vaccinated but never infected, and proposing to change the protection status of each selected individual.

Updates to s are similar: a proposed new value is chosen uniformly at random from one of the five possible configurations of the 12 individuals whose vaccination status is uncertain, as shown in Table 2. Following this, all of the individuals among the 12 who are now proposed to be vaccinated are, independently with probability v, proposed to be protected, while those proposed to be unvaccinated are proposed to be unprotected.

Finally, updates to the event times are carried out in pairs, the reason being that we expect these quantities to be correlated. Specifically, an individual is chosen uniformly at random from the set of infected individuals, and proposed new exposure and fever times are simulated from the distributions assumed in the model. These are then accepted or rejected as usual. Following this, a pair of removal and quarantine times are proposed for the same individual, regardless of whether or not the exposure and fever times were changed. In practice we repeat the entire event-time-updating procedure a number of times (e.g. 5) within each full MCMC iteration in order to improve the overall algorithm mixing.