Modelling - Areal data - Statistical tools for spatio temporal epidemiology, with application t

2.3 Areal data

2.3.2 Modelling

Spatial models

First we consider a basic model for areal disease mapping by focusing our attention on a purely spatial scale. As we are generally looking at the counts of deaths per specific area it is common to assume a Poisson distribution (Lawson et al., 2003).

Wakefield et al. (2000) provide a good example of a basic three level hierarchical model based on the aggregation of the underlying individual risk level:

Yi ∼P oisson(eSiEi), i= 1,2,3, ..., n

Si∼p(.|θ)

θ∼π()

where: Yi is the observed number, Ei is the expected number of cases for area i (under the

assumption of uniform risk), Si is the log relative risk in area i, p(.|θ) is an appropriate

second stage prior distribution for Si, andπ is the prior distribution for the parameter vector

θ. Their model is suitable when the disease is rare and it is based on the assumptions that

the individual risk level varies randomly within the area and risk associated with a particular area acts proportionally on the baseline risk for each area.

For many outbreaks the time scale of the events is also known. This information can be included in the modelling process.

Spatiotemporal models

When the time component is known, the response becomes the number of cases in each area

in each time period. For modelling purposes we denote Yit as the number of cases in areai

in time period t. The main focus with spatial temporal modelling is estimating the Poisson

mean which varies with iand t. It is generally assumed that

Yit∼P oisson(Eitθit)

whereθit is the unknown true relative risk andEit is the expected number of cases under the

assumption of uniform risk. For a basic spatial temporal model the relative risk logarithm can be specified as:

log(θit) =ui+vi+τt+γit

with ui being the spatially correlated extra variation, vi the uncorrelated extra variation, τt

the temporal variation and γitthe space-time interaction. Commonly τt andγit are assumed

to follow random walks, τt ∼ N(τt−1, σ2τ) and γit ∼ N(γi,t−1, σ2γ) respectively, which allows

Veterinary models

The majority of initial research and modelling was driven by human health data. Though the basic principles remain for veterinary data, there are added complications. These complications include the use of human intervention to control an outbreak, and that the animals themselves cannot report sickness. Control methods used to limit and control spread can have dramatic effect, altering the progression of the disease. Even through the reduction in spread is desirable from an epidemiological point of view, from the analytical side it can lead to a significant amount of missing information, which can later impact on the quality of model predictions. Since the disease reporting process first requires someone (owner, farm worker, veterinary, etc) to detect the disease and then report the disease to the appropriate authorities, the true level of the disease and its spread can be under-represented. This can therefore add difficulty in the accuracy in modelling the disease (Lawson, 2013).

An example of spatial veterinary modelling is provided by Stevenson et al. (2005) where they use Bayesian Poisson models to describe the geographical distribution of Bovine spongiform encephalophathy (BSE). The disease itself is not contagious and relatively rare so the observed

number of cases in each area (Oi) was assumed to follow an independent Poisson distribution

with the average number of cases (µi) equal to the product of the expected number of cases

(Ei) and an estimated area-level relative risk. Their model can be expressed by

log(µi) = logEi+ (α+β1x1i+...+βmxmi) +Ui+Si whereEi is estimated by ni P178 i=1Oi P178 i=1ni !

with ni being the total cattle population in the ith area and 178 being the total number

of areas. The model includes m area-specific fixed effects (β1, . . . , βm) associated with ex-

planatory variablesx1, . . . , xm, and spatially correlated and non-spatially correlated termsSi

and Ui respectively. They applied flat priors for the intercept α and regression coefficients

β1, ..., βm. They applied a normal prior to Ui, while Si was assigned a conditional intrinsic

Gaussian autoregressive (CAR) prior.

CAR priors are commonly used in Bayesian analysis of spatial data (Hodges et al., 2003). They were first proposed by Besag (1974) and made popular in disease mapping by Besag et al. (1991). CAR priors allows the posterior estimates for a region to take into consideration the neighbouring regions (Hodges et al., 2003; Jin et al., 2005).

An example of spatiotemporal veterinary modelling is provided by Branscum et al. (2008). They used a flexible Bayesian Poisson regression model to describe the annual provincial

occurrences of FMD of Turkey from 1996 to 2003. They defined their model as

Yi,t ∼P oisson(µi)

log(µi) =gi,t+βxi,t+ηi

where Yi,t is the yearly counts of cases in province i for year t and xi,t for the explanatory

variables with its corresponding regression coefficient βs. The function gi,t models the lon-

gitudinal trend specified by a Gaussian process. They used a Gaussian process because it provided the ability to have a wide variety of temporal shapes. Similar to the spatial model

example above, a CAR prior was applied to the spatial processηi.

Further examples of veterinary disease mapping models for FMD are described in sections 3.3.3 and 3.4.3.

In document Statistical tools for spatio temporal epidemiology, with application to veterinary diseases : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North (Page 43-46)