3. DATA AND METHODS
3.3 Bayesian Spatial Smoothing
A full Bayesian spatial smoothing method is applied to the municipal-level infant mortality rates to improve the quality of the estimates. As discussed in Chapter 2, the parameters of the prior distribution in full Bayesian smoothing are considered to be
random variables with their own distributions, resulting in a hierarchical model. The first level of the model is defined by the observed data itself while in the second level the prior distribution defines spatial dependence between nearby areas through its hyper-parameters.
In this study an adjacency matrix is used to identify neighbouring areas.
Neighbours are defined as municipalities that are physically connected to one another.
For example, the selected municipality shown on Figure 3.4 has seven neighbouring municipalities contributing information for its estimate of child mortality. There are a total of 234 municipalities and 1244 distinct adjacent pairs of municipalities (neighbours) in South Africa which give an average of 5.3 neighbours per municipality with the smallest number of neighbours being 1 and the largest number of neighbours being 11.
Figure 3-4 An example showing the neighbours for a municipality
To use prior distributions obtained from neighbouring areas, a hierarchical Bayesian model is employed in which the first level of the model consists of the level of child mortality in an area in which the number of child deaths reported in each municipality, Yi, is modelled using a binomial distribution as given below.
Yi ~ Binomial(pi,ni) (3.3)
where
pi is the probability that a child is dying before reaching the first birthday in municipality i and ni is the total number of children in the municipality. The resulting
1
2
3
4
5 6
7
fitted values of will be used as a smoothed estimate of infant mortality in municipality i. This parameter of interestis modelled using a generalised linear model:
logit(pi)Si (3.4) where α is an unstructured random effect representing the global mean of the log-relative risks for all areas and Siis a spatially structured random effect representing the municipal-specific effects or the deviation from the global mean (Lunn, Jackson, Best et al. 2013).
In order to further improve the estimates; it is a good practice to include some important determinants of child mortality in the model specified above. In this regard, two variables are included: level of females’ education and the level of HIV in the municipalities. Females’ education is known to be a strong predicator of child mortality in many researches. On the other hand, HIV has significantly affected the mortality of children in South Africa. Therefore, the average years of schooling of women aged 49 in each municipality and the provincial HIV prevalence rate among adults in the 15-49 age group are included in the model specified above. HIV prevalence rates are taken from the 2012 South African national HIV prevalence, incidence and behaviour survey conducted by the Human Science Research Council (HSRC 2014b). The revised generalised linear model for the probability of death controlling for these variables becomes:
logit(pi)1X1i 2X2i Si (3.6) where
X1i and X2i are respectively the education and HIV variables as defined above. The inclusion of these two variables in the model helps to effectively use the spatial neighbourhood, females’ education and HIV prevalence rate to predict the probability of death for each municipality.
The second level of the hierarchical Bayesian model is the prior distributions for the random effects. An improper uniform prior distribution is assigned for the unstructured random effect, α (Lunn, Jackson, Best et al. 2013).
~ dflat() (3.7)
pi
Since there is very little information available on how much education or HIV impact child mortality occurs in each municipality, very weak prior distributions for 1 and 2 are given by assigning a small value for the precision. In doing so, the data will be guaranteed to be the main determinant of the estimates.
1i, 2i ~N(0,0.001) (3.8) The spatially structured random effect is assigned a conditional autoregressive (CAR) distribution with parameter
Si ~CAR() (3.9)
The CAR model specifies how each Si is related to the Sj at all other locations via a set of univariate conditional distributions. One of the most commonly used formulations (see Lunn, Jackson, Best et al.) which is applied in this research is
conditional mean of Siis a weighted average of the other sSi . This model is available in WinBUGS (Bayesian Inference Using Gibbs Sampling), a software dedicated for Bayesian modelling, as
S[1:n]~car.normal
adj[],weights[],num[],inv.tau.sqared
(3.11) The CAR model also includes the hyper-parameter τ, the precision of the variance, which denotes how similar or variable neighbouring areas should be. Due to uncertainty in the degree of similarity in neighbouring areas, in the third level of the hierarchical model, τ is assigned its own distribution, a hyper-prior distribution, with a very weak gamma distribution. ~(0.5,0.0005) (3.12) To determine the standard deviation of S,
τ
is normally converted into the form
.S 1/ , where is scalar.
3.3.2 Model fitting
The parameters of the specified Bayesian models are estimated by the use of WinBUGS software which performs Bayesian inference based on the MCMC sampling scheme.
The two models are fitted and compared with DIC (deviance information criterion).
The first one is with only spatial structure, and the second model incorporating females’
education and HIV prevalence rates. For each model 100 000 iterations are run with the initial 10 000 discarded from the use for parameter estimation. After convergence, the model with the lowest DIC is selected. Convergence is evaluated by inspecting trace and autocorrelation plots of samples for each chain, as well as other numerical summaries as shown below.
3.3.3 Model diagnostics in Bayesian modelling
To ensure that the simulated posterior distribution is an accurate representation of the true posterior distribution, some diagnostic tests are necessary. Among these, the diagnostics performed here are: Gelman-Rubin statistic and examinations of autocorrelations and Monte Carlo errors. The Gelman-Rubin statistic is used for assessing convergence of MCMC simulation. For a given parameter, this statistic assesses the variability within parallel chains as compared to variability between parallel chains. The model is judged to have converged if the ratio of between variability to within variability is close to 1.
Examination of the autocorrelation function between successive iterations of chains for the parameters is the other important tool used in Bayesian model diagnostics. This is done for each of the parameters: the proportion of children who have died (pi), the spatially structured random effects (Si), the education effect (β1), the HIV effect (β2) and the standard deviation of Si. The autocorrelation values for all these parameters should be close to 0 for the model to be good.
As assessment of model accuracy, the Monte Carlo error for each parameter of interest is investigated. As a rule of thumb, to have accurate posterior estimates the
simulation should run until the MC error for each parameter of interest is less than about 5% of the sample standard deviation. This ensures whether convergence and accuracy of posterior estimates are attained and the model is appropriate to estimate posterior statistics.