2.3 Bayesian Inference
2.3.3 Integrated Nested Laplace Approximation
A key obstacle in Bayesian statistics is to actually conduct Bayesian inference in practice; the math- ematics may seem straightforward but problems do exist. For example, MCMC may experience high computational costs and suffer convergence problems when implemented over some latent Gaussian models with a non-Gaussian response variable but latent Gaussian fields. To avoid such obstacles, a newly developed method, Integrated Nested Laplace Approximation (INLA), provides a faster al- ternative to simulation-based MCMC schemes within the class of latent Gaussian models (LGM) by approximating the posterior marginals. In brief, the class of LGM can be represented by a hierar- chical structure where response variables have conditionally independent likelihood functions over a latent Gaussian field, and hyperparameters describing the latent Gaussian field have certain prior distributions. More details on notions of LGM will be explained in a later section.
What is INLA?
For a given latent Gaussian fieldx, the method to approximate its posterior marginalsπ(xi|y) follows three steps: 1. compute the Laplace approximation to the posterior marginal of parameterθ(π(θ)); 2. find an approximation of π(xi|y, θ) for selected values ofθ; 3. numerical integration combining results from step 1 and 2 to find π(xi|y). In this section, approximate densities are denoted as ˜π. A more detailed introduction of steps are given below but more details can be found in Rue et al. (2009) and Rue et al. (2017).
It is crucial to be able to select sufficiently good evaluation points when approximating the posterior marginals of parameters (˜π(θ|y)) as such an approximation integrates out the uncertainty when approximating the posterior marginals of the Gaussian latent field x. Due to high computational costs in this process, an interpolant to log ˜π(θ|y) is used instead of direct numerical integration of ˜
π(θ|y). This can be done in the following steps:
• Obtain the modeθ* of ˜π(θ|y).
• Compute the negative Hessian matrix H at θ*; such that for eigen-decomposition VΛVT of H−1, we can defineθ(z) =θ∗+VΛ1/2z, wherez isN(0, I) for any Gaussian ˜π(θ|y).
CHAPTER 2. BAYESIAN METHODS IN SPACE-TIME MODELLING
Conditioning on the set of{θk}’s obtained above, one can then approximate the posterior marginals forxis ˜π(xi|y, θk). This can be done via Laplace approximation
˜ π(xi|θ, y)∝ π(x, θ, y) ˜ πG(x−i|xi, θ, y) x −i=x∗−i(xi,θ) ,
where ˜πG refers to the Gaussian approximation ofx−i|xi, θ, yandx* is the mode estimate.
To avoid computation of each ofxi and θ, it is proposed to approximate x∗−i(xi, θ) byEπ˜G(x−i|xi). This is readily available from the Gaussian approximation during exploration of ˜π(θ|y). In spa- tial cases, based on an intuitive decision that only xj’s which are in neighbourhood of xi’s should contribute to the marginal distribution ofxi,Eπ˜G(x−i|xi) can then imply that
Eπ˜G(xj|xi)−µj(θ)
σj(θ) =aij(θ)
xi−µi(θ)
σi(θ)
for someaij(θ) such thati6=j. ThusRi(θ) can be defined as collection ofj’s such thataij(θ)>0.001 and this can be used to simplify the calculation of ˜πG(x−i|xi, θ, y). This saves the computational
cost of finding densities for each pointxiand the selection of these points are based on the mean and
variance of the Gaussian approximation, sayx(is)= xi−µi(θ)
σi(θ) . By Gauss-Hermite quadrature rule, the Laplace approximated density follow ˜πLA(xi|θ, y)∝ N {xi;µi(θ), σ2(θ)}exp{cubic spline}, where the
cubic spline is fitted for|log(πLA(xi˜|θ, y))−log(˜πG(xi|θi, y))|.
For purely computational benefits, a simplified Laplace Approximation is also derived in? especially for spatial cases. Tne simplified Laplace approximation performs expansions of ˜π(xi|y, θk) at mean
xi =µi(θ) up to the third order. By fitting a skewed-normal distribution, it corrects skewness and location errors occurred in the process of Gaussian approximation.
Why do we use INLA?
With increasing popularity in Bayesian hierarchical models for complex data, general model fittings by via simulation based methods (eg. MCMC) are likely to be computationally expensive. By applying a combination of approximation and numerical integration, INLA bypasses the convergence issues occurring with MCMC methods. INLA typically delivers faster inference and allows estimation of hyperparameters which are challenging tasks for MCMC sampling. However, these are almost
CHAPTER 2. BAYESIAN METHODS IN SPACE-TIME MODELLING
guaranteed to be associated with biases coming from errors introduced by analytic approximations when calculating posterior probabilities. Although INLA provides estimates for hyperparameters, it should be borne in mind that the identification of hyperparmeters is actually a challenging task itself and that quick inference does not necessarily mean that it is correct inference (Taylor and Diggle, 2014). Despite the potential downsides mentioned above for INLA, such approximations together with R-INLA (http://www.r-inla.org/) certainly provide a routine toolbox when dealing with complex data.
R-INLA package
In this thesis, the main application of INLA will be done by the R-INLA package, which provides a new approach to statistical inference for latent Gaussian Markov random field (GMRF) models. This is a readily available package allowing fast inference based on INLA. Details are described in Rue et al. (2009). Briefly speaking, for observed valuesy, latent parametersηand some other parameters
θ, R-INLA supports hierarchical GMRF models with the following form:
yj|ηj, θ1∼π(yj|ηj, θ1), j∈J ηi=α+ nf−1 X k=0 wkifk(cki) +zTi β+i, i∈I.
The priors for hyperparameters are assumed to have distributionπ(θ).
Here, as not all latent parameters have to be learnt through the data, J is thus a subset of I. It is also assumed thaty is conditionally independent of the parameters and latent variables which contribute to the likelihood of observations through some known link functions. It is also assumed that unstructured random effects are independent and identically distributed withN(0, ληI), whereλη
denotes the precision. Offsets and weights for each data point are normally known and are included in the linear predictor fitting part. Covariate effects which are nonlinear and/or continuous are captured in fk(cki), where cki denotes the covariate value for covariate k at observation i. fk|θf k
follows GMRFs N(0, Q−k1) for some parameters θf k. zTi βi denotes the linear covariate effects for
covariate valuesziwith coefficientβ;βs are assumed to follow Gaussian distributions with mean zero and some fixed precisions. Thus, the full latent field is thenx= (ηT, fT
0, . . . , fnTf−1, β
T), which is also
CHAPTER 2. BAYESIAN METHODS IN SPACE-TIME MODELLING
additive models (GAM), generalised additive mixed models for longitudinal data, geoadditive models, ANOVA type interactive models and univariate stochastic volatility models.