5.3 Marginal likelihood for variance components
5.3.2 Restricted maximum likelihood estimation
In this case, estimation is based on the likelihood of some error contrasts u=A0y rather than on the likelihood of y and also fits better in the Bayesian model formulation that
was adopted in the previous sections. An error contrast is defined as a linear combination
a0ywith expectation zero, causing the distribution ofa0ynot to depend onβ. An example for a set of such error contrasts are the residuals ˆεi =yi−x0iβˆwhere the vector aconsists of the ith row of the residual matrix R =I −X(X0X)−1X0. Consequently, the vector of all residuals ˆε has the distribution
ˆ
ε∼N(0, Rσ2)
with the desired property E(ˆε) = 0. However, since R is inherently singular, the distri- bution is partially improper and therefore the usage of the residuals is not advisable in general.
Since only n−dim(β) linear independent error contrasts exist, REML estimation is com- monly based on error contrasts obtained from the decomposition
AA0 =X(X0X)−1X0 with A0A=I, (5.18)
where A is an n×(n−dim(β)) matrix with full column rank. It is easy to show, that the resulting error contrasts u=A0y fulfill E(u) = 0.
Now, the marginal density of u is given by (Harville 1974)
p(u) = 1 2π n−dim(β) 2 |X0X |12|Σ|− 1 2|X0Σ−1X|− 1 2 exp −1 2(y−Xβˆ) 0Σ−1(y−Xβˆ) (5.19) and restricted maximum likelihood estimators of τ2 and σ2 are obtained by maximizing
l∗(τ2, σ2) = −1 2log(|Σ|)− 1 2log(|X 0 Σ−1X|)− 1 2(y−Xβˆ) 0 Σ−1(y−Xβˆ)
using some numerical technique. Note that the restricted log-likelihood does not depend on the special choice of A as long as n−dim(β) linear independent error contrasts are used (Verbeke & Molenberghs 2000, Sec. 5.3).
Generalization of the restricted maximum likelihood technique to nonnormal data is not straightforward, since the definition of error contrasts is not possible for more general responses due to the nonlinear dependency ofy on β in generalized linear mixed models. Therefore, we present an alternative approach by Harville (1974) to the estimation of variances in Gaussian mixed models that leads to exactly the same restricted log-likelihood for τ2 and σ2 and that can additionally be extended to more general responses.
Recall that in the Bayesian formulation of mixed models not only b is assumed to be a random variable but also β. While the prior distribution of b is proper, the distribution of β is flat, i. e.
p(β)∝const.
From the Bayesian perspective it seems reasonable, to integrate both b and β out of the distribution ofy. The resulting marginal distribution ofy(as regards tobandβ) now has to be maximized with respect to the variance parameters. Harville (1974) showed that proceeding in this way leads to exactly the same likelihood as above. For this reason, REML estimation is also sometimes referred to as marginal likelihood estimation in the literature.
Replacing ML variance estimates with their REML counterparts allows for a further in- terpretation: The ML estimators are obtained by jointly maximizing the posterior with respect to the regression coefficientsβ and the variances τ2. Then the ML estimators cor-
respond to the variance components of the posterior mode. In contrast, REML estimates are given by the mode of the marginal posterior for the variances. The latter strategy coincides with the usual strategy in an empirical Bayes approach, where hyperparameters are treated as fixed constants which have to be estimated from their marginal posterior. To derive REML estimates for GLMMs, we approximate the logarithm of the likelihood with the Pearson χ2-statistic, yielding
l(y, β, b) ≈ n X i=1 (yi−µi)2 ωiv(µi)/φ = (y−µ)0S−1(y −µ).
This is in fact equivalent to the Laplace approximation ofl(y, β, b) with a quadratic func- tion (compare Tierney & Kadane 1986). Using the definition of the working observations ˜
y in (5.13) gives
(y−µ) =D(˜y−Xβ−Zb) and therefore we have
l(y, β, b) ≈ (˜y−Xβ−Zb)0D0S−1D(˜y−Xβ−Zb) = (˜y−Xβ−Zb)0W(˜y−Xβ−Zb).
Ignoring the dependence ofW on the variance parameters (Breslow & Clayton 1993) gives rise to the fact that the likelihood can be approximated by the log-likelihood of a linear mixed model for the working observations ˜y. To be more specific, we assume
˜
y|β, b∼a N(Xβ+Zb, W−1).
The determination of the marginal distribution of ˜y (as regards to b and β) yields an approximate restricted log-likelihood for the generalized linear mixed model:
l∗(τ2, φ) = −12log(|Σ|)−1 2log(|X 0Σ−1X |)− 1 2(˜y−Xβˆ) 0Σ−1(˜y −Xβˆ) (5.20)
with Σ =W−1+ZQZ0 being an approximation to the marginal covariance matrix of ˜y. To finally obtain REML-estimates of the variance parameters, we have to maximize (5.20) with respect to τ2 (and φ if necessary). One suitable optimization procedure is the
Newton-Raphson algorithm which is based on the first and second derivative of l∗(τ2, φ)
with respect to the variance parameters. A modification of the Newton-Raphson algorithm is given by Fisher-Scoring, where the second derivative is replaced by its expectation. Since this leads to simplified estimation equations, we will focus on Fisher-Scoring. Note that in several models the derivatives with respect to the dispersion parameterφare not needed, e. g. for Poisson or Binomial data where the dispersion parameter is fixed. In these cases the corresponding derivatives have to be eliminated from the formulae for the score-function and the expected Fisher-information presented in the following sections.