Restricted maximum likelihood estimation - Marginal likelihood for variance components

5.3 Marginal likelihood for variance components

5.3.2 Restricted maximum likelihood estimation

In this case, estimation is based on the likelihood of some error contrasts u=A0_y _rather than on the likelihood of y and also fits better in the Bayesian model formulation that

was adopted in the previous sections. An error contrast is defined as a linear combination

a0_y_{with expectation zero, causing the distribution of}_a0_y_{not to depend on}_β_{. An example} for a set of such error contrasts are the residuals ˆεi =yi−x0iβˆwhere the vector aconsists of the ith row of the residual matrix R =I ₋X(X0_X₎−1_X0_{. Consequently, the vector of} all residuals ˆε has the distribution

ε_∼N(0, Rσ2)

with the desired property E(ˆε) = 0. However, since R is inherently singular, the distribution is partially improper and therefore the usage of the residuals is not advisable in general.

Since only n₋dim(β) linear independent error contrasts exist, REML estimation is com- monly based on error contrasts obtained from the decomposition

AA0 ₌_X₍_X0_X₎−1_X0 _with _A0_A₌_I, _(5.18)

where A is an n_×(n₋dim(β)) matrix with full column rank. It is easy to show, that the resulting error contrasts u=A0_y _fulfill _E₍_u_{) = 0.}

Now, the marginal density of u is given by (Harville 1974)

p(u) = 1 2π n−dim(β) 2 |X0_X |12|Σ|− 1 2|X0Σ−1X|− 1 2 exp −1 2(y−Xβˆ) 0_Σ−1₍_y₋_X_β_ˆ₎ (5.19) and restricted maximum likelihood estimators of τ2 _and _σ2 _{are obtained by maximizing}

l∗(τ2, σ2) = ₋1 2log(|Σ|)− 1 2log(|X 0 Σ−1X_|)₋ 1 2(y−Xβˆ) 0 Σ−1(y₋Xβˆ)

using some numerical technique. Note that the restricted log-likelihood does not depend on the special choice of A as long as n₋dim(β) linear independent error contrasts are used (Verbeke & Molenberghs 2000, Sec. 5.3).

Generalization of the restricted maximum likelihood technique to nonnormal data is not straightforward, since the definition of error contrasts is not possible for more general responses due to the nonlinear dependency ofy on β in generalized linear mixed models. Therefore, we present an alternative approach by Harville (1974) to the estimation of variances in Gaussian mixed models that leads to exactly the same restricted log-likelihood for τ2 _and _σ2 _{and that can additionally be extended to more general responses.}

Recall that in the Bayesian formulation of mixed models not only b is assumed to be a random variable but also β. While the prior distribution of b is proper, the distribution of β is flat, i. e.

p(β)_∝const.

From the Bayesian perspective it seems reasonable, to integrate both b and β out of the distribution ofy. The resulting marginal distribution ofy(as regards tobandβ) now has to be maximized with respect to the variance parameters. Harville (1974) showed that proceeding in this way leads to exactly the same likelihood as above. For this reason, REML estimation is also sometimes referred to as marginal likelihood estimation in the literature.

Replacing ML variance estimates with their REML counterparts allows for a further in- terpretation: The ML estimators are obtained by jointly maximizing the posterior with respect to the regression coefficientsβ and the variances τ2_{. Then the ML estimators cor-}

respond to the variance components of the posterior mode. In contrast, REML estimates are given by the mode of the marginal posterior for the variances. The latter strategy coincides with the usual strategy in an empirical Bayes approach, where hyperparameters are treated as fixed constants which have to be estimated from their marginal posterior. To derive REML estimates for GLMMs, we approximate the logarithm of the likelihood with the Pearson χ2_{-statistic, yielding}

l(y, β, b) _≈ n X i=1 (yi−µi)2 ωiv(µi)/φ = (y₋µ)0_S−1₍_y −µ).

This is in fact equivalent to the Laplace approximation ofl(y, β, b) with a quadratic function (compare Tierney & Kadane 1986). Using the definition of the working observations ˜

y in (5.13) gives

(y₋µ) =D(˜y₋Xβ₋Zb) and therefore we have

l(y, β, b) _≈ (˜y₋Xβ₋Zb)0D0S−1D(˜y₋Xβ₋Zb) = (˜y₋Xβ₋Zb)0W(˜y₋Xβ₋Zb).

Ignoring the dependence ofW on the variance parameters (Breslow & Clayton 1993) gives rise to the fact that the likelihood can be approximated by the log-likelihood of a linear mixed model for the working observations ˜y. To be more specific, we assume

y_|β, b_∼a N(Xβ+Zb, W−1₎_.

The determination of the marginal distribution of ˜y (as regards to b and β) yields an approximate restricted log-likelihood for the generalized linear mixed model:

l∗₍_τ2_{, φ}_{) =} −1₂log(_|Σ_|)₋1 2log(|X 0_Σ−1_X |)₋ 1 2(˜y−Xβˆ) 0_Σ−1_(˜_y −Xβˆ) (5.20)

with Σ =W−1₊_ZQZ0 _{being an approximation to the marginal covariance matrix of ˜}_y_. To finally obtain REML-estimates of the variance parameters, we have to maximize (5.20) with respect to τ2 _(and _φ _{if necessary). One suitable optimization procedure is the}

Newton-Raphson algorithm which is based on the first and second derivative of l∗₍_τ2_{, φ}₎

with respect to the variance parameters. A modification of the Newton-Raphson algorithm is given by Fisher-Scoring, where the second derivative is replaced by its expectation. Since this leads to simplified estimation equations, we will focus on Fisher-Scoring. Note that in several models the derivatives with respect to the dispersion parameterφare not needed, e. g. for Poisson or Binomial data where the dispersion parameter is fixed. In these cases the corresponding derivatives have to be eliminated from the formulae for the score-function and the expected Fisher-information presented in the following sections.

In document Kneib, Thomas (2006): Mixed model based inference in structured additive regression. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 79-82)