Generalized Linear (Mixed) Models - ASREML user guide release 3.0

Table 6.3 Link qualifiers and functions

Qualifier Link Inverse Link Available with

!IDENTITY η=µ µ=η All !SQRT η=√µ µ=η2 Poisson !LOGARITHM η= ln(µ) µ= exp(η) Normal, Poisson, Negative Binomial, Gamma

!INVERSE η= 1/µ µ= 1/η Normal, Gamma,

Negative Binomial

!LOGIT η=µ/(1−µ) µ= 1

(1+exp(−η))

Binomial, Multi- nomial Threshold

!PROBIT η= Φ−1(µ) µ= Φ(η) Binomial, Multi-

nomial Threshold

!COMPLOGLOG η= ln(−ln(1−µ)) µ= 1−e−eη Binomial, Multi-

nomial Threshold whereµis the mean on the data scale andη=Xτ is the linear predictor on the underlying scale.

ASReml includes facilities for fitting the family of Generalized Linear Models (GLMs, McCullagh and Nelder, 1994). A GLM is defined by a mean variance function and a link function. In this context

y is the observation,

6 Command file: Specifying the terms in the mixed model 109

φ is a parameter set with the!PHI qualifier,

µ is the mean on the data scale calculated using the inverse link function from the predicted value η on the underlying scale where η=Xτ,

v is the variance under some distributional assumption calculated as a function of µand n, and

dis the deviance (-twice the log likelihood) for that distribution.

GLMs are specified by qualifiers after the name of the dependent variable but before the ∼ character. Table 6.3 lists the link function qualifiers which relate the linear predictor (η) scale to the observation (µ=E[y]) scale. Table 6.4 lists the distribution and other qualifiers.

Table 6.4: GLM distribution qualifiers

The default link is listed first followed by permitted alternatives.

qualifiers action

!NORMAL [!IDENTITY | !LOGARITHM | !INVERSE]

allows the model to be fitted on the log/inverse scale but with the

residuals on the natural scale. !NORMAL !IDENTITYis the default.

!BINOMIAL

v=µ(1−µ)/n d= 2n(yln(y/µ)

+(1−y)ln(1−y

1−µ))

[!LOGIT | !IDENTITY | !PROBIT | !COMPLOGLOG] [!TOTAL n]

Proportions or counts [r=ny] are indicated if!TOTALspecifies the

variate containing the binomial totals. Proportions are assumed if no response value exceeds 1. A binary variate [0, 1] is indicated if

!TOTALis unspecified. The expression fordon the left applies when

yis proportions (or binary). The logit is the default link function.

The variance on the underlying scale is π2_/₃ _∼ ₃_._{3 (underlying}

logistic distribution) for the logit link.

!MULTINOMIAL k vij=µi(1−µj)/n fori≤j≤t d= 2nΣk i=1 (yiln(yi/pi)+ whereYi= Σij=1yj µi=E(Yi) andpi=µi−µi−1 ASReml3

!CUMULATIVE[!LOGIT | !PROBIT | !COMPLOGLOG] [!TOTAL n]

fits a multiple threshold model with t = k−1 thresholds to

polytomous ordinal data withkcategories assuming a multinomial

distribution.

Typically, the response variable is a single variable containing the

ordinal score (1 :k) or a set ofk variables containing counts (ri)

in thekcategories. The response may also be a series oftbinary

variables or a series of t variables containing counts. Ift counts

are supplied, the total (including thekth category) must be given

6 Command file: Specifying the terms in the mixed model 110

Table 6.4: GLM distribution qualifiers

qualifier action

The multinomial threshold model is fitted as a cumulative prob-

ability model. The proportions (yi =ri/n) in the ordered cate-

gories are summed to form the cumulative proportions (Yi) which

are modelled with logit (!LOGIT), probit (!PROBIT) or Complemen-

tary LogLog (!CLOG) link functions. The implicit residual variance

on the underlying scale isπ2_/₃_∼₃_._{3 (underlying logistic distri-}

bution) for the logit link, 1 for the probit link. The distribution underlying the Complementary LogLog link is the Gumbel distribution with implicit residual variance on the underlying svale of

π2_/₆_∼₁_.₆₅ For example

Lodging !MULTINOMIAL 4 !CUMULATIVE ∼ Trait Variety !r block predict Variety

where Lodging is a factor with 4 ordered categories. Predicted values are reported for the cumulative proportions.

!POISSON

v=µ

d= 2(yln(y/µ)

−(y−µ))

[!LOGARITHM | !IDENTITY | !SQRT]

Natural logarithms are the default link function.

ASRemlassumes the Poisson variable is not negative.

!GAMMA

v=µ2_/₍_φn₎

d= 2n(−φln(φy_µ)

+φy_µ−µ)

[!INVERSE | !IDENTITY | !LOGARITHM] [!PHI φ] [!TOTAL n]

The inverse is the default link function. n is defined with the

!TOTAL qualifier and would be degrees of freedom in the typical

application to mean-squares. The default value ofφis 1.

!NEGBIN v=µ+µ2_/φ d= 2((φ+y)ln(µ+φ y+φ) +yln(y µ))

[!LOGARITHM | !IDENTITY | !INVERSE ] [!PHI φ]

fits the Negative Binomial distribution. Natural logarithms are

the default link function. The default value ofφis 1.

General qualifiers

!AOD

ASReml2 Caution

requests an Analysis of Deviance table be generated. This is

formed by fitting a series of sub models for terms in the DENSE part building up to the full model, and comparing the deviances. An example if its use is

LS !BIN !TOT COUNT !AOD ∼ mu SEX GROUP

!AODmay not be used in association withPREDICT.

!DISP [h] includes an overdispersion scaling parameter (h) in the weights. If !DISP is specified with no argument, ASReml estimates it as the residual variance of the working variable. Traditionally it

is estimated from the deviance residuals, reported byASReml as

Variance heterogeneity. An example if its use is

6 Command file: Specifying the terms in the mixed model 111

Table 6.4: GLM distribution qualifiers

qualifier action

!OFFSET [o] is used especially with binomial data to include an offset in the

model where ois the number or name of a variable in the data.

The offset is only included in binomial and Poisson models (for Normal models just subtract the offset variable from the response variable), for example

count !POIS !OFFSET base !DISP ∼ mu group

The offset is included in the model asη=Xτ+o. The offset will

often be something like ln(n).

!TOTAL [n] is used especially with binomial and ordinal data where n is the field containing the total counts for each sample. If omitted, count is taken as 1.

Residual qualifiers control the form of the residuals returned in the .yht file. The

predicted values returned in the .yht file will be on the linear

predictor scale if the!WORKor!PVWqualifiers are used. They will

be on the observation scale if the!DEVIANCE,!PEARSON,!RESPONSE

or!PVRqualifiers are used.

!DEVIANCE produces deviance residuals, the signed square root of d/hfrom

Table 6.4 where his the dispersion parameter controlled by the

!DISPqualifier. This is the default.

!PEARSON writes Pearson residuals, y_√−µ

v, in the.yhtfile

!PVR writes fitted values on the response scale in the.yhtfile. This is

the default.

!PVW writes fitted values on the linear predictor scale in the.yhtfile.

!RESPONSE produces simple residuals,y−µ

!WORK produces residuals on the linear predictor scale, y−µ dµ/dη

A second dependent variable may be specified (except with a multinomial re- Revised 08

sponse (!MULTINOMIAL)) if a bivariate analysis is required but it will always be treated as a normal variate (no syntax is provided for specifying GLM attributes for it). The !ASUVqualifier is required in this situation for the GLM weights to be utilized.

6 Command file: Specifying the terms in the mixed model 112

Generalized Linear Mixed Models

This section was written by Damian Collins

A Generalized Linear Mixed Model (GLMM) is an extension of a GLM to include random terms in the linear predictor. Inference concerning GLMMs is impeded by the lack of a closed form expression for the likelihood. ASRemlcur- rently uses an approximate likelihood technique called penalized quasi-likelihood, or PQL (Breslow and Clayton, 1993), which is based on a first order Taylor series approximation. This technique is also known as Schalls technique (Schall, 1991), pseudo-likelihood (Wolfinger and OConnell, 1993) and joint maximisa- tion (Harville and Mee, 1984, Gilmour et al., 1985). Implementations of PQL are found in many statistical packages, for instance, in the GLMM (Welham, 2005) and the IRREML procedures of Genstat (Keen, 1994), the MLwiN pack- age (Goldstein et al., 1998), the GLMMIX macro in SAS (Wolfinger, 1994), and in the GLMMPQL function in R.

The PQL technique is well-known to suffer from estimation biases for some types of GLMMs. For grouped binary data with small group sizes, estimation biases can be over 50% (e.g. Breslow and Lin, 1995, Goldstein and Rasbash, 1996, Rodriguez and Goldman, 2001, Waddington et al., 1994). For other GLMMs, PQL has been reported to perform adequately (e.g. Breslow, 2003). McCulloch and Searle (2001) also discuss the use of PQL for GLMMs.

The performance of PQL in other respects, such as for hypothesis testing, has received much less attention, and most studies into PQL have examined only relatively simple GLMMs. Anecdotal evidence suggests that this technique may give misleading results in certain situations. Therefore we cannot recommend the use of this technique for general use, and it is included in the current version of

ASReml for advanced users. If this technique is used, we recommend the use of cross-validatory assessment, such as applying PQL to simulated data from the same design (Millar and Willis, 1999).

The standard GLM Analysis of Deviance (!AOD) should not be used when there are random terms in the model as the variance components are reestimated for Caution

each submodel.

In document ASREML user guide release 3.0 (Page 135-139)