Generalized linear models
5.7 Assessing fits and the deviance
and μusimilarly defined with a plus sign. Here z is the appropriate point on the N(0, 1) distribution. The confidence interval is exact in the case of an identity link and normal response. In other cases it is an approximation, the accuracy of which improves with increasing sample size. The dispersion φ is replaced by an estimate, causing further approximation errors.
The estimate ˆμ is unbiased when using an identity link. For other links it is biased. To illustrate, with a log link and assuming ˆβ is approximately normal, then
The bias increases with Var(xβ). For large sample sizes, Var( ˆˆ β) is small and the bias is expected to be negligible.
The interval (μ , μu) is a confidence interval for the mean of y given x.
Any particular case is a random outcome from the response distribution with mean at μ where g(μ) = xβ. An actual outcome of y will deviate from μ. An interval constructed for an actual case or outcome is called a prediction interval. The width of a prediction interval factors in the uncertainty associated with both ˆμ and the outcome from the response distribution.
Consider predicting an actual value of y given the value x of the explana-tory variables. This prediction is taken as ˆμ and hence the same as the estimate of μ. If the response distribution is, for example, G(μ, ν) at the given value of x, then a 95% prediction interval for a single realization of y is the cen-tral 95% portion of the G(μ, ν) distribution. However, μ and ν are estimated.
Taking the prediction interval as the central portion of the G(ˆμ, ˆν) distribu-tion ignores the variability associated with the parameter estimates, and is thus optimistic. A more conservative approach is to take the lower point of the pre-diction interval as the lower 2.5% point of G(μ , ˆν), and the upper point as the upper 97.5% point of G(μu, ˆν). This ad hoc approach ignores any biases which arise similarly as for the mean.
5.7 Assessing fits and the deviance
The goodness of fit of a model to data is a natural question arising with all statistical modeling. The principles of significance testing, model selection and diagnostic testing, as discussed in Chapter 4, are the same for GLMs as for normal regression; however, the technical details of the methods differ somewhat.
One way of assessing the fit of a given model is to compare it to the model with the best possible fit. The best fit will be obtained when there are as many parameters as observations: this is called a saturated model. A saturated model will ensure there is complete flexibility in fitting θi. Since
72 Generalized linear models fitted value is equal to the observation and the saturated model fits perfectly.
The value of the saturated log-likelihood is ˇ≡
which is the maximum possible log-likelihood for y given the response dis-tribution specified by a(θ). This value is compared to ˆ , the value of the maximum of the log-likelihood based on y and the given explanatory variables.
The “deviance,” denoted as Δ, is defined as a measure of distance between the saturated and fitted models:
Δ≡ 2(ˇ − ˆ ) .
• When the model provides a good fit, then ˆ is expected to be close to (but not greater than) ˇ . A large value of the deviance indicates a badly fitting model.
• The size of Δ is assessed relative to the χ2n−pdistribution (Dobson 2002).
This is the approximate sampling distribution of the deviance, assuming the fitted model is correct and n is large. The expected value of the deviance is n− p , and typically the deviance divided by its degrees of freedom n − p is examined: a value much greater than one indicates a poorly fitting model.
• A direct calculation shows that for the exponential family
Δ = 2
• When φ is unknown and estimated, then the χ2n−p distribution for the deviance is compromised. In the case of the Poisson distribution, φ = 1 and the χ2approximation is useful. In the case of the normal distribution, when σ2is known then the χ2distribution of the deviance is exact; however, when σ2 is estimated then we cannot rely on the deviance being χ2 distributed.
Several authors, for example (McCullagh and Nelder 1989, pp. 119, 122) caution against using the deviance as an overall goodness of fit measure in general, as its approximate χ2n−pdistribution depends on assumptions which are frequently not tenable. However, the deviance is useful for testing the significance of explanatory variables in nested models: see Section 5.8.
5.7 Assessing fits and the deviance 73 Table 5.2. Deviance for exponential family response distributions Distribution Deviance Δ
Inverse Gaussian σ12
n i=1
(yi−ˆμi)2 ˆ μ2iyi
Negative binomial 2n
i=1
SAS notes. In SAS the deviance is called the scaled deviance, and also the residual deviance by the SAS manual, while φ Δ is called the deviance. This means that it is the scaled deviance that is relevant. (Both scaled and unscaled deviances are given in SAS output.)
Deviance for well-known distributions. Table 5.2 gives the expressions for the deviance for the exponential family distributions discussed previously. The derivation of the deviance expressions for the normal and Poisson distributions are illustrated and the others are left as exercises.
Normal. In this case
Thus the deviance is as given in Table 5.2 and is proportional to the residual sum of squares.
Poisson. In this case φ = 1 and
a(θ) = eθ, ˙a(θ) = eθ, θˇi= ln yi, θˆi= ln ˆμi.
74 Generalized linear models Hence each term in sum (5.11) is
yi(ln yi− ln ˆμi)− (yi− ˆμi) = yiln
yi ˆ μi
− (yi− ˆμi)
and the deviance is as given in Table 5.2.