6.4 MODELLING POPULATION PHENOLOGY USING STAGE FREQUENCY DATA
6.4.2 Estimation methods
Fitting the CR models as generalised linear models
Model (6.6) can be fitted as a generalised linear model (GLM) (Candy, 1991) in a number of widely available statistical packages using a binomial error distribution for
ij
n with binomial denominator Nij*, linear predictor η and the required link function
(e.g. logit). Observations where the binomial denominator Nij* is zero should be given a prior weight of zero in the fit.
Accounting for overdispersion using the methods of Williams (1982)
The standard errors of parameter estimates obtained in the fit of equation (6.6) using the binomial likelihood need to be scaled upwards if overdispersion is apparent (Finney, 1971, p.33; Williams, 1982; McCullagh and Nelder, 1989, p.126). Williams (1982) gives three methods of accounting for overdispersion. Williams method I
(Williams 1982) assumes that the Nij* are the same within a stage and the variance of the observed counts n conditional on ij
*
ij
N is given by
(
nij Nij ti)
Nij ij(
ij)
Var *, =φ *µ 1−µ
where φis the overdispersion parameter or heterogeneity factor (Finney, 1971, p.33). William's method II assumes that the µij have themselves a sampling distribution with expected value γij and variance φγij(1−γij). This gives
(
nij Nij*,ti)
=Nij*γij(
1−γij){
1+φ(
Nij*−1)}
Var .
Williams method III, described in detail in Chapter 2 for the case of the complementary log-log link, assumes that the linear predictors have constant variance, σ2, so that for the logit link function
(
nij Nij ti)
Nij ij(
ij){
(
Nij) (
ij ij)}
Var *, ≅ *µ 1−µ 1+σ2 * −1µ 1−µ .
The modification to GLM fitting algorithm to fit method III was described in Section 2.3. Method II involves a similar modification (Williams, 1982).
Incorporating sites as random effects
Williams methods of modelling overdispersion are appropriate when there is a simple, single-level error structure. However, if sampling is multi-level or nested then this should be reflected in a multi-level error structure (e.g. Section 2.2). In modelling stage-frequency data the stages are a common, fixed set of population attributes of intrinsic interest whereas sites and seasons can be considered sample units that are nested within stages (i.e. random effects). The inclusion of SITE as a ‘fixed effect’ in the above models is useful in indicating if the physiological time scales can account for the difference between sites in C. bimaculata phenology. The inference is that this difference is attributable to the difference in altitude between the sites and the effect of altitude on temperature via the lapse rate (Munn, 1966, p.43). However, in applying model (6.6) (Section 7.9) parameter estimates cannot be site-
specific since this would require collection of stage-frequency data for each site for which predictions are required over a number of seasons in order to obtain such estimates. Such data is not routinely available but is restricted to research programs such as that described here. Therefore, sites and seasons need to be considered as random effects within the coincident regression model (i.e. the τj become random intercepts and the κj random slopes).
A relatively straightforward way to do this is to fit a Generalised Linear Mixed Model (GLMM) such as that described in Section 2.2 for Poisson and negative binomial conditional error structures. Here the error structure for each CR model, conditional on the random site effects, is assumed binomial. Fitting the CR models as a binomial GLMM was carried out using GENSTAT’s GLMM Procedure (Payne et
al., 1997) using the marginal-form of the model given by Breslow and Clayton
(1993). The coincident GLM regression model is therefore of the same form as the GLMM but parameter estimation for the GLMM takes into account the random site (and season) effects in the error structure.
More realistic models could be considered such as the hierarchical GLMs (HGLMs) of Lee and Nelder (1996). In HGLMs the random effects are not restricted to be Gausssian, as in the GLMMs, and can for example incorporate a beta (error)
distribution for the random intercepts. However, with data from only two sites and a single season available in this study the ability to discriminate between competing random effects distributions using data analytic techniques, such as residual plots, is severely limited. Since HGLMs are subject-specific models (Section 2.3) obtaining the marginal means given the predictor variables is more difficult so the extra effort required was not considered justified.
Assessing goodness of fit of the GLM
The goodness of fit of the model, combining all stages can be compared assessed using the deviance statistic (McCullagh and Nelder, 1989) calculated for a multinomial distribution as −2ln
(
LC /L0)
where L is the likelihood with 0 qˆ ijcalculated using (6.7), and L is the likelihood calculated for the saturated model (i.e. 0
i ij
ij n N
= C D
∑ ∑
= = ij i ij m i s j ij N q n n ˆ ln 2 1 1where qˆ is given by (6.7) and m is the total number of sampling times. The deviance ij
is approximately distributed as a chi square with {ms - m - 2(s-1)} degrees of
freedom. An analagous statistic to the coefficient of determination for multiple linear regression was defined by Candy (1991) as the percentage of deviance explained by (6.7) which is given by N C N C D D D P =100 −
where D , the null deviance, is calculated in the same way as N D except that C qˆ is ij
calculated as
∑
∑
i i
in /ij N , the common-across-times proportion in stage j.