Correlated data
9.1 Random effects
9.1 Random effects
An “effect” is a model coefficient. Random effects are coefficients which are random variables. Random effects are used to model the correlation between observed responses in a cluster. The same outcome of the random vari- able manifests itself in all observations in the same cluster. Random effect outcomes are different for different clusters. Under repeated sampling the out- comes of all the random effects change. However all responses in the same cluster receive the same outcome.
The β coefficients of the previous chapters are fixed effects. They remain constant under hypothetical repeated sampling. For example, the effect of driver’s age on car insurance claims is typically regarded as a fixed effect. The effect is assumed to be the same if, hypothetically, a new set of claims data is collected.
To illustrate random effects, consider a study of household insurance claims over a five-year period. Claims are observed on each policy in each of five years. Other demographics such as the geographical area or suburb are also recorded. The policies are clusters since claims on a given policy, in succes- sive years, are correlated. Now consider a regression model with area and policy as explanatory variables. Area is regarded as a fixed effect in that under hypothetical repeated sampling, area effects are expected to be the same: inter- est centers on the effect of the given suburbs. The policy effect, however, is random in that under hypothetical repeated sampling, different policies are enacted and there is no interest in any particular policy per se. Both the fixed and random effects influence the outcome.
Random intercept model. The simplest random effects model is the “random
intercept” model
g(μ) = α + xβ , α∼ N(0, ν2) . (9.1)
Here α is the random intercept term. The random variable α has the same outcome for all cases in a cluster, but different outcomes across clusters. The
β effect is constant both within and across clusters. As before, g(μ) models
the mean of the exponential family response.
In (9.1) the explanatory variables x contain the constant term 1, and β contains the corresponding intercept parameter β0. Hence the intercept for
a cluster is α + β0. The effect of the cluster-specific draw α is to induce cor-
relation between all responses within a cluster, while there is no correlation between clusters. Given α, observations in a cluster are independent.
In terms of the cluster notation, model (9.1) is written as
132 Correlated data
where c denotes cluster and i a case within the cluster. Thus case i in cluster
c receives the random intercept applicable to cluster c. To this cluster-specific
random effect is added the case-specific effect xicβ, where β is common to all
clusters.
Credibility theory or experience rating. Random intercept models are
closely related to the credibility models of insurance. Suppose individual c enacts an insurance policy over each of the years i = 1, 2, . . .. In terms of (9.2), αc is the individual or policy effect. In the first year i = 1 there is no
claims experience for the individual c and hence the price of insurance is based on rating variables contained in x such as age, sex, suburb of residence and so on. In year i = 2 the claim for the first year is known – there is information on
αc. As years pass, experience increases and information about αc increases.
Increasing weight is placed on the claims history of the policy as opposed to the general rating variables. The optimal relative weights are called the cred- ibility weights and credibility formulas are sequential updating schemes for premiums as more information about a particular risk emerges.
In the GLM context, the experience is typically regarded as given and there is no special significance accorded to sequential updating. Of course fitting a random intercept model on say three years’ data will lead to different estimates from the same analysis using four years’ data. Estimated premiums ˆμicderived
from a fit of g(μic) = αc+ xicβ implicitly factor in the relative precisions of
the estimates of αc and β, that is the precision associated with the individual
effect and the “group” effect.
General random effects model. Coefficients of explanatory variables may
be regarded as random. If a coefficient is a random effect, then it varies from cluster to cluster, again inducing within-cluster homogeneity or correlation. To facilitate the discussion write z as those explanatory variables with random coefficients and γ as the corresponding parameter vector. The vectors z and γ have the same dimension. To ensure γ has zero mean, write the model as
g(μ) = xβ + zγ , γ∼ N(0, G) , (9.3)
where z is repeated in x and G is the covariance matrix of γ. To illustrate, suppose x has a single explanatory variable: x = (1, x1). If the coefficient of
x1is a random effect with mean β1, then
g(μ) = β0+ β1x1+ γx1, γ∼ N(0, ν2) ,
where β1+ γ is the random effect. In this case x1is retained in x and z can
be thought of as a copy of the subset of x which has random coefficients. The random intercept model is the special case where z = 1.
9.1 Random effects 133
Maximum likelihood estimation. The treatment and estimation of the ran-
dom coefficients model depends on the distribution of the response y, the link
g and the distribution of random effects. The simplest and classical situation is
a normal response, identity link and normal random effects. This is an example of a fully specified model. Fully specified models – those that spell out both distributions – do lead to estimation intricacies.
For simplicity consider the random intercepts model (9.2). The extension to the more general random effects model (9.3) is straightforward. The condi- tional density for observation i in cluster c, given αc, is f (yic|αc). Given αc,
observations in cluster c are independent and the joint density of all observa- tions in cluster c is!if (yic|αc). Integrating this density with respect to the
density f (αc) of αcyields the joint distribution of cluster c responses
f (αc)
i
f (yic|αc) dαc.
Usually f (αc) is the N(0, ν2) distribution. Clusters are independent and hence
the overall joint density is the product over c of the cluster densities displayed above.
The joint density depends on the unknown parameters, β, φ and ν2. The joint density, regarded as a function of the parameters and conditional on the given observations, is the likelihood. Generally this likelihood cannot be expressed in closed form and MLEs of the unknown parameters are computed iteratively, using numerical integration.
Estimation of cluster-specific effects. Estimates of β, φ and ν2can be used
to compute estimates of the αcoutcomes. Note that these are not model param-
eters, but random outcomes from the random effects distribution, so usual maximum likelihood estimation is not used. The distribution of y given αc
is f (y|αc). Using Bayes’ theorem from elementary probability,
f (αc|y) =
f (y|αc)f (αc)
f (y) . (9.4)
The above is called the “post hoc” distribution of αc. Using the parameter esti-
mates ˆβ, ˆφ and ˆν2in the right hand side allows one to evaluate the expression.
The estimate of αcis taken as that value which maximizes (9.4), i.e. the mode
of f (αc|y). The resultant ˆαc, c = 1, . . . , m , are called the “empirical Bayes
estimates.” Computation of the estimates is again achieved using numerical methods.
In the case where the response is say a claim size, the ˆαc are the “experi-
ence” portions of the premium. As more data on a given risk is collected the experience portion of the premium will become more certain.
134 Correlated data
Normal response distribution: mixed models. For this model g(μ) = μ and
the response distribution is normal. This implies
y = xβ + zγ + , γ∼ N(0, G) , ∼ N(0, σ2) , (9.5)
where the γ and are independent. Here the draws of are different for each case, while draws of γ are different across clusters. Model (9.5) is called the “mixed model.” The likelihood of (9.5) follows from the multivariate normal distribution, and MLEs are readily computed. However the normal response distribution is typically inappropriate in insurance applications.
Generalized linear mixed models (GLMMs). This model assumes the
response y arises from a exponential family distribution with mean μ, where (9.3) applies. This fully specifies a distribution, albeit complicated. The like- lihood can be maximized with respect to β, φ and G, as discussed above in relation to the random intercept model. This involves burdensome numerical integration.
SAS notes.
• Mixed models (9.5) are implemented in proc mixed. The explana-
tory variables associated with random effects are specified in the random statement, together with cluster information.
• The GLMM is estimated using either proc nlmixed or proc
glimmix. The latter procedure is, at time of writing, still experimen- tal. The glimmix syntax is similar to that of proc mixed and proc genmod. However, in its current state it is limited in terms of memory and is not able to perform the analysis below, which is performed using proc nlmixed. Model specification in nlmixed is intricate compared with genmod, mixed and glimmix. For all three of these procedures, corre- lated data is entered in the same structure as in the mathematical model, i.e. a line or case corresponds to one observation within a cluster. A cluster iden- tification variable is required. This is typically policy number or customer number.
Vehicle insurance claims. For the simulated three-year data set introduced on
page 130, suppose the occurrence of a claim to be the response of interest. The random intercept model is:
y ∼ B(1, π) , ln π
1− π = α + x
β , α∼ N(0, ν2) .
Parameter estimates are given in Table 9.2 – see code and output on page 183. The estimate of the variance of the random intercept is ˆν2 = 3.82 (p-value
9.1 Random effects 135
Table 9.2. Logistic regression GLMM for vehicle insurance claims
Response v ariab le Occurrence of a claim
Response d istrib ution Bernoulli
Correlation stru cture random intercept
Link logit P arameter βˆ se eβˆ t p-v alue Intercept −2.654 0.039 0.070 −68.79 <0.0001 Driver’s age 1 0.274 0.059 1.315 4.65 <0.0001 2 0.008 0.047 1.009 0.18 0.8560 3 0.000 . 1.000 . . 4 −0.053 0.045 0.948 −1.19 0.2331 5 −0.275 0.051 0.760 −5.41 <0.0001 6 −0.225 0.059 0.799 −3.78 0.0002 Vehicle value ($000’s) <25 0.000 . 1.000 . . 25–50 0.236 0.040 1.266 5.88 <0.0001 50–75 0.087 0.114 1.091 0.77 0.4438 75–100 −0.886 0.453 0.413 −1.95 0.0507 100–125 −0.613 0.696 0.542 −0.88 0.3784 >125 −1.314 0.775 0.269 −1.69 0.0902 Time period 1 −0.302 0.025 0.740 −12.25 <0.0001 2 −0.172 0.024 0.842 −7.08 <0.0001 3 0.000 . 1.000 . .
Variance of random intercept ˆ
ν2 3.818 0.082 46.52 <0.0001
is significant. A ˆν2not significantly different from zero suggests the random
effect has zero variance, implying no within-cluster correlation.
In practice some policies come into force after the start, and others terminate before the end of the three-year observation period. This is catered for by incorporating exposure into the model.
Other random effects distributions. The assumption of normal random
effects in mixed models and GLMMs is adequate for most applications. How- ever there are some situations where the assumption is inappropriate. Lee and Nelder (1996) and Lee and Nelder (2001) have developed hierarchical general- ized linear models (HGLMs), which are GLMMs with random effects having non–normal distributions. HGLMs are implemented in Genstat. They are not covered in this text.
136 Correlated data