Random Effects Models for Longitudinal Survey Data

(1)

CHAPTER 14

Random Effects Models for Longitudinal Survey Data

C. J. Skinner and D. J. Holmes

14.1. INTRODUCTION introduction

Random effects models have a number of important uses in the analysis of longitudinal survey data. The main use, which we shall focus on in this chapter, is in the study of individual-level dynamics. Random effects models enable variation in individual responses to be decomposed into variation between the

`permanent' characteristics of individuals and temporal `transitory' variation within individuals.

Another important use of random effects models in the analysis of longitudinal data is in allowing for the effects of time-constant unobserved covariates in regression models (e.g. Solon, 1989; Hsiao, 1986; Baltagi, 2001). Failure to allow for these unobserved covariates in regression analysis of cross-sectional survey data may lead to inconsistent estimation of regression coefficients.

Consistent estimation may, however, be achievable with the use of random effects models and longitudinal data.

A `typical' random effects model may be conceived of as follows. It is supposed that a response variable Y is measured at each of a number of successive waves of the survey. The measurement for individual i at wave t is denoted y_it and this value is assumed to be generated in two stages. First,

`permanent' random effects y_i are generated from some distribution for each individual i. Then, at each wave, y_it is generated from y_i. In the simplest case this generation follows the same process independently at each wave. For example, we may have

y_i N y, sÿ ²₁

, y_itj y_i N yÿ _i, s²₂

: (14:1)

Under this model, longitudinal data enable the `cross-sectional' variance s²₁ s²₂ of y_it to be decomposed into the variance s²₁ of the `permanent' component y_iand the variance s²₂ of the `transitory' component at each wave.

Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright¶2003 John Wiley & Sons, Ltd.

ISBN: 0-471-89987-9

(2)

This may aid understanding of the mobility of individuals over time in terms of their place in the distribution of the response variable. An example, which we shall focus on, is where the response variable is earnings, subject to a log transformation, and a model of the form (14.1) enables us to study the degree of mobility of an individual's place in the earnings distribution (e.g. Lillard and Willis, 1978).

Rich classes of random effects models for longitudinal data have been developed for the above purposes. A number of different terms have been used to describe these models including variance component models, error component models, mixed effects models, multilevel models and hierarchical models (Baltagi, 2001; Hsiao, 1986; Diggle, Liang and Zeger, 1994; Goldstein, 1995).

The general aim of this chapter is to consider how to take account of complex sampling designs in the fitting of such random effects models. We shall suppose that there is a known probability sampling scheme employed to select the sample of individuals followed over the waves of the survey. Two additional complications will be that there may be wave nonresponse, so that not all sampled individuals will respond at each wave, and that the target population of individuals may not be fixed over time.

To provide a specific focus, we will consider data on earnings of male employees over the first five waves of the British Household Panel Survey (BHPS), that is over the period 1991±5. As a basic model for the log earnings y_it of individual i at wave t 1, . . . , T we shall suppose that

y_it b_t u_i n_it, t 1, . . . , T (14:2) where the random effect u_iis the `permanent' random effect, referred to earlier, and the n_itare transitory random effects, whose effects on the response variable may last beyond the current wave t via the first-order autoregressive (AR(1)) model:

n_it rn_itÿ1 "_it, t 1, . . . , T: (14:3) Both u_i and n_it may include the effects of measurement errors (Abowd and Card, 1989). The random variables u_i and "_it are assumed to be mutually independent with

E(u_i) E("_it) 0, var(u_i) s²_u, var("_it) s²_":

The unknown fixed parameters b_t(t 1, . . . , T ) represent annual (inflation) effects. Lillard and Willis (1978) considered this model (amongst others) for log-earnings for seven years (1967±73) of data from the US Panel Study of Income Dynamics. Letting s²_n var(n_it) and assuming the "_it and n_itare mutually independent and stationary, we obtain

s²_n s²_"= 1 ÿ rÿ ²

: (14:4)

We refer to the above model as Model B and to the more restricted `variance components' model in which r 0 as Model A. See Goldstein, Healy and Rasbash (1994) for further discussion of such models.

(3)

We shall consider two broad approaches to fitting these models under a complex sample design. The first is a covariance structure approach, following Chamberlain (1982) and Skinner, Holt and Smith (1989, section 3.4.5, hence- forth referred to SHS), in which the observations on the T waves are treated as a multivariate outcome with individuals as `single-level' units. This approach is set out in Section 14.2. The second approach treats the data as two-level (Goldstein, 1995) with the level 1 units as the waves t 1, . . . , T and the level 2 units as the individuals i. The aim is to apply the methods developed by Pfeffermann et al.

(1998). This approach is set out in Section 14.3. A related approach is developed by Feder, Nathan and Pfeffermann (2000) for a model with time-varying random effects. The application of both our approaches to earnings data from the British Household Panel Survey will be considered in Section 14.4.

14.2. A COVARIANCE STRUCTURE APPROACH a covariance structure approach

Following the notation in Section 14.1, let y_i ( y_i1, . . . , y_iT)⁰ be the T 1 vector representing the profile of values of individual i over the T waves of the survey. Under the model defined by (14.2)±(14.4), these multivariate out- comes are independent with mean vector and covariance matrix given respectively by

E( y_i) b ( b₁, . . . , b_T)⁰, (14:5) var( y_i) s²_uJ_T s²_nV_T( r), (14:6) where J_T is the T T matrix of ones and V_T( r) is the T T matrix with the (tt⁰)th element given by r^(t⁰^ÿt)(1 t t⁰ T).

These equations define a `covariance structure' model in which the mean vector is unconstrained but the k T(T 1)=2 distinct elements of the covariance matrix are constrained to be functions of the parameter vector y (s²_u, s²_", r)⁰. Inference about these parameters may follow the approach outlined in SHS (section 3.4.5).

Assuming first no nonresponse, let the data consist of the values y_ifor units i in a sample s. The usual survey estimator of the finite population covariance matrix S is given by

^S X

s

w_i( y_iÿ y)( y_iÿ y)⁰=X

s

w_i, (14:7)

where

y X

s

w_iy_i=X

s

w_i,

and where w_iis the survey weight for individual i. Let ^A vech( ^S ) denote the k 1 vector of distinct elements of ^S (the `vector half' of ^S: see Fuller, 1987, p. 382) and let A(y) vech[var( y_i)] denote the corresponding vector of elements of var( y_i) from (14.6).

A COVARIANCE STRUCTURE APPROACH 207

(4)

Following Chamberlain (1982), Fuller (1984) and SHS (section 3.4.5), a general class of estimators of y is obtained by minimising

A ÿ A(y)^

h i₀

V^ÿ1hA ÿ A(y)^ i

(14:8) where V is a given k k non-singular matrix. A generalised least squares (GLS) estimator ^y_GLS is obtained by taking V to be a consistent estimator of the covariance matrix of ^A. One choice, V_c, is obtained from the linearisation method (Wolter, 1985) by approximating the covariance matrix of the elements of ^S by the covariance matrix of the corresponding elements of the linear statistic

X

s

z_i, (14:9)

where z_i w_i[( y_iÿ y)( y_iÿ y)⁰ÿ ^S ]=P

sw_i is treated as a fixed variable. The estimator V_c may allow in the usual way for the complex design (Wolter, 1985).

Since A(y) is a non-linear function of y, iterative minimisation of (14.8) is required. It may be noted that, for a given value of r under Model B, A(y) is linear in (s²_u, s²_n) and so closed form expressions may be determined for the values ^s²_u( r) and ^s²_n( r), which minimise (14.8) for given r. The iterative minimisation may thus be reduced to a scalar problem. A consistent estimator of the covariance matrix of ^y_GLSis given by (Fuller, 1984)

V_Lÿ^y_GLS

_A ^yÿ _GLS0

V_c^ÿ1A ^y_ÿ _GLS

h i_ÿ1

, (14:10)

where _A(y) ]A(y)=]y.

An advantage of the GLS approach is that it provides a ready-made goodness-of-fit test as the minimised value of the criterion in (14.8), namely the Wald statistic:

X_W² ^hA ÿ A ^yÿ _GLSi₀

V_c^ÿ1hA ÿ A ^y^ ÿ _GLSi

: (14:11)

If the model is correct and if the sample is large enough for V_cto be a good approximation to the covariance matrix of ^A, then X_W² should be distributed approximately as chi-squared with k ÿ q degrees of freedom, where q 2 and 3 for Models A and B respectively.

One potential problem with the GLS estimator is that the covariance matrix estimator may be unstable if it is based on a relatively small number of degrees of freedom. This may lead to departures from the null distribution of X_W² assumed above. In this case, it may be preferable to consider alternative choices of V. One approach is to let V be an estimator of the covariance matrix of A based upon the (false) assumption that observations are independent and^ identically distributed. Thus, if we write

A ^ X

s

a_i, (14:12)

(5)

where a_i vech(z_i) denotes the k 1 vector of distinct elements of z_i, then we may set V equal to

V_iid nX

s

(a_iÿ a)(a_iÿ a)⁰=(n ÿ 1), (14:13)

where a P

sa_i=n and n denotes the sample size. Although V_iid may be more stable than a variance estimator which allows for the complex design, this choice of V is still correlated with ^A and, as discussed by Altonji and Segal (1996), may lead to serious bias in the estimation of y. To avoid this problem, an even simpler approach is to set V equal to the identity matrix, when the estimator of y obtained by minimising (14.8) may be viewed as an ordinary least squares (OLS) estimator. In both the cases when V V_iid and when V is the identity matrix, the resulting estimator ^y will still be consistent for y but the Wald statistic X_W² will no longer follow a chi-squared distribution if the model is true. The large-sample distribution will instead be a mixture of chi-squared distributions and this may be approximated by a chi-squared distribution using one or two moment Rao±Scott approximations (SHS, Ch. 4). It is also no longer appropriate to use expression (14.10) to obtain standard errors for the elements of ^y. Instead, as noted in SHS (Ch. 3), a consistent estimator of the covariance matrix of ^y is

V ^y; V Vÿ ₀

[ _A(^y)⁰V₀^ÿ1A(^y)]_ ^ÿ1[ _A(^y)⁰V₀^ÿ1V_cV₀^ÿ1A(^y)][ __ A(^y)⁰V₀^ÿ1A(^y)]_ ^ÿ1, where V₀ is the specified choice of V (V_iid or the identity matrix) used to determine ^y and V_c is a consistent estimator of the covariance matrix of A under the complex design. Note that this expression reduces to (14.10)^ when V₀ V_c.

The approach considered so far in this section is based on the estimated covariance matrix ^S in (14.7) and assumes no nonresponse. This is an unrealis- tic assumption. The simplest way of handling nonresponse is to consider only those individuals who respond on all T waves, the so-called `attrition sample', s_T, at wave T. For longitudinal surveys, designed for multipurpose longitudinal analyses, it is common to construct longitudinal weights w_it at each wave t, which are appropriate for longitudinal analysis based upon data for the attrition sample s_t of individuals who respond up to wave t (Lepkowski, 1989).

Thus, the simplest approach is to use only data from attrition sample s_Tand to replace the weights w_i, e.g. in (14.7), by the weights w_iT.

A more sophisticated approach, aimed at producing more efficient estimates, uses data from all attrition samples s₁, . . . , s_T. A recursive approach to the estimation of the covariance matrix of y_i may then be developed. Let y^(t)_i ( y_i1, . . . , y_it)⁰ and let ^S^(t) denote the estimated t t covariance matrix of y^(t)_i . Begin the recursion by setting

^S⁽¹⁾X

s1

w_i1( y_i1ÿ y₁)²=X

s1

w_i1,

A COVARIANCE STRUCTURE APPROACH 209

(6)

where

y1X

s1

w_i1y_i1=X

s1

w_i1

as in (14.7). At the tth step of the recursion (t 2, . . . , T ) set the (t ÿ 1) (t ÿ 1) submatrix of ^S^(t) corresponding to y^(tÿ1)_i equal to ^S^(tÿ1). Let b^(t) be the vector of weighted regression coefficients of y_it on y^(tÿ1)_i given by

b^(t) X

st

w_it( y^(tÿ1)_i ÿ y^(tÿ1))( y^(tÿ1)_i ÿ y^(tÿ1))⁰

" #_ÿ1

X

st

w_it( y^(tÿ1)_i ÿ y^(tÿ1))y_it

where

y^(tÿ1)X

st

w_ity^(tÿ1)_i =X

st

w_it:

Then set the (tt)th element of ^S^(t), corresponding to the variance of y_it, equal to

^s²_et b^(t)⁰^S^(tÿ1)b^(t), where

^s²_etX

st

w_it(e_itÿ e_t)²=X

st

w_it, e_it y_itÿ y^(tÿ1)_i ⁰b^(t), e_tX

st

w_ite_it=X

st

w_it: Finally, let ^S_t^(t),tÿ1 denote the 1 (t ÿ 1) vector of remaining elements of ^S^(t) corresponding to the covariances between y_itand y^(tÿ1)_i and let

^S_t^(t),tÿ1 b^(t)⁰S^^(tÿ1):

The recursive process is repeated for t 2, . . . , T. If y_i is multivariate normal and there are no weights the resulting ^S^(t) is a maximum likelihood estimator (Holt, Smith and Winter, 1980) for data from the set of attrition samples. In general, the estimator may be viewed as a form of pseudo-likelihood estimator (see Chapter 2). If the weights do not vary greatly, if y_i is approximately multivariate normal and the observations for most individuals fall into one of the attrition samples, the estimator ^S^(t) may be expected to be fairly efficient.

Weighting can become unwieldy if it is attempted to adjust for all possible wave nonresponse patterns in addition to the attrition samples. See, for example, Lepkowski (1989) for further discussion. For a more general discussion of inference in the presence of nonresponse see Chapter 18. We return in Section 14.4 to the application of the methods discussed in this section.

14.3. A MULTILEVEL MODELLING APPROACH a multilevel modelling approach

A second approach to handling complex survey designs in the fitting of the models defined in Section 14.1 is by adapting standard approaches, such as iterative generalised least squares (IGLS), used for fitting random effects models (Goldstein, 1995). Pfeffermann et al. (1998) have considered modifying

(7)

IGLS estimation using an approach analogous to the pseudo-likelihood method (see Chapter 2) for a model of the form (14.2), where the v_it are not serially correlated. Here we consider the extension of their approach to a longitudinal context, allowing for serial correlation. A potential advantage of this approach is that covariates may be handled more directly in the model. A potential disadvantage is that goodness-of-fit tests are not generated so directly.

In multilevel modelling terminology (Goldstein, 1995), the individuals are the level 2 units and the repeated measurements at the different waves represent level 1 units. Pfeffermann et al. (1998) allow for a two-stage sampling scheme, whereby the level 2 units i are selected with inclusion probabilities p_i and the level 1 units t with inclusion probabilities p_tjiconditional on level 2 unit i being selected. Weights w_i and w_tji are then constructed equal to the reciprocals of these respective probabilities, which are assumed known. To adapt this approach to our context of longitudinal surveys subject to wave nonresponse, it seems natural to let p_i denote the probability that individual i is sampled and p_tji the probability that this individual responds at wave t. While we may reasonably suppose that the p_iare known, it is not straightforward to estimate the p_tji for general patterns of wave nonresponse (as noted in the covariance structure approach of Section 14.2). We therefore restrict attention to estimation using only the data derived from the attrition samples s_t. As noted in Section 14.2, it is common for longitudinal weights w_it to be available for use with these attrition samples and we shall suppose here that these approximate (p_ip_tji)^ÿ1. We may then set w_iequal to the design weight p^ÿ1_i and w_tji equal to w_it=w_i. Alternatively, given w_i1, . . . , w_iT, we may set w_i w_i1 and w_tji w_it=w_i1 (t 1 . . . T). Note, in particular, that in this case w_1ji 1 for all i. This approach treats the sample selection and the response process at the first wave as a common selection process. In the approach of Pfeffermann et al. (1998), correction for bias by weighting tends to be more difficult at level 1 than at level 2, because there tends to be more non-linearity in the IGLS estimator as a function of level 1 sums than of level 2 sums. Hence setting w_i w_i1may be preferable to setting w_i p^ÿ1_i because the resulting w_tjimay be less variable and closer to one.

Having then constructed the weights w_i and w_tji, the approach of Pfeffer- mann et al. (1998) may be applied to fit a model of form (14.2) where the v_itare not serially correlated. This is Model A. The basic approach is to modify the IGLS estimation procedure by weighting all sums over i by the weights w_iand weighting all sums over t by the weights w_tji.

Often survey weights are only available in a scaled form; for example, so that they sum to the sample size. For inference about many regression-type models, as in Parts B and C of this book, estimation procedures for the model parameters are invariant to such scaling. Although this is also true for multilevel modelling if the w_i are scaled, it is not true if the weights w_tji are scaled.

Pfeffermann et al. (1998) took advantage of this fact to choose a scaling to minimise small-sample estimation bias. In our context we consider scaling the weights w_tjito construct the scaled weights w_tjias

A MULTILEVEL MODELLING APPROACH 211

(8)

w_tji t(i)w_tji= X^t⁽ⁱ⁾

t1

w_tji

" #

where t(i) is the last wave at which individual i responds (1 t(i) T).

Hence the average weight w_tji for individual i across waves 1, . . . , t(i) is equal to one.

We now consider the question of how to adapt the approach of Pfeffermann et al. (1998) to allow for possible serial correlation of the v_it in Model B. We follow an approach similar to that in Hsiao (1986, section 3.7), which is based on observing that if we know r then Model B may be transformed to the form of Model A by

y_itÿ ry_itÿ1 (b_tÿ rb_tÿ1) (1 ÿ r)u_i "_it: (14:14) The estimation procedure involves two steps:

Step 1. Eliminate the random effect u_iby differencing the responses y_it D_it y_itÿ y_itÿ1, i 2 s_t, t 2, . . . , T

and estimate the linear regression model D_it d_t gD_itÿ1 Z_it

by OLS weighted by the weights w_it for observations i in the attrition samples s_t(t 2, . . . , T), where the parameters d_t are unconstrained.

Under Model B, the least squares estimator ^g of g is consistent for g cov(D_it, D_itÿ1)=var(D_itÿ1) t 2, . . . , T

[ ÿ (1 ÿ r)s²_"=(1 r)]=[2s²_"=(1 ÿ r)]

ÿ(1 ÿ r)=2:

Set ^r 1 2^g.

Step 2. Let ~y_it y_itÿ ^ry_itÿ1and fit the model obtained from (14.14) for the transformed data:

~y_it ~b_t ~u_i ~"_it (14:15) using the approach of Pfeffermann et al. (1998) with the assumptions of Model A applying to the model in (14.15). The estimated variance of ~u_i is then divided by (1 ÿ ^r)²to obtain the estimate ^s²_u.

This two-step approach produces consistent estimators of the parameters of Model B but the resulting standard errors of ^s²_u and ^s²_" will not allow for uncertainty in the estimation of r.

Finally, we note that Pfeffermann et al. (1998) only allowed for the sample to be clustered into level 2 units. In the application in Section 14.4 the sampling design will also lead to geographical clustering of the sample individuals into

(9)

primary sampling units. The procedure for standard error estimation proposed by Pfeffermann et al. (1998) therefore needs to be extended to handle this case.

We shall not, however, consider this extension here, presenting only point estimates for the multilevel modelling approach in the next section.

14.4. AN APPLICATION: EARNINGS OF MALE EMPLOYEES IN

GREAT BRITAIN an application

In this section we apply the approaches set out earlier to fit random effects models to longitudinal data on the monthly earnings of male full-time employees in Great Britain for the period 1991±5, using data from the British Household Panel Study (BHPS). The BHPS is a household panel survey, based on a sample of around 10 000 individuals. Data were first collected in 1991 and successive waves have taken place annually (Berthoud and Gershuny, 2000).

We base our analysis on the work of Ramos (1999). Like him, we consider only men over the first five waves of the BHPS and divide the men into four age cohorts in order to control for life cycle effects. These cohorts consist of men (i) born before 1941, (ii) born between 1941 and 1950, (iii) born between 1951 and 1960 and (iv) born after 1960.

The variable y is taken as the logarithm of earnings, with earnings being defined as the usual monthly earnings or salary payment before tax, for a reference period determined in the survey. We avoid the problem of zero earnings by defining the target population at wave t to consist of those men in the age cohorts who have positive earnings. It is thus possible for individuals to move in and out of the target population between waves. It is clearly plausible that the earnings behaviour of those moving in and out of the target population will differ systematically from those remaining in the target population. For simplicity, we shall, however, assume that the models defined in Section 14.1 apply to all individuals when they have positive earnings.

The panel sample was selected by stratified multistage sampling, with postal sectors as primary sampling units (PSUs). We use the standard linearisation approach to variance estimation for stratified multistage samples (e.g. SHS, p. 50). The BHPS involves a stratified sample of 250 PSUs. For the purpose of variance estimation, we approximate this stratified design as being defined by 75 strata, obtained by first breaking down each of 18 regional strata into 2 or 3

`major strata', defined according to proportion of `head of households' in professional/managerial positions, and then by breaking down each of these major strata into 2 `minor strata', defined according to the proportion of the population of pensionable age.

We first assess the fit of Models A and B (defined in Section 14.1) for each of the four cohorts. The results are presented in Table 14.1. We use goodness-of- fit tests based on the covariance structure approach of Section 14.2, with three choices of the matrix V in (14.8):

AN APPLICATION 213

(10)

Table 14.1 Goodness-of-fit test statistics for Models A and B for four cohorts and three estimation methods.

Model A Model B

Cohort

(when born) OLS GLS (iid) GLS

(complex) OLS GLS (iid) GLS (complex)

Before 1941 11.3 13.0 15.1 9.2 8.7 10.0

1941±50 41.2^b 39.0^b 39.9^b 28.4^b 27.0^b 29.5^b

1951±60 17.2 39.0^b 43.3^b 6.5 15.5 16.5

1960 29.1^b 37.4^b 35.5^b 15.8 16.7 17.7

Notes: 1. Test statistics are weighted and are referred to the chi-squared distribution with 13 df for Model A and 12 df for Model B.

2.^asignificant at 5 % level;^bsignificant at 1 % level.

3. OLS and GLS (iid) test statistics involve Rao±Scott first-order correction.

OLS : V I, the identity matrix;

GLS (iid): V V_iid, as defined in (14.13);

GLS (complex): V V_c, the linearisation estimator of the covariance matrix of AÃ, based upon (14.9), allowing for the complex design.

For V V_c, the test statistic is given by X_W² in (14.11) with the null distribution indicated in Section 14.2. For V I or V_iid, the values of ^y_GLSand V_cin (14.11) are replaced by the corresponding values of ^y and V and a first-order Rao±Scott adjustment is applied to the test statistic (SHS, Ch. 4). The same null distributions as for V_c are used. Test statistics based upon second-order Rao±Scott approximations were also calculated and led to similar results. All of the test statistics are based on data from the attrition sample s₅at wave 5, for individuals who gave full interviews at each of the five waves. Longitudinal weights w_i5were used, which allow both for unequal sampling probabilities and for differential attrition from nonresponse over the five waves. To allow for the changing population, the expression for the estimated covariance matrix in (14.7) was modified by including only those who reported positive earnings at each wave in the estimation of the covariance between the log earnings at two waves.

The values of the test statistics in Table 14.1 are referred to a chi-squared null distribution with 13 degrees of freedom in the case of Model A and with 12 degrees of freedom in the case of Model B. The results suggest that Model A provides an adequate fit for the cohort born before 1941 but not for the other cohorts and that Model B provides an adequate fit for all cohorts, except the one consisting of those born between 1941 and 1950.

The values of the test statistics vary according to the three choices of V. The differences between the values of the test statistics for the GLS (iid) and GLS (complex) choices of V are not large, reflecting the fact that there is a large number of degrees of freedom for estimating the covariance matrix of ^A (relative to the dimension of the matrix) and that the pairs of V matrices tend not to be dramatically disproportionate. The value of the test statistic with V as

(11)

the identity matrix suggests a much better fit of both Models A and B for the 1951±60 cohort and a somewhat better fit for the cohort born after 1960. This may be because this test statistic tends to be sensitive to different deviations from the null hypothesis than the GLS test statistics. The 1951±60 cohort is distinctive in having less variation among the estimated variances of log earnings over the five waves and, more generally, displays the least evidence of non- stationarity. Because of the high positive correlation between the elements of ^A, the test statistic with V as the identity matrix may be expected to attach greater

`weight' to such departures from Model A than the GLS test statistics and this may lead to the noticeable difference in values for the 1951±60 cohort. Strong graphical evidence against Model A for this cohort is provided by Figure 14.1.

This figure plots the elements ^S_tt⁰ of ^S in (14.3) against jt ÿ t⁰j and there is a clear tendency for the covariances to decline as the number of years between waves increases. This suggests that the insignificant value of the test statistic for Model A, with V as the identity matrix, reflects lack of power.

Estimates of the parameters in Model B are presented in Table 14.2 for the three cohorts for which Model B shows no significant lack of fit in Table 14.1. Estimates are presented for the same three choices of V matrix as in Table 14.1. While the estimates based on the two GLS choices of V are fairly similar, the OLS estimates, with V as the identity matrix, can be noticeably different, especially for the 1951±60 cohort. The effect of the differences for the cohort born after 1960 is illustrated in Figure 14.2, in which the estimated variances and covariances from (14.7) are presented together with fitted lines, joining the variances and covariances under Model B, implied by the parameter estimates in Table 14.2. The lines for the GLS choices of V are surprisingly low, unlike the OLS line, which passes through the middle of the points. Similar underfitting of the variances and covariances occurs for the other cohorts and this finding may reflect downward bias in such estimates employing

0 0

1 2 3 4

0.1 0.2 0.3

Variances and co-variances

Years apart

Figure 14.1 Estimated variances and covariances for cohort born 1951±60.

AN APPLICATION 215

(12)

Table 14.2 Parameter estimates for Model B for three cohorts using covariance structure approach.

Cohort

(when born) Estimator Parameter

r s²_u s²_e

Before 1941 OLS 0.37 (0.16) 0.165 (0.028) 0.049 (0.018) GLS (iid) 0.35 (0.16) 0.150 (0.024) 0.034 (0.011) GLS (complex) 0.32 (0.13) 0.143 (0.022) 0.034 (0.009)

1951±60 OLS 0.56 (0.11) 0.146 (0.021) 0.048 (0.015)

GLS (iid) 0.85 (0.09) 0.109 (0.047) 0.026 (0.047) GLS (complex) 0.85 (0.09) 0.106 (0.044) 0.026 (0.045) After 1960 OLS 0.49 (0.08) 0.155 (0.018) 0.071 (0.014) GLS (iid) 0.41 (0.07) 0.154 (0.016) 0.063 (0.010) GLS (complex) 0.40 (0.07) 0.150 (0.016) 0.061 (0.009) Notes: 1. Standard errors in parentheses.

2. Estimates are weighted and based only on data for attrition sample at wave 5.

3. 1941±50 cohort is excluded because of lack of fit of Model B in Table 14.1.

0 1 2 3 4

0.1 0.2 0.3

Variances and co-variances

Years apart

OLS GLS (lid) GLS (complex)

Figure 14.2 Estimated variances and covariances for cohort born after 1960 with values fitted under Model B.

sample-based V matrices, as discussed, for example, by Altonji and Segal (1996) and Browne (1984). The inversion of V implies that the lowest variances tend to receive most `weight', leading to the fitted line following more the lower envelope of the points than the centre of them. The potential presence of non-negligible

(13)

bias suggests that choosing V as the identity matrix may be preferable here for the purpose of parameter estimation, as concluded by Altonji and Segal (1996).

Table 14.3 shows for one cohort the effects of weighting, of the use of data from all attrition samples and of the use of the multilevel modelling approach of Section 14.3.

For the covariance structure approach, the impact of weighting is similar for all three choices of the matrix V. The fairly modest impact of weighting is expected here, since the BHPS weights do not vary greatly and are not strongly related to earnings.

The impact of using data from all attrition samples s₁, . . . , s₅, not just from s₅, appears to be a little more marked than the impact of weighting. This may reflect the fact that the earnings behaviour of those men who leave the sample before 1995 may be different from those who remain in the sample for all five waves. In particular, this behaviour may be less stable leading to a reduction in the estimated correlation r. Control for possible informative attrition might be attempted by including covariates in the model.

Table 14.3 Parameter estimates for Model B for cohort born after 1960.

Estimator Parameter

r s²_u s²_e

Covariance structure approach

Using attrition sample at wave 5 only Weighted

OLS 0.49 0.155 0.071

GLS (iid) 0.41 0.154 0.063

GLS (complex) 0.40 0.150 0.061

Unweighted

OLS 0.45 0.166 0.078

GLS (iid) 0.37 0.161 0.068

GLS (complex) 0.35 0.156 0.066

Using all five attrition samples (weighted)

OLS 0.36 0.169 0.052

GLS (iid) 0.38 0.158 0.048

GLS (complex) 0.30 0.155 0.047

Multilevel modelling approach

Using attrition sample at wave 5 only

Weighted unscaled 0.41 0.169 0.041

Weighted scaled 0.41 0.167 0.042

Unweighted 0.41 0.170 0.045

Using all five attrition samples

Weighted unscaled 0.43 0.167 0.043

Weighted scaled 0.43 0.163 0.045

Unweighted 0.43 0.165 0.047

AN APPLICATION 217

(14)

The results for the multilevel modelling approach in Table 14.3 are based upon the two-step method described in Section 14.3. The estimated value of r is first determined and then estimates of ^s²_uand ^s²_e are obtained by the method of Pfeffermann et al. (1998) either with or without weights and, in the former case, the weights may be scaled or not.

The impact of weighting on the multilevel approach is again modest, indeed somewhat more modest than for the covariance structure approach. This may be because a common estimate of r is used here. Scaling the weights also has little effect. This may be because all the weights w_tji are fairly close to one in this application and thus scaling has less of an impact than in the two-stage sampling application in Pfeffermann et al. (1998).

The differences between the estimates from the covariance structure approach and the corresponding multilevel modelling approaches are not especially large in Table 14.3 relative to the standard errors in Table 14.2.

Nevertheless, across all four cohorts and both models, the main differences in the estimates between methods were between the three choices of V matrix for the covariance structure approach and between the covariance structure and the multilevel approaches. The impact of weighting and the scaling of weights tended to be less important.

14.5. CONCLUDING REMARKS concluding remarks

It is often useful to include random effects in the specification of models for longitudinal survey data. In this chapter we have considered two approaches to allowing for complex survey designs and sample attrition when fitting such models. The covariance structure approach is particularly natural with survey data. The complex survey design and attrition are allowed for when making inference about the covariance matrix of the longitudinal responses. Modelling of the structure of this matrix may then proceed in a standard way. The second approach is to adapt standard multilevel modelling procedures, extending the approach of Pfeffermann et al. (1998).

The two approaches may be compared in a number of ways:

. The multilevel approach incorporates the different attrition samples more directly, although the possible creation of bias with unequal w_tji(for given i) with small numbers of level 1 units (i.e. small T), as discussed by Pfeffer- mann et al. (1998), may be a problem.

. The multilevel approach incorporates covariates more naturally, although the extension of the covariance structure approach to include covariates using LISREL models is well established.

. The covariance structure approach handles serial correlation more easily.

. The covariance structure approach generates goodness-of-fit tests and residuals at the level of variances and covariances. The multilevel approach generates unit level residuals.

(15)

Finally, our application of the covariance structure approach to the BHPS data showed evidence of bias in the estimation of the variance components when using GLS with a covariance matrix V estimated from the data. This accords with the findings of Altonji and Segal (1996). This evidence suggests that it is safer to specify V as the identity matrix and use Rao ± Scott adjustments for testing.

CONCLUDING REMARKS 219