Longitudinal Weights - Non-response in Longitudinal Data

2.4 Non-response in Longitudinal Data

2.4.3 Longitudinal Weights

One other alternative approach to attenuate the non-response bias in survey data is to use adjusted sampling weights based on the response distribution. These adjusted weights compensate for the unequal probabilities of selection and for the unplanned non-response that occurs in the surveys (Kalton and Bryk, 2000). In panel surveys, these weights are usually in the form of cross-sectional and longitudinal weights, which depend on the patterns of wave non-response and on the attrition patterns. However, due to the varied number of measurement occasions, the weighting process in panel surveys is inevitably more complex than in cross-sectional surveys.

When the set of longitudinal weights is not available with the released panel data, they can be constructed. There are different methods to calculate these weights and the choice between the type of longitudinal weights to be used depends on the objective of the analysis. They are usually calculated to: adjust for wave non-response and attrition patterns or to adjust for the attrition patterns only (Lepkowski, 1989). The objective of using longitudinal weights is to compensate for the data loss (Kalton and Bryk, 2000) in each occasion. Therefore, there is a different set of weights for every occasion which adjusts the responding patterns to compensate for the non-responding patterns (Lepkowski, 1989). The longitudinal weights are usually calculated for the set of respondents, and the non-response cases are eliminated from the data set or are assigned weight zero.

The sets of longitudinal weights that are calculated to account for both the wave non-response and the attrition patterns are the most complex type. There will be up to 2T _{patterns of non-response in a panel data set with T occasions.}

This would require the construction of up to 2T _{− 1 sets of longitudinal weights}

to allow the analysis of data for all possible combinations of occasions, T . The set of longitudinal weights that accounts only for the attrition patterns, in turn, is the simplest. For example, in a longitudinal analysis that includes data from the first occasion to occasion t, only the set of weights at occasion t is needed which is adjusted to compensate for sample losses in all previous occasions (Kalton and

Bryk, 2000). For that, only the set of individuals present from the first occasion to occasion t needs to be considered in the analysis. However, this results in the elimi- nation of valid data. Lepkowski (1989) suggested modifying the wave non-response patterns so that they are expressed as attrition patterns. This modification results in fewer data being eliminated when the attrition weights are calculated. How- ever, it ignores the possibility that wave non-respondents might be fundamentally different to those who leave the survey prematurely.

Panel surveys, such as the BHPS (Taylor et al., 2009), LFS (ONS, Office for National Statistics, 2009), Panel Study of Income Dynamic (PSID) (Gouskova et al., 2008), Survey of Income and Program Participation (SIPP) (Fuller and An, 1996; Kobilarcik and Singh, 1996; Allen and Petroni, 1994) and Survey of Labour and Income Dynamics (SLID) (LaRoche, 2003; Hunter et al., 1992), use different methods to calculate the longitudinal weight adjustments. These methods vary in complexity but all have as a first step the definition of a base weight which is usually the cross-sectional weight for the first occasion already adjusted to account for initial wave non-response.

The simplest method involves classifying respondents and non-respondents in weighting cells or classes according to the information available for all. The non-response adjustment factor is calculated as the inverse of the response rate in each of the classes (LaRoche, 2003). This response rate is calculated as the weighted sum of the respondents sample over the weighted sum for the eligible sample in that class. Respondents have their base weights adjusted by this factor and non-respondents receive weight zero. This method often uses decision trees to define the different cells and is usually followed by some kind of calibration method (Kalton and Bryk, 2000).

Another method commonly used involves fitting logistic regression models for the propensity of being a respondent (or non-respondent) (Hunter et al., 1992; Rizzo et al., 1996). The outcome variable for the logistic model is the response indicator, like the Rti from the previous section. The covariates are usually cat-

egorical variables taken from the data of the previous occasions. The model can include main effects and interactions between these variables and also sampling design variables (Lepkowski, 1989). The non-response adjustment factor is calculated as the inverse of the predicted probabilities for the respondents and applied to their base weights. Non-response bias is expected to be reduced once the model controls for the covariates that are related to the response propensity (Kalton and Bryk, 2000). When only categorical variables are used this method works in a similar way to the adjustment cells methods.

The use of longitudinal weights is advisable in order to compensate for sample losses between sequences of occasions. This practice ensures the sample is repre- sentative of the population at the time the sample was selected (LaRoche, 2003). Ignoring the non-response in a longitudinal analysis might yield biased estimates, as this implies the assumption of equal distributions of the outcome variable for respondents and non-respondents (Pfeffermann and Sikov, 2008). Non-response reduces the sample size and in longitudinal surveys it might have an effect on the availability of the longitudinal component. The use of methods that adjust for non-response reduces the bias in the estimation of population parameters while preserving the relationships between the survey variables, provided that the elim- ination of the available cases is not significant (Kalton and Bryk, 2000).

In document Methods for analysing complex panel data using multilevel models with an application to the Brazilian labour force survey (Page 63-65)