Methods for missing data problems - Prevalence and co-occurrence of child maltreatment and hous

2 Study sample and measures

3.2 Prevalence and co-occurrence of child maltreatment and household

3.2.1 Methods for missing data problems

Responses for child maltreatment and household dysfunction measures: Retrospective measures of child maltreatment and household dysfunction were collected at age 45y from 9,310 individuals. Participants of the 45y survey differed from non-participants with respect to prospective neglect and household dysfunction measures (as discussed in §2.6). Prospective indicators of neglect and household dysfunction were available from 11,202 to 15,583 participants. Among them, 6,294 had complete data for all 18

measures of child maltreatment. Different response rates may affect prevalence estimates as participants with complete data may differ in the outcome measure from those with missing data (selection bias).

To reduce the possible bias due to attrition and missing data, I applied multiple

imputation and compared prevalence estimates in four different samples (Figure 3.1);

Sample 1: participants in the original birth cohort alive at age 45y (n=9,310 for retrospective measures and 11,202 - 15,583 for prospective measures);

Sample 2: participants of the 45y survey (n = 9,310 for retrospective measures and 6,852 – 8,868 for prospective measures);

Sample 3: participants in the original birth cohort alive at age 45y, missing data imputed (n=17,313);

Sample 4: participants of the 45y survey; missing data imputed (n = 9,310)

Figure 3.1: Data samples used to estimate prevalence and co-occurrence of child maltreatment and household dysfunction

Retrospective measures Prospective measures Samples* Total (1) 45y sample (2 & 4) Total imputed (3) Total (1) 45y sample (2) Total imputed (3) 45y sample imputed (4)

Graphical representation of data in each sample; *Data samples; 1) participants alive at age 45y, 2) participants of the 45y survey, 3) participants alive at age 45y with imputed data, and 4) participants of the 45y survey with imputed data.

n 17,313 9,310 17,313 17,313 9,310 17,313 9,310 Observed data Missing data Imputed data

Multiple imputation; There are several missing data mechanisms (Box 3.1). In an imputation model all partially observed variables are treated as response variables and data are assumed to be missing at random (MAR), conditional on some given

covariates. These covariates should include variables that predict the probability of missingness374 and also the values of response measures375. First, multiple copies of the dataset are generated, and missing values are replaced by imputed values using the chained equation method376. To account for the uncertainty in calculating missing values, the imputed figures are selected from their predictive distribution based on the observed data (average within-imputation variance). Second, the model of interest (i.e. the model to estimate prevalence of a maltreatment measure) is fitted to each imputed dataset, and results are combined using Rubins combination rules377. This accounts for the variability in results between the imputed datasets, and thus the uncertainty

associated with the missing values (between-imputation variance)378.

Advantages and methodological issues of multiple imputation: The main advantages of multiple imputation are that it maximises power by retaining all observed data, whilst correcting for selection bias by including all the predictors of missing data in the imputation model379. Unlike other ad-hoc methods, or a single imputation model, multiple imputation accounts for the uncertainty associated with estimating missing values, thus increasing the precision of estimates380. Furthermore, multiple imputation has been shown to be robust in departures from normality assumptions, and provides adequate results for small sample sizes379.

However, it is not possible to verify the MAR assumption required by multiple Missing completely at random (MCAR) assumes that the probability of an observation being missing is unrelated to both the unobserved value itself, and values of other variables in the dataset. Thus there are no systematic differences between the missing and observed values. Missing at random (MAR) assumes that the probability of an observation being missing can be predicted by other observed measurements and is unrelated to the unobserved value itself after controlling for other variables in the analyses.

Missing not at random (MNAR) assumes that missing data are systematically different to the observed data, even after accounting for observed data. In such cases, the reason for values being missing is dependent on the unseen observations themselves.

may indicate that the MAR assumption is likely to have been met380. Even when estimates obtained from the two analyses differ, multiple imputation is an improvement on complete data analyses as it adjusts for missing data patterns381;382. There is a debate about the upper limit of missing data which can be imputed and still provide reliable estimates. Multiple imputation tends to give less biased results compared to complete data analyses, in studies where 50-80% of participants have missing observations383-385. The STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) Statement advices that if a large fraction of the data are imputed, observed and imputed values should be compared386.

In Chapter 4, I applied multiple imputation for child maltreatment and household dysfunction measures to a target sample of all participants alive at age 45y (n=17,313). Variables found to be associated with the probability of missing observations were identified previously by Atherton et al338. These measures were used in the imputation model; ethnicity, social class at birth, lone-mother household and reading ability at age 7y. Child maltreatment and household dysfunction measures were also incorporated in the imputation model, as they predicted missingness and were used in subsequent analyses. The relationship between child maltreatment and household dysfunction measures were assessed using complete and multiply imputed datasets. These results were compared to indicate if MNAR was present. Findings were similar, and analyses from the complete dataset are presented (Chapter 4).

In document Investigating the effects of child maltreatment and household dysfunction on child physical development in a British birth cohort (Page 80-83)