Missing data - Analytic Techniques - RESEARCH DESIGN AND METHODS

CHAPTER 3: RESEARCH DESIGN AND METHODS

3.6 Analytic Techniques

3.6.6 Missing data

The sample selection procedure outlined in Section 3.4 resulted in a minimal amount of missing data in the final sample. Contemporary missing data techniques require the researcher to establish the extent to which data is missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR; Enders, 2010) in order to justify the selected treatment of the missing data. The most likely and favourable scenario in real-world research settings is MAR, where missingness is not a result of the value that would have been provided if it were not missing, but is likely related to other variables or participant characteristics. This section describes the

89 amount and treatment of missing data in each of the variable areas of self-regulation, parenting and behavioural problems.

Across the self-regulation indicator variables (sleep regulation, reactivity and persistence), there were 4 to 36 cases with missing data depending on the item and the wave. At most this represents 1.3% of the total sample and so was considered

negligible. Similarly, across the parenting variables, there were 5 to 33 cases with missing data, depending on the item. At most this represents 1.1% of the total sample and so was again considered negligible. Missing data on the maternal depression items was also present for less than 1% of the sample. Tests for the extent to which

missingness on these items was a function of other variables in the dataset were not conducted due to the very small amount of missing data present.

A greater extent of missing data was found for the outcome measures taken from Wave 4 of the dataset, as complete Wave 4 data was not a requirement for sample selection. A total of 123 cases within the study sample of 2880 did not have mother- reported behaviour problems or maternal mental health data for Wave 4 due to sample attrition within the longitudinal study. Differences between these 123 cases and the rest of the study sample with complete data were tested and no differences were found on Aboriginal or Torres Strait Island status, main language other than English, child

gender, self-regulation indicators or maternal history of depression. However there were differences between those with complete and missing data at Wave 4 in regards to socio-economic disadvantage (SED), with those with missing data having a

significantly higher SED score at Wave 1 than those with complete data (F = 15.4, df = 1, p = .000). Given that a relationship between self-regulation indicators and mother- reported behaviour problems at Wave 4 was hypothesised, the lack of relationship between the self-regulation variables and missingness on the outcome measure at Wave 4 was taken as a strong indication that the missing data would be unlikely to be

predicted by the actual behaviour problem data if it were available. That is, it is unlikely that parents who had children with significantly higher or lower levels of behaviour problems comprised the participants that did not complete Wave 4 data. Therefore it was reasonable to assume that the data was MAR, that is, unlikely to be missing due to the scores that would have been provided, rather than NMAR (Enders, 2010). The same assumption was made in relation to the missing maternal mental health data because

90 there was no relationship between history of maternal depression and missing maternal mental health data at Wave 4.

A total of 564 cases within the study sample of 2880 did not have teacher- reported behaviour problem data for Wave 4 due partly to sample attrition within the longitudinal study and partly to lower teacher response rates. Differences between these cases and the rest of the study sample with complete data were tested and no differences were found on Aboriginal and Torres Strait Islander status, main language other than English, SED, child age, child gender, or maternal history of depression. It was assumed that the teacher-reported behaviour problem data was at least MAR.

Working on the assumption that all missing data was MAR, a number of approaches to the treatment of missing data were considered. These were the use of maximum likelihood estimators in the estimation of models in Mplus as these

estimators are able to handle missing data without imputation; multiple imputation prior to analysis; and, expectation maximisation (EM) imputation prior to analysis. The estimator chosen for the substantive analyses conducted in Mplus for this thesis was largely the WLSMV estimator, selected to accommodate the ordinal categorical nature of many of the variables (with the exception of the final study which used the maximum likelihood estimator). This precluded the use of the maximum likelihood estimator to directly handle missing data. Multiple imputation was considered too computationally burdensome for the large dataset and large number of variables in this study. Therefore EM imputation was completed in the Statistical Package for Social Sciences program (SPSS) prior to the substantive analyses in order to allow a complete dataset to be analysed in the Mplus models. In the case of the single dichotomous item of history of maternal depression, missing cases (less than 1%) were imputed with the most common response of “no” (to a history of depression).

In document Self-regulation from birth to age seven : associations with maternal mental health, parenting, and social, emotional and behavioural outcomes for children (Page 108-111)