• No results found

2 Methods

2.2 Data cleaning, missing data and imputation

2.2.1 Data cleaning

All categorical variables were checked for data entry errors (i.e. categories that did not exist) or inconsistences with responses to another questionnaire item. Where responses were inconsistent (e.g. for parental smoking) an ‘inconsistent’ category was derived. This was later recoded to ‘missing’ to allow values to be imputed.

2.2.1.2 Continuous Variables

The continuous variables were checked for extreme values, and inconsistency between measures or between Waves. Each of the measures which required cleaning is described in turn.

Body Size Measures

Extreme values

Extreme values for height, weight, waist and hip were carefully examined to determine whether they were plausible. This was done by: comparing values, and their percentile scores, with other body size measures (e.g. did those with extremely high weights also have large waists); comparing values at Wave 1 and Wave 2 (e.g. were those who were very tall at Wave 1 still tall at Wave 2); and checking the sex and ethnicity of those with extreme values. In total, only one height measure, two weight measures, and one waist measure were implausible; these were recoded to missing.

Difference between waves

The differences in values between Waves were checked to ensure that the amount of weight/height/waist circumference gained (or lost) was plausible. Small decreases in height (<0.5cm) were found for 8 pupils and were assumed to be due to a slight

measurement error at one or both Waves; in this case, the Wave 1 height was assumed to be correct (the taller height), and replaced the Wave 2 height. Twenty people had a height difference between -8.1cm to -0.5cm. All of the measured heights were plausible; the Wave 1 height was assumed to be correct, and the Wave 2 height was recoded to missing. One individual had a negative difference in height between the Waves much larger than any of the others; this White UK girl was measured at 158.1cm (centile 67.6) at Wave 1, and 125.4cm (centile 0) at Wave 2 giving a difference of -32.7cm. The Wave 1 height was deemed most plausible given the percentile values and the corresponding BMI values for each height (BMI of 24.2 at Wave 1 and 49.5 at Wave 2). The Wave 2 height was recoded to missing.

Differences in weight between the Waves were then examined. The mean weight difference between Wave 1 and Wave 2 was 9.79kg. However, the range was large: - 91.9kg to 48.1kg. These values seemed implausible given that the mean time between the two measurements was only 2.6 years (range 2.2 to 3.5 years). Three pupils had lost 80kg

or more (over 12.5 stone) but had increases in their hip and waist measures between Wave 1 and Wave 2. This suggested no weight loss occurred. Furthermore, the waist and hip measures suggest that the high Wave 1 measures were inaccurate; these were recoded to missing. The hip and waist measures of all those who had a weight loss of more than 12kg were checked. In most cases waist and hip measures reduced, confirming weight loss had occurred. However, in one case, a Black Caribbean girl who lost 38kg, the reduction in waist and hip measures was small and did not reflect such a large loss in weight. The Wave 1 measure was recoded to missing.

BMI was calculated for each individual from the cleaned height and weight measures. The measurements (height, weight, waist, hip, and BMI at both Wave 1 and Wave 2) were examined for the individuals with the highest and lowest BMIs at each Wave. At both Waves pupils with low (<14) and high (>35) BMIs had waist and hip data consistent with such BMI values.

At both Waves, waist and hip were measured twice. Where the two measurements were >0.5cm apart, a third measure was taken. The mean waist was calculated as the average of the two closest measures. For Waist, at Wave 1, 182 pupils had >0.5cm between their measures, of which 28 of them had >2cm between their measures. Closer inspection of these pupils revealed that the majority had a difference of <5cm, and all were less than 8cm. No changes to data were made. At Wave 2 thirty individuals had a mean waist where the difference between the 2 measures used to calculate the mean was greater than 0.5cm but all were less 1.5cm and deemed to be acceptable. Differences between the hip measures were also checked. At Wave 1, 148 people had a difference between measures of >0.5cm, and 23 of them had differences >2cm. Of these, the majority had a difference of <4cm. At Wave 2, eight pupils had a difference between their measures of >0.5cm, but all were less than <0.9cm apart.

The difference between the hip and waist measurement for each individual at each Wave was calculated. At Wave 2, five pupils had waist measures which were greater than their hip measures. The difference ranged from 0.9cm to 4.8cm. These values were plausible and no changes were made.

Standard of living score

As previously described, the standard of living score could range from 0 to 19 items. It was decided that it was highly unlikely that any pupil could score 0 (as items such as bathroom/toilet were on the list). At Wave 1, one pupil reported no items, another only 1 item (the 1 item was a DVD player which seemed implausible). The next lowest score was 5. At Wave 2, 15 pupils (all boys) reported having 0 items, the next lowest score was 8. Those reporting improbably low scores (0 or 1) were recoded to missing.

Physical Activity

The physical activity measures were checked systematically. At Wave 2 the first check was to ensure that a pupil had not reported doing an activity an extreme number of times or for an extreme total time in the previous 7 days. The range of values reported was

examined to help inform sensible cut-offs. It was decided that it would be pragmatic to have the same cut-offs for all activities (although it is acknowledged that it is more

plausible for some activities to have been done for a greater number of times/longer length of time than others). Any pupil reporting >14hrs, or >14 sessions of an activity in the previous 7 days was recoded to missing for both the time and number of sessions variables. It was decided that an activity session had to have lasted at least 5 minutes and no more than 240 minutes (or that the average length of a session had to fall within these limits for a pupil who reported doing an activity more than once). Therefore any pupils who reported an activity session which had a mean length outside of these limits had both their time and number of sessions recoded to missing.

The time and number of session variables were then summed for all the activities in the list and checked for any extremes in these total variables. The total number of activity

sessions allowed was 28, and the total length of time 1800 minutes (30 hours i.e. 4 hours per weekday plus 5 hours per weekend day). Where pupils had values greater than these for either measure, both were recoded to missing.

For the simpler Wave 1 physical activity measure, the total number of sessions was also set at 28 to match Wave 2. Pupils who reported more than this had their value recoded to missing.

2.2.2 Extent of Missing Data