List of Abbreviations
2.2 Data sources
2.2.1 UK Household Longitudinal Study (UKHLS) dataset
The UKHLS datasets available for use in Chapters 4 and 5 of the present thesis came from the mainstage dataset (though data from the innovation panel were also examined to assess the comparability of the seven separate sleep characteristics generated using bespoke sleep module items in the mainstage questionnaire, with answers generated from the validated Pittsburgh Sleep Quality Index [PSQI], on which the UKHLS sleep items were based, and which participants in the innovation panel also completed; see Appendix, Section 8.1.2, page 284). The mainstage dataset of the UKHLS was chosen as the basis from which to study sleep patterns in the UK for three broad reasons;
First of all, the mainstage study’s sampling frame was specifically designed to be broadly representative of the UK population; comprising participants in more than 50,000 households from England, Scotland, Wales and Northern Ireland. So to begin with, the mainstage study comprised four separate sub-samples of UKHLS participants: the general population sub-sample; the general population comparison sub-sample; the ethnic minority boost sub-sample; and the (original) British Household Panel (BHP) survey sub-sample. The proportions and sizes of these sub-samples were, however, different. Nonetheless, to ensure the generalisability of the mainstage study (i.e. the ability of the final sample to represent the UK population, considering the different probability of: selection experienced by participants between, and within, the several smaller subsamples;
and having a selection bias; Gundi, 2016) a special weighting variable was
developed by the UKHLS research team for researchers to include in their analysis (i.e. n_indpxub_lw and n_indpxub_xw) 4.
In summarising the sub-sampling technique of the main stage UKHLS design it should be emphasised that the ‘general population’ sub-sample was designed to be representative of the UK population by stratifying the UK on the basis of postcode sectors, with household addresses chosen randomly with multi staging technique within each of the postcode sectors. The general population comparison sub-sample was a smaller sample, randomly selected from within the larger general population sample; while the ethnic minority boost sub-sample was designed to elevate the numbers of participants from key ethnic minority groups, specifically from the: Indian, Pakistani, Bangladeshi, Caribbean and African communities. This approach aimed to ensure that the UKHLS had access to data from at least 1,000 households in each of these ethnic groups. To this end, the postcode sectors with the highest proportion of addresses where ethnic minorities were known to reside were identified, and these sectors were combined to generate sampling strata. Finally, households within these strata were selected through multistage random selection to ensure a similar number of participants were recruited from each ethnic group. Thereafter, the ‘original’ BHP sub-sample included participants who had already been enrolled in the final wave of the BHP study (in many respects, the precursor to the UKHLS), and comprised BHP participants who had agreed to participate in the UKHLS, these participants being incorporated into the UKHLS from Wave 2 of the UKHLS onwards (University of Essex. Institute for Social and Economic Research and NatCen, 2015). The use of random, complex and multi-staging sampling (a technique that covered the majority of the UK post code sectors) by the UKHLS was viewed as a distinct advantage for the analysis of latent sleep patterns in the present thesis (i.e. for use in Chapter 4), since it ensured that these analyses were likely to be broadly representative of any such sleep patterns across the UK’s population as a whole.
In addition, the inclusion of an ethnic minority boost sample was felt to be helpful (and important) for the analysis of sleep and pregnancy outcomes (in Chapter 5), since ethnicity is considered a clinically relevant predictor for pregnancy outcomes, and having sufficient numbers of participants from the ethnic minority populations
4 Note: The results of the analyses using UKHLS dataset (as presented in Chapter 4 of the present thesis) did not include a weighting variable, since the results of these analyses were found to be the same with or without the inclusion of the weighting variable. Therefore, it was decided that all subsequent analyses would be undertaken without the use of these weights, though solely to facilitate greater simplicity in the presentation and interpretation of these analyses’ results.
helped to ensure that ethnicity could be included as a potential confounder in these analyses.
The second reason for choosing the UKHLS data was that the UKHLS remains one of the few large-scale surveys to have included items on sleep (and, importantly, on far more than just sleep duration alone); though the UKHLS also collected extensive additional data on the sociodemographic, economic, cultural and behavioural background and circumstances of its participants; and included adult (>16yrs) participants across all age groups and from both sexes, including women who were pregnant at the time of questionnaire completion.
The final reason for choosing the UKHLS was that the this was designed as a powerful, prospective, longitudinal study, carried out in a series of interconnected cross-sectional waves. Each wave of data collection commenced on an annual basis, and each wave lasted for 24 months, with a 12-month period of overlap between consecutive waves. Data collection for the first wave (‘Wave 1’) commenced in January 2009, and the last of the current rounds of completed waves used in the present thesis (Wave 6) concluded in late 2015 - the data from which being released in November 2016. Waves 1 and 4 included the complete UKHLS sleep module (each comprising items designed to collect data on 7 different, individual sleep characteristics). For this reason, data from these two waves were used to:
I. examine the nature of latent sleep patterns amongst all UKHLS participants (Chapter 4); and
II. identify pregnant women with complete sleep data, and analyse the association between sleep and pregnancy outcomes (in Chapter 5).
The longitudinal design of the UKHLS was particularly beneficial to the analyses undertaken in the present thesis, as this made it possible to examine the stability of any latent sleep patterns over time (i.e. between Waves 1 and 4), and to specify the DAGs used to inform the covariate adjustment sets used in the multivariable statistical analyses used to examine the association between sleep and pregnancy outcomes (since the longitudinal nature of data collection helped to identify the temporal sequence with which potential confounders and likely mediators had occurred/were recorded – a crucial issue in the specification of DAGs, as explained later in this chapter).
UKHLS data from modules relevant to pregnancy-related issues were also examined, including those generated from other Waves (particularly Waves 2 and 5), since these had information about previous pregnancies that had occurred during, or shortly after, the preceding Wave. Thus, any birth outcomes and
pregnancy-related data of any participants who were pregnant in Waves 1 or 4 were carefully traced in the datasets for Waves 2 and 5. These pregnancy-related data were then used in the UKHLS pregnancy study (i.e. Chapter 5; see Figure 2-3).
The UKHLS is primarily funded by the UK’s Economic and Social Research Council, with additional support from multiple other government departments. The UKHLS is coordinated and directed by Institute for Social and Economic Research at the University of Essex with collaborators at the University of Warwick and the London School of Economics and Political Science (University of London). The experienced National Centre for Social Research (NatCen; based at City University in London) conducted the data collection field work (University of Essex. Institute for Social and Economic Research and NatCen, 2015).
Figure 2-3 Diagram showing how data from successive Waves of the UKHLS provided data on pregnancy and sociodemographic, health, and behavioural variables for use in analysing the association between sleep and pregnancy outcomes in Chapter 5 of this thesis.