• No results found

Chapter 3. Developing a national birth cohort using administrative linked

3.4 Methods for developing an English birth cohort using HES

3.5.1 Identifying births

3.5.1.1 Inclusion and exclusion criteria

I identified 11,523,422 birth episodes for 10,375,757 unique HESIDs. I excluded

310,433 multiple births (3.0% of HESIDs) and 536 terminations of pregnancy (0.01% of HESIDs). A further 55,586 HESIDs were marked as stillbirths. However, 8.6% of these records were linked to a death record indicating either a false match between HES and ONS mortality data, or a miscoding of the birth episode as stillbirth in HES (e.g., an error in the discharge method field indicating a stillbirth). I assumed that high quality links between HES and ONS mortality data were live births miscoded as stillbirths, and I kept them in the cohort (1,046 HESIDs). High quality links had to have an exact agreement of NHS number, sex and date of birth (indicated by a match rank of 1 or 2, see Box 3.2). Thus, I excluded 54,540 stillbirths (0.5% of HESIDs). The numbers of excluded multiple births and stillbirths were consistent with numbers reported for England and Wales by ONS (3.0% and 0.5%).44,162 All exclusions made are illustrated

in Figure 3.3. Overall, I identified 11,058,361 birth episodes for 10,014,226 singleton live births in the HES-ONS birth cohort between the 1st April 1997 and the 31st March 2014.

Figure 3.3 - Flow diagram showing exclusions made to develop a representative b irth cohort using HES-ONS data

HES=Hospital Episode Statistics; ONS=Office for National Statistics. Data are number and % of births (unique HESIDs) removed at each stage. The final cohort covered 65% of all births in

Final cohort (2003-2012) N=3,932,886 (38%)

DEALING WITH MISSING DATA (section 3.5.3.6) Trusts with “poor” quality (1,770,419 HESIDs, 17%)

EXCLUSION CRITERIA (Section 3.5.1.1)

Deaths on days 0-1 (6,823 HESIDs, 0.07%)

EVALUATING LINKAGE QUALITY (Section 3.5.2.3) DATA CLEANING (Section 3.5.1.2)

Multiple births (310,433 HESIDs, 3.0%) Terminations of pregnancy (536 HESIDs, 0.01%)

Stillbirths (54,540 HESIDs, 0.5%)

Unfinished episodes (21,062 HESIDs, 0.2%) False matches (2,614 HESIDs, 0.03%)

MOTHER-BABY LINKAGE (Section 3.5.3.1) Births outside 1998-2012 (1,186,615 HESIDs, 11%) Episodes identified as births (1998-2012)

Unique HESIDs: N=10,375,757

Births before 2003 (2,686,841 HESIDs, 26%)

Births with missing data or 500g or gestational age<24 weeks (390,338 HESIDs, 3.8%) Non-English residents (12,712 HESIDs, 0.1%) N=10,014,222 (97%)

N=8,803,935 (85%)

N=6,113,116 (59%)

N=6,100,404 (59%)

3.5.1.2 Data cleaning

I removed 21,062 HESIDs for which only unfinished HES episodes were available.

These episodes were likely to be missed matches where a new HESID was generated for equivalent finished episodes recorded in the following financial year. I excluded 2,614 HESIDs identified as false matches. I then dropped 62,592 duplicated birth episodes and a further 952,714 consecutive episodes of care after birth (leaving one birth episode per baby). Finally, I excluded 1,186,615 HESIDs for babies born outside the period of available linked ONS mortality data and the Swedish birth cohort (1st January 1998 to 31st December 2012).

3.5.1.3 Cohort coverage

Following these exclusions, I identified 8,803,935 births in 1998-2012. Assuming that the ratio of singleton live births to all live births was the same in England as in England and Wales (97.0% in 1998-2012; data for England only was not available from ONS publications), the HES-ONS birth cohort covered 96.4% of all singleton live births in England in 1998-2012.44,162

3.5.2 Longitudinal follow-up data until the fifth birthday

3.5.2.1 Hospital admission trajectories

Initially, there were 16,435,242 episodes of care (including birth episodes) with age at admission <5 years old identified for 8,803,935 singleton live births in the HES-ONS birth cohort. I removed 246,561 episodes during data cleaning as either duplicates or records with no recorded clinical information. The remaining 16,188,681 episodes were linked into 15,024,811 hospital admissions for 8,803,935 children born in 1998-2012, of which 6,220,870 were admissions for children after birth but before age of five years old.

In 1998-2002, 12.3-14.1% of babies had at least one hospital admission after birth in the first year of life (Figure 3.4). After 2002 when the NN4B service was implemented, the proportion rose to 18.1% in 2003, and increased annually, reaching 23.4% in 2012.

Key finding from results Section 3.5.1:

 HES-ONS birth cohort covered 96.4% of singleton live births in England between 1998-2012

This shift suggests that there were more missed links between birth admissions and consecutive hospital admissions after birth due to increased chances of missing NHS number at birth prior to introduction of the NN4B system.

Figure 3.4 – Percentage of children in HES-ONS birth cohort with at least one hospital admission after birth in the first year of life by the year of birth

HES=Hospital Episode Statistics; ONS=Office for National Statistics.

3.5.2.2 Linkage to ONS mortality data

3.5.2.2.1 Checking the linkage between HES and ONS mortality data

Initially 43,491 deaths were linked to births in 1998-2012. I identified additional 371 in-hospital deaths recorded only in HES, where the discharge method in the in-hospital admission record indicated a death but there was no link to the ONS mortality data. All of these ‘HES only’ deaths were for births in 1998, when the linkage between HES and ONS mortality data was first introduced.

I excluded the following death records (i.e. I removed the link between death records and HES-ONS birth cohort and thus these birth records remained in the cohort):

 1,997 deaths which occurred after child’s fifth birthday (4.5% of all deaths),

 80 deaths which occurred after the 31st December 2012 (0.2%)

 4 deaths with date of death before birth date (0.01%)

 149 deaths with subsequent hospital admissions after death (0.3%, see Section 3.4.2.2)

 36 deaths for which the difference in the date of death according to HES and ONS mortality data was >1 day (0.1%)

After these exclusions, there were 41,616 child deaths in the HES-ONS birth cohort.

0%

5%

10%

15%

20%

25%

% of births in the HES-ONS birth cohort

Year of birth

Similar to trends in the proportion of infants with at least one hospital admission after birth, infant mortality rates based on the HES-ONS birth cohort showed different patterns before and after implementation of NN4B system in October 2002. The rates were underestimated relative to rates reported for England and Wales for births before 2003, and closely matched rates reported by the ONS for births in 2003-2012 (Figure 3.5).

Figure 3.5 – Comparison of infant mortality rates per 1000 births based on HES-ONS birth cohort and singleton live births registered in England and Wales by age at death and year of birth

HES=Hospital Episode Statistics; ONS=Office for National Statistics. Data from ONS publications for England and Wales based on birth and death registration data.34,42,161

0 1 2 3 4

Deaths per 1000 live births

Year of birth

Neonatal mortality (0-27 days)

HES-ONS birth cohort

ONS national statistics for England and Wales

0 0.5 1 1.5 2

Deaths per 1000 live births

Year of birth

Post-neonatal mortality (28-364 days)

HES-ONS birth cohort

ONS national statistics for England and Wales

The differences in mortality rates calculated from the HES-ONS birth cohort and those reported by ONS for England and Wales before 2003 were larger for post-neonatal mortality compared to neonatal mortality (Figure 3.5). Children who die in the neonatal period are more likely to be cared for in a neonatal intensive care unit and therefore die in hospital; figure 3.5 suggests that these babies with longer post-natal hospital stays were more likely to have their NHS number added to their records during the birth admission, once the NHS number was allocated at birth registration, enabling linkage to an ONS mortality record. However, the vast majority of babies born before 2003 would be discharged shortly after birth and not get their NHS number updated in hospital records. If a birth episode did not contain an NHS number, it could only be linked to consecutive admissions in HES and to ONS mortality records using postcode, date of birth and sex (see Boxes 3.1 and 3.2) and no link would be established if, for example, the child changed their address. Thus, I considered linkage to ONS mortality data and to longitudinal hospital admission records in HES to be unreliable before 2003 due to a high risk of missed matches. Hence, I excluded 2,683,451 births prior to 2003 from the HES-ONS birth cohort from further analyses.

3.5.2.2.2 Recording of causes of death

In 2003-2012, 8.2% of all deaths were recorded in HES but did not link to an ONS mortality record (2,197 deaths). Missed links with ONS were the most common for early deaths – 66.5% of these “HES only” deaths were on days 0-1 of life (1,460 deaths), 26.9% at 2-27 days (591 deaths), 5.7% in the post-neonatal period (126 deaths), and only 0.01% of deaths were beyond infancy (25 deaths).

260 of the 39,419 deaths which did link to an ONS mortality record did not have any recorded causes of deaths, of which 98.1% were at age 28-30 days (255 deaths). They accounted for 78.5% of all deaths on days 28-30 days (325 deaths). It is likely that these deaths were certified using neonatal death certificates, which should be used for deaths in the neonatal period, and causes of deaths were removed by NHS Digital when processing the data. Data cleaning rules applied by NHS Digital to the ONS mortality data are not documented to confirm this. Thus, I restricted comparisons of cause-specific mortality between England and Sweden (presented in Chapter 6) to deaths beyond 30 days of life to ensure comparability.

3.5.3 Recording of key risk factors for child mortality

3.5.3.1 Improving the completeness of risk factor variables using mother-baby linkage in HES

96% of births in the HES-ONS birth cohort linked to a delivery record (Table 3.1).

Linkage results were comparable with those of Harron et al.154

Table 3.1 –Percentage of births in HES-ONS birth cohort which were linked to a maternal delivery record by year of birth and linkage method, compared to results reported by Harron et al.

Deterministically linked Probabilistically linked Overall linkage rate Year Harron

et al. My cohort Harron

et al. My cohort Harron

et al. My cohort

2003 24% 23% 73% 70% 97% 94%

2004 24% 23% 68% 72% 92% 95%

2005 25% 25% 70% 70% 95% 95%

2006 35% 35% 59% 61% 94% 96%

2007 24% 23% 71% 73% 95% 95%

2008 32% 31% 65% 65% 97% 96%

2009 36% 36% 61% 60% 98% 96%

2010 41% 41% 57% 55% 98% 96%

2011 43% 43% 55% 54% 99% 96%

2012 42% 41% 57% 55% 99% 96%

Total 33.1% 32.5% 63.4% 63.0% 96.5% 95.5%

HES=Hospital Episode Statistics; ONS=Office for National Statistics. Link age rates for Harron et al.154 were obtained via personal communication with Dr Harron.169

Key findings for Section 3.5.2:

 Birth episodes recorded in HES prior to introduction of NN4B in 2002 were more likely to be missing the NHS number, leading to an increased risk of missed links to longitudinal hospital admission records and ONS mortality records. Thus, analyses have to be limited to 6,113,116 births in 2003-2012.

 Mortality rates in infancy based on the HES-ONS birth cohort were representative for England and Wales for births in 2003-2012.

 Comparisons of cause-specific mortality with Sweden should be restricted to deaths beyond 30 days of life due to the use of neonatal death certificate in England and high rates of missing data on causes of death at 28-30 days.

Mother-baby linkage led to substantial improvements in the completeness of risk factors at birth in the HES-ONS birth cohort. After linkage to maternal delivery records the proportion of births with recorded information increased from:

 67% to 84% for birth weight

 64% to 78% for gestational age

 63% to 97% for maternal age

 45% to 97% for IMD score

Importantly, the coverage of the complete case cohort has increased from only 18%

(driven by high rates of missing IMD scores at birth) to 75% of all births in HES-ONS birth cohort. Using enhanced information on postcode, I excluded 12,712 births to non-English residents to match inclusion criteria to SMBR (Figure 3.3). As a result,

6,100,404 births remained in the HES-ONS birth cohort.

3.5.3.2 Cohort validation

3.5.3.2.1 Missing data

Rates of missing data were highest for gestational age and birth weight (missing for 22% and 17% of births respectively, Table 3.2). The rates of missing data were higher in children who died than for live births, and decreased with increasing age at death.

Nearly half of deaths on days 0-1 of life did not have recorded gestational age, and a third did not have recorded birth weight. Due to an extraction error by NHS Digital, these early deaths were also more likely to have missing postcode and consequently, no IMD score (unless they had more than one hospital episode at birth). High rates of missing data in the “baby tail” (as indicated by higher rates of missing gestational age and birth weight) meant that these births were less likely to link to a delivery record, as these variables were part of linkage algorithm (see Appendix C.3).

Table 3.2 – Percentage of births and deaths by age at death recorded in the HES -ONS birth cohort in 2003-2012 with missing recording of risk factors of interest

Deaths by age at death

HES=Hospital Episode Statistics; IMD=Index of Multiple Deprivation; ONS=Office for National Statistics.

Complete case cohort covered 4,545,247 out of 6,100,404 births recorded in the HES-ONS birth cohort in 2003-2012 (75%). Mortality rates in infancy based on the whole HES-ONS birth cohort were comparable with rates reported for England and Wales (2.6 infant deaths/1000 live births vs 2.5/1000 live births, Table 3.3).42,161 However, infant mortality rates in the complete case cohort were underestimated (2.3/1000 live births). This was primarily driven by highly underestimated mortality at 0-1 days (0.80 deaths/1000 live births vs 1.6/1000 births). Beyond the 1st day of life, the rates were underestimated by 0.07-0.10 deaths/1000 live births.

Table 3.3 - Comparison of crude mortality rates per 1000 births in whole and complete case HES-ONS birth cohorts, in England and Wales according to ONS national statistics in 2003-2012

Age at death HES-ONS: whole birth cohort

HES-ONS: complete case birth cohort

England and Wales (ONS)

Number of births 6,100,404 4,545,247 6,604,156

0-1 days 1.59 0.80 1.61

2-6 days 0.59 0.46 0.54

7-27 days 0.67 0.57 0.64

28-364 days 1.34 1.22 1.33

0-364 days 2.60 2.25 2.51

HES=Hospital Episode Statistics; ONS=Office for National Statistics. Rates for England and Wales were obtained from ONS mortality publications for 2003-2012.42,161 Mortality rate on days 0-1 in England and Wales was estimated by assuming that approximately ¾ of deaths on days 0-6 occur in the first two days.170

3.5.3.2.2 Distribution of risk factors in live births

The distributions of birth weight, gestational age, and maternal age for live births recorded in the HES-ONS birth cohort closely matched the distributions reported for England and Wales (both overall and for the complete case cohort, Table 3.4). Births at

<24 weeks’ gestation or weighing <1000g at birth contributed to a low proportion of births, but were highly underreported (by 20% in the whole cohort and by 35% in the complete case cohort for a birth weight <1000g, and by 63% in the whole cohort and 88% in the complete case cohort for births at <24 weeks’ gestation).

Table 3.4 – Distribution of birth weight, maternal age, gestational age in whole and complete case HES-ONS birth cohorts, and in England and Wales according to ONS national statistics in 2003-2012

HES-ONS: whole cohort

HES-ONS:

complete case cohort

England and Wales (ONS)

Birth weight (g)

<1000 0.34% 0.27% 0.42%

1000-1499 0.49% 0.46% 0.51%

1500-1999 1.0% 1.0% 1.0%

2000-2499 3.8% 3.8% 3.8%

2500-2999 16% 16% 16%

3000-3499 37% 37% 37%

3500-3999 30% 30% 30%

≥4000 11% 12% 12%

Maternal age (years)

<20 6.3% 6.2% 6.3%

20-24 19% 19% 19%

25-29 27% 27% 27%

30-34 28% 28% 28%

35-39 16% 16% 16%

≥40 3.6% 3.6% 3.6%

Gestational age (weeks, 2010-2012)

<24 0.033% 0.0087% 0.085%

24-27 0.24% 0.22% 0.27%

28-31 0.56% 0.53% 0.60%

32-36 4.7% 4.7% 4.7%

37-41 90% 90% 90%

≥42 4.5% 4.5% 4.2%

HES=Hospital Episode Statistics; ONS=Office for National Statistics. All data are % of all singleton live births per risk factor category. Information for England and Wales was obtained from ONS mortality publications for 2003-2012.42,161 For gestational age tabulations, I used data from 2010-12, as the gestational age categories in ONS publications were sufficiently detailed only for these years.34 Column totals may not add up to 100% due to rounding.

3.5.3.2.3 Distribution of risk factors by age at death

Mortality rates for a birth weight <1000g in the complete case cohort were severely underestimated at 0-6 days (150/1000 births compared to 250/1000 births in England and Wales, Table 3.5), and overestimated for deaths at 7-27 and 28-364 days (60/1000 births compared to 48/1000 births and 63/1000 births compared to 52/1000 births, respectively). For other birth weight categories, the rates were comparable with those reported for England and Wales. The differences in birth weight-specific mortality based on the whole HES-ONS birth cohort compared to rates in England and Wales were smaller than for complete case HES-ONS birth cohort (Table 3.5).

For maternal age, mortality rates in the complete case HES-ONS birth cohort were underestimated at 0-6 days for all maternal age categories, reflecting

underrepresentation of deaths on days 0-1 of life in the complete case cohort

compared to England and Wales (as shown in Table 3.3) due to under-recording of risk factors of interest for these early deaths (as shown in Table 3.2). Mortality rates by each maternal age category based on the whole HES-ONS birth cohort were underestimated relative to rates in England and Wales, but the differences were smaller than for the complete case cohort (Table 3.5).

For gestational age, mortality rates were underestimated at <24 and 24-27 weeks in the neonatal period, and overestimated in the post neonatal period relative to rates reported for England and Wales (for both whole and complete case HES-ONS birth cohorts). Mortality rates were representative for births at ≥28 weeks (Table 3.5).

113

Table 3.5 – Mortality rates per 1000 births by age at death and categories of birth weight, maternal age and gestational age in whole and complete case HES-ONS birth cohorts, and in England and Wales according to ONS national statistics in 2003-2012

Risk factor Birth weight (g) Early neonatal deaths (0-6 days) Late neonatal deaths (7-27 days) Post neonatal deaths (28-364 days)

<1000 230 150 250 53 60 48 56 63 52

(years) Early neonatal deaths (0-6 days) Late neonatal deaths (7-27 days) Post neonatal deaths (28-364 days)

<20 2.4 1.7 2.8 0.97 0.86 1.0 2.3 2.3 2.4

Gestational age (weeks, 2010-2012) Neonatal deaths (0-27 days) Post neonatal deaths (28-364 days)

<24 800 510 860 50 93 27

HES=Hospital Episode Statistics; ONS=Office for National Statistics. Information for England and Wales was obtained from ONS mortality publications for 2003-2012.42,161 For gestational age tabulations, I used data from 2010-12, as the gestational age categories in ONS publications were sufficiently detailed only for these

34

3.5.4 Strategies for dealing with missing data

3.5.4.1 Multiple imputation using chained equations

Following MI, the differences in proportions of live births by birth weight and gestational age categories relative to England and Wales remained, but were smaller than for the complete case cohort (Table 3.6). For example, a higher proportion of births in the imputed datasets had a birth weight <1000g (0.31% compared to 0.27% in the complete case cohort) and were born at <24 weeks’ gestation (0.027% compared to 0.0087%). However, these proportions were still underreported by 26% and 63%

relative to national statistics published by the ONS for England and Wales. The

distribution of maternal age was representative for the population of children in England and Wales. This was expected since only 3.2% of records had missing maternal age (after linkage to the mothers’ delivery records).

Key findings from Section 3.5.3:

 The complete case HES-ONS birth cohort cannot be used for fair

comparison of child mortality in England and in Sweden as mortality rates are underestimated compared to national figures reported for England and Wales by the ONS (especially on days 0-1 of life).

 Underestimated infant mortality rates in the complete case HES-ONS birth cohort were primarily driven by underreporting of gestational age and birth weight among the most vulnerable babies: born at <24 weeks, weighing

<1000g at birth, or those who died shortly after birth.

 There was no “pattern in missingness” of maternal age; that is, infant mortality rates were underestimated in for all maternal age categories.

Table 3.6 – Distribution of birth weight, gestational age and maternal age among births in the complete case HES-ONS birth cohort, following MI, and in England and Wales

according to ONS national statistics in 2003-2012 Risk factor

Gestational age (weeks, 2010-2012)

<24 0.0087% 0.027% 0.085%

HES=Hospital Episode Statistics; MI=multiple imputation; ONS=Office for National Statistics. All data show % of all live births. The % were calculated separately for each imputed dataset and pooled together using Rubin’s rules (that is, by tak ing an average).163,164 Information for England and Wales was obtained from ONS mortality publications for 2003-2012.42,161 For gestational age tabulations, I used data from 2010-12, as the gestational age categories in ONS

publications were sufficiently detailed only for these years.34 Column totals may not add up to 100% due to rounding.

Birth weight-specific mortality rates in the imputed datasets were not representative for England and Wales. The rates were overestimated for birth weight categories of: 1000-3499g for deaths at 0-6 days, <3000g at 7-27 days and <1499g at 28-364 days.

Similarly, gestation-specific mortality rates did not match published rates for England and Wales (Table 3.7). Mortality rates by maternal age category in the imputed datasets, however, were representative of England and Wales compared to national statistics published by the ONS.

116

Table 3.7 – Mortality rates per 1000 births by age at death and birth weight, gestational age, and maternal age categories based on the complete case HES-ONS birth cohort, following MI, and in England and Wales according to ONS national statistics in 2003-2012

Risk factor Birth weight (g) Early neonatal deaths (0-6 days) Late neonatal deaths (7-27 days) Post neonatal deaths (28-364 days)

<1000 150 230 250 60 51 48 63 57 52

(years) Early neonatal deaths (0-6 days) Late neonatal deaths (7-27 days) Post neonatal deaths (28-364 days)

<20 1.7 2.8 2.8 0.86 1.0 1.0 2.3 2.3 2.4

Gestational age (weeks, 2010-2012) Neonatal deaths (0-27 days) Post neonatal deaths (28-364 days)

<24 510 790 860 93 53 27

HES=Hospital Episode Statistics; MI=multiple imputation; ONS= Office for National Statistics. Mortality rates were calculated separately for each imputed dataset and

HES=Hospital Episode Statistics; MI=multiple imputation; ONS= Office for National Statistics. Mortality rates were calculated separately for each imputed dataset and