• No results found

Improving the identification of diabetes mellitus in electronic health records

3. to be geographically generalizable, scalable, cost-effective and future proof.

6.9 Rationale

In November 2014, Medline was completed by myself to identify published articles utilizing the CPRD to examine ethnic differences in diabetes mellitus. The search showed that no study had been undertaken to quantify ethnic differences in the incidence and prevalence of diabetes mellitus in the CPRD. The following study set out to examine whether estimates of diabetes burden by ethnic group using the observational data held in the CPRD are comparable to those found in cohort and interventional studies undertaken throughout the UK.

Diabetes mellitus is the largest public health problem facing the UK. Its increasing prevalence means that significant numbers of cases are available in the CPRD. Since the disease is predominantly managed in primary care we are likely to pick up the vast majority of cases in our database. Furthermore, findings from this research can be translated into action to address inequalities since the majority of diabetes risk is attributable to modifiable risk factors. Should the data in the CPRD be found usable to examine ethnic inequalities in outcome, the scope for research into ethnicity and diabetes and opportunities for timely and high-impact research at a low cost are expanded.

The purpose of adjudicating disease outcomes is two-fold: firstly, to enhance power by improving identification of disease cases; and secondly, to increase specificity of classification

95 by differentiating between conditions of similar presentation but different aetiology.(225) Though linkage of primary care records for a proportion of Biobank participants has been achieved in Wales and Scotland, no linkage to English, Scottish or Irish primary care yet exists.

The algorithms designed to adjudicate diabetes cases were developed by a team at the University of Surrey led by Simon de Lusignan using the Welsh data held in the Secure Anonymised Information Linkage Databank (SAIL) at the University of Swansea. Though the algorithms as designed required the use of ethnicity data to help adjudicate diabetes type, no ethnicity data were available at the time of development and thus the results from the initial derivation cohort may underestimate the prevalence of type 2 diabetes in ethnic minority subgroups in the SAIL database population.

The present study sought to improve on the original implementation by utilizing the patient- level ethnicity data available in the CPRD. Additionally, the performance of these algorithms in the CPRD was examined, as improving classification of diabetes by ethnic group is a critical first step towards validating these algorithms for use in other, more diverse settings. The increased specificity of diabetes diagnoses across the CPRD population will greatly improve power to identify ethnic differences in patterns of diabetes prevalence and incidence in large- scale epidemiological studies using this resource.

6.10 Study objectives and hypotheses

The objectives of this study were to:

1. Implement three algorithms designed to improve the classification of type 1 and type 2 diabetes mellitus and develop a simplified version of the algorithms which does not require the use of linked HES data.

2. Compare the performance of the algorithms with respect to improving the coding and classification of diabetes type between the Welsh SAIL database, where the algorithm was derived, and the CPRD.

96 3. Determine ethnic differences in the prevalence and incidence of type 1 and type 2

diabetes over time using populations derived from the algorithms.

4. Examine ethnic differences age at onset of type 1 and type 2 diabetes, and BMI value at diagnosis.

5. Quantify differences in the incidence of type 1 and type 2 diabetes for South Asian and Black/Caribbean subgroups.

The hypotheses of this study were:

a) The use of algorithms incorporating routinely recorded information on prescribing, clinical measures and competing diagnoses will improve the classification of diabetic type over the use of diagnostic Read codes alone.

b) The prevalence and incidence of type 2 diabetes will increase over time, with incidence highest for South Asian followed by Black African/Caribbean and White groups.

c) Onset of type 2 diabetes will be earlier in South Asian groups compared with White and Black African/Caribbean groups.

d) The BMI value closest to the date of diabetes diagnosis will not vary by ethnicity for T1DM, but be lower for the South Asian in comparison with the White group for T2DM.

6.11 Methods

All clinical and therapeutic data were extracted from the August 2013 build of the CPRD for all patients with at least one diagnostic Read code for diabetes mellitus (see Appendix).

6.11.1 Data Extraction

a) Diagnostic Read codes for diabetes mellitus

Each individual Read code was assigned to one of seven categories (type 1 definite/probable/possible, type 2 definite/probable/possible, or other). If a patient had multiple Read codes falling into any one category, the earliest recorded code of each type was retained for analysis (Table 7.1).

97 Table 6.2 Categorization of Read codes for diabetes mellitus

Type 1 Diabetes Type 2 Diabetes Other

Definite Type 1 DM: C10E Not contradicted/ceased/superseded Type 2 DM: C10F Not contradicted/ceased/superseded Gestational L180 Genetic C10c- C10D Other/Secondary C10G-J, L-N, C11y0 Insulin resistance: C10K, C1098, C10F8 Ceased: 21263, 212H Probable IDDM: C108 Adult onset: C1073 Gestational: L1805 Not contradicted/ceased/superseded NIDDM: C109 Gestational: L1806 Gestational: L180X Not contradicted/ceased/superseded Possible Diabetes mellitus, adult onset:

C10z1 C10y0 C110 Not

contradicted/ceased/superseded

Diabetes mellitus, adult onset: C10%, C112 (z), L180x

Not

contradicted/ceased/superseded

b) Clinical Measures

For each clinical measure, the value closest to the date of diabetes diagnosis was retained for analysis.