Source electronic health databases 3.1 Summary
3.2 The Clinical Practice Research Datalink
Across the UK there are numerous primary care databases which bring together electronic patient records; however, most of these cover small geographical areas, or small numbers of general practices. The Clinical Practice Research Datalink (CPRD, formerly GPRD) is one of three clinical research databases which provide patient data from across practices in the UK, allowing for research to be undertaken on samples generalisable to the whole population. The other two databases like the CPRD are the Health Improvement Network Database (THIN) and the QRESEARCH database.
The CPRD was initially set up in 1987 as a commercial databank by the company VAMP (Value Added Medical Products). Now run by the Medicines and Healthcare products Regulatory Agency (MHRA), the CPRD is the largest primary care database in the UK, covering just over 8% of the UK population.(73–75)
The UK has the advantage of near-universal registration with general practitioners, around 98% of the entire population. As such, analyses of the registered patient population are widely representative of the UK population, though notable exceptions include asylum seekers, the homeless, prison populations and those in the armed services, who are less likely to access
39 GP services.(76–78) Additional linkages to secondary care data, disease registries, surveys and vital statistics give these databases unique value for observational studies and increasingly for pragmatic clinical trials.(79–82)
The CPRD currently contains longitudinal primary care records for approximately 13.5 million patients, of whom 5.5 million are currently active. Continuous observational data have been collected in most practices for over six years, yielding over 30 million patient years of observation.78 Patients contributing to the CPRD have been shown to be representative of the UK population in terms of age and gender, though in terms of regional representation the north of England is slightly under-represented.(74) Importantly, the validity of a wide range of diagnostic and clinical measures has been established, with a 2010 systematic review demonstrating a mean positive predictive value of 88% across a range of 183 diagnoses.(83– 86) The distribution of general practices contributing to the CPRD compared with the distribution of all general practices in the UK in July 2012 is shown in table 3.1.
Table 3.1 Regional distribution of practices contributing to the July 2012 CPRD compared with the UK distribution
Region CPRD July 2012 % UK April 2012 %
England 483 77% 8123 82%
Scotland 69 11% 998 10%
Wales 50 8% 474 5%
NI 22 4% 354 4%
Total 624 100% 9949 100%
All patients contributing to the CPRD are registered with 624 practices which all use the Vision clinical software system. Vision is one of several clinical software systems recommended for use in primary care by the GP Systems of Choice (GPSoC) Initiative, which supplies information technology systems to general practices across the UK. Other software systems include Egton Medical Information Systems (EMIS) and TPP System One, amongst others.(87)
40 General practitioners and practice staff record data onto their clinical systems and send anonymised patient data every 6 weeks to the CPRD. These data are then appended to the continually growing database, which contains information on diagnoses, symptoms, referrals, test results, medications, consultations, demographics, and lifestyle factors. Fifty per cent of English practices contributing to the CPRD also allow linkage to other data sources, such as the Hospital Episode Statistics for England and the Office for National Statistics (ONS) Mortality Data.
Quality of research data is audited at both the patient and practice level by the CPRD team. Individual patient data are defined as being of ‘Acceptable Research Quality’ (ARQ) if they are free of gaps or inconsistencies which cast doubt on the accuracy of the data recorded. Practices are required to record a minimum of 95% of prescribing and relevant patient encounter events. Data from practices are routinely validated by internal checks. Practice-level data are defined as being Up to Standard’ (UTS) if it conforms to set of 10 metrics, including having a high proportion of patients with ARQ data, and having rates of prescriptions, deaths, pregnancies and referrals comparable to other practices. The first practice to meet these quality criteria did so in 1987, with most other general practices reaching the same level of quality by 1991.
For research purposes, individual patient data is anonymised, with identifying information such as NHS number, name, date of birth, address and postcode removed. Information such as gender and year of birth are retained in order to conduct stratified analyses. In addition to these demographic data, researchers can access coded data pertaining to diagnoses, symptoms and processes of care. Free text entered by the primary care team are not routinely available to researchers, as these may contain identifiable information. Coded data are entered according to the Read clinical coding system, a hierarchical system of medical coding used across UK primary care.(88,89)
41 The CPRD is organised into 10 file types, each of which contains a subset of the patient record. For research purposes, information from these files can be joined using the anonymised patient or practice identifier. The file types are described in table 3.2.
Table 3.2 Description of Clinical Practice Research Datalink file types
File type Contents Example fields
Patient Basic demographics and
registration details
Anonymised identifier, year of birth, registration date, transfer out date, death date
Practice Details for all participating practices Practice identifier, geographical region, date of
becoming “up to standard”, date of last data collection
Staff Practice staff details Staff identifier, gender, role
Consultation Information about consultation type
as entered by the GP
Consultation identifier, consultation type, consultation date, staff identifier, consultation duration
Clinical All medical history including
symptoms signs and diagnoses
Date of clinical event, date of data entry, clinical code, episode type, additional details identifier
Additional Details relating to events coded in
the clinical file
Patient identifier, entity type, data fields (depends on entity type)
Referral Information about referrals to
external care centres
Referral method, referral specialty, referral type, attendance type, referral urgency
Immunisation Details of immunisation records Immunisation reason, type, stage, status, compound
used, location, reason for immunisation, route of administration
Test Test results linked to events coded
in the clinical file
Type of test, result, normal range for result, unit of measure
Therapy All prescriptions issued by the GP CPRD product code, British National Formulary
Code, product name, dosage, quantity, pack size, number of days prescribed