To create the linked data that we needed for the programme, a number of steps were needed to bring the different types of information together. Within NHS Digital, processes are already in place to link HES records and ONS mortality data. The new task for this project was to link ambulance service electronic records with these subsequent health records.
April–November 2012
Data approvals • CAG (NIGB) • Ethics
• R&D (YAS and EMAS) January–July 2013
HES data submission to NHS Digital (formerly HSCIC) sign off
• Application submitted to NHS Digital in January 2013 • Final sign off from NHS Digital in July 2013
March–November 2013
Ambulance data
• Obtaining, querying and modifying ambulance data sample and submission to NHS Digital
December 2013– August 2014
NHS Digital data processing
• Notification in April 2014 that data would not be delivered. No further correspondence from NHS Digital was received until August 2014
• In August 2014 we are asked to resubmit our application to NHS Digital, renew our CAG approval and obtain an IG toolkit, with level 2 approval
September 2014– May 2015
Data approvals
• Resubmission to the CAG on 9 September, with approval granted on 19 December 2014
• IG toolkit submitted on 27 November 2014 • NHS Digital approval process 19 December 2014 to 19 May 2015
May–August 2015
Ambulance data
• Processing data to fit with new NHS Digital data requirements
• Investigation of missing DoB data (required as part of NHS Digital’s data-linking algorithm)
• Uploading data to NHS Digital
August–October 2015 NHS Digital data processing • Receive first batch of linked data on 29 September 2015
October 2015– March 2016
University of Sheffield data cleaning and formatting • Linking HES and ONS data to ambulance CAD and ePRF • Creating new variables for analysis
• Analysis of linkage rates. Identified no ‘hear and treat’ data linked and other non-linked cases
• Obtain improved data from ambulance service in order to rerun data linkage for missing cases
March–October 2016
NHS Digital’s data approvals and data processing
• Submission of improved ambulance data to rerun linkage for non-matched cases
• New DSA and data approvals required to rerun data linkage
• NHS Digital upgraded its systems and was unable to respond to e-mails for 1 month
• First tracing report available to UoS in June 2016 • Final data available to UoS October 2016
FIGURE 5 Timetable of data permissions and processes for obtaining linked data. CAG, Confidentiality Advice Group; DoB, date of birth; DSA, data sharing agreement; IG, Information Governance; NIGB, National Information Governance Board; R&D, research and development; UoS, University of Sheffield.
The first step was to retrieve the relevant information from ambulance service CAD and ePRF records. The starting point was all 999 calls received in the relevant time frame. Some calls were excluded at this point such as attendances with no ePRF, interhospital transfers, calls passed to other ambulance services and duplicate calls for the same incident. The exception was ‘hear and treat’ calls, defined as those calls that received input from a clinician (nurse or paramedic) but which have no ePRF record as no ambulance is sent. The following stepwise process was then followed:
l Yorkshire Ambulance Service and EMAS selected and extracted the study data sample, based on all included ambulance service contacts within the specified time period.
l The study ambulance services linked the CAD and ePRF data (except ‘hear and treat’ calls) for all
selected ambulance service contacts and produced a linked data set in Microsoft Excel®(Microsoft
Corporation, Redmond, WA, USA). These data contain a large number of variables recording details of the patient, call processes, response provided, clinical assessment and treatment.
l The ambulance services assigned a unique ID code to each individual patient record.
l The ambulance services created a version of the data set that contained only the clinical data from the ePRF, non-identifiable emergency call and dispatch information from CAD and the unique ID number. This anonymised file, in the form of a password-protected Excel spreadsheet, was sent via secure encrypted e-mail to the research team at the University of Sheffield.
l The ambulance services created a second version of the data set that contained only the variables required for data linking including patient identifiable data. These included, for example, date, time and location of incident, patient name, date of birth, address, hospital attended, the unique ID number, and (when available) NHS number. For cases for which there was no NHS number available, these were traced by NHS Digital. This data set was sent to NHS Digital as a password-protected Excel spreadsheet via NHS Digital’s secure electronic file transfer system.
The next step was to link the ambulance service data with HES and ONS mortality data. This was undertaken by NHS Digital using its data-linking algorithm. This was a deterministic linkage of NHS number, sex, date of birth and postcode using a series of progressive steps39to match the same information in one data set
with that in another. When the NHS number was unavailable, we used NHS Digital’s NHS number-tracing service to look up NHS numbers using date of birth and patient name. NHS digital linked ambulance data with a large number of variables from the HES A&E, HES patient admission and ONS death records so we could identify all patients who subsequently attended an ED, were admitted to hospital or died. The unique patient ID provided by the ambulance service was retained in this linked data set. After all possible records were linked, NHS Digital removed identifiable data and, when necessary, replaced it into a pseudonymised variable, for example date of birth was transformed into age. The de-identified data were returned to the research team using the same secure transfer processes.
The final step was for the research team to re-link the clinical and CAD data provided by the ambulance services with the HES and ONS data provided by NHS Digital, using the unique ID number contained in each data set to produce our final linked data set. Figure 6 shows the data flow processes used for workstream 2.
Because of the delays in obtaining linked data, we were unable to obtain the intended four complete data sets in our original plan. The first best-quality data received was that created for EMAS data for the period January–June 2013 in October 2016. We did subsequently obtain linked data for YAS for the same period and also the linked data for both EMAS and YAS for the second period of July–December 2013. However, given the time needed to then process these data sets into formats needed for the programme research, we were unable to use them within the time available. These data will be available for further research but the description below of data processing and the number of cases included in the linked data used in this programme was confined to the first EMAS data set we were able to fully utilise.
DOI: 10.3310/pgfar07030 PROGRAMME GRANTS FOR APPLIED RESEARCH 2019 VOL. 7 NO. 3
© Queen’s Printer and Controller of HMSO 2019. This work was produced by Turner et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.