Chapter 4 Development of the research cohort data set
4.5 Housing and security
Data were obtained following an application to TPP, which was reviewed internally by their research committee. Following approval, a data extract was prepared by a TPP analyst, and this was delivered through a secure data link. The flow of data is shown in Figure 6.
Figure 6: Chart to illustrate data flow
Abbreviations
LIDA: Leeds Institute for Data Analytics VRE: Virtual Research Environment SQL: Structured Query Language CSV: Comma-Separated Values
All data were housed within a secure Virtual Research Environment (VRE). This is a ‘private cloud’ with limited, secure access and strict protocols for transfer of data in and out. The VRE is managed by a team of data analysts who are responsible for disclosure control, information classification, security, and back- up arrangements. It is accredited to the international standard for information security management, ISO/IEC 27001:2013, and meets the requirements to store health data from NHS Digital, Public Health England and other NHS or social care organisations.277 A data management protocol was completed with
input from the data services team, and was approved by the information governance manager for the Leeds Institute for Data Analysis (LIDA). A brief summary of this will follow.
4.5.1 Extract from data management protocol
4.5.1.1 Data Collection
What data will you collect or create? Patient records will be extracted from the ResearchOne database. Long-term access will not be allowed, or required. Data will be accessible for up to 5 years.
How will the data be collected or created? Data are extracted electronically from routine primary care records. Data will be transferred electronically.
4.5.1.2 Documentation and Metadata
What documentation and metadata will accompany the data? Some of the data will be coded using controlled terminologies such as ICD, British National Formulary (BNF) and Read, and the appropriate version of these terminologies will be stored with the data.
4.5.1.3 Ethics and Legal Compliance
How will you manage any ethical issues? The data are de-identified. Routine clinical data will be used. This does not require specific ethical review, as the research is limited to secondary use of information previously collected in the
course of normal care without the intention to use it for research at the time of collection. Patients are not identifiable to the research team. The ResearchOne database has NHS Research Ethics Committee and National Information Governance Board approval. The data will be saved securely on the university Integrated Research Campus (IRC).
How will you manage copyright and Intellectual Property Rights issues? Research findings can be freely published without interference, regardless of the nature of the findings. Where the ResearchOne dataset contributes toward any publication or presentation the source must be acknowledged and a copy of any journal or conference publication submitted to the ResearchOne Project Committee.
4.5.1.4 Storage and Backup
How will the data be stored and backed up during the research? The University of Leeds IRC is a secure data management platform. The IRC handles a large volume and variety of data so that it can be used securely and efficiently in research.
Data will be stored on a project-specific VRE on the IRC. The VRE enables data analysis through remote access into a secure virtual desktop, ensuring the data stays within the secure environment. Researchers sign an IRC User Agreement and undertake any required information governance training before being given access to the data through the VRE. Data cannot leave the environment without approval and intervention by the IRC Data Services Team, who check for
unauthorised disclosure. Researchers disseminate non-disclosive findings or consented information – and publish these open access where possible. Data is subjected to volume-level snapshots periodically throughout the day and is synchronously replicated to a secondary data centre on campus.
How will you manage access and security? IRC processes are based on
international standards and legal requirements for the confidentiality, availability and integrity of data. Data handling procedures are determined by the IRC’s
Information Security Management System which has gained accredited certification to ISO/IEC 27001:2013 and has been assessed as satisfactory against the NHS Information Governance Toolkit. The main risk to data security is re-identification of data subjects, either accidentally or intentionally. The use of a VRE on the IRC significantly reduces this risk. Researchers are not able to introduce additional data to the VRE to enable jigsaw attacks to attempt re- identification. Researchers are not able to download data from the VRE themselves, therefore preventing release of data that may be potentially identifiable. The platform itself has been designed to be secure in operation, has been penetration tested and undergoes regular patching and vulnerability scanning. Access control is strict and researchers can only access their own projects, and only in isolation from each other so they cannot leak data across projects.
Researchers accessing the IRC are bound by an IRC User Agreement which details their responsibilities. Researchers are also bound by the terms and conditions of their contract with the University of Leeds, and its requirement to be bound by the statutes, ordinances and policies of the institution. Any outputs of data from the VRE will be verified by the IRC Data Services Team as
compliant with relevant legislation, contracts and agreements which the project is bound by, in particular to the Data Protection Act 1998. Researchers are also bound by the ResearchOne confidentiality agreement which contains clauses which confer duties upon the institution and individual in relation to
confidentiality and data protection.
4.5.1.5 Selection and Preservation
Which data are of long-term value and should be retained, shared, and/or preserved? The data must be destroyed after five years by agreement with ResearchOne. The dataset is solely for use on projects that have approval from the ResearchOne Project Committee and relevant ethics and governance bodies.
What is the long-term preservation plan for the dataset? The data must be destroyed after five years by agreement with ResearchOne. The dataset is solely for use on projects that have approval from the ResearchOne Project Committee and relevant ethics and governance bodies.
4.5.1.6 Data Sharing
How will you share the data? This data must not be shared. Other researchers may apply to ResearchOne for the same data. The results of the research will be published in the academic literature, and will form an MD dissertation. The dataset is solely for use on projects that have approval from the ResearchOne Project Committee and relevant ethics and governance bodies.
4.5.1.7 Responsibilities and Resources
Who will be responsible for data management? The data will remain in the IRC, Leeds. Responsibility for good practice lies with each researcher using the dataset. The researchers are under the supervision of Professor Chris Gale, (Professor of Cardiovascular Medicine, University of Leeds and co-supervisor).