Data Standards and Distributed Security Challenges

As CVOs commonly span heterogeneous domains, a requirement for the construction of distributed queries and aggregation, or joining, of distributed data is the development and use of a standard method of classification or common vocabulary. This includes the naming of the data sets themselves, the people involved and their roles (privileges) with regard to access and use of these data sets. Preferably these data sets and roles should be standardised so that comparisons can be made and queries joined together, for example, across a range of clinical data sets.

There are quite a few developments in standards for the description of data sets used in the clinical trials domain. However, this can be an involved process depending on the standards groups developing and acting on strategies put together through major initiatives such as Health-Level 7 (HL7) [135], SNOMED-CT [190] and OpenEHR [191]. There are a wide range of legacy data sets and naming conventions which impact upon standardisation processes and their acceptance. The International Statistical Classification of Disease and Related Health Problems version 10 (ICD-10) [192] is used for the recording of diseases, health related problems and is supported by the World Health Organisation. For example, within the NHS Scotland, ICD-10 is used along with ICD-9 and read codes in the SMR data sets.

Linking standardised data descriptions between domains so that entities and relationships within one organisational hierarchy can be mapped or understood within the context of another domain is fundamental to any framework. Once it has been established how meaningful comparisons can be made between the differing domains, the framework can be applied to a generic clinical trial that could run queries across heterogeneous domains, bringing back results, richer in scope and information than if single local sites had been independently queried.

Since information stored in clinical trials are highly sensitive, data obtained or collected must be kept in the strictest confidence and the integrity of the data must be maintained. The exact data should only be revealed to few roles in a trial. This was one of the most fundamental challenges of VOTES to realise the opportunities and benefits that can be brought to clinical trials by Grid technology, but also to maintain the high security standards that must be strictly adhered to.

Security policies will naturally differ between local sites, which leads to several challenges when defining and implementing policies that take into account both local and remote security concerns. These include:

• Applying a generic policy that takes into account each local policy or links local policies together using a standard interface;

• Dynamically enforcing these policies so that, for example, restrictions applied by a site not providing pertinent information for a particular query will not impact on the other sites that are involved;

• Building a trust chain that allows local sites to authenticate with a VO and therefore, by proxy, be authenticated to access limited resources at other sites without compromising protected resources at those sites;

• Prevention of inference (statistical disclosure) that may arise when data from multiple sources are joined together;

• Maintaining data ownership and enforcing ownership policies regardless of where the data might be moved, stored or used.

8.6 Scenarios

To consider the challenges of clinical trials and provide the context in which DTN is explored, several key scenarios are outlined and the requirements these place on secure collaborations. The first scenario presents a representative sequence of interactions demonstrating how recruitment of patients can be ethically achieved. The second and third are overview scenarios for data collection and study management respectively.

Scenario 1 - Patient recruitment: This scenario presents a representative sequence of interactions demonstrating how recruitment of patients can be ethically achieved.

1. A trials co-ordinator logs into a web portal that provides a visual interface to various CVOs associated with a variety of clinical trials5 _{and/or tentative trials. After}

authenticating, a personalised environment is created based upon the specific role (in this case, that of the trials coordinator) in the CVO and the location from where they are accessing the portal. He/She is only shown the Grid services pertinent to the ap- propriate trials applicable to him/her, and hence the data sets associated with those services.

2. The trial coordinator wishes to recruit patients for a leukaemia cancer trial. Patient details are available in hospital and GPs local (and secure) databases. Emails are sent to the GP practices or hospitals with information describing the particular trial to be conducted, the general criteria applicable to matching patients and other information, e.g. financial information about participating in the trial. The email contains a link to a Grid service (Leukaemia trial 2006). The GPs and Consultants themselves are described in policies associated with the tentative set up of a CVO, for patient identification and recruitment.

3. It is assumed that the GP/Consultant is interested in entering into the trial, i.e. have matching patients and they follow the attached link. The GP/Consultant may securely access the Grid service either using a username and password combination or using a digital certificate, e.g. X509 certificate. In this scenario, it is assumed that X509 certificates are being used.

4. After extracting more information about the trial from the portal, the GP/Consultant decides to download a signed XML pro-form pre-designed for the trial. This is a partially completed document describing the main information relevant to the trial as documented in the trial protocol, where the empty fields need to be filled through a query to the GP practice or hospital databases.

5. The signature of the signed pro-forma document is checked to ensure its authenticity and to ensure that it has not been corrupted. If these are both true, the document is used as the basis for an XML query against the GP practice or hospital databases (GPASS supports such an interface). This query might, in turn, result in further information being extracted from other resources.

6. At this point, letters describing the trial to selected patients can be automatically produced. These are used to obtain patient consent before continuing further with the recruitment.

7. The selected patients may then consent to participating into the trial. Note that their letters of consent may be sent directly to the trial coordinator instead of the GP/ Consultant as described here.

8. The forms are automatically completed based on the results of the queries to the GP practice or hospital database. The forms are digitally signed and returned to the Grid service for that particular trial (Leukaemia trial 2006).

9. The returned and signed XML document is authenticated. Verification that the sender (the GP/Consultant) is authorised to upload the document are made, e.g. through checking that they were one of the GPs/Consultants contacted initially. The document is validated to ensure its correctness, e.g. by ensuring it satisfies the associated schema and the relevant data fields are meaningfully completed (and match the desired constraints associated with participation in the trial). At this point, the responding GP/Consultant is formally added to the CVO. Further follow up information may sub- sequently be sought, e.g. monitoring information related to the selected patients. 10. The completed XML document and the associated meta-data describing the history of

how this information was established, by whom, when, and for what trial are uploaded and securely added to the CVO repository for this particular trial.

It is important to note in this scenario that patient consent is given (step 6) before patient data is returned to the clinical trials team. Another important aspect here is that the GP can decide whether this might be in the patient’s interest. The patient may ultimately say no and hence is always involved in the consent process.

Scenario 2 - Data collection: A trial investigator submits a query for data that is generated or stored at different study centres. The types of data to collect will include lab results, patient medical history, trial data from trial databases, code lists for adverse events and follow-up data. Data is expected to be pulled and aggregated from geographically distributed locations, which include hospitals, primary care information systems, mobile centres, PDAs and laptops. Security concerns include statistical disclosure, confidentiality, privacy and data integrity.

Scenario 3 - Study Management: A steering or data monitoring committee member in- vestigating the adherence/compliance of a study to an agreed protocol, requests access to collected data and trial investigators audit trails. He/She is expected to generate reports based on his/her observations and analysis. He/She is expected to execute statistical pro- grams on data collected from different sources. His/Her requirements include understanding the semantics and structure of collected data. Security requirements here include confidentiality, privacy and data integrity.

8.7 Summary

This chapter reported on the various clinical trials investigations that served as the basis for testing the DTN framework and its implementation. The chapter described how two case studies were carried out, of various magnitudes and scale, to test and validate the security- oriented requirements of collaborative centres in e-Health environments. The trial reviews revealed the need for trust realisation in security-oriented collaborative environments. These reviews formed the basis for the DTN framework, developed and applied for clinical trials. The chapter described how the VOTES project provided a basis for exploring this research work. Background information on VOTES was given including how it was a collaborative

universities of Oxford, Glasgow, Imperial, Nottingham and Leicester. The chapter described how the aim of VOTES was to build a clinical framework that supported a multitude of clinical virtual organisations. Among the findings from this work it was identified that a key need exists for data standardisation and distributed security. The chapter concluded with different scenarios tackling the issue of patient recruitment, data collection and study management in the e-Health clinical trials domain.

This chapter describes the DTN implementation in systems that have been developed to support clinical trials. Two systems developed for VOTES are described and the implementation of DTN in these systems is discussed. The chapter concludes with the performance and evaluation of the DTN implementation.

9.1 Virtual Organisations for Trials of Epidemiological

Studies

Successful e-Health research depends on access to, and use of, a wide range of clinical, biomedical, social, geospatial, environmental and other data sets. In large scale, multi- centre clinical studies crossing geographical and organisational divides, the need to access, link and aggregate data securely is core. Whilst the Grid community have come up with a wide variety of technologies that support authentication and authorisation, experiences in the Virtual Organisations for Trials and Epidemiological Studies (VOTES) project have shown that irrespective of the technological advances and capabilities offered by the Grid community, data providers themselves are typically unwilling to provide direct access to their data sets, i.e. through the penetration of the NHS firewalls, for example, from Higher Education / Further Education (HE/FE). This is in addition to the European Union directive that says health providers, like the NHS, should only interact with parties they have explicit contracts with [157], e.g. informed consent [193, 194] given by a patient for his/her data to be accessible by another party is a form of a contract. To this end, prototype systems were developed as part of the VOTES project. These are described in the following sections.

In document Dynamic trust negotiation for decentralised e-health collaborations (Page 154-161)