1
Genomic and Clinical Data Sharing Policy Questions with Technology and Security Implications:
Consensus Position Statements from the Data Safe Havens Task Team
Delivery date: 18 October 2014 When the Security Working Group (SWG) was asked to expedite the development of a security technology infrastructure specification for the Global Alliance for Genomics and Health, the lack of a uniform privacy and security policy foundation confounded the task. To begin to establish such policy, the SWG posed to the Regulatory and Ethics Working Group (REWG) a set of eight key genomic and clinical data sharing policy questions that carry technology and security implications. To address these questions, the REWG formed the Data Safe Havens Task Team, with members from both working groups. The following constitutes the consensus position of the Data Safe Havens Task Team for each question. The Task Team intends for these position statements to help guide the policy work of the REWG and the technology infrastructure work of the SWG.
2 Question 1
One can envision a number of ways that Global Alliance for Genomics and Health (GA4GH) participants might make their genomic and clinical data available to other GA4GH participants, and each of these ways implies a different technology architecture. Which of the following most closely matches how you envision GA4GH participants sharing data?
a. Each GA4GH participant that is a data provider will hold and manage its own data and will provide means for other GA4GH participants to query the data, consistent with the data provider’s own privacy and security policy;
b. The GA4GH will help steward one or more shared repositories that hold genomic and clinical data contributed by GA4GH data providers, each of whom will manage its own data sets under its own privacy and security rules; or
c. Some other way (please describe).
Position Statement
The GA4GH as an entity will not hold any data. Rather, it will define policy for responsible stewardship and technology standards to serve as guidance for participants in the GA4GH ecosystem. Some genomic and clinical data shared within the GA4GH ecosystem will reside in large repositories that are managed by a single entity, under a single policy. Some GA4GH participants will hold their own data, managed under their own local rules. And some participants may adopt a hybrid approach wherein certain data are locally managed, and some data subsets are contributed to a shared repository. In general, data used in clinical practice may be more likely to be held locally and protected under local policy. The GA4GH will set forth a high-‐level set of principles of responsible data sharing to which all participating organizations and individuals should conform. Local rules are acceptable so long as they do not violate the overarching GA4GH mission and the principles and core elements articulated in the Framework for Responsible Sharing of Genomic and Health-‐Related Data. Layers of policy may apply.
Question 2
One can also envision a number of ways that GA4GH participants might access and use shared genomic and clinical data, and each of these ways implies different security mechanisms. Which of the following most closely matches how you envision GA4GH participants accessing shared data?
a. By searching for and retrieving data, or downloading an entire data set, from one or more repository(s) made available to GA4GH participants, and analyzing the data using a software application running on the participant’s own computer system;
b. By using a web-‐based query service that the data holder makes available to GA4GH participants and that returns a response to the user’s query;
3
c. By executing a software application that analyzes data stored in one or more GA4GH participants’ repositories, without copying any data to the user’s own machine; or
d. Some other way (please describe).
Position Statement
Options A and B will definitely happen, but it is not yet clear how much federation (option C) will be possible. The Regulatory and Ethics Working Group should address the policy implications of options B and C.
Question 3
We assume that identity needs to be managed, and actions attributed, to the level of an individual user (versus group affiliation). That is, each person will log-‐in to the shared resource using her own, validated identity; access to applications and data will be controlled based on the authorizations
assigned to that identity and its associated role/affiliation; actions will be attributed to that identity; and digital signatures will be at the individual level. Is this a valid assumption?
a. Who will be responsible for identity-‐proofing individuals prior to giving them access to GA4GH-‐
shared genomic and clinical data?
b. What level of assurance is required for identity proofing? For example, will the individual need to be present in person in order to be identity proofed?
c. Who will issue access credentials (e.g., password, digital certificate) to individuals?
d. Can an individual’s authenticated identity be passed from one server to another? That is, if an individual logs into a software server authorized to access data held by a GA4GH participant, can that server pass that authenticated identity to another server without requiring the user to login again? If identities are shared in this way, how strong should the initial authentication be to make it trustworthy?
e. Who will identity-‐proof and issue credentials to software servers that access data held by GA4GH participants?
Position Statement
Yes, this is a valid assumption. The GA4GH may want to investigate the use of third-‐party identity-‐
proofing options. The GA4GH definitely needs the ability for an individual’s authenticated identity to be passed from one server to another. Also, individuals’ local access authorizations are subject to change, such as when the individual changes roles or leaves the institution. The GA4GH needs a way to assure that any local changes in an individual’s authorizations, or deletion of their local account, is propagated across the GA4GA ecosystem. The methods used for identity proofing and authentication, and the levels of assurance provided by those methods, may vary based on local policy and the sensitivity of the
4
data. This variability will make it necessary for information regarding methods to be passed along with the authenticated identity. 1
Question 4
We recognize that data privacy and security laws differ among GA4GH participant jurisdictions. How will these differences be accommodated within a GA4GH privacy and security policy?
a. Will the GA4GH adopt policy that reflects the most restrictive rules among all jurisdictions represented in the GA4GH organizational membership?
b. Will the GA4GH adopt minimal policy, assigning to each GA4GH participant responsibility for assuring compliance with more restrictive applicable law?
c. Will the GA4GH adopt some other type of policy?
d. Will the GA4GH require each participant to formally agree to comply with the GA4GH privacy and security policy?
Position Statement
The GA4GH will certainly need an agreed-‐upon privacy and security policy, guided in part by the Framework for Responsible Sharing of Genomic and Health-‐Related Data. Questions that need to be considered include: what technological mechanisms would need to be in place to prevent violations of policy, and who would be responsible for monitoring and enforcing these policies.
While performing an analysis in situ within a country of origin would provide a technological solution to enable simplified policy, practically, it must be assumed that data will pass between countries. Sharing data between countries may be simplified by ensuring that sufficient privacy protections are in place to comply with local data protection regulations. For those countries with anonymization policies, there are difficulties and considerations to deal with in relation to the possibility of stripping data of
usefulness as part of the anonymizing process.
Allowing data to leave countries to a small number of aggregation depots may create a simple ecosystem with a lower overall policy management scenario.
In the end, a privacy and security policy must drive the technological choices and final security
architecture when sharing data among entities that may span institutional, geographic, and regulatory boundaries.
1 The ability to pass authenticated identities makes the strength of the initial authentication even more important with regard to security assurance.
5 Question 5
What will be the policy regarding the generation and maintenance of an accounting of accesses and uses of genomic and clinical data shared among GA4GH participants?
Will the GA4GH maintain a centralized accounting of data accesses and uses, or will such logs reside with each participant? Who will review this/these log(s)? How will potential breaches and misuses be detected, and to whom will they be reported?
Position Statement
The GA4GH could act as a certification authority, possibly issuing a data safe haven badge of trust.
There will not be any centralized accounting. Who is responsible for reviewing logs matters less than ensuring that the logs would in fact be available, reviewable and in an interoperable common log format.
The GA4GH should develop guidelines on sanctions for beach of policies. A deliberate and material breach of a policy could lead to expulsion from the GA4GH, but notification should first be sent to the member. Options could also include contractual sanctions in a GA4GH data transfer agreement. We anticipate that most breaches will likely be accidental, and it will be important to set a
threshold/gradation for seriousness of breaches. Both the airline and financial industries provide good models for incident handling.
Question 6
What types of data will be sharable among GA4GH participants? For example, will demographic data be included? What is the policy for protecting phenotypic data?
Position Statement
Both genomic and clinical data could be shareable within the GA4GH ecosystem. This could be
broadened under the umbrella term “health-‐related data”. Policies for protecting different categories of data, based on sensitivity or other attribute, may be developed by the GA4GH.
Question 7
Will a GA4GH privacy and security policy contain any restrictions around the use of cloud services? For example, are public (i.e. commercial) clouds and private clouds equally acceptable? Does the GA4GH plan to use a community cloud for use only among GA4GH participants? Does the GA4GH plan to certify cloud service providers?
Position Statement
Foundational security protections will be consistent with generally accepted practices on all technology systems, i.e., a need for authentication, encryption, etc. Protections will be implemented at multiple
6
layers (e.g., application layer, operating system layer, and hardware layer) both within local data centers and within virtual facilities providing cloud services. As discussed in Question 5, audit logs of security-‐
relevant events, including data accesses, should be interoperable.
To enable compliance with jurisdictional and institutional policies, the physical location of stored data and application services should be transparent. For example, the virtualization model for cloud services moves data from server to server, data center to data center, and may cross countries or regions. Virtualization that crosses geographical boundaries may be problematic with respect to compliance with applicable laws (e.g. data privacy, intellectual property). Security and privacy policies, as well as consent policies, need to be respected by cloud service providers. A program to certify cloud service providers may be instituted by the GA4GH or a third party.
Question 8
How will appropriate individual consent be obtained, managed, and enforced within the GA4GH
community? How will an individual be able to change his or her authorizations for data sharing and use?
a. Each GA4GH participant will be responsible for obtaining unrestricted consent for any individual data shared with any other GA4GH participant, and for terminating sharing of the individual’s data to the GA4GH upon the individual’s request;
b. Each GA4GH participant will be responsible for obtaining the consent necessary for the intended usage before sharing individual data with any other GA4GH participant; for communicating to a recipient any restrictions the individual has placed on the use of those data; and for
implementing any authorization changes the individual may make;
c. Each GA4GH participant will be responsible for obtaining consent authorizing the GA4GH to manage access to an individual’s data, and the GA4GH will enable individuals to select (and change) sharing preferences to be enforced within the GA4GH community; or
d. Some other consent scheme.
Position Statement
The entity that makes the data available within the GA4GH ecosystem will be responsible for assuring that, where required, the consent necessary for the intended usage has been obtained before sharing individual data within the GA4GH ecosystem; for assuring that any restrictions the individual has placed on the use of those data are conveyed along with the data; and for communicating any authorization changes the individual may make. The provider of the data service that enables users to access the data is responsible for enforcing these restrictions, and for communicating restrictions to the data recipient.
We recognize that the privacy and consent laws vary among the countries involved in the GA4GH. We further recognize that the institutional privacy and consent policies and practices vary among the institutions that hold and manage genomic and clinical data, including the granularity of permission and authorization rules. Our challenge is to discover and enable means and mechanisms for enabling data to
7
be shared among a broad diversity of geographies and institutions, while adhering to applicable law, policies, and individual preferences.
Also, mechanisms for making legacy collections more efficiently available, which may include clinical information and tissues from deceased individuals, need to be developed.