Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 1 of 29
Linked2Safety
FP7-288328
A Next-Generation, Secure Linked Data Medical
Information Space for Semantically-Interconnecting
Electronic Health Records and Clinical Trials
Systems Advancing Patients Safety in Clinical
Research
Deliverable D3.3
Public Summary on Interoperable EHR Data Space
Editor(s): Cristi Potlog (SIVECO), Hasapis Panagiotis (INTRASOFT), Dimitris Ntalaperas (UBITECH), David Tian (UNIMAN), John Keane (UNIMAN), Eleni Kamateri (CERTH)
Responsible Partner: SIVECO
Status-Version: Final–v0.5
Date: 03/01/2014
Project Number: FP7-288328
Project Title: Linked2Safety
Title of Deliverable: D3.3 – Public Summary on Interoperable EHR
Data Space
Date of Delivery to the EC: 03/01/2014
Workpackage responsible for the Deliverable:
WP3 – Linked2Safety Interoperable EHR Data Space
Editor(s):
Panagiotis Hasapis (INTRASOFT), Dimitris Ntalaperas (UBITECH), David Tian (UNIMAN), John Keane (UNIMAN), Eleni Kamateri (CERTH)
Contributor(s): SIVECO, UNIMAN, INTRASOFT, UBITECH
Reviewer(s): CERTH
Approved by: All Partners
Abstract: The objective of this document is to define and document the Linked2Safety Common EHR Schema, respecting and aligning the widely accepted EHR standards, as well as in the design of a lightweight semantic model that will allow the alignment between EHR artefacts and standard medical vocabularies.
Keyword List: Interoperable EHR Data Space, Common EHR Schema, Semantic EHR Schema, Linked Data, Data Cubes
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 3 of 29
Document Description
Document Revision History
Version Date
ModificationsIntroduced
Modification Reason Modified by
v0.1 10/12/2013 ToC definition SIVECO
v0.2 17/12/2013 Submission of partners’ contributions
UNIMAN, INTRASOFT, UBITECH v0.3 18/12/2013 Creation of first stable draft of the report SIVECO v0.4 19/12/2013 Internal review comments CERTH v0.5 03/01/2014 Addressing internal review comments
Contents
1 INTRODUCTION ... 8
1.1 DOCUMENT SCOPE ... 8
1.2 DOCUMENT STRUCTURE ... 9
2 INTEROPERABLE EHR DATA SPACE DESIGN ... 10
3 EHR ANONYMITY AND INTEROPERABILITY MECHANISM ... 12
3.1 SCHEMA MAPPING AND ALIGNMENT COMPONENT ... 12
3.2 TRANSFORMATION TO COMMON EHR SCHEMA COMPONENT ... 13
3.3 NON-IDENTIFIABLE DATA CUBES CREATION COMPONENT ... 14
3.4 RDFIZER COMPONENT ... 17
4 EHR REVISION, ANNOTATION AND ENRICHMENT MECHANISM ... 19
4.1 SEMANTIC ANNOTATION AND ENRICHMENT COMPONENT ... 19
4.2 PERSONAL DATA SECURITY, PRIVACY AND ANONYMITY POLICIES COMPONENT ... 26
5 CONCLUSIONS ... 27
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 5 of 29
List of Figures
FIGURE 1: SCHEMA MAPPING AND ALIGNMENT COMPONENT SCREENSHOT ... 13
FIGURE 2: TRANSFORMATION TO COMMON EHR SCHEMA COMPONENT SCREENSHOT ... 14
FIGURE 3: AN EXAMPLE OF EXPERIMENTING WITH VALUES FOR PERTURBATION AND CUT-OFF .. 16
FIGURE 4: AN EXTRACT FROM FAKE COMMON EHR ALIGNED DATA CONVERTED TO A CSV DATA CUBE ... 17
FIGURE 5: SEMANTIC ENRICHMENT BASIC SCREEN ... 20
FIGURE 6: BROWSING FOR AN RDF DATA CUBE ... 20
FIGURE 7: VISUALING THE RDF DATA CUBE AS A TREE ... 21
FIGURE 8: ANNOTATION MENU ... 22
FIGURE 9: POLICY EDITOR ... 26
List of Tables
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 7 of 29
Definitions, Acronyms and Abbreviations
Table 1: Definitions, Acronyms and AbbreviationsAcronym Title
EHR Electronic Health Record
EDC Electronic Data Capture
OLAP Online Analytical Processing
OWL Web Ontology Language
1
Introduction
1.1
Document Scope
The present document constitutes the Deliverable “D3.3 – Public Summary on Interoperable EHR Data Space” (henceforth referred to as D3.3) of the Linked2Safety project. The main objective of this deliverable is to provide a public summary of the work undertaken within WP3, namely a report summarizing the Linked2Safety Interoperable EHR Data Space, and will be published on the project portal for public access.
In this respect, this deliverable contains a summary of the entire work undertaken so far in WP3: Interoperable EHR Data Space, which allows the creation of a network of aligned and locally maintained EHR repositories (with clinical and healthcare data) and EDC databases (with clinical trials system information).
The work within this work package was divided into three tasks: Task 3.1: Interoperable EHR Data Space Design
Task 3.2: EHR Anonymity and Interoperability Mechanism Task 3.3: EHR Revision, Annotation and Enrichment Mechanism
The Interoperable EHR Data Space is designed to accept and align EHR related data and information, both from up-to-date structured/controlled data sources (potentially any modern EHR repository) respecting and imposing patients’ data anonymity.
The objectives of Interoperable EHR Data Space were:
Designing and implementing according to the produced specifications the mechanisms and the software toolset of the Interoperable EHR data space; Enabling the effective and viable utilization of the medical information retrieved from the heterogeneous EHR repositories for the design of clinical trials;
Enabling the alignment, transformation, annotation, interconnection and standardized bridging of the existing EHR and EDC repositories to a common-reference, widely-accepted EHR schema, based on openEHR[8] principles;
Ensuring the anonymity and protection of patients’ personal records (in compliance with the Linked2Safety Data Privacy Framework developed in WP2) by developing the relevant mechanism.
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 9 of 29
1.2
Document Structure
The document is divided into five major sections, shortly presented in the following lines.
Section 1 describes the scope and structure of the document, constituting an introduction to this deliverable.
Section 2 presents the work performed in Task 3.1 - Interoperable EHR Data Space Design.
Section 3 presents the work performed in Task 3.2 - EHR Anonymity and Interoperability Mechanism
Section 4 presents the work performed in Task 3.3 - EHR Revision, Annotation and Enrichment Mechanism
Section 5 presents the conclusions of work performed in this work package. At the end of the document there is also a references section.
2
Interoperable EHR Data Space Design
In this task the software toolset components and the mechanisms comprising the technical design of the Interoperable EHR Data Space (WP3) were specified and also the way that these components and mechanisms function as a whole has been determined.
This task was be led by the outcome of Task 1.3 in which the “Linked2Safety Common EHR Schema” have been defined. It has also taken into account the outcomes of Tasks 1.4 and 2.3 and their respective deliverables, in order to achieve the Milestone3 - “Availability of the Linked2Safety Semantic Model and Data Privacy Framework”.
The outcome of this task was documented in D3.1 – “Interoperable EHR Data Space Design”[5].
The Interoperable EHR Data Space has been designed using the following methodology:
Review of the Outcomes of Tasks 1.3, 1.4 and 2.3: As mentioned in the Description of Work, the Interoperable EHR Data Space is designed taking into account the outcomes of Task 1.3 “Linked2Safety Common EHR Schema”, Task 1.4 “Linked2Safety Semantic EHR Model” and Task 2.3 “Linked2Safety Data Privacy Framework and Consent Forms”. Therefore, the outcomes of tasks 1.3, 1.4 and 2.3 are reviewed;
Identification of Components: The software components were identified in a top-down fashion based on the reference architecture of the Linked2Safety platform presented in D1.2 – “Reference Architecture”[2]; Design of Components: Each component is designed using UML, an
object-oriented software design approach, in terms of activity diagrams, class diagrams and sequence diagrams. During the design of the components, the outcomes of tasks 1.3, 1.4 and 2.3 are taken into account;
Integration of Components: After completion of the design of each component, all components are integrated using UML activity diagrams, class diagrams and sequence diagrams in a bottom-up fashion to compose the Interoperable EHR Data Space.
Activity diagrams describe the operational step-by-step workflows of components in a system. An activity diagram shows the overall flow of control. A class diagram is a static structure diagram that describes the structure of a system by showing the system's classes, their attributes, operations (or methods), and the relationships among the classes. A sequence diagram is an interaction diagram that shows how processes operate with one another in a given order. It is a construct of a message sequence chart and shows object interactions arranged in time. It depicts the objects and classes involved in a scenario and the sequence of messages exchanged between the objects, needed to carry out the scenario functionality.
Following the design methodology, the Interoperable EHR Data Space consists of the following software components:
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 11 of 29 Schema Mapping and AlignmentComponent;
Transformation to Common EHR Schema Component; Non-identifiable Data Cubes Creation Component; RDFizer Component;
mantic Annotation, Enrichment and Storage Component;
RDF Cubes Semantic Annotation, Enrichment and Storage Component. The inputs to the Interoperable EHR Data Space are the schemas and the healthcare instance data of some distributed heterogeneous EHR and EDC databases. The outputs of the Interoperable EHR Data Space are non-identifiable semantically-enriched RDF data cubes which can be linked into the Linked Medical Data Space (WP4).
The requirements and architecture of the Interoperable EHR Data Space, as described by deliverables D1.1 – “Requirements Analysis”[1] and D1.2 – “Reference Architecture”[2], have been translated into detailed specifications. The outcomes of Tasks 1.3, 1.4 and 2.3 have been also taken into account during the design of the Interoperable EHR Data Space as follows. The Schema Mapping and Alignment component maps and aligns the schemas of the heterogeneous EHR and EDC databases to the Common EHR Schema – D1.3[3] (outcome of Task 1.3 “Linked2Safety Common EHR Schema Definition”
)
.The Non-identifiable Data Cube Creation component creates non-identifiable data cubes based on the data cube schema defined in the Common EHR Schema conforming to the Data Privacy Framework (outcome of Task 2.3 “Linked2Safety Data Privacy Framework and Consent Forms”). The RDF Cubes Semantic Annotation, Enrichment and Storage component semantically enriches RDFized data cubes based on the Semantic EHR Model – D1.4[4] (outcome of Task 1.4 “Linked2Safety Semantic EHR Model”).
3
EHR
Anonymity
and
Interoperability
Mechanism
During this specific task the implementation and the testing of the software components of the Interoperable EHR Data Space (concerning the anonymisation and alignment of the patients’ health records) have been carried out.
The produced mechanism makes possible the seamless mapping, alignment, transformation and standardized bridging of the existing EHR and EDC repositories to a common-reference EHR schema, ensuring the anonymity and protection of patients’ personal records.
The main concept behind this mechanism is the use of integration archetypes, through which most of the received information can be easily converted to typical
openEHR[8] format.
The outcome of this task is documented in the first release of the Interoperable EHR Data Space prototype (D3.2a)[6].
3.1
Schema Mapping and Alignment Component
The Schema Mapping and Alignment component incorporates the common mapping and alignment engines required for the automatic transformation of EHRs and EDCs to the common EHR schema.
The Common EHR Schema is used as the baseline schema against which all medical data variables from the clinical partners are mapped. The mapping process is performed by expert users from the clinical partners, which allows them to align their medical data variables to the Common EHR Schema, thus enabling extensive processing of health data coming from heterogeneous sources, regardless of their origin.
This component outputs a mapping file that will be used by the transformation component to automatically transform data to the Common EHR Schema representation that can be stored in the Linked2Safety repository.
The Common EHR Schema is a step towards interoperability and interconnection of healthcare resources based on two interoperability dimensions:
semantics: by using “separation of concerns” to resolve heterogeneity issues (with data and schema) at a conceptual level;
data/schema: by employing interlinking/mapping of cross-domain data/schema (e.g. EHR, clinical trial) following the Linked Data[9] principles.
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 13 of 29
Figure 1: Schema Mapping and Alignment Component Screenshot
3.2
Transformation to Common EHR Schema Component
The Transformation to Common EHR Schema component performs automatic transformation of EHRs and EDCs to the reference Common EHR Schema. Output instance data is created using the Common EHR Schema, based on the
openEHR[8] standard.
This component enables automatic transformation of medical information mapped and aligned to the Linked2Safety Common EHR Schema using the Schema Mapping and Alignment component, by employing custom developed software tools to ease the task of translating between metadata standards, thus bridging the gap between analysis and execution.
This component uses input data from the clinical partners in association with a given mapping file, and outputs data transformed using the given mappings to the Common EHR Schema representation that can be stored in the Linked2Safety repository.
Figure 2: Transformation to Common EHR Schema Component Screenshot
3.3
Non-identifiable Data Cubes Creation Component
Electronic Health Records (EHRs) contain a wealth of medical information. They have the potential to help significantly advance medical research, as well as improve health policies, providing society with additional benefits. However the European healthcare information space is fragmented due to the lack of legal and technical standards, cost effective platforms, and sustainable business models. This project aims to aggregate the data from each data provider’s site, sharing only the aggregated data in the form of data cubes with the rest of the Linked2Safety platform. Therefore the question of how to address legal and ethical considerations associated with confidentiality issues in each data resource in the project is key. The Linked2Safety approach is based upon the creation of aggregated data in the format of data cubes.
In general, a set of records is given as input to the component along with a set of attributes. The attributes are used to define the dimensions (or axis) of a data cube - if the values of an attribute are continuous, the user can define the ranges according to which the values of the attributes are to be discretized.
The algorithm was described in detail in “D3.1 – Interoperable EHR Data Space”. Data cubes are created from records described by the Common EHR Schema. The steps of the algorithm (in brief) are the following:
Data from these files are aligned in order to be efficiently parsed by subsequent steps of the algorithm; Those records have been created in
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 15 of 29
the previous step of the transformation process, where the data are aligned in this common format.
For each attribute the component forms one-dimensional slices of the data cube. The data for each slice comes from aggregating the count of patients that have specific values for the corresponding values
The above process is repeated for each attribute.
After the cube is created, a perturbation phase follows, during which random noise is introduced to each cell.
The next phase filters out areas of the cube that contain little or no information. As m can be quite large and as most combinations may have zero or a small count, the cube created in the above step can be sparse. If the count loss is acceptable the process asks the user to optionally add
notes to the resulting data cube.
This approach is based upon two techniques that secure the process: adding random noise to the data cubes (perturbation) and applying a threshold on counts (cut-off). For enabling this algorithm, a need for setting up the appropriate values for those two techniques exists. Our reasoning on how to select appropriate values (as a base methodology) is presented in the following section as an example.
Selecting appropriate values for perturbation and cut-off
Perturbation (for the purposes of this project) takes as input a parameter referred to as the perturbation range. Perturbation is implemented allowing for both negative and positive perturbation. Thus if the perturbation range is 3, then a random integer will be generated under a normal distribution within the range of [-3, 3] and added to the original value. If the resulting value is negative then it will be replaced with 0.
Cut-off is a straight forward approach to limit the risk of having aggregated values that are so small that it may be possible to identify the subjects to which they relate. Each cell is tested if it exceeds a pre-set threshold value referred to as the cut-off threshold. If it passes the cut-off threshold then it is reported as is, if not then it is replaced.
For making the methodology more understandable, an example of selecting the appropriate thresholds is illustrated. The results shown are obtained from the Linked2Safety paper published in BIBE20121. The aggregated values that failed to pass the cut-off threshold were replaced with the median between 0 and the threshold. Tests where cut-off wasnot applied were reported as cut-off with a threshold of 0 as that would cause no changes.
For cut-off, the thresholds tested were 1, 3, 5 and 10. These were selected based on a literature search, as being the most often used values for thresholds. For perturbation, the ranges that were used were [-1, 1], [-3, 3], [-5, 5], [-10, 10], indicated in all graphs by the absolute number of the integer at the edge of the range correspondingly 1, 3, 5, 10.
1Antoniades, Athos, et al. "The effects of applying cell-suppression and perturbation to aggregated
genetic data." – Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on. IEEE, 2012.
For both procedures, when they were not used the parameter was set to 0. When both the cut-off and perturbation range were set to 0 the outcome was the one received from analyzing the original data cubes with no perturbation or cut-off applied. All possible unordered combinations of the two procedure parameters were tested.
In order to evaluate the performance of each tested cut-off and perturbation parameter, these will be compared to the same analyses performed on the raw data with no perturbation or cut-off applied. Pearson’s correlation is estimated between the analyses performed on the original data and each cut-off and perturbation parameter combination. The cut-off for Pearson’s correlation is set a-priori to accept two tests as having a (reasonable) agreement of r > 0, 95.
Figure 3: An example of experimenting with values for perturbation and cut-off
The correlation coefficient was estimated between each of the results of a perturbation and cut-off parameter combination and the same analyses on the original data with no cut-off and perturbation. Figure 3 shows the correlation coefficient between the original aggregated data and each of the cut-off threshold and perturbation range parameters. In this table the rows represent the perturbation range set while the columns represent the cut-off threshold set. All Pearson’s correlation coefficients that pass the a-priori defined 0.95 threshold are in bold while the rest are not.
In the next figure, we present a sample of a Common EHR instance data file, to the corresponding CSV file. Both files are intermediate and are deleting after the RDFization of the CSV data cube file. We would like to note that all data are sample fake data.
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 17 of 29 <rdf:Description rdf:about="↦ID:d5eb07a7-341a-1e0c-bd37-e9502058e627"> <cehr:BMI>160.0</cehr:BMI> <cehr:hasHypertension>100.0</cehr:hasHypertension> <cehr:patientHasDiabetesTypeOne>70.0</cehr:patientHasDiabetesTypeOne> </rdf:Description> <rdf:Description rdf:about="↦ID:ffca5844-e1a9-1d35-86e6-bb8015c9283a"> <cehr:BMI>185.0</cehr:BMI> <cehr:hasHypertension>110.0</cehr:hasHypertension> <cehr:patientHasDiabetesTypeOne>80.0</cehr:patientHasDiabetesTypeOne> </rdf:Description> <rdf:Description rdf:about="↦ID:19a48d68-fe75-1b94-92b3-c90169a1e320"> <cehr:BMI>120.0</cehr:BMI> <cehr:hasHypertension>80.0</cehr:hasHypertension> <cehr:patientHasDiabetesTypeOne>76.0</cehr:patientHasDiabetesTypeOne> </rdf:Description> <rdf:Description rdf:about="↦ID:d344d5ac-28ba-1887-87b8-8288233c797d"> <cehr:BMI>175.0</cehr:BMI> <cehr:hasHypertension>95.0</cehr:hasHypertension> <cehr:patientHasDiabetesTypeOne>78.0</cehr:patientHasDiabetesTypeOne> </rdf:Description> [….] bmi,patinethashypertension, patienthasdiabetestypeone,number 107,64,87,50 109,68,82,32 110,55,67,6 110,60,66,12 110,65,76,23 110,65,78,24 110,70,60,10 […]
Figure 4: An extract from fake Common EHR aligned data converted to a CSV data cube
3.4
RDFizer component
The RDFizer component is responsible for converting the aggregated data in the form of RDF data cubes of the same data adhering to the RDF Data Cube Vocabulary, as described in Deliverable D1.3[3] “Linked2Safety Common EHR Schema”.
The process of the RDFization of a data cube is straightforward and described in detail in deliverable D3.1[5]. The core element of the RDF Data Cube Vocabulary is the dataset concept, which describes the structure of a data cube as a collection of “observations”. The latter class holds the description of an occurrence number, the dimensions coordination for that number along with the semantics (based on Concepts) of those dimensions, and was also partially extended to handle our Semantic EHR concepts. All dimension names do come from a closed list, which are effectively the Semantic EHR labels of variables and classes.
In the next figure, we present a sample of a CSV file and the corresponding RDF Turtle file. We would like to note that all data are sample fake data.
bmi,patienthashypertension, patienthasdiabetestypeone,number 107,64,87,50 109,68,82,32 110,55,67,6 110,60,66,12 110,65,76,23 110,65,78,24 110,70,60,10 [...]
<http://www.linked2safety-project.eu/fake/data-cube/snp/1> dcterms:date "2013_06_21"; dcterms:publisher "fakeprovdeddatacubes"; rdfs:label "Prov"; qb:structure <http://www.linked2safety-project.eu/fake />; a qb:DataSet. <http://www.linked2safety-project.eu/fake/data-cube-snp/> qb:component <http://www.linked2safety-project.eu/fake/data-cube/d2/1>,<http://www.linked2safety- project.eu/fake/data-cube/d1/1>,<http://www.linked2safety-project.eu/fake/data-cube/d0/1> ,<http://www.linked2safety-project.eu/fake/data-cube/d3>; a qb:DataStructureDefinition.
<http://www.linked2safety-project.eu/fake/data-cube/d2/1> qb:dimension l2s-dim: patienthasdiabetestypeone.
<http://www.linked2safety-project.eu/fake/data-cube/d1/1> qb:dimension l2s-dim:bmi. <http://www.linked2safety-project.eu/fake/data-cube/d0/1> qb:dimension l2s-dim: patienthashypertension. <http://www.linked2safety-project.eu/fake/data-cube/d3/1> qb:measure sdmx-measure:Cases. <http://www.linked2safety-project.eu/fake/data-cube/snp/1/0> qb:dataSet <http://www.linked2safety-project.eu/fake/data-cube/snp/1>; l2s-dim:bmi "107"; l2s-dim:patienthashypertension "64"; l2s-dim: patienthasdiabetestypeone "87"; sdmx-dimension:Cases "50"; a qb:Observation. <http://www.linked2safety-project.eu/fake/data-cube/snp/1/1> qb:dataSet <http://www.linked2safety-project.eu/fake/data-cube/snp/1>; l2s-dim:bmi "109"; l2s-dim:patienthashypertension "68"; l2s-dim: patienthasdiabetestypeone "82"; sdmx-dimension:Cases "32"; a qb:Observation. [...]
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 19 of 29
4
EHR Revision, Annotation and Enrichment
Mechanism
This task implemented a mechanism for editing and annotating the aligned, interoperable EHR and EDC records that are ingested through the anonymity and interoperability mechanisms (delivered through the activities of Task 3.2). The mechanism allows the revision of existing EHR records as well as the creation of new annotations and enrichments with the appropriate, globally available controlled medical vocabularies for the homogenization of the EHR repositories’ content meaning and structure.
The outcome of this task is documented in the second release of the Interoperable EHR Data Space prototype (D3.2b)[7].
4.1
Semantic Annotation and Enrichment Component
The Semantic Annotation and Enrichment Component implements the semantic linking and the enrichment of annotations. Annotations are useful since they can provide additional explanations which may help with interpretation and understanding of data items. On the other hand, annotations act as sources of additional metadata which can be exploited to improve search and retrieval services, in particular for non-expert users who may be unfamiliar with domain-specific terminology. Lastly, the annotations improve the reusability aspect and strengthen the applicability of a model enabling it to interoperate with other existing vocabularies.
The Annotation and Enrichment component enables the semantic annotation and enrichment of the referenced (RDFized) data cubes with the use of the Linked2Safety Semantic EHR Model as well as other globally available clinical and medical ontologies and vocabularies. The produced annotations are incorporated in the “Linked2Safety Semantic EHR Model”[4] by enriching its existing entities (e.g. with the use of the owl:sameAs property).
The "Annotation and Enrichment Mechanism" can be accessed by typing the URL address in which the application resides. The main screen is depicted in Figure 5. The application window consists of the following panels:
The main canvas: here, a user can upload the RDF data cube and see its graphical representation
The annotation panel: the data cube can be semantically enriched by the user by adding the desired annotations
Figure 5: Semantic Enrichment Basic Screen
This panel is used for loading a data cube and displaying the data cube's data in a graphical way. At first the user presses the "Choose File" button to select a data cube to upload.
Figure 6: Browsing for an RDF data cube
After selecting the data cube the data cube is presented in a tabular form as to give to the user a detailed view of the data cube's variables and the corresponding values for all the combinations of aggregation. After the data cube is loaded, the user may proceed by adding her annotations.
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 21 of 29
Figure 7: Visualing the RDF data cube as a tree
In the annotation panel the data cube can be semantically enriched. The panel is depicted in Figure 8. The annotations added are of two types:
Top level annotations: These annotations concern the whole data set. The user can add a general comment in text form and provide a description about the process by which the data cubes were collected. Furthermore, concepts from the SEHR ontology, namely the ones corresponding to diseases, may be linked to the data cube by selecting them from the check boxes. More than one concept may be applied.
Variable annotations: These are annotations that correspond to each variable of the data cube individually. For each variable, the user may enter annotations concerning information specific to this variable. The way that the data were collected (diagnosed or self reported) is selected by the appropriate drop down menu depicted in Figure 8, whereas further annotations regarding each variable can be entered in the appropriate text areas.
Figure 8: Annotation Menu
After pressing the submit button a RDF file is saved in the local triplestore (public access room), along with the accompanied dimensions. In the following figure, we present the outcome of annotating a data cube with information about self-reported variables as well as enriching the data cube (in bold are the additional facts that were inserted to this data cube):
<http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://purl.org/linked-data/cube#component>
<http://www.linked2safety-project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/2> .
<http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://purl.org/linked-data/cube#component>
<http://www.linked2safety-Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 23 of 29 project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/0> . <http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://purl.org/linked-data/cube#component> <http://www.linked2safety-project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/1> . <http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://purl.org/linked-data/cube#component> <http://www.linked2safety-project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/4> . <http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#DataStructureDefinition> . <http://www.linked2safety-project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/2> <http://purl.org/linked-data/cube#dimension> <http://www.linked2safety-project.eu/properties/patientHasDiabetes> . <http://www.linked2safety-project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/0> <http://purl.org/linked-data/cube#dimension> <http://www.linked2safety-project.eu/properties/body_mass_index> . <http://www.linked2safety-project.eu/providerA/data-cube/dim/c29f611c5e0ff36afc3d657b2c35f6e8/1> <http://purl.org/linked-data/cube#dimension> <http://www.linked2safety-project.eu/properties/hasHypertension> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/0> <http://purl.org/linked-data/cube#dataset> <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/0> <http://www.linked2safety-project.eu/properties/body_mass_index> "0" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/0> <http://www.linked2safety-project.eu/properties/hasHypertension> "0" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/0> <http://purl.org/linked-data/sdmx/2009/measure#Cases> "36" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/1> <http://purl.org/linked-data/cube#dataset> <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/1> <http://www.linked2safety-project.eu/properties/body_mass_index> "70" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/1> <http://www.linked2safety-project.eu/properties/hasHypertension> "120" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/1> <http://purl.org/linked-data/sdmx/2009/measure#Cases> "5" .
<!--Semantic Enrichement annotations begin here-->
<http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#label> "some text describing the data cube". <http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#seeAlso> <http://hcls.deri.ie/l2s/sehr/chuv/1.0#GastrointestinalDiseases>. <http://www.linked2safety-project.eu/providerA/dc/c29f611c5e0ff36afc3d657b2c35f6e8/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#seeAlso> <http://hcls.deri.ie/l2s/sehr/chuv/1.0#AllergyImmunologyRheumatology>. project.eu/properties/body_mass_index> <http://www.linked2safety-project.eu/properties/variableComment> "Self Reported".
project.eu/properties/hasHypertension> <http://www.linked2safety-project.eu/properties/variableComment> "Diagnosed".
<http://purl.org/linked-data/sdmx/2009/measure#Cases> <http://www.w3.org/1999/02/22-rdf-syntax-ns#comment> "A comment regarding the total count".
<http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/oa#Annotation> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno1> <http://www.w3.org/ns/oa#hasBody> <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/body1> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno1> <http://www.w3.org/ns/oa#hasTarget> <http://www.linked2safety-project.eu/properties/body_mass_index> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno1> <http://www.w3.org/ns/oa#annotatedBy> <http://www.linked2safety-project.eu/providerA/smith> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno1> <http://www.w3.org/ns/oa#annotatedAt> "2013-01-28T12:00:00Z" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/body1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text> .
<http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/body1> <http://www.w3.org/2008/content#chars> "Self Reported" .
<http://www.linked2safety-project.eu/providerA/smith> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://www.linked2safety-project.eu/providerA/smith> <http://xmlns.com/foaf/0.1/name> "John Smith" .
<!-- annotation of 2nd property: hasHypertention --> <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/oa#Annotation> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno2> <http://www.w3.org/ns/oa#hasBody> <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/body2> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno2> <http://www.w3.org/ns/oa#hasTarget> <http://www.linked2safety-project.eu/properties/hasHypertension> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno2> <http://www.w3.org/ns/oa#annotatedBy> <http://www.linked2safety-project.eu/providerA/smith> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/anno2> <http://www.w3.org/ns/oa#annotatedAt> "2013-01-28T12:00:00Z" . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/body2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text> . <http://www.linked2safety-project.eu/providerA/data-cube/c29f611c5e0ff36afc3d657b2c35f6e8/body2> <http://www.w3.org/2008/content#chars> "Diagnosed" . <http://www.linked2safety-project.eu/providerA/smith> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://www.linked2safety-project.eu/providerA/smith> <http://xmlns.com/foaf/0.1/name> "John Smith" .
Furthermore, each data provider can enrich his RDF data cubes with metadata needed for the enforcement of access policies. In particular, access policies define the data to be protected and to whom access is granted or denied. In order to
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 25 of 29
limit the application of access policies only to data cubes having specific characteristics, access policies specify a metadata profile that will be satisfied only by data cubes annotated with those metadata. Thus, a set of additional metadata is required to enable better describing data cubes and facilitate data providers to express richer access constraints.
According to access policy model, the metadata that are needed for describing data cubes are the following:
hasDatacubeTilte defines the title of the data cube
hasDatacubeDescription defines the description of the data cube
hasDatacubeCreationDatedefines the point associated with the creation of the data cube
hasDatacubeSource defines the source or else the study project from which the data cube has been derived
hasDatacubeProvider defines the entity responsible for making the data cube available (or else the owner of the data cubes)
hasDatacubeCreator defines the entity responsible for creating the data cube
Moreover additional information can also be given for the study project from which the data cubes has been derived:
hasStudyProjectTitle defines the title of the study project from which data cube has been derived
hasStudyProjectDescription defines the description of the study project hasStudyProjectLocation defines the location where the study project has
been conducted. This usually refers to the location of the healthcare institute performed the clinical study
hasStudyProjectSponsor defines the sponsor of the study project
hasStudyProjectClinicalSite defines the clinical site/health institution where the study project has been run
hasStudyProjectCoordinator defines the principal investigator of the study project
hasStudyProjectPeriodOfTime defines the period of time that the study project has been run
Existing widely-known vocabularies can be reused instead of defining new concepts for the annotation of RDF data cubes. In particular, the following popular linked data vocabularies can are exploited
The FOAF vocabulary2 that describes agents that can be either people or organizations.
The DDI Discovery vocabulary3 that describes research and survey datasets on the Web. This vocabulary could be used to describe the study project from which the data cube has derived.
The DCMI Metadata Terms vocabulary4 that is a specification of all metadata terms used to describe a resource like the data cube resource.
2 http://xmlns.com/foaf/spec/ 3 http://rdf-vocabulary.ddialliance.org/discovery.html 4http://dublincore.org/documents/dcmi-terms/
4.2
Personal Data Security, Privacy and Anonymity Policies
Component
The Personal Data Security, Privacy and Anonymity Policies Component, also called the Policy Editor Component, is used to create access policies on top of RDF data cubes. Each access policy defines the data to be protected and to whom access is granted or denied.
Access policies can be applied on specific data cubes (named graphs) or on data cubes sharing common characteristics. In the first case, the named graphs encapsulating the specific data cubes are defined. In the second case, the data provider defines a number of characteristics (metadata) to restrict access on data cubes annotated with those characteristics. Furthermore, access policies define a user profile to give access permissions only to users described with those attributes including the expertise, the purpose, the qualification (as depicted in Figure 9) as well as other attributes such as the organization, the role, the working area of the user, etc.
Figure9 depicts the user interface of the Policy Editor along with a predefined set of options. Of interest is the Dimension drop down menu, where the user specifies the level for which the policy will be applied. In the example depicted, the Policy will be applied to the whole data cube; the user defines the user profile restrictions that the policy should also enforce.
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 27 of 29 Figure 9: Policy Editor
5
Conclusions
The work in each of the three tasks of the work package 3 of Linked2Safety project has been summarized in the current deliverable. First task has designed all the software components envisioned by WP3 – Interoperable EHR Data Space, while the remaining two tasks have split the work into manageable activities led by the technical partners.
The Interoperable EHR Data Space comprises the following components: Schema Mapping and Alignment Component
Transformation to Common EHR Schema Component Non-identifiable Data-cubes Creation Component RDFizer Component
Semantic Annotation, Enrichment and Storage Component
Personal Data Security, Privacy and Anonymity Policies Component
Figure 10 shows the inputs and outputs of each component as well as the interactions among them.
Figure 10: Interaction of Components of the Interoperable EHR Data Space
The work and all deliverables involved in work package 3 were submitted as planned and on time, including also the present, named D3.3 – “Public Summary on Interoperable EHR Data Space”.
Project Title: Linked2Safety Contract No. FP7-288328 Project Coordinator: INTRASOFT International S.A. www.linked2safety-project.eu
Page 29 of 29
6
References
[1] Linked2Safety – D1.1 Requirements Analysis
available on http://www.linked2safety-project.eu/node/30 [2] Linked2Safety – D1.2 Reference Architecture
available on http://www.linked2safety-project.eu/node/30 [3] Linked2Safety – D1.3 Common EHR Schema
available on http://www.linked2safety-project.eu/node/30 [4] Linked2Safety – D1.4 Semantic EHR Model
available on http://www.linked2safety-project.eu/node/30 [5] Linked2Safety – D3.1 Interoperable EHR Data Space Design
available on http://www.linked2safety-project.eu/node/30
[6] Linked2Safety – D3.2a Interoperable EHR Data Space - First Prototype available on http://www.linked2safety-project.eu/node/30
[7] Linked2Safety – D3.2b Interoperable EHR Data Space - Second Prototype available on http://www.linked2safety-project.eu/node/30
[8] The openEHR Standard
[http://www.openehr.org/home]
[9] Linked Data - W3C Standard