Models of Meaning and Models of Use: Binding Terminology to the EHR An Approach using OWL

(1)

Models of Meaning and Models of Use:

Binding Terminology to the EHR – An Approach using OWL

AL Rector MD PhD

1

, R Qamar MSc

1

and T Marley MSc

2

1

_{School of Computer Science, University of Manchester, Manchester M13 9PL, UK}

2

Salford Health Informatics Research, University of Salford, Salford, UK

ABSTRACT: A method for representing the binding of terminology models (‘ontologies’) to information models using OWL-DL is presented. The binding of SNOMED-CT to the HL7 RIM is taken as an example. The key insight is that the information model is ‘meta’ to the terminology model – i.e. that classes in the terminology model are represented as individuals in the information model.

Introduction

In previous papers [1-3] we have discussed three sorts of models related to clinical care systems, each with their own standards and each developed by different groups at least semi-independently.

• The terminology model or “ontology” – the model of our conceptualisation of the entities in clinical medicine1_{of which the most widely discussed}

currently is SNOMED-CT2_.

• The information model – the model of the data structures in the healthcare record or message, typified by the HL73_{standards by the Reference}

Information Model (RIM) expressed in variants of UML or the family of information models is provided by CEN ENV 13606 and OpenEHR4 expressed in a mixture of UML and the Archetype Definition Language (ADL)5

• The inference model or action model – the model of what actions should be taken when, the knowledge for decision support and quality assurance systems.

Looked at from a different perspective, we can view these as:

• A model of meaning – the information we are trying to convey about our understanding of the world of medicine – the terminology model or “ontology”.

1_{There is controversy over the use of the term “ontology”}

and confusion with its use in philosophy. Whether one takes the realist stance that ontologies model the world or the cognitivist stance that they model our

conceptualisation of the world is irrelevant to this paper.

2_{http://www.snomed.org} 3_{http://www.hl7.org} 4_{http://www.openehr.org} 5_{http://oceaninformatics.biz/}

• Two models of use – how we structure that information for particular purposes – the model of information and the model of inference.

Because the model of meaning and models of use must interact, the interface between them must be clearly specified and testable. Because the models overlap, there are often mutual constraints between them. Because the models are large, factoring them into re-usable submodels is desirable.

In this paper, we sketch the results of experiments to address one part of this problem – the definition of the interface between models of meaning

(terminologymodels) and informationmodels – what we refer to as the “binding” of terminology to the information model. The methodology uses description logics (DLs) as implemented in the description logic variant of the new “Web Ontology Language” (OWL-DL). Despite its name, we treat OWL simply as a standardised syntax for the underlying description logic. The result is a logical model of the constraints on the information model rather than of an ontology. More detailed accounts of description logics and OWL can be found in Baader

et al.[4] and a tutorial by Horridge et al. [5].

This work is part of a larger effort on describing the constraints on combined information and terminology models and their factoring into re-usable submodels. A separate paper [6] in this conference discusses the mechanisms for selecting which codes or terms from the model of meaning to bind to the information model. This paper concerns itself only with how to express the bindings selected. All work reported was performed using the Protégé-OWL tools6_.

Objectives of the Representation

This effort has had two overall goals:

• To express the content of the information model and terminology model in OWL;

• To express the ‘binding’ between the two models in such a way that:

(2)

−there is a clear interface between the model of meaning and the information model, analogous to the API between program modules;

−the binding is expressive enough to capture a) enumerated lists of codes; b) all subcodes of a given code (with or without the root); c) all boolean combinations of a) and b);

−the mutual constraints between the models. Other objectives not reported in detail here include a) representation of all constraints in the HL7 models including those in the text boxes on the diagrams of RMIMs, CMETs, etc.; b) “unfolding” the model to give a view based on containment analogous to the eventual XML serialisation, and c) factoring the resulting structure into re-usable fragments.

Information models are ‘meta’ to the

model of meaning

The key to understanding the relation between the model of meaning and information model is to realise that they represent different kinds of things. The model of meaning represents our conceptualisation of entities in the world; the information model represents the data structures that we use to capture that conceptualisation. The information model refers to a model of our conceptualisation of the world, not to the conceptualisation itself.

This is seen most clearly when we consider negation. For example, it makes no sense to talk of a person who does not have a body temperature (even if the person is dead and the temperature is ambient). By contrast, it is perfectly reasonable to talk about whether or not a form or data structure for describing a person includes their body temperature.

Individuals in the model of meaning represent patients or specific patients’ conditions; individuals in the information model represent classes of

patients’ conditions – i.e. they can be seen as proxies for conditions themselves. They therefore correspond to “codes” in typical coding systems.

This fits the practical requirements of representing the constraints in OWL. All constraints (“universal restrictions” in OWL) are of the form: “All Ms have property only Ts” or “All Ms have property only {t1, t2…tn}”, or boolean combinations of these forms,

where M is a submodel and the tis are individual

members of the class T. Were we to treat the information model at the same level as model of meaning, then the individuals in the information model – the t_is – would represent specific patients’ conditions. Hence, we could only restrict the submodel to a set of specific patients’ conditions –

e.g. “John Smith’s diabetes”. This is obviously nonsense. If instead we use two levels, the tis

represent codes for classes of conditions. Hence, restrictions can be to any boolean combination of codes or sets (classes) of codes.

In OWL-DL, we cannot reason about the model and meta model simultaneously (although this may eventually be possible within limits in experimental extensions [7]).7_{We must therefore apply the}

reasoner in stages: a) first classify and check the model of meaning, b) project the results to a set of individual ‘codes’ in the information model, c) classify and check the information model. However, this is not a handicap since it corresponds to normal practice in healthcare – to deal with the information model (e.g. HL7) and terminology model (e.g.

SNOMED) separately.

7_{The reasons for ultimately involve avoiding the paradoxes}

of self-reference – see e.g. Sainsbury, R.M., Paradoxes. 1987, Cambridge University Press.

(3)

The relation between the models is shown diagrammatically in Figure 1 using the example of SNOMED and HL7. The model of meaning is shown in the upper half as subclasses of patient’s conditions (black dots). This corresponds to the SNOMED logical or “stated” form. These subclasses are “projected” to information model as a network of individuals (“codes”) connected by the property

has_subcode, which mirrors the subclass relation in

the model of meaning. This corresponds to the SNOMED representation in HL7 Code phrases or its “delivery form”. The information model (RIM) classes are shown in the lower left. The specialisation of “Observation” for “Diabetes” is bound to the set of “Diabetes or its subcodes”

represented by the light oval, the specialisation of

“Observation” for “Diabetes type 1” is bound just to the individual code ”Diabetes Type 1”. We can use a reasoner to check the consistency of the bottom half and top half separately, but never to check the bottom half and top half together.

(It is worth noting that this mechanism is analogous to that used for representing other thesauri as advocated by SKOS – Simple Knowledge Organisation System)8 – and to methods discussed by the Semantic Web Best Practices Working – see option 3 in Noy [8]).

Representing the models in OWL

Representing the HL7 Reference Information Model (RIM) or the OpenEHR reference model in OWL is straightforward. We use a slightly simplified variant of the mapping described by OMG [9] with the additional assumption of a common directionality for associations, e.g. for the RIM from Act to Entity via

Participation and Role. The class and property hierarchies for a fragment of the HL7 RIM are shown in Figure 2.

All HL7 classes, submodels and specialisations from the RIM through RMIMs, DMIMs, CMETs etc are represented as subclasses of the appropriate RIM class. Because OWL requires special properties for concrete data types – integers, strings, etc. – these are encapsulated in “code holders”. A separate class,

Code, is used to represent coded data types.

Since OWL makes no distinction analogous to UML’s between associations and attributes, and because HL7 includes complex datatypes, the property hierarchy is used to distinguish between attributes, associations and “data type items”. Because of their importance, we provide a separate class Code and separate branch of the property

hierarchy has_code_item for the associated properties.

8_{http://www.w3.org/2004/02/skos/}

The representation of the model of meaning itself – the “stated form” of SNOMED-CT – does not concern us here. What is important is its meta-model in the information model. Each code is represented by an individual of type Code, linked to its super- and sub-codes by the property has_subcode. The information on its name, preferred term, etc. is represented by the datatype properties in Figure 2c.

For each code, the class of that “code or any of its subcodes” can then be defined in the information model. The subclass hierarchy of these “code classes” will parallel the subclass hierarchy in the model of meaning. However, the link between the two models is only indirect.

Info_model_entity –Info_model_class ––Act_class ––Act_relation_class ––Participation_class … –Data_type ––Structured_data_type ––Text_holder ––Date_holder ––… –Code_entity ––Code –––Internal_code ––––Code_sys_ID ––––Stuctural_code ––––… –––External_code ––––SNCT_code ––––… ––Qualifier ––Placeholder_code ;

Figure 2a: Basic class hierarchy

Property Domain Range Card

has_info_item

–has_association Info_model_class Info_model_class ––has_participation Act_class Participation_class ––has_act_relation Act_class Act_relation_class –has_attribute Act_class Data_type OR Code –has_datatype_item Struct’Data_type Data_type –has_code_item

––has_code Act_class Code *..1 ––has_name_code Qualifier Code *..1 ––has_value_code Qualifier Code *..1 ––has_qualifier Code OR Qualifier Qualifier *..1 ––has_Code_sys_ID Code Code_sys_ID *..1 …

has_subcode |Code Code

Figure 2b: Property hierarchy showing domain, range, and cardinality (where not *..*)

has_code_id text has_term text has_preferred_term text has_synonym text …

Figure 2c: Example data type properties to apply to individual codes.

To bind the codes to the information model we borrow from ADL the notion of a “Placeholder code” which will be used in the information model when it is created and then later “bound” to codes by an equivalence axiom.

Outline Procedure

An example result of binding an extract of a CMET to a set of codes is given in Figure 3 in the abbreviated OWL syntax used in the Protégé-OWL tool and summarised in Figure 4. Step by step, the

(4)

procedure to generate such representations is as follows. For each submodel:

1.Create a subclass of the relevant Info_model_class, e.g. Act_class or a previously defined subclass of

Act_class.

2.Enter the associations and attributes for the class as OWL restrictions using the corresponding property with min and max cardinalities corresponding to the submodel. If there are cardinality constraints on the inverse, enter the reciprocal restrictions on the target class for the association using the appropriate inverse attributes. Ensure that all classes are in the domain and range of each property used to restrict it.

3.For each occurrence of a code, insert newly created subclass of Placeholder_code with a convenient name. (Do not make the Placeholder codes disjoint or place them in a hierarchy.)

To represent the codes, identify the ‘Placeholder codes’ mentioned in the restrictions. These will form the interface to the coding system. (In the case of Templates written in ADL these should appear in the

“ontologies” section.) We assume a separate interface to a terminology server for the bulk of the codes, so only the interface codes need to be represented in the information model proper.

Next, create the code representations using the following simplification that improves computational efficiency by not representing the has_subcode links explicitly:

1.Ensure that the parent code for each coding system is present, e.g. SNCT_code, and that it carries a restriction using has_Code_sys_ID to the appropriate coding system ID code.

2. For each code in the interface, create a subclass of the relevant code class in parallel with the subclass hierarchy of the coding system. Give the new class a name derived from the coding system suffixed with “_code_or_its_subcodes”.

3. Create exactly one direct individual for each of the classes just created; give it the same name stem with the suffix “_code”. Use the properties has_code_id, has_term, etc. to add the appropriate

concrete information to link from the individual to the originating coding system.

Finally, bind the external codes to the placeholder codes by equivalence axioms:

1. For each placeholder code class, create a new class with the same base name suffixed with “_binding_axiom”. (Internal codes may either be

entered directly or bound via placeholders as preferred.)

2.Create two equivalent class axioms – represented in Protégé-OWL as sets of necessary and sufficient conditions – one simply to the placeholder class, the other to an expression specifying the required binding to the external coding system. A typical presentation is shown in Figure 3.

The expressions in the final step can be any boolean combination of:

• an enumerated list of codes – represented as an enumerated nominal: {c1_code c2_code,…}

• classes of the form C_or_its_subcodes.

Although more elaborate than the minimum logically necessary, this procedure keeps the binding axioms strictly separated from both the placeholders and external codes. If desired, each can reside in a separate module and each can be separately annotated.9

We make the simplifying assumption that an HL7 request simply requires a SNOMED context of

9_{The current OWL standard does not support annotation of}

individual axioms although the OWL 1.1 extension currently in preparation will do so.

RequestMedicationAdministrate  Substance_administration_act_class

has_participant EXACTLY 1 Medication_consumable has_act_id VALUE hl7IIGlobal_code

has_mood_code EXACTLY 1 RQO_code_or_its_subcodes has_status_code EXACTLY 1

Request_compatible_act_code_placeholder …

has_code SOMEANDONLY Req_for_Med_admin_placeholder

Figure 3a: An extract of the CMET class for Request for Medication Administration

Request_medication_binding_axiom_snct ≡ Req_for_Med_admin_placeholder ≡ (SNCT_medication_admin_act_code_or_its_subcodes AND SNOMED_request_context_code_or_its_subcodes) SNCT_request_context_code_or_its_subcodes ≡ SNCT_code AND has_qualilfier SOME

(Qualifier AND has_name_code VALUE

procedure_request_code AND has_value_code VALUE request_code) Request_compatible_act_code_placeholder ≡

(Act_aborfted_code_or_its_subcodes OR ACT_active_code_or_its_subcodes OR ACT_completed_code_or_its_subcodes)

Figure 3b: Binding axioms and part of specification for SNOMED code.

OWL abstract syntax Simplified Syntax someValuesFrom SOME allValuesFrom ONLY minCardinality MIN maxCardinality MAX Cardinality EXACTLY intersectionOf AND unionOf OR equivalentClasses ≡ subclassOf (“implies”) 

(5)

“Request” (request_code). Note that this approach allows the two different aspects of the SNOMED code – procedure and context – to be factored and expressed separately. Note also that using a placeholder for the internal code Request_compatible_act_code_placeholder allows us to define a subset of act codes for this binding that might be re-used in another submodel.

The reasoner can be applied to the above to determine if the classes are self-consistent and to unite the placeholders with their bindings. All additional consequences are inferred from the resulting classified version.

To test that a message instance is consistent with a submodel, we must take account of the fact that OWL is an open-world system, i.e. that information not stated is taken to be unknown rather than absent. This requires adding a ‘closure axiom’ to each individual stating that the filler items explicitly represented are the only fillers for the parent property

has_info_item. Since has_info_item subsumes all other

relevant properties, there can be no other fillers to any of the subproperties (assuming disjoint ranges). Likewise, if the intent is that the submodel should allow only the associations and attributes given, then the submodel too must be closed. Once closed, the reasoner can be used to test each message against the closed submodel.

Discussion

This paper focuses on the second of the two aims set out in the introduction: to express the binding between the model of meaning and the information model. It demonstrates how two of the criteria set out in the introduction can be met:

• There is a defined interface between the model of meaning and the information model represented by the binding axioms and placeholder code classes. The interface is clearly separated from each model and serves the same function as an API for programming modules.

• The binding can capture all Boolean combinations of codes and classes of codes.

The methodology also meets the third criterion – expressing mutual constraints – as hinted at by the definition of SNCT_request_context_code_or_its_subcodes. However, a full exposition of the handling of constraints must await a longer paper.

Of the other goals, one example of factoring into re-usable submodels has been indicated with Request_compatible_act_code_placeholder, but the many further opportunities for factoring the information models themselves are deferred to a longer paper. Likewise, this paper does not discuss the translation

from the original model of meaning to the meta model used in the information model, currently done by scripting. A declarative mechanism is clearly desirable and a subject of research.

We take it as a strong argument in favour of this approach that constraints to the level of individual ‘codes’, as well as classes of codes, follow naturally without special mechanisms. A benefit of this approach is that it can be used even with terminology models that are not based on strict logical criteria –

e.g. ICD 9/10 MeSH – although of course care must be taken when interpreting the results – e.g. not all patients with heart disease will be found by asking for all patients with codes under heart disease in ICD, since many heart diseases are coded under other headings – congenital disease, infectious disease, etc.

Acknowledgements

This work supported in part by the UK Department of Health “Connecting for Health” project, the UK MRC CLEF project (G0100852), the JISC and UK EPSRC projects CO-ODE and HyOntUse (GR/S44686/1) and the EU Funded Semantic Mining Network of Excellence. The HL7 Terminfo working group stimulated and contributed to many of the ideas presented here.

References

1. Rector, A.L. The Interface between Information, Terminology, and Inference Models. in Tenth World Conference on Medical and Health Informatics: Medinfo-2001. 2001. London, England, pp 246-250. 2. Rector, A.L., et al. Interface of inference models with

concept and medical record models. in Artificial Intelligence in Medicine Europe (AIME). 2001. Cascais, Portugal: Springer Verlag, pp314-323. 3. Rector, A., Taweel, A, and Rogers, J. Models and

inference methods for clinical systems: A principled approach. in Medinfo 2004. San Francisco: North Holland, pp 79-83.

4. Baader, F., et al., eds. The Description Logic Handbook. 2003, Cambridge University Press: Cambridge, England.

5. Horridge, M., et al., A practical guide to building OWL ontologies using the Protege-OWL plugin and CO-ODE tools. 2004, U Manchester http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf. 6. Qamar, R. and A. Rector. Automating termp-0binding

of clinical data model contents to SNOMED-CT using symatic and syntactic procedures. in AMIA 2006. 2006. Washington, DC (submitted for publication).

7. Pan, J.Z., I. Horrocks, and G. Schreiber. OWL FA: A metamodeling extensiosn of OWL DL. in Proc

OWL-ED2005.

http://www.mindswap.org/OWLWorkshop/sub15.pdf 8. Noy, N. Representing classes as property values on the

semantic web. 2005, W3C.

http://www.w3.org/TR/swbp-classes-as-values/ 9. IBM and Sandpiper Software Inc., Ontology Definition

Metamodel: Third revised submission to OMG. 2005, OMG. http://www.omg.org/docs/ad/05-08-01.pdf