City, University of London Institutional Repository
Citation
:
Harrow, I., Balakrishnan, R., Jimenez-Ruiz, E. ORCID: 0000-0002-9083-4599,
Jupp, S., Lomax, J., Reed, J., Romacker, M., Senger, C., Splendiani, A., Wilson, J. and
Woollard, P. (2019). Ontology mapping for semantically enabled applications. Drug
Discovery Today, doi: 10.1016/j.drudis.2019.05.020
This is the published version of the paper.
This version of the publication may differ from the final published
version.
Permanent repository link: http://openaccess.city.ac.uk/id/eprint/22924/
Link to published version
:
http://dx.doi.org/10.1016/j.drudis.2019.05.020
Copyright and reuse:
City Research Online aims to make research
outputs of City, University of London available to a wider audience.
Copyright and Moral Rights remain with the author(s) and/or copyright
holders. URLs from City Research Online may be freely distributed and
linked to.
City Research Online:
http://openaccess.city.ac.uk/
publications@city.ac.uk
Ontology
mapping
for
semantically
enabled
applications
Ian
Harrow
1,
Rama
Balakrishnan
2,
Ernesto
Jimenez-Ruiz
3,4,
Simon
Jupp
5,
Jane
Lomax
6,
Jane
Reed
7,
Martin
Romacker
8,
Christian
Senger
9,
Andrea
Splendiani
10,
Jabe
Wilson
11and
Peter
Woollard
121IanHarrowConsultingLtd,Whitstable,UK 2GenentechInc.,SouthSanFrancisco,CA,USA 3TheAlanTuringInstitute,London,UK
4DepartmentofInformatics,UniversityofOslo,Oslo,Norway
5EuropeanMolecularBiologyLaboratory,EuropeanBioinformaticsInstitute(EMBL-EBI),Hinxton,Cambridge,UK 6SciBiteLtd,Cambridge,UK
7LinguamaticsLtd,Cambridge,UK
8RocheInnovationCenter,Basel,Switzerland 9OSTHUSGmbH,Aachen,Germany 10Novartis,Basel,Switzerland 11ElsevierRELX,London,UK 12GlaxoSmithKline,Stevenage,UK
In
this
review,
we
provide
a
summary
of
recent
progress
in
ontology
mapping
(OM)
at
a
crucial
time
when
biomedical
research
is
under
a
deluge
of
an
increasing
amount
and
variety
of
data.
This
is
particularly
important
for
realising
the
full
potential
of
semantically
enabled
or
enriched
applications
and
for
meaningful
insights,
such
as
drug
discovery,
using
machine-learning
technologies.
We
discuss
challenges
and
solutions
for
better
ontology
mappings,
as
well
as
how
to
select
ontologies
before
their
application.
In
addition,
we
describe
tools
and
algorithms
for
ontology
mapping,
including
evaluation
of
tool
capability
and
quality
of
mappings.
Finally,
we
outline
the
requirements
for
an
ontology
mapping
service
(OMS)
and
the
progress
being
made
towards
implementation
of
such
sustainable
services.
Introduction
Biomedicalresearchisunderadelugeofanincreasingamountand
varietyofdata.Diversetechnologiesenablemoregranular
mea-surementsfromthe laboratorybenchtotheclinicalbedsidefor
personalisedtreatments.Torealisethispromise,suchdataneedto bebroughttogethertobuildconsistentbiologicalknowledgebases
[1].Aspartofthisprocess,differentconcepts,terminologies,and datamodelsneedtobereconciled.Thisreconciliationissupported
byavarietyofknowledgemanagementresources,whichcovera
continuousspectrumof‘semanticexpressivity’(Fig.1).
Atoneextreme,wehavesimplelists,suchascontrolled
vocab-ularies. Integration is significantly easier when different data
sources use terms froma standardised list,insteadof freetext.
Resources that have greater semantic expressivity enable more
supportforintegrationandinteroperability,forinstance
leverag-ing synonymsor translating across languages. At the other
ex-tremeofthesemanticspectrum,wehaveontologies.Thesearea
setofconceptsinasubjectareaordomainthatshowsrelations
betweenconceptsrepresentedbyproperties.
Ontologiesgobeyondlists,thesauri,andtaxonomiestoprovidea
formal description of definitions of conceptual classes and their
relations(oneexamplebeingtheirhierarchicalstructure).‘Formal’ meansthatdefinitionsarebasedonalogicalframework,suchasthe
WebOntologyLanguage(OWL).Thisenablesarepresentationofthe
meaningofconceptsthatismachineprocessable,ultimatelyallowing reasoning,generationofnewknowledge,andautomaticdetectionof
Reviews
INFORMA
TICS
Correspondingauthor:Harrow,I. (ianharrowconsulting@gmail.com)
1359-6446/ã2019TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).
inconsistencies in the semanticmodel [2]. Inaddition, Uniform ResourceIdentifiers(URI)uniquelyreferenceeachclasstosupport machineprocessingandinteroperability.
One of the strongest examples of a mature ontology in the
biomedicalsciences isGeneOntology(GO) [3,4],whichis used
extensivelybyamultitudeofapplicationsandanalyticaltools[5]. Ideally,eachdomain in thebiomedicalsciencesshould be supported by a singlereference ontology [6],anidea thatwasoriginally a
strategicobjectiveoftheOpenBiomedicalOntologies(OBO)
con-sortium [6]. However, the real situation is different and finds
numerousoverlappingontologies,eachhavingtheirowncontexts
ofapplication. Thiscreatesproblems ofreconciliation andeven
difficultiesinselectionofthemostappropriateresource[7].
Overlapbetweenontologieshappensforavariety ofreasons.
One ofthese isthat a singlereference ontologyoften provides
insufficientcoverageforaparticularapplication,whichgivesrise
tothedevelopmentofapplicationontologies,suchasthe
Experi-mental Factor Ontology (EFO). The EFO uses relevant parts of
referenceontologiesandcrossreferences(ormappings)between
them [8]. Mapping between ontologies expands the coverage
acrosslargedomains,suchasanatomy,disease,phenotype,and
laboratoryinvestigation.Mappingbetweenontologiesindifferent
domainsrequiresthediscoveryoftheevidenceforarelationship
through,forexample,dataortextmining[9,10].
Anotherreasonisthatmanyapplicationsmakeuseofclassification
systems,suchasMedicalSubjectHeadings(MeSH),Enzyme
Commis-sion(E.C.)nomenclature,AnatomicalTherapeuticChemical Classifi-cation of drugs (ATC), or Human Gene Nomenclature (HGNC), which, althoughpowerful,werenever designedas ontologies. However, it can beveryusefultomapbetweensuchclassificationsystemsand ontol-ogies,whichontology-matchingalgorithmsareabletodo[11].
Althoughapplicationontologiesandmappingtocodelistscan
bebuiltbymanualcuration,itisdesirabletoaugmentthisprocess
withontology-matchingalgorithms[12].Thesebring scalability
whilereducingthecostofmaintenance.
Application
of
ontologies
and
their
mappings
Ontology
application
Controlledvocabularieshavebeenusedformanydecades,
espe-ciallyby industry, toensure theconsistency ofmetadata while
collectingdataofanexperimentorconductinganalysis,oftenin
laboratory information-management systems [13]. Seamlessly
integratedintoapplications,controlledvocabulariesand
ontolo-giescanspeedupthe entryofdatasetsandfacilitatethe
subse-quentretrievalofdatathroughsimplesearchinterfaces.This is
becauseexperimentalmetadataforabiologicalassaycancomprise
many elements, including creation date, experimenter, batch/
sampleinformation(e.g.,tissueorcelltype,cellline,etc.),disease
ornormalstatus,andtreatment(stimulant,compound,placebo,
ortimecourse)[14].
Ontologiesalreadyhaveanimportantroleinannotatingand
organising the vast wealth of experimental, clinical, and
real-worlddataand theirday-to-dayusageiswellestablishedinthe
scientific community. Therefore, it is not surprising that the
important biomedical literature resource, PubMed, developed
and applies the MeSH taxonomy for indexing and searching
journalarticles[15].Inpharmacovigilance, adverseeventsneed
to bereported to the US Food and Drug Administration(FDA)
using the MedDRA ontologyas a system to encode regulatory
information[16].Furthermore,the FDA hasmandatedthatthe
StudyDataTabulationModel(SDTM),developedbytheClinical
DataInterchangeStandardsConsortium(CDISC),mustbeusedas
the standard for the submission of study data. The controlled
terminologies of SDTM are integrated with the NCI Thesaurus
[17].Real-worlddataprovidesafinalexample,whereWHO
clas-sificationsofdisease,ICD-N,havebeenusedforannotation[18]. Giventhatprecisionmedicine,personalisedhealthcare,and
trans-lationalmedicine areincreasinglydrivingmodernresearchand
development in the biopharmaceutical industry, it is vital to
combinedatafromthevastnumberofpublicandprivate
reposi-tories using all these different classifications and ontologies.
Therefore,ontologymappingisintrinsicallytiedtodata integra-tion,whichiscrucialforthesuccessfuldiscoveryanddevelopment
ofinnovativetreatmentsofdisease.
Ontologiesareoneofthemechanismstoencodethe
seman-tics for an area of human knowledge in a machine-readable
manner [19,20]. They arevital for capturing meaningful
rela-tionshipstoallowuserstosearchorbrowserelationshipsandto
identifypatternsfromanalysis[21–24].Considermodernsearch
engines,suchasGoogle andBing, whichuseminimalcontext
REVIEWS DrugDiscoveryTodayVolume00,Number00June2019
DRUDIS-2475;NoofPages8
Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020
Semantic expressivity
Thesauri Lists
Classifications and taxonomies
Ontologies
Types: Sets of terms Glossaries Dictionaries Vocabularies
Types: Synonyms Related meanings Associations
Types: Catalogues Directories Hierarchies Taxonomies
Features: URIs Definitions Hierarchies Relations
Drug Discovery Today
FIGURE1
Thespectrumofsemanticexpressivityforknowledgemanagementresources.Abbreviation:URI,UniformResourceIdentifier.
2 www.drugdiscoverytoday.com
Reviews
INFORMA
and,inthecaseofWikipedia,usersarepresentedwitha disam-biguationpagetoselecttherelevantresults.Thiscontrastswith
a search ofscientific data andliterature,whichrequires more
consistentandreliableresultsbyharnessingcontrolled
vocabu-laries,classification systems,andontologies(Fig.1),especially
whenthousands,ifnotmillions,ofresultsneedtobeprocessed
automatically.Considertheexample ofbone disease,as
illus-tratedinFig.2,wherewecanseethepositionsofLegg–Calve–
PerthesdiseaseandCoxaMagnaintheMeSHhierarchy,without
any other prior knowledge. Such hierarchical structure of a
taxonomyorontology can alsohelpwiththevisualisation of
data,so thata usercanstart withabroadclass,such asbone
disease,andthenmoveontoconsidermorespecific,yetrelated,
diseases.
Ontologies and their mappings have a central role in open
semantically enabled applications, such as Open PHACTS [25]
andOpenTargets[26].Commercialexamplesofsimilar
applica-tionsareElsevier’sPathway Studio[27]and ClarivateAnalytics’
MetaCore/MetaBase[28].InthecaseofOpenTargets,thispublic
targetvalidationapplicationmakesfundamentaluseoftheEFO,
whichhasbeendevelopedandoptimisedtosupportsuch
applica-tions [29].Many ofthese powerfulapplications use automated
Drug Discovery Today
FIGURE2
Therelationalpositionbetweentwobonediseases,Legg–Calve–PerthesDiseaseandCoxaMagna,intheMedicalSubjectHeading(MeSH)hierarchy.
www.drugdiscoverytoday.com 3
Reviews
INFORMA
text-miningtechnologypoweredbyontologiestofacilitatesearch forsubject–verb–objecttripletsinscientifictexts.
Theevidenceembeddedintheseapplicationsisoftenintegrated
withgraphicalvisualisationandstatisticalanalysis,wheremapped
ontologiesarethekeycomponentsforbeingabletoexaminethe
underlyingbiologyofahypothesisoranexperiment[26,30–32].
Themappedontologiesvarybyapplication,buttypicallyinclude
GO,DiseaseOntology(DO),HumanPhenotypeOntology(HPO),
EFO, MeSH, and NCBI taxonomy. Crucially, such applications
providelinkstotheliteraturefromwhichmappingswerederived,
whichisimportanttoassesstheconfidenceinsuchinformation
[30,33].
Mapping
between
ontologies
Ontologymapping(ormatching)iscentraltoprovidingsemantic
accessacrossaggregateddatausedinknowledge-based products
andservicesconsumedbylifesciencecompanies,academic
insti-tutions,anduniversities.Whenbringingtogetherontologiesand
related resources (Fig. 1), we are faced with different scenarios reflectingdifferentusecasesformappings.
As mentioned earlier, often different ontologies are used to
annotate the same or similar domains, for example, HPO and
Mammalian Phenotype (MP) Ontology. These ontologies have
beendevelopedindependentlybydifferentcommunitiesormight
becustomisedtomeetspecificuserneeds.Inthiscase,ontology
mapping finds equivalence (exact or synonymous matches) or
relationshipsinthehierarchy,whichcanbeshownarroworbroad
semanticsimilarity.AnothersimilarexampleisDO,whichisused
widelybytheresearchcommunity,whereasSNOMEDCTisused
mostlyby healthcareworkersandclinicians,forexampleinthe
National Health Service, UK (https://digital.nhs.uk/services/
terminology-and-classifications/snomed-ct). Translational
appli-cations require interoperability by mapping betweenthese two
important ontologies, which has been approached successfully
through lexical mappings supplemented by Unified Medical
LanguageSystem(UMLS)concepts[34,35].
Anotherapplicationofontologymappingwithina domainis
the predictiveuseofphenotypeannotationsindifferent model
organisms. For example, rare human gene mutations can be
annotatedby relatinghomologous mutationstophenotypes in
modelorganismsfordiagnosisofrareinheriteddiseases[36].
Finally,matchingcanalsorelateontologytermsbetween
close-lyrelateddomains,suchasdiseaseandphenotypes[37].Inthis
case,wearelookingatestablishingmoregenericrelationsbetween
concepts,effectivelydefiningaknowledgenetwork.Thisscenario
isafrequenttaskinlifesciences,whereontologymatchingcan
bridgedifferentdomainsandsupportcomplexresearchquestions.
Challenges
and
solutions
for
better
mappings
Generating ontology mappings can provide several challenges.
Wordsinlanguagecanhaveambiguousmeaningsthatdependon
thecontext.Forexample,theEnglishword‘mole’,inanatomyitis askinfeature,inchemistryitisaunitofmeasure,forananimal
there are numerousspecies of talpid ‘true’mole or a distantly
related, marsupial mole or golden mole.Beyond the scientific
realm, mole can be a human surname, the name of various
villages, rivers, and creeks, and is also an embedded spy in
an organisation, and so on. This ambiguity means that it is
insufficient to simply match class names, terms, or labels for
successfulontologymapping.Therefore,itisimportanttomake
useofcontexttoresolveambiguity,whichincludesbackground
knowledge and relations among concepts [38]. Another major
challengetomappingontologiesismanagingtheconsequences
ofontologydynamics,whichreflectandrepresenthowscientific
understandingevolves[1,38].Thismeansthatanyderived
map-pings have to be maintained, while making sure that source
identifiersandlabelsareretained.Weexpandonthischallenge
intheOMSsectionofthisreview.
A common approach to tackling the challenge of mapping
betweendifferent ontologiesistomap allthe termstoa single
ontologyorknowledgeresource.Manysourceontologiescontain
embeddedcross-referencesthatcanbeusedascuratedmatchesto
another ontology. An Ontology of Biomedical Associations
(OBAN)isanexampleofsuchanapproach,whichwasconstructed
asalarge-scale,genericterm-associationmodelto support
con-structionofatargetvalidationknowledgebase[29].PhenomeNET
isafurtherexample,wherespecies-specificphenotypeontologies
are mapped based on the overarching, anatomy ontology,
UBERON,whichidentifiesequivalentphenotypefeaturesthrough
anatomicalconceptsacrossdifferentspecies[39,40].Similarly,the
Monarch Initiative has built a platform for mapping between
phenotypesandgenotypesacrossspecies,andincludesthe
Mon-archMergedDiseaseOntology,calledMONDO[41].
Guidance,
principles,
and
simple
rules
for
the
selection
of
ontologies
Whenseveralontologiesoverlaptocoverascientificdomain,we
arefacedwiththeproblemofhowtoselectwhichontologytouse.
In clinical sciences, the best practice is mature enough to be
governedbyauthoritiestomeet governmentregulations,as
de-scribedearlier,whereas,inpreclinicalandtranslationalresearch,
bestpracticesanddatastandardstendtobelessmatureandeven
absent.ThissituationpromisestoimprovewiththeMIRO
guide-linesforMinimumInformationforReportingofanOntology[42].
ThePistoiaGuidelinesweredevisedasapragmaticsteptosupport
theselectionofontologiesbeforetheapplicationandmappingof
ontologies. These guidelines are available on a public wiki of
Ontologies Mapping Resources, hosted by the Pistoia Alliance
(https://pistoiaalliance.atlassian.net/wiki/spaces/PUB/pages/ 43089928/Ontologies+Mapping+Resources). They comprise of threetypesofguideline:general,technical,andcontentinTable
1. This table shows how the Pistoia guidelines align with the
principles of the Open Biological and Biomedical Ontologies
(OBO) Foundry (http://www.obofoundry.org), which are under
constantdevelopment and reviewby the OBO community. In
addition,Table1alsoshowsalignmenttothepaperentitled‘Ten
SimpleRules forSelecting aBio-ontology’publishedby Malone
etal.[7].
Thesuitabilityofontologiesforaparticularapplication,suchas
geneexpressionanalysisormappingbetweenontologies,canbe
reviewedusingthe availablerulesand guidelines.The National
Center for Biomedical Ontology has developed a tool for this
purpose,calledtheOntologyRecommender2.0[43].
‘Sometimes an Ontology is Not Needed at All’ is the tenth
simpleruleofMaloneetal.[7].Thisisbecausemorelight-weight
knowledgemanagementsystemsmightbesufficient(seeFig.1for
REVIEWS DrugDiscoveryTodayVolume00,Number00June2019
DRUDIS-2475;NoofPages8
Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020
4 www.drugdiscoverytoday.com
Reviews
INFORMA
examples).Therefore,selectionofanontologyorrelatedresource
shouldbedrivenbyunderstandingtheneedsoftheusers.
Ontologies
mapping
tool
evaluation
Tool
requirements
and
capabilities
Asetofminimalrequirementscanbeusedtocomparethenumerous
academic and commercial tools designedfor mapping between
ontologies. These functional requirements comprise of three
aspects:(i)userInterfacetoincludevisualisationofsource ontolo-giesandmappingalignmenteditor;(ii)frameworktoinclude
work-flow and ontology matching (OM) algorithm; and (iii) import
ontologies or mappingsand export of mappings(Fig. 3). These
requirementsincludeelementsoftheontologyalignmentlifecycle thathavebeendescribedbyEuzenatandShvaiko([44]Chapter3).
Such functional requirements can be used to compare and
evaluate the capabilities of public and commercial
ontology-mappingtools.Thisprocesswasundertakenin2016,andfound
that one academic tool (AML [45]) and two commercial tools
[Infotech Soft (http://infotechsoft.com) and Mondeca (http://
en.mondeca.com)] satisfied more than 80% of the functional requirementsillustratedinFig.3.
Evaluation
of
ontology
matching
algorithms
OM algorithms arecomputationaltoolsthat map betweentwo
ontologies,andhavewideapplicationbeyond lifesciences[44].
TheOntologyAlignmentEvaluationInitiative(OAEI;http://oaei.
ontologymatching.org) is a mature and open annual challenge thathasoperatedsince2004.Itprovidesacompetitiveplatformto
showcaseandevaluatetheperformanceoflatestalgorithms.
Itisusefultoconsiderthedifferentfeaturesandtechniquesused
byOMalgorithms,whichcanbeclassifiedassummarisedinTable
2 ([44]Chapter 3). They harnesslexical features (e.g.,different names,synonyms,anddefinitionsofconcepts),structural,logical,
or hierarchicalfeatures (e.g.,therelation oneconcept haswith
otherconceptswithinanontology),extendedinformationabout
the source ontologies (e.g., usage in annotations), and exploit
backgroundinformation(e.g.,UMLS)[45].
OMalgorithmsproduceasetofmatchesbetweentheclassesin
the twoontologies beingmapped.Suchmatchesmightexpress
equivalence,binary,ormultiplerelationswithascoreof similari-ty.Thequalityofthepredictedmatchesofthemappingresultswill
depend onoptimising thealgorithm parameters,whichwillbe
specificfortheontologiesbeingmapped.
TABLE1
Comparisonofguidelines,principles,andsimplerulesfortheselectionofontologies
Type PistoiaGuidelines OBOprinciples Simplerulespaper[7]
Generic License Open Open
Maintenance Maintenance Activedevelopment
Versioning Versioning Previousversionsavailable
Users Pluralityofusersdocumented;commitmenttocollaboration Developmentwiththecommunity Locusofauthority Locusofauthority
Technical Format Format
URIs URIs Persistenceofclassesandrelationships
Relations Relations
Textualdefinitions Textualdefinitions Textualdefinitionsfordomainexperts
Documentation Documentation
Namingconventions Namingconventions Textualdefinitionsfordomainexperts
ConservedURIs Persistenceofclassesandrelationships
Content Contentdelineation Contentdelineation Specificdomain
Contentcoverage Currentunderstandingreflected
Contentquality Textualdefinitionsfordomainexperts
Ontology Mapping Tool
Input ontologies or
mappings User Interface
Visualisation of ontology structure
Alignment editor
Framework
Workflow and Evaluation
Ontology Matching algorithm
1 2
3
Output mappings
Drug Discovery Today
FIGURE3
Functionalrequirementsofanontologymappingtool.
www.drugdiscoverytoday.com 5
Reviews
INFORMA
[image:6.612.42.578.80.264.2] [image:6.612.44.580.557.733.2]Numerous algorithms were tasked with matching pairs of
disease and phenotype ontologies in the OAEI 2016 challenge
(http://oaei.ontologymatching.org/2016). Predicted mappings
werecomparedtoa‘silverstandard’fromaconsensusvote,given
the absenceof‘goldstandard’ mappings,in additiontolimited
manualevaluation.Foursystems(AML[45],FCA-Map[46],
Log-Map(Bio)[47],andPhenoMF[40])gavethehighestperformance
fordetectionofequivalencematches,butallstruggledtodetect
semanticsimilarity[37].Itisclearthatacombinationof
automat-ed and manual curation is required to generate high-quality
mappings [48]. This is analogous to the workflow for protein
annotation,whereacombinationofautomatedandmanual
cura-tionisusedtoproduceandmaintaintheproteinknowledgebase,
UniProt[49].
Toward
services
for
ontology
mappings
Service
requirements
Ontologiesaredynamicentitiesthatevolveovertime.Common
changesinclude:classaddition;classdeprecation;combinationof
classes;andhierarchicalrelationships.Therefore,ontology
map-pingsarenotstaticresourcesandneedtoevolveinconcertwith
their source ontologies; it follows that any ontology mapping
needsto beprovidednotonly asa one-offprocess,butalso as
anongoingservice[1].
Whereasthemostfrequentlyusedontologiesareopenly
acces-sible,manyresearchersandorganisationsbuildtheirown
ontol-ogies,eithertoexpandonaparticularbranchofapublicontology orforareasthatarenotwellserved.Therefore,therearetwokey
usecasesforanOMS:(i)mappingamongpublicontologies;and
(ii)mappingbetweenpublicandinternalontologies.Theformer
can beachieved witha repositoryof mappingsamong popular
public ontologies, which has the benefit that it can manage
updates,utiliseexistingmappings,and generate newones. The
seconduse-casecanbeapproachedbyprovidingtoolingsuchthat
the user can generate bespoke mappings from their internal
ontologiestopublicontologiesasrequired.
Fortheseandotherusecases,anOMSshouldbeabletobe
used at all levels of an ontology, fromsingle terms toentire
branchesandontologies.Thisgivetheflexibilitythat
research-ers need indaily search and integration tasks. It is useful to
contrastanOMSwithanidentifiermappingservice,suchasthe
BridgeDb framework, which is focused on mapping between
databaseidentifiers[50].
AnOMSformappingamongpublicontologiesshouldbeableto
incorporateexistingmappingsets,suchasbyutilisingthe
cross-referencesbetweenontologiesthatarecommonlysuppliedaspart
of the source ontology. An OMS should also harness an OM
algorithm,in addition to curation, to enable mappingat scale
acrosswholeontologies.Ideally,itshouldalsoallowtheaddition
of user-curated content and validation of predicted mappings,
assistedby ‘crowd-sourcing’,whichhas been usedforontology
validation[51].
Existingstandardstorepresentalignmentsshouldbeusedbyan
OMS([44]Chapter10).Inaddition,itshouldprovidemetadatafor
mappings,which include: (i)dynamics: all ontologies and any
mappingsbetweenthemwillchangeovertime;thus,theservice
needstoreflectsuchdynamicsusingboththroughmanual
cura-tionand automationby OM algorithms. Asubset of metadata
shouldrecordsuchdynamicsforinteroperability andreuse; (ii)
provenance.Usersshould haveclearinformationonthe
prove-nanceofanymapping,includingontologysources,version
num-ber,downloaddate,andsoon.Specifically,foreachmapping,the
service should provide annotation with suitable metadata and
documentation,toenableinteroperabilityandreuse;(iii)quality.
The serviceshould providethe qualitymetrics for, and within,
mappings.Thisshould includesimilarity scoresforeach match
(expectedtorangefromexactandequivalenttoclosesimilarityto
broadlysimilar), anindication ofconfidence (e.g.,validated or
not)andglobalmetrics,suchasprecision(correctnessfrom
sam-ples) and recall (missing matches compared to standard
map-pings); (iv) license limitations. Some ontologies, for example
SNOMEDCT,have license restrictions,whichmightalso apply
toderivedproducts,suchasmappings.Theserestrictionsshould
becapturedaspartoftheOMmetadata.
Implementing
a
prototype
service
Aprototypeontologymappingservicehasbeenimplementedas
partofthePistoiaAllianceOntologiesMappingproject(https:// www.pistoiaalliance.org/projects/ontologies-mapping). The
pri-mary objective of this service is to provide mappings between
ontologies, building on existing EMBL-EBI services for the life
sciences [52]. In particular, the OM repository, OxO (https://
www.ebi.ac.uk/spot/oxo) is being developed to storemappings (orcross-references)betweentermsfromontologies,vocabularies,
and coding standards. OxO stores cross-references, which are
curatedmappings,embeddedin>200public ontologieshosted
REVIEWS DrugDiscoveryTodayVolume00,Number00June2019
DRUDIS-2475;NoofPages8
Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020
TABLE2
FeaturesandtechniquesusedbyOMalgorithms
Level Technicalbasis Shortdescriptionoftechnique
Element Stringbased Oftenusedtomatchnames,identifiers,andnamedescriptionsofontologicalentities Languagebased Considersnamesaswordsinsomenaturallanguages,suchasEnglish
Constraintbased Dealswithinternalconstraintsappliedtodefinitionsofentities,suchastypes,multiplicityofattributes,andkeys Informalresourcebased Deducesrelationsbetweenontologyentitiesbasedonhowtheyrelatetoeachother
Formalresourcebased Makesuseofformalresources,suchasdomain-specificontologies,upperontologies.andlinkeddata
Structure Graphbased Comparessourceontologies(includingdatabaseschemasandtaxonomies)asnodesonlabelledgraphs Taxonomybased Hierarchicalclassificationsconsideronlythespecialisationrelation
Modelbased Matchessourceontologiesbasedonsemanticinterpretation Instancebased Comparessetsofinstancesofclassestodecidewhethertheymatch
Knowledge Factordatabased Exploitsfactsordatastoredinrelevantknowledgebaseordatabase
6 www.drugdiscoverytoday.com
Reviews
INFORMA
[image:7.612.37.573.85.214.2]by the Ontology LookupService(OLS). OxOmakesit easierto
combinemapping data sets that are labelled in differentways.
Evenifdifferentstandardsareusedtoannotatedatasets,theycan
stillbemadeinteroperablethroughOM.Companiescanusethe
public-domainOMsinOxOtobridgethegapbetweenpublicand
privateresearchdata.
ThePistoiaAllianceprototypeaimedtobuildonOxOthrough
developmentofanOM algorithmto predictmappingsbetween
publicontologieshostedbyOLS.Theprototypeservicefocussed
onthephenotypeanddiseaseontologydomainfortenmappings
betweenfivepublicontologies,namely:HPO,DO,OrphanetRare
Disease Ontology (ORDO), MP, and MeSH. The mappings
pre-dictedbythealgorithm,developedfortheOMS,werecompared
with silver-standard mappings from consensus voting between
top-performingalgorithmsin OAEI2017[37,53].The predicted
mappings from this prototype service are stored in the OxO
repository, along with the curated cross-references (mappings)
embeddedinallthepublicontologieshostedbyOLS.
The OMalgorithm(technicaldetailswillbedisclosedinaplanned technicalpaper),poweringtheOMSstoredinOxO,isabletodetect
matcheswithhighsimilarityscore,wherelabelsandsynonymsare
equivalentorsimilarbetweenontologies.OxOalsostoresthe
man-uallycuratedcross-references,whichcanbemissedby thesilver
standards.Thispowerfulcombinationofpredictedmappingsfrom
analgorithmandcuratedmappingsisanexampleofasolutionthat candeliverascalableandsustainablemappingservice.
Concluding
remarks
Thisreviewshowstheimpressiveprogressmadeoverrecentyears
withengineeringontologiesandtheirmappingsbyutilising
mod-erntoolsandservices.Itdescribeshowthisprogressenablesbetter
supportforsemanticallyawareapplications.Wehighlightcrucial
challengesthatmustberecognisedandovercomebypublicand
privateenterpriseworkingtogetherinsustainablewaystodeliver thenecessarytoolsandservices.Theimportantprocessof
provid-ingqualitymappingsbetweenontologiesasasustainableservice
shouldbesupportedsothatitcanmatureasastandardisedand
consolidatedactivity.
Thecurrentfloodofbigdatainthelifesciences,especiallyfrom
‘omicssources,bringsmassivechallengesfordatamanagement.
Semanticalignmentanddatastandardisationarevitaltosolveif
we aregoingtoharnessmoderntechnologies,suchasmachine
learning,forfuturedrugdiscovery.Theseimportantchallengesare
being met by the biopharma industry through the ongoing
implementation of the Findable, Accessible, Interoperable and
Reusable(FAIR)guidingprinciplesforscientificdatamanagement
andgovernance,whichmakedata‘findable,accessible,
interoper-able,andreusable’[54,55].TheinteroperabilityprinciplesofFAIR aresupportedbytheeffectiveapplicationofontologiesandtheir
mappingstounderpinintegrationbetweenmanyrelevantsources
ofdata[2,56].
Acknowledgements
WewouldliketoexpressgratitudeforfundingoftheOntologies
MappingprojectbypayingmembersofthePistoiaAllianceInc.
WewishtothankThomasLienerandHelenParkinsonforhelpful
discussions,whichhaveinfluencedthisarticle.S.J.benefitedfrom
fundingfromtheEuropeanUnion’sHorizon2020researchand
innovationprogrammeundergrantagreementno.654248
(CORBEL).E.J.R.wassupportedbytheAIDAproject,fundedbythe
UKGovernment’sDefence&SecurityProgrammeinsupportof
theAlanTuringInstitute,theBIGMEDproject(IKT259055)and
theSIRIUSCentreforScalableDataAccess(ResearchCouncilof
Norway,projectno.:237889).
References
1Rathore,A.S.etal.(2017)Roleofknowledgemanagementindevelopmentand lifecyclemanagementofbiopharmaceuticals.Pharm.Res.34,243–256
2Grob,A.etal.(2016)Evolutionofbiomedicalontologiesandmappings:overviewof recentapproaches.Comput.Struct.Biotechnol.J.14,333–340
3Ashburner,M.etal.(2000)Geneontology:toolfortheunificationofbiology.The GeneOntologyConsortium.Nat.Genet.25,25–29
4Blake,J.A.etal.(2015)GeneOntologyConsortium:goingforward.NucleicAcidsRes. 43,D1049–D1056
5TheGeneOntologyConsortium(2017)ExpansionoftheGeneOntology knowledgebaseandresource.NucleicAcidsRes.45,D331–D333
6Smith,B.etal.(2007)TheOBOFoundry:coordinatedevolutionofontologiesto supportbiomedicaldataintegration.Nat.Biotechnol.25,1251–1255
7Malone,J.etal.(2016)Tensimplerulesforselectingabio-ontology.PLoSComput. Biol.12,e1004743
8Malone,J.etal.(2010)ModelingsamplevariableswithanExperimentalFactor Ontology.Bioinformatics6,1112–1118
9Singhai,A.etal.(2016)Pressingneedsofbiomedicaltextmininginbiocurationand beyond:opportunitiesandchallenges.Database2016 baw161
10Przbyia,P.etal.(2016)Textminingresourcesforthelifesciences.Database2016 baw145
11Winnenburg,R.I.andBodenreider,O.A.(2014)Aframeworkforassessingthe consistencyofdrugclassesacrosssources.J.Biomed.Semant.5,30
12Dragisic,Z.etal.(2017)Experiencesfromtheanatomytrackintheontology alignmentevaluationinitiative.J.Biomed.Semant.8,56
13Harland,L.etal.(2011)Empoweringindustrialresearchwithsharedbiomedical vocabularies.DrugDiscov.Today16,940–947
14Perez-Riverol,Y.etal.(2014)Opensourcelibrariesandframeworksformass spectrometry-basedproteomics:adeveloper’sperspective.Biochim.Biophys.Acta1844,63–76
15Kim,S.etal.(2016)Meshable:searchingPubMedabstractsbyutilizingMeSHand MeSH-derivedtopicalterms.Bioinformatics32,3044–3046
16Wang,L.etal.(2014)Standardizingadversedrugeventreportingdata.J.Biomed. Semant.5,36
17Anzai,T.etal.(2015)ResponsestotheStandardforExchangeofNonclinicalData (SEND)innon-UScountries.J.Toxicol.Pathol.28,57–64
18Mennini,F.S.etal.(2017)Economicburdenofdiverticulardisease:anobservational analysisbasedonrealworlddatafromanItalianregion.Dig.LiverDis.49,1003– 1008
19Whetzel,P.L.etal.(2013)NCBOTechnology:poweringsemanticallyaware applications.J.Biomed.Semant.15(Suppl.1),S8
20Hoehndorf,R.etal.(2015)Theroleofontologiesinbiologicalandbiomedical research:afunctionalperspective.BriefBioinform.16,1069–1080
21Croset,S.etal.(2016)Flexibledataintegrationandcurationusingagraph-based approach.Bioinformatics32,918–925
22Gomez-Cabrero,D.etal.(2014)Dataintegrationintheeraofomics:currentand futurechallenges.BMCSyst.Biol.8(Suppl.2),I1
23Lapatas,V.etal.(2015)Dataintegrationinbiologicalresearch:anoverview.J.Biol. Res.22,9
24Zhang,H.etal.(2018)Anontology-guidedsemanticdataintegrationframeworkto supportintegrativedataanalysisofcancersurvival.BMCMed.Inform.Decis.Mak.18 (Suppl.2),4
25Williams,A.J.etal.(2012)OpenPHACTS:semanticinteroperabilityfordrug discovery.DrugDiscov.Today17,1188–1198
26Koscielny,G.etal.(2017)OpenTargets:aplatformfortherapeutictarget identificationandvalidation.NucleicAcidsRes.45,D985–D994
27Yuryev,A.etal.(2009)Ariadne’sChemEffectandPathwayStudioknowledgebase. ExpertOpin.DrugDiscov.4,1307–1318
www.drugdiscoverytoday.com 7
Reviews
INFORMA
28Shmelkov,E.etal.(2011)Assessingqualityandcompletenessofhuman transcriptionalregulatorypathwaysonagenome-widescale.Biol.Direct6,15 29Sarntivijai,S.etal.(2016)Linkingrareandcommondisease:mappingclinicaldisease–
phenotypestoontologiesintherapeutictargetvalidation.J.Biomed.Semant.7,8 30Kafkas,Ş.etal.(2017)Literatureevidenceinopentargets—atargetvalidation
platform.J.Biomed.Semant.8,20
31Maldonado,R.etal.(2018)Deeplearningmeetsbiomedicalontologies:knowledge embeddingsforepilepsy.AMIAAnnu.Symp.Proc.2017,1233–1242
32ArguelloCasteleiro,M.etal.(2018)Deeplearningmeetsontologies:experimentstoanchor thecardiovasculardiseaseontologyinthebiomedicalliterature.J.Biomed.Semant.9,13 33Garcı´a-Campos,M.A.etal.(2015)Pathwayanalysis:stateoftheart.Front.Physiol.6,383 34Raje,S.I.andBodenreider,O.(2017)Interoperabilityofdiseaseconceptsinclinical andresearchontologies:contrastingcoverageandstructureinthediseaseontology andSNOMEDCT.Stud.HealthTechnol.Inform.245,925–992
35Dhombres,F.andBodenreider,O.(2016)Interoperabilitybetweenphenotypesin researchandhealthcareterminologies—investigatingpartialmappingsbetween HPOandSNOMEDCT.J.Biomed.Semant.7,3
36Petri,V.etal.(2014)DiseasepathwaysattheRatGenomeDatabasePathwayPortal: genesincontext—anetworkapproachtounderstandingthemolecular mechanismsofdisease.Hum.Genomics8,17
37Harrow,I.etal.(2017)Matchingdiseaseandphenotypeontologiesintheontology alignmentevaluationinitiative.J.Biomed.Semant.8,55
38Kamder,M.R.etal.(2017)Asystematicanalysisoftermreuseandtermoverlap acrossbiomedicalontologies.Semant.Web8,853–871
39Mungall,C.J.etal.(2012)Uberon,anintegrativemulti-speciesanatomyontology. GenomeBiol.13,R5
40Rodrı´guez-Garcı´a,M.A´ etal.(2017)Integratingphenotypeontologieswith PhenomeNET.J.Biomed.Semant.8,58
41Mungall,C.J.etal.(2017)TheMonarchInitiative:anintegrativedataandanalyticplatform connectingphenotypestogenotypesacrossspecies.NucleicAcidsRes.45,D712–D772
42Matentzoglu,N.etal.(2017)MIRO:guidelinesforminimuminformationforthe reportingofanontology.J.Biomed.Semant.9,6
43Martı´nez-Romero,M.etal.(2017)NCBOOntologyRecommender2.0:anenhanced approachforbiomedicalontologyrecommendation.J.Biomed.Semant.8,21 44Euzenat,J.andShvaiko,P.(2013)OntologyMatching(2ndedn.),Springer 45Faria,D.etal.(2018)Tacklingthechallengesofmatchingbiomedicalontologies.J.
Biomed.Semant.9,4
46Zhao,M.etal.(2018)Matchingbiomedicalontologiesbasedonformalconcept analysis.J.Biomed.Semant.9,11
47Jimenez-Ruiz,E.andCuencaGrau,B.(2011)LogMap:logic-basedandscalable ontologymatching.Int.Semant.WebConf.2011,273–288
48Dragistic,Z.etal.(2016)Uservalidationinontologyalignment.Int.Semant.Web Conf.2016,200–217
49UniProtConsortium(2015)UniProt:ahubforproteininformation.NucleicAcids Res.43,D204–D212
50vanIersel,M.P.etal.(2010)TheBridgeDbframework:standardizedaccesstogene, proteinandmetaboliteidentifiermappingservices.BMCBioinf.11,5
51Mortensen,J.M.etal.(2016)Isthecrowdbetterasanassistantorareplacementin ontologyengineering?AnexplorationthroughthelensoftheGeneOntology.J. Biomed.Inf.60,199–209
52Perez-Riverol,Y.etal.(2017)OLSClientandOLSDialog—OpenSourceToolsto AnnotatePublicOmicsDatasets.Proteomics17(19),
53Arichi,M.etal.(2017)ResultsoftheOntologyAlignmentEvaluationInitiative 2017.CEURWorkshopProc.2032,61–113
54Wilkinson,M.D.etal.(2016)TheFAIRGuidingPrinciplesforscientificdata managementandstewardship.Sci.Data3,160018
55Wise,J.etal.(2019)ImplementationandrelevanceofFAIRdataprinciplesin biopharmaceuticalR&D.DrugDiscov.Today24,933–938
56Dhombres,F.andCharlet,J.(2017)Knowledgerepresentationandmanagement, it’stimetointegrate!Yearb.Med.Inform.26,148–151
REVIEWS DrugDiscoveryTodayVolume00,Number00June2019
DRUDIS-2475;NoofPages8
Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020
8 www.drugdiscoverytoday.com
Reviews
INFORMA