• No results found

Ontology mapping for semantically enabled applications

N/A
N/A
Protected

Academic year: 2019

Share "Ontology mapping for semantically enabled applications"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

City, University of London Institutional Repository

Citation

:

Harrow, I., Balakrishnan, R., Jimenez-Ruiz, E. ORCID: 0000-0002-9083-4599,

Jupp, S., Lomax, J., Reed, J., Romacker, M., Senger, C., Splendiani, A., Wilson, J. and

Woollard, P. (2019). Ontology mapping for semantically enabled applications. Drug

Discovery Today, doi: 10.1016/j.drudis.2019.05.020

This is the published version of the paper.

This version of the publication may differ from the final published

version.

Permanent repository link: http://openaccess.city.ac.uk/id/eprint/22924/

Link to published version

:

http://dx.doi.org/10.1016/j.drudis.2019.05.020

Copyright and reuse:

City Research Online aims to make research

outputs of City, University of London available to a wider audience.

Copyright and Moral Rights remain with the author(s) and/or copyright

holders. URLs from City Research Online may be freely distributed and

linked to.

City Research Online:

http://openaccess.city.ac.uk/

publications@city.ac.uk

(2)

Ontology

mapping

for

semantically

enabled

applications

Ian

Harrow

1

,

Rama

Balakrishnan

2

,

Ernesto

Jimenez-Ruiz

3,4

,

Simon

Jupp

5

,

Jane

Lomax

6

,

Jane

Reed

7

,

Martin

Romacker

8

,

Christian

Senger

9

,

Andrea

Splendiani

10

,

Jabe

Wilson

11

and

Peter

Woollard

12

1IanHarrowConsultingLtd,Whitstable,UK 2GenentechInc.,SouthSanFrancisco,CA,USA 3TheAlanTuringInstitute,London,UK

4DepartmentofInformatics,UniversityofOslo,Oslo,Norway

5EuropeanMolecularBiologyLaboratory,EuropeanBioinformaticsInstitute(EMBL-EBI),Hinxton,Cambridge,UK 6SciBiteLtd,Cambridge,UK

7LinguamaticsLtd,Cambridge,UK

8RocheInnovationCenter,Basel,Switzerland 9OSTHUSGmbH,Aachen,Germany 10Novartis,Basel,Switzerland 11ElsevierRELX,London,UK 12GlaxoSmithKline,Stevenage,UK

In

this

review,

we

provide

a

summary

of

recent

progress

in

ontology

mapping

(OM)

at

a

crucial

time

when

biomedical

research

is

under

a

deluge

of

an

increasing

amount

and

variety

of

data.

This

is

particularly

important

for

realising

the

full

potential

of

semantically

enabled

or

enriched

applications

and

for

meaningful

insights,

such

as

drug

discovery,

using

machine-learning

technologies.

We

discuss

challenges

and

solutions

for

better

ontology

mappings,

as

well

as

how

to

select

ontologies

before

their

application.

In

addition,

we

describe

tools

and

algorithms

for

ontology

mapping,

including

evaluation

of

tool

capability

and

quality

of

mappings.

Finally,

we

outline

the

requirements

for

an

ontology

mapping

service

(OMS)

and

the

progress

being

made

towards

implementation

of

such

sustainable

services.

Introduction

Biomedicalresearchisunderadelugeofanincreasingamountand

varietyofdata.Diversetechnologiesenablemoregranular

mea-surementsfromthe laboratorybenchtotheclinicalbedsidefor

personalisedtreatments.Torealisethispromise,suchdataneedto bebroughttogethertobuildconsistentbiologicalknowledgebases

[1].Aspartofthisprocess,differentconcepts,terminologies,and datamodelsneedtobereconciled.Thisreconciliationissupported

byavarietyofknowledgemanagementresources,whichcovera

continuousspectrumof‘semanticexpressivity’(Fig.1).

Atoneextreme,wehavesimplelists,suchascontrolled

vocab-ularies. Integration is significantly easier when different data

sources use terms froma standardised list,insteadof freetext.

Resources that have greater semantic expressivity enable more

supportforintegrationandinteroperability,forinstance

leverag-ing synonymsor translating across languages. At the other

ex-tremeofthesemanticspectrum,wehaveontologies.Thesearea

setofconceptsinasubjectareaordomainthatshowsrelations

betweenconceptsrepresentedbyproperties.

Ontologiesgobeyondlists,thesauri,andtaxonomiestoprovidea

formal description of definitions of conceptual classes and their

relations(oneexamplebeingtheirhierarchicalstructure).‘Formal’ meansthatdefinitionsarebasedonalogicalframework,suchasthe

WebOntologyLanguage(OWL).Thisenablesarepresentationofthe

meaningofconceptsthatismachineprocessable,ultimatelyallowing reasoning,generationofnewknowledge,andautomaticdetectionof

Reviews

INFORMA

TICS

Correspondingauthor:Harrow,I. (ianharrowconsulting@gmail.com)

1359-6446/ã2019TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).

(3)

inconsistencies in the semanticmodel [2]. Inaddition, Uniform ResourceIdentifiers(URI)uniquelyreferenceeachclasstosupport machineprocessingandinteroperability.

One of the strongest examples of a mature ontology in the

biomedicalsciences isGeneOntology(GO) [3,4],whichis used

extensivelybyamultitudeofapplicationsandanalyticaltools[5]. Ideally,eachdomain in thebiomedicalsciencesshould be supported by a singlereference ontology [6],anidea thatwasoriginally a

strategicobjectiveoftheOpenBiomedicalOntologies(OBO)

con-sortium [6]. However, the real situation is different and finds

numerousoverlappingontologies,eachhavingtheirowncontexts

ofapplication. Thiscreatesproblems ofreconciliation andeven

difficultiesinselectionofthemostappropriateresource[7].

Overlapbetweenontologieshappensforavariety ofreasons.

One ofthese isthat a singlereference ontologyoften provides

insufficientcoverageforaparticularapplication,whichgivesrise

tothedevelopmentofapplicationontologies,suchasthe

Experi-mental Factor Ontology (EFO). The EFO uses relevant parts of

referenceontologiesandcrossreferences(ormappings)between

them [8]. Mapping between ontologies expands the coverage

acrosslargedomains,suchasanatomy,disease,phenotype,and

laboratoryinvestigation.Mappingbetweenontologiesindifferent

domainsrequiresthediscoveryoftheevidenceforarelationship

through,forexample,dataortextmining[9,10].

Anotherreasonisthatmanyapplicationsmakeuseofclassification

systems,suchasMedicalSubjectHeadings(MeSH),Enzyme

Commis-sion(E.C.)nomenclature,AnatomicalTherapeuticChemical Classifi-cation of drugs (ATC), or Human Gene Nomenclature (HGNC), which, althoughpowerful,werenever designedas ontologies. However, it can beveryusefultomapbetweensuchclassificationsystemsand ontol-ogies,whichontology-matchingalgorithmsareabletodo[11].

Althoughapplicationontologiesandmappingtocodelistscan

bebuiltbymanualcuration,itisdesirabletoaugmentthisprocess

withontology-matchingalgorithms[12].Thesebring scalability

whilereducingthecostofmaintenance.

Application

of

ontologies

and

their

mappings

Ontology

application

Controlledvocabularieshavebeenusedformanydecades,

espe-ciallyby industry, toensure theconsistency ofmetadata while

collectingdataofanexperimentorconductinganalysis,oftenin

laboratory information-management systems [13]. Seamlessly

integratedintoapplications,controlledvocabulariesand

ontolo-giescanspeedupthe entryofdatasetsandfacilitatethe

subse-quentretrievalofdatathroughsimplesearchinterfaces.This is

becauseexperimentalmetadataforabiologicalassaycancomprise

many elements, including creation date, experimenter, batch/

sampleinformation(e.g.,tissueorcelltype,cellline,etc.),disease

ornormalstatus,andtreatment(stimulant,compound,placebo,

ortimecourse)[14].

Ontologiesalreadyhaveanimportantroleinannotatingand

organising the vast wealth of experimental, clinical, and

real-worlddataand theirday-to-dayusageiswellestablishedinthe

scientific community. Therefore, it is not surprising that the

important biomedical literature resource, PubMed, developed

and applies the MeSH taxonomy for indexing and searching

journalarticles[15].Inpharmacovigilance, adverseeventsneed

to bereported to the US Food and Drug Administration(FDA)

using the MedDRA ontologyas a system to encode regulatory

information[16].Furthermore,the FDA hasmandatedthatthe

StudyDataTabulationModel(SDTM),developedbytheClinical

DataInterchangeStandardsConsortium(CDISC),mustbeusedas

the standard for the submission of study data. The controlled

terminologies of SDTM are integrated with the NCI Thesaurus

[17].Real-worlddataprovidesafinalexample,whereWHO

clas-sificationsofdisease,ICD-N,havebeenusedforannotation[18]. Giventhatprecisionmedicine,personalisedhealthcare,and

trans-lationalmedicine areincreasinglydrivingmodernresearchand

development in the biopharmaceutical industry, it is vital to

combinedatafromthevastnumberofpublicandprivate

reposi-tories using all these different classifications and ontologies.

Therefore,ontologymappingisintrinsicallytiedtodata integra-tion,whichiscrucialforthesuccessfuldiscoveryanddevelopment

ofinnovativetreatmentsofdisease.

Ontologiesareoneofthemechanismstoencodethe

seman-tics for an area of human knowledge in a machine-readable

manner [19,20]. They arevital for capturing meaningful

rela-tionshipstoallowuserstosearchorbrowserelationshipsandto

identifypatternsfromanalysis[21–24].Considermodernsearch

engines,suchasGoogle andBing, whichuseminimalcontext

REVIEWS DrugDiscoveryTodayVolume00,Number00June2019

DRUDIS-2475;NoofPages8

Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020

Semantic expressivity

Thesauri Lists

Classifications and taxonomies

Ontologies

Types: Sets of terms Glossaries Dictionaries Vocabularies

Types: Synonyms Related meanings Associations

Types: Catalogues Directories Hierarchies Taxonomies

Features: URIs Definitions Hierarchies Relations

Drug Discovery Today

FIGURE1

Thespectrumofsemanticexpressivityforknowledgemanagementresources.Abbreviation:URI,UniformResourceIdentifier.

2 www.drugdiscoverytoday.com

Reviews

INFORMA

(4)

and,inthecaseofWikipedia,usersarepresentedwitha disam-biguationpagetoselecttherelevantresults.Thiscontrastswith

a search ofscientific data andliterature,whichrequires more

consistentandreliableresultsbyharnessingcontrolled

vocabu-laries,classification systems,andontologies(Fig.1),especially

whenthousands,ifnotmillions,ofresultsneedtobeprocessed

automatically.Considertheexample ofbone disease,as

illus-tratedinFig.2,wherewecanseethepositionsofLegg–Calve–

PerthesdiseaseandCoxaMagnaintheMeSHhierarchy,without

any other prior knowledge. Such hierarchical structure of a

taxonomyorontology can alsohelpwiththevisualisation of

data,so thata usercanstart withabroadclass,such asbone

disease,andthenmoveontoconsidermorespecific,yetrelated,

diseases.

Ontologies and their mappings have a central role in open

semantically enabled applications, such as Open PHACTS [25]

andOpenTargets[26].Commercialexamplesofsimilar

applica-tionsareElsevier’sPathway Studio[27]and ClarivateAnalytics’

MetaCore/MetaBase[28].InthecaseofOpenTargets,thispublic

targetvalidationapplicationmakesfundamentaluseoftheEFO,

whichhasbeendevelopedandoptimisedtosupportsuch

applica-tions [29].Many ofthese powerfulapplications use automated

Drug Discovery Today

FIGURE2

Therelationalpositionbetweentwobonediseases,Legg–Calve–PerthesDiseaseandCoxaMagna,intheMedicalSubjectHeading(MeSH)hierarchy.

www.drugdiscoverytoday.com 3

Reviews

INFORMA

(5)

text-miningtechnologypoweredbyontologiestofacilitatesearch forsubject–verb–objecttripletsinscientifictexts.

Theevidenceembeddedintheseapplicationsisoftenintegrated

withgraphicalvisualisationandstatisticalanalysis,wheremapped

ontologiesarethekeycomponentsforbeingabletoexaminethe

underlyingbiologyofahypothesisoranexperiment[26,30–32].

Themappedontologiesvarybyapplication,buttypicallyinclude

GO,DiseaseOntology(DO),HumanPhenotypeOntology(HPO),

EFO, MeSH, and NCBI taxonomy. Crucially, such applications

providelinkstotheliteraturefromwhichmappingswerederived,

whichisimportanttoassesstheconfidenceinsuchinformation

[30,33].

Mapping

between

ontologies

Ontologymapping(ormatching)iscentraltoprovidingsemantic

accessacrossaggregateddatausedinknowledge-based products

andservicesconsumedbylifesciencecompanies,academic

insti-tutions,anduniversities.Whenbringingtogetherontologiesand

related resources (Fig. 1), we are faced with different scenarios reflectingdifferentusecasesformappings.

As mentioned earlier, often different ontologies are used to

annotate the same or similar domains, for example, HPO and

Mammalian Phenotype (MP) Ontology. These ontologies have

beendevelopedindependentlybydifferentcommunitiesormight

becustomisedtomeetspecificuserneeds.Inthiscase,ontology

mapping finds equivalence (exact or synonymous matches) or

relationshipsinthehierarchy,whichcanbeshownarroworbroad

semanticsimilarity.AnothersimilarexampleisDO,whichisused

widelybytheresearchcommunity,whereasSNOMEDCTisused

mostlyby healthcareworkersandclinicians,forexampleinthe

National Health Service, UK (https://digital.nhs.uk/services/

terminology-and-classifications/snomed-ct). Translational

appli-cations require interoperability by mapping betweenthese two

important ontologies, which has been approached successfully

through lexical mappings supplemented by Unified Medical

LanguageSystem(UMLS)concepts[34,35].

Anotherapplicationofontologymappingwithina domainis

the predictiveuseofphenotypeannotationsindifferent model

organisms. For example, rare human gene mutations can be

annotatedby relatinghomologous mutationstophenotypes in

modelorganismsfordiagnosisofrareinheriteddiseases[36].

Finally,matchingcanalsorelateontologytermsbetween

close-lyrelateddomains,suchasdiseaseandphenotypes[37].Inthis

case,wearelookingatestablishingmoregenericrelationsbetween

concepts,effectivelydefiningaknowledgenetwork.Thisscenario

isafrequenttaskinlifesciences,whereontologymatchingcan

bridgedifferentdomainsandsupportcomplexresearchquestions.

Challenges

and

solutions

for

better

mappings

Generating ontology mappings can provide several challenges.

Wordsinlanguagecanhaveambiguousmeaningsthatdependon

thecontext.Forexample,theEnglishword‘mole’,inanatomyitis askinfeature,inchemistryitisaunitofmeasure,forananimal

there are numerousspecies of talpid ‘true’mole or a distantly

related, marsupial mole or golden mole.Beyond the scientific

realm, mole can be a human surname, the name of various

villages, rivers, and creeks, and is also an embedded spy in

an organisation, and so on. This ambiguity means that it is

insufficient to simply match class names, terms, or labels for

successfulontologymapping.Therefore,itisimportanttomake

useofcontexttoresolveambiguity,whichincludesbackground

knowledge and relations among concepts [38]. Another major

challengetomappingontologiesismanagingtheconsequences

ofontologydynamics,whichreflectandrepresenthowscientific

understandingevolves[1,38].Thismeansthatanyderived

map-pings have to be maintained, while making sure that source

identifiersandlabelsareretained.Weexpandonthischallenge

intheOMSsectionofthisreview.

A common approach to tackling the challenge of mapping

betweendifferent ontologiesistomap allthe termstoa single

ontologyorknowledgeresource.Manysourceontologiescontain

embeddedcross-referencesthatcanbeusedascuratedmatchesto

another ontology. An Ontology of Biomedical Associations

(OBAN)isanexampleofsuchanapproach,whichwasconstructed

asalarge-scale,genericterm-associationmodelto support

con-structionofatargetvalidationknowledgebase[29].PhenomeNET

isafurtherexample,wherespecies-specificphenotypeontologies

are mapped based on the overarching, anatomy ontology,

UBERON,whichidentifiesequivalentphenotypefeaturesthrough

anatomicalconceptsacrossdifferentspecies[39,40].Similarly,the

Monarch Initiative has built a platform for mapping between

phenotypesandgenotypesacrossspecies,andincludesthe

Mon-archMergedDiseaseOntology,calledMONDO[41].

Guidance,

principles,

and

simple

rules

for

the

selection

of

ontologies

Whenseveralontologiesoverlaptocoverascientificdomain,we

arefacedwiththeproblemofhowtoselectwhichontologytouse.

In clinical sciences, the best practice is mature enough to be

governedbyauthoritiestomeet governmentregulations,as

de-scribedearlier,whereas,inpreclinicalandtranslationalresearch,

bestpracticesanddatastandardstendtobelessmatureandeven

absent.ThissituationpromisestoimprovewiththeMIRO

guide-linesforMinimumInformationforReportingofanOntology[42].

ThePistoiaGuidelinesweredevisedasapragmaticsteptosupport

theselectionofontologiesbeforetheapplicationandmappingof

ontologies. These guidelines are available on a public wiki of

Ontologies Mapping Resources, hosted by the Pistoia Alliance

(https://pistoiaalliance.atlassian.net/wiki/spaces/PUB/pages/ 43089928/Ontologies+Mapping+Resources). They comprise of threetypesofguideline:general,technical,andcontentinTable

1. This table shows how the Pistoia guidelines align with the

principles of the Open Biological and Biomedical Ontologies

(OBO) Foundry (http://www.obofoundry.org), which are under

constantdevelopment and reviewby the OBO community. In

addition,Table1alsoshowsalignmenttothepaperentitled‘Ten

SimpleRules forSelecting aBio-ontology’publishedby Malone

etal.[7].

Thesuitabilityofontologiesforaparticularapplication,suchas

geneexpressionanalysisormappingbetweenontologies,canbe

reviewedusingthe availablerulesand guidelines.The National

Center for Biomedical Ontology has developed a tool for this

purpose,calledtheOntologyRecommender2.0[43].

‘Sometimes an Ontology is Not Needed at All’ is the tenth

simpleruleofMaloneetal.[7].Thisisbecausemorelight-weight

knowledgemanagementsystemsmightbesufficient(seeFig.1for

REVIEWS DrugDiscoveryTodayVolume00,Number00June2019

DRUDIS-2475;NoofPages8

Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020

4 www.drugdiscoverytoday.com

Reviews

INFORMA

(6)

examples).Therefore,selectionofanontologyorrelatedresource

shouldbedrivenbyunderstandingtheneedsoftheusers.

Ontologies

mapping

tool

evaluation

Tool

requirements

and

capabilities

Asetofminimalrequirementscanbeusedtocomparethenumerous

academic and commercial tools designedfor mapping between

ontologies. These functional requirements comprise of three

aspects:(i)userInterfacetoincludevisualisationofsource ontolo-giesandmappingalignmenteditor;(ii)frameworktoinclude

work-flow and ontology matching (OM) algorithm; and (iii) import

ontologies or mappingsand export of mappings(Fig. 3). These

requirementsincludeelementsoftheontologyalignmentlifecycle thathavebeendescribedbyEuzenatandShvaiko([44]Chapter3).

Such functional requirements can be used to compare and

evaluate the capabilities of public and commercial

ontology-mappingtools.Thisprocesswasundertakenin2016,andfound

that one academic tool (AML [45]) and two commercial tools

[Infotech Soft (http://infotechsoft.com) and Mondeca (http://

en.mondeca.com)] satisfied more than 80% of the functional requirementsillustratedinFig.3.

Evaluation

of

ontology

matching

algorithms

OM algorithms arecomputationaltoolsthat map betweentwo

ontologies,andhavewideapplicationbeyond lifesciences[44].

TheOntologyAlignmentEvaluationInitiative(OAEI;http://oaei.

ontologymatching.org) is a mature and open annual challenge thathasoperatedsince2004.Itprovidesacompetitiveplatformto

showcaseandevaluatetheperformanceoflatestalgorithms.

Itisusefultoconsiderthedifferentfeaturesandtechniquesused

byOMalgorithms,whichcanbeclassifiedassummarisedinTable

2 ([44]Chapter 3). They harnesslexical features (e.g.,different names,synonyms,anddefinitionsofconcepts),structural,logical,

or hierarchicalfeatures (e.g.,therelation oneconcept haswith

otherconceptswithinanontology),extendedinformationabout

the source ontologies (e.g., usage in annotations), and exploit

backgroundinformation(e.g.,UMLS)[45].

OMalgorithmsproduceasetofmatchesbetweentheclassesin

the twoontologies beingmapped.Suchmatchesmightexpress

equivalence,binary,ormultiplerelationswithascoreof similari-ty.Thequalityofthepredictedmatchesofthemappingresultswill

depend onoptimising thealgorithm parameters,whichwillbe

specificfortheontologiesbeingmapped.

TABLE1

Comparisonofguidelines,principles,andsimplerulesfortheselectionofontologies

Type PistoiaGuidelines OBOprinciples Simplerulespaper[7]

Generic License Open Open

Maintenance Maintenance Activedevelopment

Versioning Versioning Previousversionsavailable

Users Pluralityofusersdocumented;commitmenttocollaboration Developmentwiththecommunity Locusofauthority Locusofauthority

Technical Format Format

URIs URIs Persistenceofclassesandrelationships

Relations Relations

Textualdefinitions Textualdefinitions Textualdefinitionsfordomainexperts

Documentation Documentation

Namingconventions Namingconventions Textualdefinitionsfordomainexperts

ConservedURIs Persistenceofclassesandrelationships

Content Contentdelineation Contentdelineation Specificdomain

Contentcoverage Currentunderstandingreflected

Contentquality Textualdefinitionsfordomainexperts

Ontology Mapping Tool

Input ontologies or

mappings User Interface

Visualisation of ontology structure

Alignment editor

Framework

Workflow and Evaluation

Ontology Matching algorithm

1 2

3

Output mappings

Drug Discovery Today

FIGURE3

Functionalrequirementsofanontologymappingtool.

www.drugdiscoverytoday.com 5

Reviews

INFORMA

[image:6.612.42.578.80.264.2] [image:6.612.44.580.557.733.2]
(7)

Numerous algorithms were tasked with matching pairs of

disease and phenotype ontologies in the OAEI 2016 challenge

(http://oaei.ontologymatching.org/2016). Predicted mappings

werecomparedtoa‘silverstandard’fromaconsensusvote,given

the absenceof‘goldstandard’ mappings,in additiontolimited

manualevaluation.Foursystems(AML[45],FCA-Map[46],

Log-Map(Bio)[47],andPhenoMF[40])gavethehighestperformance

fordetectionofequivalencematches,butallstruggledtodetect

semanticsimilarity[37].Itisclearthatacombinationof

automat-ed and manual curation is required to generate high-quality

mappings [48]. This is analogous to the workflow for protein

annotation,whereacombinationofautomatedandmanual

cura-tionisusedtoproduceandmaintaintheproteinknowledgebase,

UniProt[49].

Toward

services

for

ontology

mappings

Service

requirements

Ontologiesaredynamicentitiesthatevolveovertime.Common

changesinclude:classaddition;classdeprecation;combinationof

classes;andhierarchicalrelationships.Therefore,ontology

map-pingsarenotstaticresourcesandneedtoevolveinconcertwith

their source ontologies; it follows that any ontology mapping

needsto beprovidednotonly asa one-offprocess,butalso as

anongoingservice[1].

Whereasthemostfrequentlyusedontologiesareopenly

acces-sible,manyresearchersandorganisationsbuildtheirown

ontol-ogies,eithertoexpandonaparticularbranchofapublicontology orforareasthatarenotwellserved.Therefore,therearetwokey

usecasesforanOMS:(i)mappingamongpublicontologies;and

(ii)mappingbetweenpublicandinternalontologies.Theformer

can beachieved witha repositoryof mappingsamong popular

public ontologies, which has the benefit that it can manage

updates,utiliseexistingmappings,and generate newones. The

seconduse-casecanbeapproachedbyprovidingtoolingsuchthat

the user can generate bespoke mappings from their internal

ontologiestopublicontologiesasrequired.

Fortheseandotherusecases,anOMSshouldbeabletobe

used at all levels of an ontology, fromsingle terms toentire

branchesandontologies.Thisgivetheflexibilitythat

research-ers need indaily search and integration tasks. It is useful to

contrastanOMSwithanidentifiermappingservice,suchasthe

BridgeDb framework, which is focused on mapping between

databaseidentifiers[50].

AnOMSformappingamongpublicontologiesshouldbeableto

incorporateexistingmappingsets,suchasbyutilisingthe

cross-referencesbetweenontologiesthatarecommonlysuppliedaspart

of the source ontology. An OMS should also harness an OM

algorithm,in addition to curation, to enable mappingat scale

acrosswholeontologies.Ideally,itshouldalsoallowtheaddition

of user-curated content and validation of predicted mappings,

assistedby ‘crowd-sourcing’,whichhas been usedforontology

validation[51].

Existingstandardstorepresentalignmentsshouldbeusedbyan

OMS([44]Chapter10).Inaddition,itshouldprovidemetadatafor

mappings,which include: (i)dynamics: all ontologies and any

mappingsbetweenthemwillchangeovertime;thus,theservice

needstoreflectsuchdynamicsusingboththroughmanual

cura-tionand automationby OM algorithms. Asubset of metadata

shouldrecordsuchdynamicsforinteroperability andreuse; (ii)

provenance.Usersshould haveclearinformationonthe

prove-nanceofanymapping,includingontologysources,version

num-ber,downloaddate,andsoon.Specifically,foreachmapping,the

service should provide annotation with suitable metadata and

documentation,toenableinteroperabilityandreuse;(iii)quality.

The serviceshould providethe qualitymetrics for, and within,

mappings.Thisshould includesimilarity scoresforeach match

(expectedtorangefromexactandequivalenttoclosesimilarityto

broadlysimilar), anindication ofconfidence (e.g.,validated or

not)andglobalmetrics,suchasprecision(correctnessfrom

sam-ples) and recall (missing matches compared to standard

map-pings); (iv) license limitations. Some ontologies, for example

SNOMEDCT,have license restrictions,whichmightalso apply

toderivedproducts,suchasmappings.Theserestrictionsshould

becapturedaspartoftheOMmetadata.

Implementing

a

prototype

service

Aprototypeontologymappingservicehasbeenimplementedas

partofthePistoiaAllianceOntologiesMappingproject(https:// www.pistoiaalliance.org/projects/ontologies-mapping). The

pri-mary objective of this service is to provide mappings between

ontologies, building on existing EMBL-EBI services for the life

sciences [52]. In particular, the OM repository, OxO (https://

www.ebi.ac.uk/spot/oxo) is being developed to storemappings (orcross-references)betweentermsfromontologies,vocabularies,

and coding standards. OxO stores cross-references, which are

curatedmappings,embeddedin>200public ontologieshosted

REVIEWS DrugDiscoveryTodayVolume00,Number00June2019

DRUDIS-2475;NoofPages8

Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020

TABLE2

FeaturesandtechniquesusedbyOMalgorithms

Level Technicalbasis Shortdescriptionoftechnique

Element Stringbased Oftenusedtomatchnames,identifiers,andnamedescriptionsofontologicalentities Languagebased Considersnamesaswordsinsomenaturallanguages,suchasEnglish

Constraintbased Dealswithinternalconstraintsappliedtodefinitionsofentities,suchastypes,multiplicityofattributes,andkeys Informalresourcebased Deducesrelationsbetweenontologyentitiesbasedonhowtheyrelatetoeachother

Formalresourcebased Makesuseofformalresources,suchasdomain-specificontologies,upperontologies.andlinkeddata

Structure Graphbased Comparessourceontologies(includingdatabaseschemasandtaxonomies)asnodesonlabelledgraphs Taxonomybased Hierarchicalclassificationsconsideronlythespecialisationrelation

Modelbased Matchessourceontologiesbasedonsemanticinterpretation Instancebased Comparessetsofinstancesofclassestodecidewhethertheymatch

Knowledge Factordatabased Exploitsfactsordatastoredinrelevantknowledgebaseordatabase

6 www.drugdiscoverytoday.com

Reviews

INFORMA

[image:7.612.37.573.85.214.2]
(8)

by the Ontology LookupService(OLS). OxOmakesit easierto

combinemapping data sets that are labelled in differentways.

Evenifdifferentstandardsareusedtoannotatedatasets,theycan

stillbemadeinteroperablethroughOM.Companiescanusethe

public-domainOMsinOxOtobridgethegapbetweenpublicand

privateresearchdata.

ThePistoiaAllianceprototypeaimedtobuildonOxOthrough

developmentofanOM algorithmto predictmappingsbetween

publicontologieshostedbyOLS.Theprototypeservicefocussed

onthephenotypeanddiseaseontologydomainfortenmappings

betweenfivepublicontologies,namely:HPO,DO,OrphanetRare

Disease Ontology (ORDO), MP, and MeSH. The mappings

pre-dictedbythealgorithm,developedfortheOMS,werecompared

with silver-standard mappings from consensus voting between

top-performingalgorithmsin OAEI2017[37,53].The predicted

mappings from this prototype service are stored in the OxO

repository, along with the curated cross-references (mappings)

embeddedinallthepublicontologieshostedbyOLS.

The OMalgorithm(technicaldetailswillbedisclosedinaplanned technicalpaper),poweringtheOMSstoredinOxO,isabletodetect

matcheswithhighsimilarityscore,wherelabelsandsynonymsare

equivalentorsimilarbetweenontologies.OxOalsostoresthe

man-uallycuratedcross-references,whichcanbemissedby thesilver

standards.Thispowerfulcombinationofpredictedmappingsfrom

analgorithmandcuratedmappingsisanexampleofasolutionthat candeliverascalableandsustainablemappingservice.

Concluding

remarks

Thisreviewshowstheimpressiveprogressmadeoverrecentyears

withengineeringontologiesandtheirmappingsbyutilising

mod-erntoolsandservices.Itdescribeshowthisprogressenablesbetter

supportforsemanticallyawareapplications.Wehighlightcrucial

challengesthatmustberecognisedandovercomebypublicand

privateenterpriseworkingtogetherinsustainablewaystodeliver thenecessarytoolsandservices.Theimportantprocessof

provid-ingqualitymappingsbetweenontologiesasasustainableservice

shouldbesupportedsothatitcanmatureasastandardisedand

consolidatedactivity.

Thecurrentfloodofbigdatainthelifesciences,especiallyfrom

‘omicssources,bringsmassivechallengesfordatamanagement.

Semanticalignmentanddatastandardisationarevitaltosolveif

we aregoingtoharnessmoderntechnologies,suchasmachine

learning,forfuturedrugdiscovery.Theseimportantchallengesare

being met by the biopharma industry through the ongoing

implementation of the Findable, Accessible, Interoperable and

Reusable(FAIR)guidingprinciplesforscientificdatamanagement

andgovernance,whichmakedata‘findable,accessible,

interoper-able,andreusable’[54,55].TheinteroperabilityprinciplesofFAIR aresupportedbytheeffectiveapplicationofontologiesandtheir

mappingstounderpinintegrationbetweenmanyrelevantsources

ofdata[2,56].

Acknowledgements

WewouldliketoexpressgratitudeforfundingoftheOntologies

MappingprojectbypayingmembersofthePistoiaAllianceInc.

WewishtothankThomasLienerandHelenParkinsonforhelpful

discussions,whichhaveinfluencedthisarticle.S.J.benefitedfrom

fundingfromtheEuropeanUnion’sHorizon2020researchand

innovationprogrammeundergrantagreementno.654248

(CORBEL).E.J.R.wassupportedbytheAIDAproject,fundedbythe

UKGovernment’sDefence&SecurityProgrammeinsupportof

theAlanTuringInstitute,theBIGMEDproject(IKT259055)and

theSIRIUSCentreforScalableDataAccess(ResearchCouncilof

Norway,projectno.:237889).

References

1Rathore,A.S.etal.(2017)Roleofknowledgemanagementindevelopmentand lifecyclemanagementofbiopharmaceuticals.Pharm.Res.34,243–256

2Grob,A.etal.(2016)Evolutionofbiomedicalontologiesandmappings:overviewof recentapproaches.Comput.Struct.Biotechnol.J.14,333–340

3Ashburner,M.etal.(2000)Geneontology:toolfortheunificationofbiology.The GeneOntologyConsortium.Nat.Genet.25,25–29

4Blake,J.A.etal.(2015)GeneOntologyConsortium:goingforward.NucleicAcidsRes. 43,D1049–D1056

5TheGeneOntologyConsortium(2017)ExpansionoftheGeneOntology knowledgebaseandresource.NucleicAcidsRes.45,D331–D333

6Smith,B.etal.(2007)TheOBOFoundry:coordinatedevolutionofontologiesto supportbiomedicaldataintegration.Nat.Biotechnol.25,1251–1255

7Malone,J.etal.(2016)Tensimplerulesforselectingabio-ontology.PLoSComput. Biol.12,e1004743

8Malone,J.etal.(2010)ModelingsamplevariableswithanExperimentalFactor Ontology.Bioinformatics6,1112–1118

9Singhai,A.etal.(2016)Pressingneedsofbiomedicaltextmininginbiocurationand beyond:opportunitiesandchallenges.Database2016 baw161

10Przbyia,P.etal.(2016)Textminingresourcesforthelifesciences.Database2016 baw145

11Winnenburg,R.I.andBodenreider,O.A.(2014)Aframeworkforassessingthe consistencyofdrugclassesacrosssources.J.Biomed.Semant.5,30

12Dragisic,Z.etal.(2017)Experiencesfromtheanatomytrackintheontology alignmentevaluationinitiative.J.Biomed.Semant.8,56

13Harland,L.etal.(2011)Empoweringindustrialresearchwithsharedbiomedical vocabularies.DrugDiscov.Today16,940–947

14Perez-Riverol,Y.etal.(2014)Opensourcelibrariesandframeworksformass spectrometry-basedproteomics:adeveloper’sperspective.Biochim.Biophys.Acta1844,63–76

15Kim,S.etal.(2016)Meshable:searchingPubMedabstractsbyutilizingMeSHand MeSH-derivedtopicalterms.Bioinformatics32,3044–3046

16Wang,L.etal.(2014)Standardizingadversedrugeventreportingdata.J.Biomed. Semant.5,36

17Anzai,T.etal.(2015)ResponsestotheStandardforExchangeofNonclinicalData (SEND)innon-UScountries.J.Toxicol.Pathol.28,57–64

18Mennini,F.S.etal.(2017)Economicburdenofdiverticulardisease:anobservational analysisbasedonrealworlddatafromanItalianregion.Dig.LiverDis.49,1003– 1008

19Whetzel,P.L.etal.(2013)NCBOTechnology:poweringsemanticallyaware applications.J.Biomed.Semant.15(Suppl.1),S8

20Hoehndorf,R.etal.(2015)Theroleofontologiesinbiologicalandbiomedical research:afunctionalperspective.BriefBioinform.16,1069–1080

21Croset,S.etal.(2016)Flexibledataintegrationandcurationusingagraph-based approach.Bioinformatics32,918–925

22Gomez-Cabrero,D.etal.(2014)Dataintegrationintheeraofomics:currentand futurechallenges.BMCSyst.Biol.8(Suppl.2),I1

23Lapatas,V.etal.(2015)Dataintegrationinbiologicalresearch:anoverview.J.Biol. Res.22,9

24Zhang,H.etal.(2018)Anontology-guidedsemanticdataintegrationframeworkto supportintegrativedataanalysisofcancersurvival.BMCMed.Inform.Decis.Mak.18 (Suppl.2),4

25Williams,A.J.etal.(2012)OpenPHACTS:semanticinteroperabilityfordrug discovery.DrugDiscov.Today17,1188–1198

26Koscielny,G.etal.(2017)OpenTargets:aplatformfortherapeutictarget identificationandvalidation.NucleicAcidsRes.45,D985–D994

27Yuryev,A.etal.(2009)Ariadne’sChemEffectandPathwayStudioknowledgebase. ExpertOpin.DrugDiscov.4,1307–1318

www.drugdiscoverytoday.com 7

Reviews

INFORMA

(9)

28Shmelkov,E.etal.(2011)Assessingqualityandcompletenessofhuman transcriptionalregulatorypathwaysonagenome-widescale.Biol.Direct6,15 29Sarntivijai,S.etal.(2016)Linkingrareandcommondisease:mappingclinicaldisease–

phenotypestoontologiesintherapeutictargetvalidation.J.Biomed.Semant.7,8 30Kafkas,Ş.etal.(2017)Literatureevidenceinopentargets—atargetvalidation

platform.J.Biomed.Semant.8,20

31Maldonado,R.etal.(2018)Deeplearningmeetsbiomedicalontologies:knowledge embeddingsforepilepsy.AMIAAnnu.Symp.Proc.2017,1233–1242

32ArguelloCasteleiro,M.etal.(2018)Deeplearningmeetsontologies:experimentstoanchor thecardiovasculardiseaseontologyinthebiomedicalliterature.J.Biomed.Semant.9,13 33Garcı´a-Campos,M.A.etal.(2015)Pathwayanalysis:stateoftheart.Front.Physiol.6,383 34Raje,S.I.andBodenreider,O.(2017)Interoperabilityofdiseaseconceptsinclinical andresearchontologies:contrastingcoverageandstructureinthediseaseontology andSNOMEDCT.Stud.HealthTechnol.Inform.245,925–992

35Dhombres,F.andBodenreider,O.(2016)Interoperabilitybetweenphenotypesin researchandhealthcareterminologies—investigatingpartialmappingsbetween HPOandSNOMEDCT.J.Biomed.Semant.7,3

36Petri,V.etal.(2014)DiseasepathwaysattheRatGenomeDatabasePathwayPortal: genesincontext—anetworkapproachtounderstandingthemolecular mechanismsofdisease.Hum.Genomics8,17

37Harrow,I.etal.(2017)Matchingdiseaseandphenotypeontologiesintheontology alignmentevaluationinitiative.J.Biomed.Semant.8,55

38Kamder,M.R.etal.(2017)Asystematicanalysisoftermreuseandtermoverlap acrossbiomedicalontologies.Semant.Web8,853–871

39Mungall,C.J.etal.(2012)Uberon,anintegrativemulti-speciesanatomyontology. GenomeBiol.13,R5

40Rodrı´guez-Garcı´a,M.A´ etal.(2017)Integratingphenotypeontologieswith PhenomeNET.J.Biomed.Semant.8,58

41Mungall,C.J.etal.(2017)TheMonarchInitiative:anintegrativedataandanalyticplatform connectingphenotypestogenotypesacrossspecies.NucleicAcidsRes.45,D712–D772

42Matentzoglu,N.etal.(2017)MIRO:guidelinesforminimuminformationforthe reportingofanontology.J.Biomed.Semant.9,6

43Martı´nez-Romero,M.etal.(2017)NCBOOntologyRecommender2.0:anenhanced approachforbiomedicalontologyrecommendation.J.Biomed.Semant.8,21 44Euzenat,J.andShvaiko,P.(2013)OntologyMatching(2ndedn.),Springer 45Faria,D.etal.(2018)Tacklingthechallengesofmatchingbiomedicalontologies.J.

Biomed.Semant.9,4

46Zhao,M.etal.(2018)Matchingbiomedicalontologiesbasedonformalconcept analysis.J.Biomed.Semant.9,11

47Jimenez-Ruiz,E.andCuencaGrau,B.(2011)LogMap:logic-basedandscalable ontologymatching.Int.Semant.WebConf.2011,273–288

48Dragistic,Z.etal.(2016)Uservalidationinontologyalignment.Int.Semant.Web Conf.2016,200–217

49UniProtConsortium(2015)UniProt:ahubforproteininformation.NucleicAcids Res.43,D204–D212

50vanIersel,M.P.etal.(2010)TheBridgeDbframework:standardizedaccesstogene, proteinandmetaboliteidentifiermappingservices.BMCBioinf.11,5

51Mortensen,J.M.etal.(2016)Isthecrowdbetterasanassistantorareplacementin ontologyengineering?AnexplorationthroughthelensoftheGeneOntology.J. Biomed.Inf.60,199–209

52Perez-Riverol,Y.etal.(2017)OLSClientandOLSDialog—OpenSourceToolsto AnnotatePublicOmicsDatasets.Proteomics17(19),

53Arichi,M.etal.(2017)ResultsoftheOntologyAlignmentEvaluationInitiative 2017.CEURWorkshopProc.2032,61–113

54Wilkinson,M.D.etal.(2016)TheFAIRGuidingPrinciplesforscientificdata managementandstewardship.Sci.Data3,160018

55Wise,J.etal.(2019)ImplementationandrelevanceofFAIRdataprinciplesin biopharmaceuticalR&D.DrugDiscov.Today24,933–938

56Dhombres,F.andCharlet,J.(2017)Knowledgerepresentationandmanagement, it’stimetointegrate!Yearb.Med.Inform.26,148–151

REVIEWS DrugDiscoveryTodayVolume00,Number00June2019

DRUDIS-2475;NoofPages8

Pleasecitethisarticleinpressas:Harrow,I.etal.Ontologymappingforsemanticallyenabledapplications,DrugDiscovToday(2019),https://doi.org/10.1016/j.drudis.2019.05.020

8 www.drugdiscoverytoday.com

Reviews

INFORMA

Figure

TABLE 1
TABLE 2

References

Related documents

0 5 10 15 20 25 30 Malaysia India Commercial India micro Hong Kong Brazil Mexico Saudi Lebanon Israel USA Germany UK Commercial UK HES data UK Seniors Asia Latin America Middle East

Using wpf for basic print document yourself share count for the following method requires minimal configuration and to support is your email address may find out.. No scale

Analysing the effects of tax burden on the Romanian general consolidated budget (total direct and indirect taxes without social contributions), from 2000-2011, we

Oates criticizes society’s treatment of women writers including Chopin and women like Edna in Th e Awakening in the 19th century and the beginning of the 20th century

The present project, part of an overarching program of research examining Post LPN to BN transitions, explored the following questions: What barriers do Licensed Practical

A total of 240 subjects (120 male and 120 females) were selected for the study having orthodontic problems (age 15-25 years) and were divided into 6 groups (one

The IFC PSs do not include specific requirements regarding managing the potential for in-migration except that Yara Dallol BV will avoid or minimise the transmission of

The second part of the course brings specific business, technical, logistical issues pertinent to each of the main technologies in the renewable energy sector today: solar,