5.3 Mapping between source and core ontologies
5.3.4 Rules for MFO
There is a clear delineation in most systems biology formalisms between a protein as an abstract concept and a protein as a specific interactor in a reaction. Within SBML, this is defined using the speciesType or species elements for the concept of a protein, and speciesReference for the participant in the reaction. Within BioPAX Level 2, the biological entity / participant pairing is modelled with physicalEntity and physicalEntityParticipant. Within BioGRID, interactor and participant elements are used.
A similar dichotomy exists within the telomere ontology, and when converting the information re- trieved in Use Case 1 back to SBML, these considerations need to be taken into account. One solution is to link the telomere ontology’s Protein class (and its parent class PhysicalEntity) to the
SBMLspeciesType element. However, mostSBMLmodels do not use speciesType, and future versions ofSBMLwill have a different way of modelling such concepts [221]. Therefore, to align with future versions ofSBMLas well as to make the mapping simple, the instances of the telomere ontology PhysicalEntity class are instead linked to the appropriate attributes of anSBMLspecies element.
Biological annotation
MIRIAMannotation is added toMFO by mapping a variety of database cross-references from the telomere ontology. The rules used to create these annotations introduce two additionalSWRLbuilt- ins, swrlb:stringEqualIgnoreCase and swrlb:stringConcat. The swrlb:stringEqualIgnoreCase built-
in is satisfied if and only if the two arguments match in a case-insensitive manner, while the first argument in swrlb:stringConcat is filled with a concatenation of all further arguments [123].
In the first two rules for mapping information from the telomere ontology toMFO(TUO_MFO_00001 and TUO_MFO_00001_synonym), all Rad9 instances are enhanced with two biological annotations: firstly, aMIRIAMURI for the UniProtKB entry associated with the Rad9 protein and secondly, the addition of a link to theSBO“macromolecule” term (SBO_0000245)23. This association is double- checked by ensuring that the “rad9” species name is already present. This is a rule which returns information—required for the use cases—on the Rad9 protein only. While this is a highly specific rule, it can easily be generalised to run over Protein, the parent class of Rad9. In such a general case, shown later for TUO_MFO_00006 and TUO_MFO_00006_synonym, all of the references to specificMFOspeciesinstances have been removed.
TUO_MFO_00001 : tuo:Rad9(?protein)∧ tuo:hasDatabaseReference(?protein,?crossref) ∧ tuo:databaseName(?crossref, ?dbname)∧ tuo:accession(?crossref, ?accession)∧ swrlb:stringEqualIgnoreCase(?dbname, “uniprot”)∧
swrlb:stringConcat(?miriamIs, “urn:miriam:uniprot:”, ?accession) ∧ sbo:SBO_0000245(?term) ∧ mfo:Species(?mfoSpecies) ∧ mfo:name(?mfoSpecies, ?nameText)∧ swrlb:stringEqualIgnoreCase(?nameText, "rad9")⇒ mfo:MiriamAnnotation(?crossref) ∧ mfo:miriamAnnotation(?mfoSpecies, ?crossref) ∧ mfo:bqbIs(?crossref,?miriamIs) ∧ mfo:sboTerm(?mfoSpecies,?term)
AdditionalMIRIAMannotations are added with TUO_MFO_00002, which adds any Saccharomyces Genome Database (SGD) cross-references toMFOfor Rad9 instances. As with TUO_MFO_00001 and TUO_MFO_00001_synonym, this rule could easily be made more generic to allow all MFO
species to be annotated with their relevant cross-references. As MIRIAM URIs require a prefix 23For brevity, the second rule (TUO_MFO_00001_synonym) is not shown. It is identical to the first rule other than the
swrlb:stringEqualIgnoreCasecompares against “uniprotkb” rather than “uniprot”. Different source datasets use different names for this database.
identifying the referenced database (e.g. “urn:miriam:uniprot:”), each database requires its own rule to correctly build the URI. Rules TUO_MFO_00003-4 (not shown) differ from TUO_MFO_00001-2 only in the database used to create the newMIRIAMannotation; TUO_MFO_00003 for Intact and TUO_MFO_00004 for Pathway Commons.
TUO_MFO_00002 : tuo:Rad9(?protein)∧ tuo:hasDatabaseReference(?protein,?crossref) ∧ tuo:databaseName(?crossref, ?dbname)∧ tuo:accession(?crossref, ?accession)∧ swrlb:stringEqualIgnoreCase(?dbname, “sgd”)∧
swrlb:stringConcat(?miriamIs, “urn:miriam:sgd:”, ?accession) ∧ mfo:Species(?species) ∧ mfo:name(?species, ?nameText)∧ swrlb:stringEqualIgnoreCase(?nameText, “rad9”)⇒ mfo:MiriamAnnotation(?crossref) ∧ mfo:miriamAnnotation(?species, ?crossref) ∧ mfo:bqbIs(?crossref,?miriamIs)
In TUO_MFO_00006 and TUO_MFO_00006_synonym, for every Protein in the telomere ontology, a UniProtKBMIRIAMannotation is created.24. In addition to the UniProtKBMIRIAMannotations withinMFOfor each telomere ontology Protein, TUO_MFO_00007-9 addMIRIAMannotations for
SGD, Intact and Pathway Commons. Due to their similarity with TUO_MFO_00006, for brevity they are not shown.
TUO_MFO_00006 : tuo:Protein(?mfoSpecies) ∧ mfo:Species(?mfoSpecies)∧ tuo:hasDatabaseReference(?mfoSpecies, ?crossref)∧ tuo:databaseName(?crossref, ?dbname)∧ tuo:accession(?crossref, ?accession)∧ swrlb:stringEqualIgnoreCase(?dbname, “uniprot”)∧
swrlb:stringConcat(?miriamIs, “urn:miriam:uniprot:”, ?accession)∧
24TUO_MFO_00006_synonym is not shown in this section as it is identical to TUO_MFO_00006 except for the string
sbo:SBO_0000245(?term)⇒ mfo:MiriamAnnotation(?crossref) ∧
mfo:miriamAnnotation(?mfoSpecies, ?crossref) ∧ mfo:bqbIs(?crossref,?miriamIs) ∧
mfo:sboTerm(?mfoSpecies,?term)
TUO_MFO_00011-12 provide name properties for the Species instances within MFO. As SWRL
rules will not overwrite existing name values, this rule will only modify the mapped name property if it is empty. This ensures that existing values are not overwritten, and further ensures—if the rules are run in sequence—that names from the recommendedName property are not overridden by those from the synonym property. If pre-existing names from theMFOmodel should be re-written, they can simply be deleted prior to or during conversion fromSBMLtoMFO.
TUO_MFO_00011 : tuo:Protein(?spec) ∧ mfo:Species(?spec)∧ tuo:recommendedName(?spec,?recName) ⇒ mfo:name(?spec, ?recName) TUO_MFO_00012 : tuo:Protein(?spec) ∧ mfo:Species(?spec)∧ tuo:synonym(?spec,?synonym) ⇒ mfo:name(?spec, ?synonym)
Rad9-Rad17 Interaction (TUO_MFO_00005)
Specific interactions relating to Rad9 were retrieved from the telomere ontology and mapped toMFO. Rad9-Rad17 is described in this section, and Rad9-Mec1 is described in the next. TUO_MFO_00005 maps the Rad9-Rad17 interaction, and is tailored to populating a singleSBMLmodel. Otherwise, if multiple models are stored within MFO, incorrect listOfReactions and listOfSpecies could be matched. Where necessary, this limitation can easily be resolved by extending the rule to specify the ListOf__ instances associated with a particular systems biology model.
As TUO_MFO_00005 is a long and complex rule, it is presented in sections rather than all at once. Further, in contrast to the native ordering in theOWLfiles themselves, portions of the rule have been
reordered slightly for increased readability. Firstly the reaction to be mapped, the reactants involved, and the proteins associated with those reactants within the telomere ontology are identified:
TUO_MFO_00005 :
tuo:ProteinComplexFormation(?reaction)∧ tuo:Rad9Rad17Interaction(?reaction)∧ tuo:hasReactant(?reaction, ?reactant)∧ tuo:playedBy(?reactant, ?protein)∧
Next, the already-extant ListOfReactions and ListOfSpecies instances fromMFOare stored inSWRL
variables for later use:
mfo:ListOfReactions(?listOfReaction)∧ mfo:ListOfSpecies(?listOfSpecies)∧
SWRL built-ins are then called to create seven new variables. Firstly, ?nameVariable is filled with the value “Rad9Rad17Complex” using swrlb:stringConcat; this is a string literal only, and is not an instance itself. ?nameVariable is used later in the rule to create a value for the name data prop- erty within MFO. Six calls to swrlx:makeOWLThing are then run. These calls create the appro- priate number of MFO instances for storing the information related to the reactions and proteins matched within this rule. For instance, if there are X instances of the Protein class matched within TUO_MFO_00005, then X new instances are created and bound to the variable ?speciesId. As the Rad9-Rad17 interaction is a protein complex formation, one product—the complex itself—is created for each reaction. The newly-created instance variable ?product describes the new product of the re- action. The other new variables create additional parts of an SBMLmodel which are not directly modelled within the telomere ontology.
swrlb:stringConcat(?nameVariable, “Rad9Rad17Complex”)∧ swrlx:makeOWLThing(?reactantList, ?reaction)∧ swrlx:makeOWLThing(?reactionSId, ?reaction)∧ swrlx:makeOWLThing(?speciesSId, ?protein)∧ swrlx:makeOWLThing(?product, ?reaction)∧ swrlx:makeOWLThing(?prodSpecSid, ?reaction)∧ swrlx:makeOWLThing(?prodRef, ?reaction)∧ swrlx:makeOWLThing(?productList, ?reaction)
These SWRL built-ins mark the end of the antecedent for TUO_MFO_00005, which has created or identified all variables needed to map the reaction data to MFO. The consequent begins with assignment of all of the identifiers to SId instances inMFO.
⇒
mfo:SId(?reactionSId)∧ mfo:SId(?speciesSId)∧ mfo:SId(?prodSpecSid)∧
Then, each of the links to those SId instances is built using the id object property. Additionally, the nameproperty is filled for the ?product instance:
mfo:id(?reaction, ?reactionSId)∧ mfo:id(?protein, ?speciesSId)∧ mfo:id(?product, ?prodSpecSid)∧ mfo:name(?product, ?nameVariable)∧
Next, each of the instances named in the first argument of the id property are mapped to their appro- priateMFOclasses. Additionally, the Reaction instance gets a mapped list of reactants and products and both the Species and Reaction instances are linked to their appropriate ListOf__ instances.
mfo:Reaction(?reaction)∧ mfo:Species(?protein)∧ mfo:Species(?product)∧ mfo:listOfReactants(?reaction, ?reactantList)∧ mfo:listOfProducts(?reaction, ?productList)∧ mfo:reaction(?listOfReaction, ?reaction)∧ mfo:species(?listOfSpecies, ?protein)∧ mfo:species(?listOfSpecies, ?product)∧
The ListOfSpeciesReferences class in MFO is then populated with the appropriate new instances from the swrlx:makeOWLThing built-in, and each individual SpeciesReference is mapped and added to its list:
mfo:ListOfSpeciesReferences(?reactantList)∧ mfo:ListOfSpeciesReferences(?productList)∧
mfo:ProductSpeciesReference(?prodRef)∧ mfo:ReactantSpeciesReference(?reactant)∧ mfo:speciesReference(?reactantList, ?reactant)∧ mfo:speciesReference(?productList, ?prodRef)∧
Rad9-Mec1 Interaction (TUO_MFO_00010)
There are two interactions between Rad9 and Mec1 present within the telomere ontology. In this rule, one of these interactions has been arbitrarily chosen to send toMFO. This is a choice the modeller can make in future, after looking at the information contained in each interaction choice. Unlike TUO_MFO_00005, there is no extra information from the data sources which allows the interaction to be defined as being a protein complex formation. As such, a likely product for the reaction cannot be created. The lack of knowledge about the interaction means that TUO_MFO_00010 is a much simpler rule than TUO_MFO_00005. Except for those sections of the rule dealing with products and lists of products, TUO_MFO_00010 is the same as TUO_MFO_00005. As a result the reaction has reactants only, and no products.
TUO_MFO_00010 : tuo:Rad9Mec1Interaction(’psimif:interaction_143’)∧ tuo:hasReactant(’psimif:interaction_143’, ?react)∧ tuo:playedBy(?react, ?protein)∧ mfo:ListOfReactions(?listOfReaction)∧ swrlx:makeOWLThing(?reactantList, ’psimif:interaction_143’)∧ swrlx:makeOWLThing(?reactSId, ’psimif:interaction_143’)∧ swrlx:makeOWLThing(?specSId, ?protein)⇒ mfo:ListOfSpeciesReferences(?reactantList)∧ mfo:SId(?reactSId)∧ mfo:SId(?specSId)∧ mfo:ReactantSpeciesReference(?react)∧ mfo:speciesReference(?reactantList, ?react)∧ mfo:Reaction(’psimif:interaction_143’)∧ mfo:id(’psimif:interaction_143’, ?reactSId)∧ mfo:reaction(?listOfReaction, ’psimif:interaction_143’)∧ mfo:Species(?protein)∧
mfo:id(?protein, ?specSId)∧
mfo:listOfReactants(’psimif:interaction_143’, ?reactantList)