• No results found

D.3 Impact Corpora

D.3.2 Manually Annotated Documents

The list of documents (PubMed IDs) manually annotated for impact evalua-

tion is presented in Table 43.

Table 43: List of documents (PubMed IDs) manually annotated for impact evaluation 10074357 10544015 10955993 11265460 12205101 12604240 12650918 12702265 12746550 12890481 12902331 14996818 15026177 15103634 15178335 15206895 15276835 15311930 15625320 15882618 16000301 16129418 16139296 16242114 17581819 17952367 17969139 17974571 19398559 19653994 19674460 8706817 9705344 9784233 16889958 19290871 8706817 18571493 17343568 17420465

The list of abstracts (PubMed IDs) manually annotated for impact evalu-

Table 44: List of abstracts (PubMed IDs) manually annotated for impact evaluation Subtilisin 12089022 12298003 15047911 15187224 15544324 Xylanase 10220321 10235626 10752608 10860737 11377763 11601976 11917150 1359880 15129722 15260499 15278768 7764794 8019418 8376336 8855954 9201919 9681873 9731776 9930661 Haloalkane Dehalogenase 10099367 11932489 14525993 9862209 8021255 10100638 11937643 3579270 9790663 8110757 10231528 12089046 7705355 9579656 8855957 10585505 12450392 7737973 9236003 10963662 12676719 7828730 9051734 Dioxygenase 12206778 10438749 11312272 15733056 9251195 12081948 10397810 11514531 15342624 9190809 12057964

Appendix E

GATE XML Format for Impact

Annotations

<?xml version=’1.0’encoding=’UTF-8’?> <GateDocument>

<!−− The document’s features-->

< GateDocumentFeatures > < Feature >

<Name className =" java.lang. String "> MimeType </ Name> < Value className =" java.lang. String "> text/ plain </ Value > </ Feature >

< Feature >

<Name className =" java.lang. String "> docNewLineType </ Name> < Value className =" java.lang. String ">LF</ Value >

</ Feature >

</ GateDocumentFeatures >

<!-- The document content area with serialized nodes --> < TextWithNodes >

<Node id ="671" /> Abstract

L- Xylulose reductase (XR) catalyzes the oxidoreduction between xylitol and L- xylulose in the uronate cycle . The enzyme has been shown to be identical to diacetyl reductase , an enzyme that reduces a- dicarbonyl compounds . XR belongs to the short - chain dehydrogenase / reductase family , and shows high sequence identity with mouse lung carbonyl reductase (MLCR ), an enzyme that reduces 3- ketosteroids but not sugars . In this study , we have confirmed the roles of Ser136 , Tyr149 and Lys153 of XR as the catalytic triad by <Node id ="1196" /> drastic loss of activity resulting from the mutagenesis of S136A , Y149F and K153M in rat XR.< Node id ="1288" />

We have also constructed several mutant XRs, in which putative substrate binding residues from rat XR were substituted with those found in the corresponding positions of MLCR, in order to identify amino acids responsible for the different substrate recognition of the enzymes . While single mutants at positions 137, 143, 146, 190 and 191 caused little or moderate change in substrate specificity ,

<Node id ="1686" />a double mutant ( N190V and W191S ) and<Node id ="1723" /> triple mutant ( Q137M , L143F and H146L ) resulted in almost loss of activity for only the sugars .

</ TextWithNodes >

<!-- Named annotation set --> < AnnotationSet Name =" Manual ">

< Annotation Id ="85" Type =" Impact " StartNode ="1196" EndNode ="1288"> < Feature >

<Name className =" java.lang. String "> mutation </ Name> < Value className =" java.lang. String "> S136A </ Value > </ Feature >

< Feature >

<Name className =" java.lang. String "> organism </ Name>

< Value className =" java.lang. String "> Rattus norvegicus </ Value > </ Feature >

< Feature >

Appendix F

JAPE Rules for Extraction

Modules

Here, we show the developed JAPE rules.

F.1

JAPE Rules for the Organism Extraction

Main File MultiPhase : Organism Phases : fullname taxUnit abbrGen s t r a i n taxUnitNames acronym organism clean fullname.jape Phase : FullnameOrganism Input : Lookup Options : c o n t r o l = appelt Rule : FullnameOrganism (

{Lookup . inst =˜ ”organism”}

) : org−−> {

// c r e a t e an annotation s e t c o n s i s t i n g o f a l l the f u l l name annotations o f organisms try{

gate . AnnotationSet organismSet = ( gate . AnnotationSet ) bindings . get ( ” org ” ) ; gate . Annotation organism = organismSet . i t e r a t o r ( ) . next ( ) ;

// c r e a t e the class f e a t u r e try{

organism . getFeatures ( ) . put ( ” class ” , ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception : ” + ex ) ;

}

System . out . p r i n t l n ( ” Exception : ” + ex ) ;

} }

taxUnit.jape Phase : TAXUNIT

Input : Lookup Token s t o p L i s t

Options : c o n t r o l = appelt debug = true

Macro : GENUS (

{Lookup . majorType == ”genus”}

)

Macro : SPECIES (

{Lookup . majorType == ” species ”}

) Macro : UNDEFSPECIES ( ( {Token . string == ”sp”} | {Token . string == ”spp”} ) {Token . string == ” . ”} ) Macro : STRAIN (

{Lookup . majorType == ” strain ” }

)

Rule : genRule (

GENUS ) : gen−−>

: gen . TaxonomicUnit = {gazetteered = ” true ” , class=”Genus”}

Rule : specRule (

SPECIES ) : spec−−>

: spec . TaxonomicUnit ={gazetteered = ” true ” , class=” Species ”} Rule : undefSpecRule

(

UNDEFSPECIES ) : undefspec−−>

: undefspec . TaxonomicUnit ={gazetteered = ” true ” , class=” UndefinedSpecies ”}

Rule : strRule (

STRAIN ) : s t r

({ ! stopList})−−>

: s t r . TaxonomicUnit = {gazetteered = ” true ” , class=” Strain ”}

abbrGen.jape Phase : ABBRGEN

Input : TaxonomicUnit Token Lookup Options : c o n t r o l = appelt Macro : SPECIES

(

{TaxonomicUnit . class == ” Species ”}

)

( (

{Token . kind == ”word” , Token . length == ”1”} {Token . string == ” . ”}

) : abbr ( SPECIES ) ) : abbrOrg−−> {

gate . AnnotationSet abbr = ( gate . AnnotationSet ) bindings . get ( ” abbr ” ) ;

// Record t h a t t h i s i s an abbreviation .

gate . FeatureMap genFeatures = Factory . newFeatureMap ( ) ;

try{

genFeatures . put ( ” abbr ” , ” true ” ) ;

genFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ”Genus” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in abbrGen . jape : ” + ex ) ;

}

outputAS . add ( abbr . f i r s t N o d e ( ) , abbr . lastNode ( ) , ” TaxonomicUnit ” , genFeatures ) ;

}

strain.jape Phase : STRAIN

Input : TaxonomicUnit Lookup Token possibleStrain learnedStrain s t o p L i s t Options : c o n t r o l = appelt debug = true

Macro : GENUS (

{TaxonomicUnit . class == ”Genus”}

)

Macro : SPECIES (

{TaxonomicUnit . class == ” Species ”} | {TaxonomicUnit . class == ” UndefinedSpecies ”}

)

Macro : STRAIN (

{learnedStrain} |

{TaxonomicUnit . class == ” Strain ”}

)

Macro : ORGANISM (

{Lookup . class == ”Organism”}

) Rule : StrainRule1 ( ( (ORGANISM)| (GENUS) | ( SPECIES ) ) ( STRAIN ) : posStrain ) ({ ! stopList }) −−> {

gate . AnnotationSet posStrain = ( gate . AnnotationSet ) bindings . get ( ” posStrain ” ) ; gate . FeatureMap strFeatures = Factory . newFeatureMap ( ) ;

try{

strFeatures . put ( ” gazetteered ” , ” f a l s e ” ) ; strFeatures . put ( ” rule ” , ” one ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in s t r a i n . jape : ” + ex ) ;

}

outputAS . add ( posStrain . f i r s t N o d e ( ) , posStrain . lastNode ( ) , ” TaxonomicUnit ” , strFeatures ) ;

} Rule : StrainKeywordL ( ( (ORGANISM)| (GENUS) | ( SPECIES ) ) ? (

{Token . string == ” (? i ) strain ” }|

({Token . string == ”sp”} | {Token . string == ” str ” }){Token . string == ” . ”}

) ) ( ( STRAIN ) | {learnedStrain} | {possibleStrain} ) : s t r a i n −−> {

gate . AnnotationSet s t r a i n = ( gate . AnnotationSet ) bindings . get ( ” s t r a i n ” ) ; gate . FeatureMap strFeatures = Factory . newFeatureMap ( ) ;

try{

strFeatures . put ( ” gazetteered ” , ” f a l s e ” ) ; strFeatures . put ( ” rule ” , ” two ” ) ;

strFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Strain ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in s t r a i n . jape : ” + ex ) ;

}

outputAS . add ( s t r a i n . f i r s t N o d e ( ) , s t r a i n . lastNode ( ) , ” TaxonomicUnit ” , strFeatures ) ;

} Rule : StrainKeywordR ( (ORGANISM)| (GENUS)| ( SPECIES ) ) ( {learnedStrain} | {possibleStrain} ) : s t r a i n (

{Token . string ==˜ ” (? i ) strain ”}

)

−−> {

gate . AnnotationSet s t r a i n = ( gate . AnnotationSet ) bindings . get ( ” s t r a i n ” ) ; gate . FeatureMap strFeatures = Factory . newFeatureMap ( ) ;

try{

strFeatures . put ( ” gazetteered ” , ” f a l s e ” ) ; strFeatures . put ( ” rule ” , ” three ” ) ;

strFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Strain ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in s t r a i n . jape : ” + ex ) ;

}

outputAS . add ( s t r a i n . f i r s t N o d e ( ) , s t r a i n . lastNode ( ) , ” TaxonomicUnit ” , strFeatures ) ;

taxUnitNames.jape Phase : TAXUNITNAMES Input : TaxonomicUnit

Options : c o n t r o l = appelt debug = true

Macro : TAXUNIT ( {TaxonomicUnit} ) Rule : foo ( TAXUNIT ) : unit−−> { try{

gate . AnnotationSet unit = ( gate . AnnotationSet ) bindings . get ( ” unit ” ) ; gate . Annotation unitAnn = ( gate . Annotation ) unit . i t e r a t o r ( ) . next ( ) ; gate . FeatureMap unitFeatures = unitAnn . getFeatures ( ) ;

unitAnn . getFeatures ( ) . put ( ”name” ,

doc . getContent ( ) . getContent ( unit . f i r s t N o d e ( ) . g e t O f f s e t ( ) , unit . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ) ;

} catch ( Exception e ) {System . out . println ( ” Exception in TaxonomicUnit”+e ) ; } }

acronym.jape Phase : ACRONYM

Input : Lookup Token SpaceToken Options : c o n t r o l = appelt Rule : Acronym

(

({SpaceToken}|{Token . kind==” punctuation ” })({Lookup . majorType == ”acronym” } ) : acrOrg

({SpaceToken}|{Token . kind==” punctuation ” })

)−−> {

// G e t t i n g a l l the acronym annotations try{

gate . AnnotationSet acronymSet = ( gate . AnnotationSet ) bindings . get ( ” acrOrg ” ) ; gate . Annotation acronymAnno = acronymSet . i t e r a t o r ( ) . next ( ) ;

// c r e a t e the class f e a t u r e try{

acronymAnno . getFeatures ( ) . put ( ” class ” , ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception : ” + ex ) ;

}

gate . FeatureMap acronymFeatures = Factory . newFeatureMap ( ) ; acronymFeatures . put ( ” class ” , ” Organism ” ) ;

outputAS . add ( acronymSet . f i r s t N o d e ( ) , acronymSet . lastNode ( ) , ”acronym” , acronymFeatures ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception : ” + ex ) ;

} }

organism.jape Phase : ORGANISM

Input : TaxonomicUnit Genus Species Strain Token learnedStrain Lookup Options : c o n t r o l = Appelt debug = true

Macro : GENUS (

{TaxonomicUnit . class == ”Genus”}

Macro : SPECIES (

{TaxonomicUnit . class == ” Species ”} | {TaxonomicUnit . class == ” UndefinedSpecies ”}

)

Macro : STRAIN (

{TaxonomicUnit . class == ” Strain ”}

)

Macro : STRAINKEYWORD (

{Token . string ==˜ ” (? i ) strain ”} |

{Token . string == ”sp”}{Token . string == ” . ”} | {Token . string == ” str ”}{Token . string == ” . ”}

)

Macro : SUBSPECIES (

({Token . string == ”subsp”}{ Token . string == ” . ” }) | {Token . string == ” subspecies ”}

) /* −−−−−RULE 1−−−−−−−*/ Rule : OrganismRule1 P r i o r i t y : 70 ( (GENUS) : gen ( SPECIES ) : spec ( ( SUBSPECIES ) ) ( SPECIES ) : sub ) : org1−−> {

gate . AnnotationSet gen = ( gate . AnnotationSet ) bindings . get ( ” gen ” ) ; gate . AnnotationSet spec = ( gate . AnnotationSet ) bindings . get ( ” spec ” ) ; gate . AnnotationSet sub = ( gate . AnnotationSet ) bindings . get ( ”sub” ) ; gate . AnnotationSet org1 = ( gate . AnnotationSet ) bindings . get ( ” org1 ” ) ;

// For the whole organism

gate . FeatureMap orgFeatures = Factory . newFeatureMap ( ) ;

// Check i f the genus i s abbreviated .

I t e r a t o r genIt = gen . i t e r a t o r ( ) ;

String abbrFeatValue = ( String ) ( ( gate . Annotation ) genIt . next ( ) ) . getFeatures ( ) . get ( ” abbr ” ) ; Boolean genIsAbbr = new Boolean ( abbrFeatValue ) ;

i f ( genIsAbbr . booleanValue ( ) == true ) {

orgFeatures . put ( ” abbrGenus ” , ” true ” ) ;

}

// Give the ” class ” f e a t u r e f o r ontology−aware JAPE try{

orgFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// Get the values i n t o e a s i l y understandable v a r i a b l e s try{

String myGenus =

doc . getContent ( ) . getContent ( gen . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

gen . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; String mySpecies =

doc . getContent ( ) . getContent ( spec . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

spec . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ;

// Subspecies− check i f present

mySubspecies = doc . getContent ( ) . getContent ( sub . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

sub . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; orgFeatures . put ( ” Subspecies ” , mySubspecies ) ;

String name = myGenus + ” ” + mySpecies+ ” subsp . ” + mySubspecies ; orgFeatures . put ( ”docName” , name ) ;

orgFeatures . put ( ” Rule ” , ”1” ) ; orgFeatures . put ( ”Genus” ,myGenus ) ; orgFeatures . put ( ” Species ” , mySpecies ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// The organism annotation i s done , add i t and s t o r e i t s ID f o r the r e l a t i o n handling .

I n t e g e r orgNpId = outputAS . add ( org1 . f i r s t N o d e ( ) , org1 . lastNode ( ) , ” Organism ” , orgFeatures ) ;

/* −−Relations−− */

// See SRE/NE/ r e l a t i o n C l a s s L a y e r . jape− annotate the r e l a t i o n onto // one o f the r e l a t e d annotations .

gate . FeatureMap genRelFeatures = Factory . newFeatureMap ( ) ; gate . FeatureMap specRelFeatures = Factory . newFeatureMap ( ) ; gate . FeatureMap subRelFeatures = Factory . newFeatureMap ( ) ; gate . Annotation genAnn = ( gate . Annotation ) gen . i t e r a t o r ( ) . next ( ) ; I n t e g e r genNpId = genAnn . g e t I d ( ) ;

gate . Annotation specAnn = ( gate . Annotation ) spec . i t e r a t o r ( ) . next ( ) ; I n t e g e r specNpId = specAnn . g e t I d ( ) ;

i f( sub ! = null ) {

gate . Annotation subAnn = ( gate . Annotation ) sub . i t e r a t o r ( ) . next ( ) ; I n t e g e r subNpId = subAnn . g e t I d ( ) ;

subRelFeatures . put ( ” from ” , orgNpId ) ; subRelFeatures . put ( ” to ” , subNpId ) ;

}

genRelFeatures . put ( ” from ” , orgNpId ) ; genRelFeatures . put ( ” to ” , genNpId ) ; specRelFeatures . put ( ” from ” , orgNpId ) ; specRelFeatures . put ( ” to ” , specNpId ) ;

outputAS . add ( gen . f i r s t N o d e ( ) , gen . lastNode ( ) , ” genus ” , genRelFeatures ) ; outputAS . add ( spec . f i r s t N o d e ( ) , spec . lastNode ( ) , ” species ” , specRelFeatures ) ;

i f( sub ! = null ) {

outputAS . add ( sub . f i r s t N o d e ( ) , sub . lastNode ( ) , ” subspecies ” , subRelFeatures ) ;

} /* −−RelationsEnd−− */ } /* −−−−−END RULE 1−−−−−*/ /* −−−−−RULE 2−−−−−−−*/ Rule : OrganismRule2 P r i o r i t y : 50 ( (GENUS) : gen

( {Token . string == ” ( ” }(GENUS){Token . string == ” ) ”} )? ( SPECIES ) : spec ( ( STRAINKEYWORD ) ? ( STRAIN ) : s t r ( STRAINKEYWORD ) ? ) ? ) : org2−−> {

gate . AnnotationSet gen = ( gate . AnnotationSet ) bindings . get ( ” gen ” ) ; gate . AnnotationSet spec = ( gate . AnnotationSet ) bindings . get ( ” spec ” ) ; gate . AnnotationSet s t r = ( gate . AnnotationSet ) bindings . get ( ” s t r ” ) ; gate . AnnotationSet org2 = ( gate . AnnotationSet ) bindings . get ( ” org2 ” ) ;

// For the whole organism

// Check i f the genus i s abbreviated .

I t e r a t o r genIt = gen . i t e r a t o r ( ) ;

String abbrFeatValue = ( String ) ( ( gate . Annotation ) genIt . next ( ) ) . getFeatures ( ) . get ( ” abbr ” ) ; Boolean genIsAbbr = new Boolean ( abbrFeatValue ) ;

i f ( genIsAbbr . booleanValue ( ) == true ) {

orgFeatures . put ( ” abbrGenus ” , ” true ” ) ;

}

// Give the ” class ” f e a t u r e f o r ontology−aware JAPE try{

orgFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// Get the values i n t o e a s i l y understandable v a r i a b l e s try{

String myGenus =

doc . getContent ( ) . getContent ( gen . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

gen . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; String mySpecies =

doc . getContent ( ) . getContent ( spec . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

spec . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ;

// S t r a i n− check i f present because i t ’ s optional

String myStrain = null ;

i f( s t r ! = null ) {

myStrain = doc . getContent ( ) . getContent ( s t r . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

s t r . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; orgFeatures . put ( ” Strain ” , myStrain ) ;

}

String name = myGenus + ” ” + mySpecies ;

i f( s t r ! = null ) {

name = name + ” ” + myStrain ;

}

orgFeatures . put ( ”docName” , name ) ; orgFeatures . put ( ” Rule ” , ”2” ) ; orgFeatures . put ( ”Genus” ,myGenus ) ; orgFeatures . put ( ” Species ” , mySpecies ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// The organism annotation i s done , add i t and s t o r e i t s ID f o r the r e l a t i o n handling .

I n t e g e r orgNpId = outputAS . add ( org2 . f i r s t N o d e ( ) , org2 . lastNode ( ) , ” Organism ” , orgFeatures ) ;

/* −−Relations−− */

// See SRE/NE/ r e l a t i o n C l a s s L a y e r . jape− annotate the r e l a t i o n onto // one o f the r e l a t e d annotations .

gate . FeatureMap genRelFeatures = Factory . newFeatureMap ( ) ; gate . FeatureMap specRelFeatures = Factory . newFeatureMap ( ) ; gate . FeatureMap strRelFeatures = Factory . newFeatureMap ( ) ; gate . Annotation genAnn = ( gate . Annotation ) gen . i t e r a t o r ( ) . next ( ) ; I n t e g e r genNpId = genAnn . g e t I d ( ) ;

gate . Annotation specAnn = ( gate . Annotation ) spec . i t e r a t o r ( ) . next ( ) ; I n t e g e r specNpId = specAnn . g e t I d ( ) ;

i f( s t r ! = null ) {

gate . Annotation strAnn = ( gate . Annotation ) s t r . i t e r a t o r ( ) . next ( ) ; I n t e g e r strNpId = strAnn . g e t I d ( ) ;

strRelFeatures . put ( ” from ” , orgNpId ) ; strRelFeatures . put ( ” to ” , strNpId ) ;

genRelFeatures . put ( ” from ” , orgNpId ) ; genRelFeatures . put ( ” to ” , genNpId ) ; specRelFeatures . put ( ” from ” , orgNpId ) ; specRelFeatures . put ( ” to ” , specNpId ) ;

outputAS . add ( gen . f i r s t N o d e ( ) , gen . lastNode ( ) , ” genus ” , genRelFeatures ) ; outputAS . add ( spec . f i r s t N o d e ( ) , spec . lastNode ( ) , ” species ” , specRelFeatures ) ;

i f( s t r ! = null ) {

outputAS . add ( s t r . f i r s t N o d e ( ) , s t r . lastNode ( ) , ” s t r a i n ” , strRelFeatures ) ;

} /* −−RelationsEnd−− */ } /* −−−−−END RULE 2−−−−−*/ /* −−−−−RULE 3−−−−−−−*/ Rule : OrganismRule3 P r i o r i t y : 20 ( ( SPECIES ) : spec (STRAINKEYWORD) ( STRAIN ) : s t r ) : org3−−> {

gate . AnnotationSet spec = ( gate . AnnotationSet ) bindings . get ( ” spec ” ) ; gate . AnnotationSet s t r = ( gate . AnnotationSet ) bindings . get ( ” s t r ” ) ; gate . AnnotationSet org3 = ( gate . AnnotationSet ) bindings . get ( ” org3 ” ) ;

// For the whole organism

gate . FeatureMap orgFeatures = Factory . newFeatureMap ( ) ;

gate . Annotation specAnn = ( gate . Annotation ) spec . i t e r a t o r ( ) . next ( ) ; gate . Annotation strAnn = ( gate . Annotation ) s t r . i t e r a t o r ( ) . next ( ) ;

try{

orgFeatures . put ( ” Species ” ,

doc . getContent ( ) . getContent ( spec . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

spec . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ) ; orgFeatures . put ( ” Strain ” ,

doc . getContent ( ) . getContent ( s t r . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

s t r . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// Give the ” class ” f e a t u r e f o r ontology−aware JAPE try{

orgFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

orgFeatures . put ( ” Rule ” , ”3” ) ;

I n t e g e r orgNpId = outputAS . add ( org3 . f i r s t N o d e ( ) , org3 . lastNode ( ) , ” Organism ” , orgFeatures ) ;

/* −−Relations−− */

gate . FeatureMap specRelFeatures = Factory . newFeatureMap ( ) ; gate . FeatureMap strRelFeatures = Factory . newFeatureMap ( ) ; I n t e g e r specNpId = specAnn . g e t I d ( ) ;

I n t e g e r strNpId = strAnn . g e t I d ( ) ; specRelFeatures . put ( ” from ” , orgNpId ) ; specRelFeatures . put ( ” to ” , specNpId ) ; strRelFeatures . put ( ” from ” , orgNpId ) ; strRelFeatures . put ( ” to ” , strNpId ) ;

outputAS . add ( spec . f i r s t N o d e ( ) , spec . lastNode ( ) , ” species ” , specRelFeatures ) ; outputAS . add ( s t r . f i r s t N o d e ( ) , s t r . lastNode ( ) , ” s t r a i n ” , strRelFeatures ) ;

/* −−RelationsEnd−− */ } /* −−−−−END RULE 3−−−−−*/ Rule : OrganismRule4 P r i o r i t y : 20 ( (GENUS) : gen (STRAINKEYWORD) ? ( STRAIN ) : s t r ) : org4−−> {

gate . AnnotationSet gen = ( gate . AnnotationSet ) bindings . get ( ” gen ” ) ; gate . AnnotationSet s t r = ( gate . AnnotationSet ) bindings . get ( ” s t r ” ) ; gate . AnnotationSet org4 = ( gate . AnnotationSet ) bindings . get ( ” org4 ” ) ;

// For the whole organism

gate . FeatureMap orgFeatures = Factory . newFeatureMap ( ) ;

gate . Annotation genAnn = ( gate . Annotation ) gen . i t e r a t o r ( ) . next ( ) ; gate . Annotation strAnn = ( gate . Annotation ) s t r . i t e r a t o r ( ) . next ( ) ;

try{

String myGenus = doc . getContent ( ) . getContent ( gen . f i r s t N o d e ( ) . g e t O f f s e t ( ) , gen . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; String myStrain = doc . getContent ( ) . getContent ( s t r . f i r s t N o d e ( ) . g e t O f f s e t ( ) ,

s t r . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; orgFeatures . put ( ” Rule ” , ”4” ) ;

orgFeatures . put ( ”Genus” , myGenus ) ; orgFeatures . put ( ” Strain ” , myStrain ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// Give the ” class ” f e a t u r e f o r ontology−aware JAPE try{

orgFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

I n t e g e r orgNpId = outputAS . add ( org4 . f i r s t N o d e ( ) , org4 . lastNode ( ) , ” Organism ” , orgFeatures ) ;

/* −−Relations−− */

gate . FeatureMap genRelFeatures = Factory . newFeatureMap ( ) ; gate . FeatureMap strRelFeatures = Factory . newFeatureMap ( ) ;

I n t e g e r genNpId = genAnn . g e t I d ( ) ; I n t e g e r strNpId = strAnn . g e t I d ( ) ; genRelFeatures . put ( ” from ” , orgNpId ) ; genRelFeatures . put ( ” to ” , genNpId ) ; strRelFeatures . put ( ” from ” , orgNpId ) ; strRelFeatures . put ( ” to ” , strNpId ) ;

outputAS . add ( gen . f i r s t N o d e ( ) , gen . lastNode ( ) , ” genus ” , genRelFeatures ) ; outputAS . add ( s t r . f i r s t N o d e ( ) , s t r . lastNode ( ) , ” s t r a i n ” , strRelFeatures ) ;

/* −−RelationsEnd−− */ } /*−−−−−−−−−−−−−−Rule5−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ Rule : OrganismRule5 P r i o r i t y : 10 (

({Lookup . class == ”Organism” } ) : org

(

( STRAIN ) : s t r ( STRAINKEYWORD ) ? ) ?

) : org5−−> {

gate . AnnotationSet org5 = ( gate . AnnotationSet ) bindings . get ( ” org5 ” ) ; gate . AnnotationSet org = ( gate . AnnotationSet ) bindings . get ( ” org ” ) ; gate . AnnotationSet s t r = ( gate . AnnotationSet ) bindings . get ( ” s t r ” ) ; gate . Annotation orgAnno = ( gate . Annotation ) org . i t e r a t o r ( ) . next ( ) ;

gate . FeatureMap orgFeatures = Factory . newFeatureMap ( ) ;

// Acronym− check i f i t is an acronym , i f so give i t a feature try{

String type = orgAnno . getFeatures ( ) . get ( ” majorType ” ) . t o S t r i n g ( ) ;

i f ( type ! = null && type . equals ( ”acronym” ) ){ try{

orgFeatures . put ( ” type ” , ”Acronym” ) ;

} catch ( Exception ex ) { }

}

} catch ( Exception ex ) { }

// S t r a i n− check i f present because i t ’ s optional

String myStrain = null ;

i f( s t r ! = null ) { try{

myStrain =

doc . getContent ( ) . getContent ( s t r . f i r s t N o d e ( ) . g e t O f f s e t ( ) , s t r . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; orgFeatures . put ( ” Strain ” , myStrain ) ;

} catch ( Exception ex ) { }

}

//For s p e c i f y i n g genus and species , get a l l the taxonomic annotations i n organism

gate . AnnotationSet taxSet = inputAS . get ( ” TaxonomicUnit ” ) .

getContained ( orgAnno . getStartNode ( ) . g e t O f f s e t ( ) , orgAnno . getEndNode ( ) . g e t O f f s e t ( ) ) ;

int lastToken = orgAnno . getEndNode ( ) . g e t O f f s e t ( ) . intValue ( ) + 1; gate . AnnotationSet tokenSet =

inputAS . get ( ” Token ” ) . getContained ( orgAnno . getStartNode ( ) . g e t O f f s e t ( ) , new Long ( ( long ) lastToken ) ) ; L i s t sortedtaxSet = new ArrayList ( taxSet ) ;

L i s t sortedtokenSet = new ArrayList ( tokenSet ) ;

//with these v a r i a b l e s we want t o check i f the f i r s t token was //an abbreviated genus i f so we check the species i s not u p p e r i n i t i a l e d

String specAnnoFeature = ” ” ; Boolean abbreviation = f a l s e ;

i f( ! sortedtokenSet . isEmpty ( ) ){

// S o r t the l i s t t o check i f the f i r s t token i s an abbreviated genus

// i f so then check i f the t h i r d token s t a r t s with a lower case : i f not discard the annotation

C o l l e c t i o n s . s o r t ( sortedtokenSet , new gate . u t i l . OffsetComparator ( ) ) ; I t e r a t o r t o k e n I t e r a t o r = sortedtokenSet . i t e r a t o r ( ) ;

gate . Annotation tokenAnno = ( gate . Annotation ) t o k e n I t e r a t o r . next ( ) ;

i f ( tokenAnno . getFeatures ( ) . get ( ” length ” ) . equals ( ”1” ) ){ try{

gate . Annotation nexttokenAnno = ( gate . Annotation ) t o k e n I t e r a t o r . next ( ) ;

i f ( nexttokenAnno . getFeatures ( ) . get ( ” s t r i n g ” ) . equals ( ” . ” ) ){

orgFeatures . put ( ” abbrGenus ” , ” true ” ) ; abbreviation = true ;

try{

gate . Annotation thirdtokenAnno = ( gate . Annotation ) t o k e n I t e r a t o r . next ( ) ; specAnnoFeature = thirdtokenAnno . getFeatures ( ) . get ( ” orth ” ) . t o S t r i n g ( ) ;

} catch ( Exception ex ) { }

} }catch ( Exception ex ) { } } } i f( ! sortedtaxSet . isEmpty ( ) ){

// S o r t the l i s t as sometimes species c o n s i s t s o f two or more words

C o l l e c t i o n s . s o r t ( sortedtaxSet , new gate . u t i l . OffsetComparator ( ) ) ; I t e r a t o r t a x I t e r a t o r = sortedtaxSet . i t e r a t o r ( ) ;

String speciesName = ” ” ;

while( t a x I t e r a t o r . hasNext ( ) ){

gate . Annotation taxAnno = ( gate . Annotation ) t a x I t e r a t o r . next ( ) ; String taxon = ” ” ;

taxon = ( String ) taxAnno . getFeatures ( ) . get ( ” class ” ) ;

i f ( taxon . equals ( ”Genus” ) ) {

// Check i f the genus i s abbreviated .

String abbrFeatValue = ” ” ;

abbrFeatValue = ( String ) taxAnno . getFeatures ( ) . get ( ” abbr ” ) ; orgFeatures . put ( ” abbrGenus ” , abbrFeatValue ) ;

orgFeatures . put ( ”Genus” , taxAnno . getFeatures ( ) . get ( ”name” ) ) ;

}else i f ( taxon . equals ( ” Species ” ) ) {

// Add a l l the species contained i n organism annotation

speciesName = speciesName+ ( String ) ( taxAnno . getFeatures ( ) . get ( ”name” ) ) + ” ” ; orgFeatures . put ( ” Species ” , speciesName ) ;

} } }

i f ( ( ( abbreviation == true)&& ( specAnnoFeature . equals ( ” u p p e r I n i t i a l ” ) ) ) )

{ }else

{

// For the whole organism

orgFeatures . put ( ” class ” , ” Organism ” ) ;

// Give the ” class ” f e a t u r e f o r ontology−aware JAPE try {

orgFeatures . put ( gate . c r e o l e . ANNIEConstants .LOOKUP CLASS FEATURE NAME, ” Organism ” ) ;

} catch ( Exception ex ) {

System . out . p r i n t l n ( ” Exception in organism . jape : ” + ex ) ;

}

// Get the values i n t o e a s i l y understandable v a r i a b l e s try{

String name = doc . getContent ( ) . getContent ( org . f i r s t N o d e ( ) . g e t O f f s e t ( ) , org . lastNode ( ) . g e t O f f s e t ( ) ) . t o S t r i n g ( ) ; orgFeatures . put ( ”docName” , name ) ;

orgFeatures . put ( ” Rule ” , ”5” ) ;

} catch ( Exception ex ) { }

// The organism annotation i s done , add i t and s t o r e i t s ID f o r the r e l a t i o n handling .

I n t e g e r orgNpId = outputAS . add ( org5 . f i r s t N o d e ( ) , org5 . lastNode ( ) , ” Organism ” , orgFeatures ) ;

} }

/*−−−−−−−−−−−−−−−−End of Rule5−−−−−−−−−−−−−−−−−−−−−−−−−−−*/

clean.jape Phase : Clean

Input : possibleStrain Lookup TaxonomicUnit genus learnedStrain species s t o p L i s t s t r a i n manualStrain subspecies acronym

Options : c o n t r o l = a l l Rule : CleanTempAnnotations ( {possibleStrain }| {Lookup}| {genus}| {learnedStrain }| {species }| {stopList }|

{strain} | {manualStrain}| {acronym}| {subspecies} ) : temp −−> {

gate . AnnotationSet temp = ( gate . AnnotationSet ) bindings . get ( ”temp” ) ; inputAS . removeAll ( temp ) ;

}

F.2

JAPE Rules for the Mutation Series Extrac-