5.6 Impact Analysis Evaluation
5.6.2 BRENDA Analysis
We believe that the reference BRENDA data is not complete and it can not show the performance of our system clearly. Thus, we manually annotated 40 full-text PubMed documents and assess the quality of our manually annotated data against the reference BRENDA data. Additionally, we evaluated the BRENDA data against our manually annotated corpus to demonstrate that the BRENDA data is incomplete.
Manual against reference BRENDA Data Evaluation. We assessed our 40 manually annotated documents against the reference BRENDA data;
the results are summarized in Table 34. In this task, the gold standard is
the BRENDA data and the test data is our manually annotated corpus. Table 34: Manual Evaluation against BRENDA data
Precision Recall F-Measure 71.6% 88.9% 79.3%
As discussed earlier in Section 5.6.3, impact mentions in tables are
them. Furthermore, some erroneous entries exist in the database. These
cases are reflected in the Recall section of Table 34. A precision of 71.6%
against the manual annotations shows that there are mentions of impacts missing in the database.
BRENDA against Manual Data Evaluation. The BRENDA data is also compared against the manually annotated documents; the results are
shown in Table35. In this task, our manually annotated corpus is consid-
ered as a gold standard and the BRENDA data is the test data.
Table 35: BRENDA Data Evaluation against Manual Annotations
Precision Recall F-Measure 79.7% 61.6% 69.5%
The recall of 61.6% against our manually annotated data indicates that many mentions of impacts are missing from the BRENDA data, and our system can help to complete the BRENDA data. The precision reflects the erroneous entries in the BRENDA data and the entries from the graphical diagrams that we do not manually annotate.
5.6.3
Impact Grounding
The results generated by the system can proven to be valuable by comparing them to a publically available gold standard. As shown earlier in the
preparation of the BRENDA data (see Section 5.1.3), each mutation is
retrieved together with the grounded impact. We use this data to test our grounding task. First, the system’s output of all detected impacts with their associated mutations is collected; then, using the queried BRENDA data, we test whether for a specific mutation an impact is detected.
In the following sections, we assess the performance of our developed system in grounding impacts against the BRENDA and our manually anno- tated corpus.
A true positive link represents a correctly identified association between a correctly detected impact and its mutation. Conversely, a true negative is the correct impact associated to a false mutation. Since the grounding
of impacts is highly dependent on the mutations detected in the text, we check the effectiveness of the impact extraction with respect to the result of the two mutation detection systems, Mutation Miner and MutationFinder.
System vs. BRENDA Results. We investigated the performance of the mutation-impact grounding against the curated BRENDA data, and sum-
marize the results of Mutation Miner and MutationFinder in Table 36.
Table 36: Mutation-Impact Relation Evaluation against the reference BRENDA data
Mutation-Impact Relation Evaluation against BRENDA–Mutation Miner #Documents Precision Recall F-Measure
100 57.5% 84.2% 68.3%
Mutation-Impact Relation Evaluation against BRENDA–MutationFinder #Documents Precision Recall F-Measure
100 57% 82.5% 67.4%
System vs. Manual Results. The evaluation of the semantic assignment of impacts to mutations has also been performed on the Xylanase corpus
(Table 37). Here, the mutations are extracted by Mutation Miner.
Table 37: Impact Grounding Evaluation on the Xylanase corpus– Mutation Miner
Impact Grounding Evaluation– Mutation Miner
Accuracy 75.7%
The performance of our system on our manually annotated corpus of 40
documents is assessed and the results are summarized in Table 38.
Table 38: Impact Grounding Evaluation on 40 manually annotated docu- ments
Impact Grounding Evaluation– Mutation Miner
Accuracy 71.7%
Impact Grounding Evaluation– MutationFinder
Discussion. As mentioned earlier, graphical diagrams are converted to indistinct textual blocks and since we do not analyse tables, the mutations with their impacts reported in tables are not grounded correctly by our
system; this is reflected in the recall of our results in Table 36. For
example, in documents PMIDs 12702265 and 15152005, the mutations are indexed with their impacts on their kinetic properties in tables. There are also mutation mentions that are not detected by Mutation Miner and MutationFinder, such as K5S/K6S.
Some erroneous impacts are reported in the BRENDA data, such as
H154F in document PMID 12205101 and N275E in PMID 10544015, where
the reported mutations do not exist in the document.
False negatives of the impact grounding are mainly due to the use of pronominal and nominal references. For example, use of all four R277
mutant proteins in PMID 10955993 for expressing the impact: The rate of reactivity toward oxygen was unaffected.
Also, consider this example:
Mutation of the residues Ala-200, Leu-203 or Gly-204decreasesall kinetic parameterssignificantly, suggesting that these amino acids are essential
for the binding of the pyrophosphate moiety of the coenzyme.a
aExcerpt from The three zinc-containing alcohol dehydrogenases from
baker’s yeast, Saccharomyces cerevisiae, PMID: 12702265
The mutations, Gly204Ala and Ala200 :Ala201Leu, are introduced
earlier in the article in tables (Ala200 refers to an insertion mutation).
Expressing mutations in natural language also adds more complexity to the grounding task:
. . . the introduction of an additional mutation of F241L to this Q137M
mutantagain converts it into thecold-sensitiveform [16].a
aExcerpt from Structural determinant for cold inactivation of rodent L-
xylulose reductase, PMID: 12890481
In the above example, an additional mutation of F241L to this Q137M
mutant expresses a mutation series, F241L/Q137M, where our system fails
Document PMID 8519804 reports on the impacts of 14 mutations:
Within this collection, 14 mutants had single amino-acid changes that were divided into 4 groups: (a) amino-acid changes associated with proposed lig- ands to Zn2; (b) a substitution of one of several conserved glycine residues; (c) mutations at the substrate or coenzyme binding site; (d) alterations that resulted in a change of charge near the active site.a
aExcerpt from Functional analysis of E. coli threonine dehydrogenase by
means of mutant isolation and characterization, PMID: 8519804
Our system detects the impacts of the 14 mutants, but the BRENDA data just reports the mutation C38S, which does not even exist in the document: All these cases result in false positives that explains our system’s low precision on the BRENDA data.
In our manually annotated corpora, some impacts cannot be grounded to a specific mutation, for example:
Moreover, disulfide bonds have usually been introduced by substituting more than two amino acid residues per monomer. These mutants generally
have multivalent disulfide-bond formations resulting indecreasedflexibility
of the quaternary structure.a
aExcerpt from Stabilization of quaternary structure of water-soluble quino-
protein glucose dehydrogenase, PMID: 12746550
In the above example, substituting more than two amino acid residues
per monomer refers to ambiguous mutations that affect the flexibility of
the quaternary structure, which still expresses an impact that can not be grounded. These are considered as erroneous grounding in our evaluation.
Ambiguous mentions of impacts are another source of false positives. Consider the following example:
Six mutant proteins (H86A/E/F/K/Q/W) were produced, purified and characterized. The six mutations reduced the affinity of XlnA towards xylan without having any major effect on the catalytic constant. All these mutations also lowered the pKa of the acid-base catalyst by 0.46-1.94 pH units. The mutations decreased the enzyme stability at 60°C by up to 95% and the transition temperature by 2.2-5.8°C. Unfolding of the
protein with guanidine hydrochloride (Gdn-HCl) showed thatfive out of six
mutations decreasedthe concentration required to denature 50% of the
XlnA, confirming the importance of H86 for the stability of the enzyme.a
aExcerpt from Site-directed mutagenesis study of a conserved residue in
family 10 glycanases: histidine 86 of xylanase A from Streptomyces lividans,
PMID: 9681873
“Five out of six mutations decreased the concentration”, however, which 5 mutations is not specified.
Some mutations are not detected by the external mutation tagging systems. Consider the following example:
Asp37 of xylanaseC was replaced with asparagineand other residues by sitedirected mutagenesis. Analyses of the wild-type and mutantenzymes showed that Asp37 is important for high enzyme activity at low pH. In
the case of the asparagine mutant,the optimum pH shifted to 5.0 and
the maximum specific activitydecreasedto about15%of that of the wild-
typeenzyme.a
aExcerpt from Crystallographic and mutational analyses of an extremely
acidophilic and acid-stable xylanase: biased distribution of acidic residues and importance of Asp37 for catalysis at low pH., PMID: 9930661
The above mutation, D37N, was not detected by Mutation Miner or MutationFinder. Thus, our system grounded the impacts, the optimum pH
shifted to 5.0 and the maximum specific activity decreased to about 15%,
incorrectly.
Mutation Miner does not check that the wild-type residue should be different from the mutant one, and detects some erroneous mutations, resulting in faulty grounding.