3.2 Evaluation strategy
3.2.2 Threshold determination
Overall strategy
Once the best semantic and functional similarity approaches and other associated pa- rameters have been selected, they will be tested in conjunction with the FuSiGroups algorithm. As the FuSiGroups algorithm includes two variable thresholds, the “se- mantic threshold” (ST) and the “functional threshold” (FT), it is first of all necessary to determine the optimal threshold ranges for each approach.
In addition to the ROC capabilities described above, R’s ROCR library can also generate accuracy graphs (see Table 3.3 for definition of accuracy). An accuracy graph shows the predictive ability of each cut-off, i.e. the ability of a given approach to distinguish between true positives and true negatives at each point of its range of values. An example of an accuracy graph is given in Figure 3.2. Note that the graph’s y-axis, representing the accuracy, ranges from 0.5 to 0.8 (or 50% to 80% accuracy). Accuracies of less than 0.5 are not usually found as they would represent a worse than random approach.
Cutoff Accur acy 0.0 0.2 0.4 0.6 0.8 1.0 0.50 0.55 0.60 0.65 0.70 0.75 0.80 Sample curve 1 Sample curve 2
Sample accuracy curves
Figure 3.2: Example of different possible accuracy curves
In order to determine a good range of thresholds, the following concepts are defined:
3.2 Evaluation strategy
Definition 7. Minimum threshold - semantic or functional similarity value that achieves the highest accuracy for a given approach. Let accuracy = f (similarity) ⇔ similarity = f−1
(accuracy). Minimum threshold = f−1
(max(accuracy)).
Definition 8. Maximum threshold - semantic or functional similarity value that is greater than the minimum threshold and corresponds to an accuracy, rounded to the nearest 0.05, of the maximum accuracy minus 15% of the range of accu- racy values for a given approach. Let accuracy = f (similarity) ⇔ similarity = f−1(accuracy) and let r = max(accuracy) − min(accuracy). Maximum threshold
= f−1(max(accuracy) − (r ∗ 0.15)).
The definition of the minimum threshold is based on the assumption that it is more desirable to exclude some true positives from the groups generated by the FuSiGroups algorithm than to include false positives and the maximum accuracy represents the best possible trade-off between true positives and true negatives for a given approach. The maximum threshold definition was derived from the need to minimise the number of false positives while at the same time obtaining a threshold that is distinct from the minimum threshold. 15% of the range of accuracy values was determined as the point fulfilling these criteria from the analysis of a number of different datasets. The rounding to the nearest 0.05 was included for ease of analysis. The actual accuracy values for the thresholds calculated in this work are derived in Chapter 5.
Functional thresholds
The thresholds for the functional similarity approaches can be derived directly from the functional similarity data. The 30 (3 times 10) sub-datasets from the resam- pling described in Section 3.2.1 are analysed in parallel and 30 accuracy curves are obtained which can be assimilated into a single curve using vertical threshold aver- aging. Unfortunately, while it is possible to obtain a single curve on a graph, this approach does not allow the extraction of a single similarity value for the minimum and maximum thresholds as the underlying data is not averaged. Although deriv- ing the thresholds visually from the curve would be a possibility, the quality of the curves was judged to be too low for this approach. The problem can however be solved by aggregating the 30 subsets into one big dataset. This results in a single accuracy curve identical to the vertically averaged individual curves, i.e. there is no loss of precision from the 30 sub-datasets, but with specific x and y values available for each point on the curve. As there is no difference in the actual curves derived from either approach but the aggregate dataset gives more scientifically accurate
3.2 Evaluation strategy
data points than visual analysis, the aggregate dataset was chosen for the present analysis.
For each selected functional similarity approach, the cut-off (x-value) correspond- ing to the highest accuracy data point (y-value) is selected to obtain the minimum functional threshold. Then the data point with the largest cut-off corresponding to an accuracy of 15% of the accuracy range is selected to obtain the maximum functional threshold. For the two curves in Figure 3.2 for example, the accuracy values lie between about 0.5 and 0.8, i.e. a range of 0.3, so 15% of the range is 0.05, leading to an accuracy of 0.75.
Semantic thresholds
Semantic thresholds are not as easy to establish as functional thresholds as they cannot be derived directly from the data. The semantic threshold determines the appropriate level of semantic similarity between the GO terms that make up a group’s definition. The true positive and true negative datasets constructed for the functional similarity analysis are based on gene products, related to a range of GO terms. At present, there is no equivalent dataset of GO terms qualified as similar or related based on a given property that is not semantic similarity. At best, such a dataset could be generated by a human curator using expert understanding, a laborious task for a dataset of sufficient size (1000+ term pairs). However, such a dataset would still be based on semantic relatedness rather than an independently verifiable property.
For this reason, semantic thresholds need to be determined using an indirect approach. The “MAX” functional similarity approach selects the GO term pair with the highest semantic similarity from a set of term pairs. This single most similar term pair is the closest that the gene products in the positive and negative datasets can be related to the GO terms on which their functional similarity is based. On the assumption that two biologically related gene products are most likely to be annotated with highly similar GO terms, while two unrelated gene products are most likely to be annotated with equally unrelated GO terms, the “MAX” functional similarity scores for the individual sub-ontologies are used to establish the semantic thresholds for each approach.
FuSiGroups uses only one semantic threshold for all GO term pairs but there are three GO ontologies, i.e. three sets of minimum and maximum thresholds can be deduced. There are two ways of addressing this issue. One option is to deduce three sets of thresholds using the method described for the functional thresholds, then to
3.2 Evaluation strategy
average the three thresholds into a single one. This approach is justifiable, as it is fairly common that two gene products have highly similar annotations in one or two ontological categories, but not in the other(s). An average of the three thresholds would therefore present a balanced overall threshold. A second option is to use a similar process to the one used to reduce the 30 sub-datasets into a single dataset. The datasets for the three ontological scores are aggregated into a single very large dataset which, after performing the usual analysis on it, generates a single accuracy curve. The minimum and maximum semantic thresholds can then be deduced from this curve in the same way as the functional thresholds.
The threshold determinations will be discussed in Chapter 5. Once the semantic and functional thresholds have been determined, combinations of a range of semantic and functional thresholds can be run for the best performing semantic and functional similarity approaches. The resulting groups can then be analysed using the strategy described in the next section.