CHAPTER IV: A GENE ONTOLOGY BASED MODEL OF THE
4.3. Results & discussion
4.3.2. Analysis of clusters of gold standard terms
4.3.2.2. Cluster term overlap analysis
In addition to gene overlap analysis, an ontology term overlap analysis was also conducted, again to investigate the functional dependencies between the various GOC clusters. Here, we check whether two clusters share the same ontology terms. We use the diagram on Figure 4.3.2 to illustrate the difference between the gene and term overlap analyses. Thus, the two clusters of terms in Figure 4.3.2, delimited by blue and red dashed lines, feature two terms in common (shown in purple) which constitute their term overlap. As for the gene overlap, there are 5 genes in common to both clusters (shown in bold and underlined); these are NPY, FGF, GDNF, ATF3 and BDNF.
Figure 4.3.2. Diagram illustrating the gene and term overlap analyses between clusters of terms. Two
clusters are visible on the diagram: clusters 1 and 2 delimited by dashed lines in blue and red respectively. Terms in blue correspond to cluster 1 whilst those in red correspond to cluster 2. Terms in purple are shared between the two clusters. Genes are shown below the terms that annotate them. Importantly, a gene may be annotated with two different terms from different clusters. Genes likewise shared between clusters are indicated in bold and underlined.
ATF3, BDNF GDNF NPY NPY BAX FGF FGF CALCA Cluster 1 Cluster 2
4. A Gene ontology based model of the functional characteristics of peripheral neuropathy
4.3. Results & discussion
Importantly, overlap in terms between clusters implies overlap in genes as the terms and the genes annotated with them are collectively shared by the clusters. The opposite is not true since the same genes could be associated with different terms from two clusters. The reason why we opted to use the term overlap analysis in addition to the gene overlap analysis despite the fact that the latter is implied by the former is that where the gene overlap between two clusters falls below the significance threshold, the existence of a common term would re-establish the evidence for a functional association between the two clusters.
The rational behind using the term overlap analysis to trace functional relationships between different GOC clusters is that ontology terms from deeper levels in the GO graph are more granular, reflecting additional functional details that may uncover unanticipated links with higher-order functions. For instance, the term ‘Notch signalling pathway involved in neuron fate commitment (GO:0021880)’ depicts the involvement of the Notch signalling pathway in the process of neuron fate commitment. The term in question is common to the ‘signal transduction (GO:0007165)’ and the ‘nervous system development (GO:0007399)’ GOC clusters (Fig 4.3.3) from the molecular and system classes respectively. Importantly, the term appears to relate to the root term from the ‘signal transduction (GO:0007165)’ cluster
4. A Gene ontology based model of the functional characteristics of peripheral neuropathy
4.3. Results & discussion
via a chain of ‘is a’ type of relationships while it links to the ‘nervous system development (GO:0007399)’ cluster via a ‘part of’ relationship. This illustrates the essence of the term overlap analysis, whereby functional associations between GOC clusters from varying levels of biological complexity (outlined in table 4.3.4) are revealed by means of identifying terms from clusters from low complexity levels whose functionality is inherently partial to higher-order biological processes from clusters from higher levels.
The term overlap analysis was based on identifying gold standard terms common to pairs of clusters but could have been also targeted at the overlap in the progeny of gold standard terms from the two clusters, since child terms are semantically indicative of their parents in the gene ontology. This applies to the previous example: term ‘Notch signalling pathway involved in neuron fate commitment (GO:0021880)’, which is not a gold standard term itself but which inherits two gold standard parent terms: the ‘Notch signalling pathway (GO:0007219)’ and the ‘neuron differentiation (GO:0030182)’ from the ‘signal transduction (GO:0007165)’ and the ‘nervous system development (GO:0007399)’ clusters respectively (Fig 4.3.3).
The occurrence of a common term between clusters can only arise from a functional link between them. As such, unlike the gene overlap analysis, we
4. A Gene ontology based model of the functional characteristics of peripheral neuropathy
4.3. Results & discussion
did not need to infer any significance from the number of common terms. It follows that the term overlap measure is expressed in an absolute rather than a relative fashion. The results from all cluster pairs are shown in table 4.3.6.
Figure 4.3.3. Relationships between low and high level biological processes captured as ‘part-of' relationships in GO. Child terms common to the ‘signal transduction (GO:0007165)’ cluster, the ‘cellular component organization & biogenesis (GO:0016043)’ cluster (both clusters marked in grey boxes) and the ‘nervous system development (GO:0007399)’ cluster from the more complex biological system class are shown. Importantly, these common children terms are associated with the higher order nervous system development process via ‘part-of’ relationships (shown in dashed lines). Nodes in color represent the gold standard set of terms whilst those transparent are the ancestors of the gold standard terms. A color scheme was applied to indicate the term study occurrence for the gold standard terms (red,
transport 0006810 metabolic process 0008152 signal transduction 0007165 Cellular component organization and biogenesis 0016043 cell adhesion 0007155 cell cycle process 0022402 apoptosis 0006915 nervous system development 0007399 neurological process 0050877 immune system process 0002376 behavior 0007610 inflammatory Response 0006950 Transport 0006810 54 0 0 16 0 0 0 0 4 0 0 0 metabolic process 0008152 140 0 10 0 0 1 0 4 5 0 0 signal transduction 0007165 62 0 0 0 8 0 0 0 0 0 Cellular component organization and biogenesis 0016043 56 0 1 6 2 0 0 0 0 Cell adhesion 0007155 9 0 0 0 0 0 0 0 cell cycle process 0022402 8 0 0 0 0 0 0 Apoptosis 0006915 20 0 0 0 0 0 Nervous system development 0007399 25 2 0 0 0 Neurological process 0050877 27 0 0 0 immune system process 0002376 39 0 4 Behavior 0007610 17 0 Inflammatory Response 0006950 8
4. A Gene ontology based model of the functional characteristics of peripheral neuropathy
4.3. Results & discussion
The results from the term and gene overlap analyses complemented each other in a variety of ways. Where there were term overlap and significant gene overlap between two clusters from the varying biological process classes outlined in table 4.3.4, the ontological terms in common were examined to reveal details about the nature of functional association between the clusters in the pair. Taking the example of the ‘signal transduction (GO:0007165)’ and the ‘nervous system development (GO:0007399)’ clusters, a significant proportion of genes seems to be in common between them indicating a functional interrelationship. Exactly which signalling pathways are involved in which neuronal processes is partly revealed by the terms common to both clusters. Thus, as shown in Figure 4.3.3, a number of signalling pathways seem to be involved in the process of neuron differentiation that occurs following nerve injury including the BMP, Notch, Wnt and the fibroblast growth factor signalling pathways.
Sometimes, two clusters may show an overlap in gene content that is significant enough to suggest a functional link between their encapsulated functions, yet no terms are found in common between them. In other words, the two clusters show a significant gene overlap but no term overlap. In this case, the functions of the genes in common are examined to determine the nature of functional relationships between the clusters. The opposing scenario
4. A Gene ontology based model of the functional characteristics of peripheral neuropathy
4.3. Results & discussion
is where clusters show an overlap in constituent terms, but score no significant gene overlap. This occurs when the number of genes annotated to the common terms amounts to a minor fraction of the clusters total gene count. Here the functional link between clusters is evident from the term overlap analysis alone.