Prostate cancer case study: enriched terms

3.3 Results

3.3.6 Prostate cancer case study: enriched terms

To compare in detail the di↵erence in the biological knowledge captured by the co- prediction and co-expression networks, the global analysis presented earlier was followed by a case study focused on a dataset characterised by a single disease – prostate cancer [170]. Particular focus was put on the specific knowledge captured by one paradigm but not the other.

In Figures 3.11 and 3.12 are compared the co-prediction and the Pearson co-expression networks inferred from the prostate cancer dataset. The attention was set on GO terms and pathways enriched uniquely in one type of network. For the sake of readability, the generic GO terms (with depth <9 in the GO hierarchical structure) were filtered out. C2 was the network with the largest number of unique terms, followed by C4

Fig 3.11: Unique enriched GO terms (biological process) for each network configuration. The x-axis shows the 12 investigated networks. The y-axis shows the names of enriched terms unique to co-prediction or Pearson co-expression networks. Red terms are associated with co-expression networks, blue with co-prediction. Empty columns indicate networks with no unique terms.

networks while only 3 GO terms and 4 pathways were specific to co-expression networks. A similar disproportion in favour of the co-prediction networks was found when comparing with MIC and ARACNE networks (see Section A.3.1 of Appendix A for the complete analysis).

Several of the unique GO terms enriched in the co-prediction networks are related to prostate cancer, according to the specialised literature. The role of the Protein ubiquination in prostate cancer was recently analysed and showed an impact for its treatments [178]. TheERK pathway is involved in the motility of prostate cancer cells [179]. Prostate cancer cells seem to alter the nature of theircalcium influx to promote growth and acquire apoptotic resistance [180]. Furthermore, the role ofcalcium home-

Fig 3.12: Unique enriched biological pathways for each network configuration. The x-axis shows the 12 investigated networks. The y-axis shows the names of enriched terms unique to co-prediction or Pearson co-expression networks. Red terms are associated with co-expression networks, blue with co-prediction. Empty columns indicate networks with no unique terms.

ostasis in the majority of the cell-signalling pathways involved in carcinogenesis has been well established, prostate cancer included [181].

Some enriched pathways specific to co-prediction networks are also highly relevant to prostate cancer. Several studies showed the involvement of theJAK/STAT pathway in the prostate cancer development [182,183]. There is multiple evidence suggesting that one of the major ageing-associated influences on prostate carcinogenesis is oxidative stress and its cumulative impact on DNA damage [184,185]. Finally,FAS (also called Apo1 or CD95) plays a central role in the physiological regulation of programmed cell death and has been implicated in the pathogenesis of various malignancies and diseases of the immune system including prostate cancer [186].

An additional analysis was performed on the biological terms related to the hubs (highly connected nodes) of the inferred networks. A node v was considered to be a

hub if its degree was at least one standard deviation above the average network degree, that is if:

d(v)> µd+ d

where d(v) is the degree of the node v (number of direct neighbours), and µd and d are the mean and standard deviation of the network node degree distribution.

To compare the networks, the top ten most frequent Gene Ontology terms, shared between each network’s hubs, were used. To make this analysis more specific, the most generic/common terms (which could be associated with many genes) were discarded, only the GO terms situated at level 10 or higher in the GO hierarchy, were considered. Figure 3.13 provides the top ten most frequent GO terms associated to the hubs of co-prediction and Pearson co-expression networks. Blue terms were found only in co- prediction networks, red terms were found only in co-expression networks and green terms were in common.

Terms Pearson ARACNE MIC

Co-prediction 16 18 16

Co-expression 19 20 19

Common 11 9 11

Table 3.18: Unique and common terms from networks’ hubs

In total, 16 unique terms for co-prediction networks were found, 19 unique terms for co-expression networks and 11 common terms. Table 3.18 summarises the number of unique and common terms shared between networks created with di↵erent approaches. The plots associated to the comparison of FuNeL with ARACNE and MIC are available in the Figure 3.14 and Figure 3.15. The results further highlight biological terms exclusively associated either with co-prediction and co-expression networks.

An analysis of term overlap was conducted using only the best performing networks in the curated G-D association analysis (namely C2 for FuNeL, SN(C3) for Pearson,

SE(C4) for ARACNE andSE(C2) for MIC, see Section A.2 in Appendix A for details).

The aim was to further show how the knowledge captured by networks inferred with di↵erent approaches is partially shared and often highly network-specific. Figure 3.16

Fig 3.13: Top 10 most frequent biological processes from Gene Ontology found in the network hubs when comparing FuNeL and Pearson co-expression networks. Blue terms were found only in co-prediction networks, red terms were found only in co-expression networks, and green terms were found in both.

Fig 3.14: Top 10 most frequent biological processes from Gene Ontology found in the network hubs when comparing FuNeL and ARACNE co-expression networks. Blue terms were found only in co-prediction networks, red terms were found only in co- expression networks, and green terms were found in both.

Fig 3.15: Top 10 most frequent biological processes from Gene Ontology found in the network hubs when comparing FuNeL and MIC co-expression networks. Blue terms were found only in co-prediction networks, red terms were found only in co-expression networks, and green terms were found in both.

Fig 3.16: Overlap of PANTHER enriched terms between the best performing networks identified with the G-D association analysis (curated databases). The values represent the number of enriched terms that are unique or shared between di↵erent networks. On the left the overlap of GO terms (including all 3 categories: BP, CC and MF), on the right the overlap of pathways.

shows the number of shared/unique enriched GO terms (including all three GO categories) and pathways across di↵erent networks. In total 122 GO terms were found to be captured by all the networks, while only one pathway was in common to all. The FuNeL network is similar to the ARACNE one, a total of 386 GO terms and 16 pathways were associated to both. Few terms were purely specific of MIC and Pearson networks, on the contrary a large number of unique terms was related to C2 (143 GO

terms and 17 pathways). Overall, Figure 3.16 emphasises even more the complemen- tarity between the co-prediction and co-expression approaches regarding the captured biological knowledge.

In document Knowledge extraction from biomedical data using machine learning (Page 103-110)