Results and Discussions - Structure learning of gene interaction networks

To infer the edge structure of gene-gene interaction networks the conditional statistical dependence for each probeset was modelled by inferring their respective Markov Blankets, using an approach highlighted by Li et al [29]. Candidates for inclusion in a probeset's Markov Blanket were selected from disjoint sets of clustered putatively co-expressed probesets. These disjoint sets were calculated using Pearson Correlation-based Markov Clustering (see Ma- terials and Methods). The underlying hypothesis was that a subset of these putatively co-expressed probesets, or genes, are the child nodes of some subset of parent nodes from another set of putatively co-expressed probesets. Thus we hypothesized that biologically relevant gene-gene interactions can be inferred under the assumption that putatively co-expressed genes have some common

Figure 4.1: Cross Cluster Gene Interaction Network

Nodes indicate genes and edges connect nodes that are signicantly statistically dependent across clusters signal that can be measured by statistical dependence. It is therefore these signals that we attempted to infer. We used the implementation in bnlearn package in R of the Markov Blanket detection algorithm [42, 46].

In order to explore the hypothesis underlying our methodology, or gather evidence to support it, we applied our method to a publicly available mi- croarray dataset obtained from NCBI [15]. The dataset is a Saccharomyces cerevisiae time course gene expression dataset concerned with the ageing of non-dividing yeast cells without the application of caloric restrictions (see Ma- terials and Methods) [15]. S. cerevisiae is a well-studied organism and we therefore benet from a wealth of validated information that can be used to improve the condence in our approach.

After normalizing the dataset using Robust Multichip Averaging (RMA) [26] and obtaining the log2 expression matrix of probesets, we calculated the all-against-all row-wise Pearson Correlation Coecients. We then selected a subset of probesets for further analysis, based on genes that matched those in the Kyoto Encyclopaedia of Genes and Genomes (KEGG) metabolic pathway for Saccharomyces cerevisiae [34]. Only row pairs that had an absolute correlation of 0.9 and higher were selected and then clustered using Markov Clustering with an ination value of 7 [47]. Then for each element in each cluster the Markov Blanket for that element was estimated from elements of every other cluster, respectively. For the Markov Blankets a 0.05 level of signicance was applied, with corrections for multiple hypothesis testing performed using the Benjamin-Hochberg procedure to control the false discovery rate [5]. The resultant network is depicted in Figure 4.1

Figure 4.2: Annotated Cross Cluster Gene Interaction Network

Nodes are KEGG metabolic pathways (purple triangles) and genes (green circles). Edges between genes are the result of the Markov Blanket analysis and an edge between genes and a pathway indicates genes occurring in that pathway

4.2.1 Projection on Metabolic Pathways

The directionality was removed from the network inferred from our methodology, thus resulting in undirected edges between particular gene pairs. We annotated the network with their respective KEGG metabolic pathways by adding nodes indicating the respective pathway and then joining these pathways to a gene if the particular gene is involved in the pathway [34](see Materi- als and Methods). The network obtained from the KEGG metabolic pathways is given in Figure 4.2

In order to identify meaningful edges in our originally inferred network, we identied cliques of size 3 or larger in the original network after it was annotated eectively attempting to identify triangular structures. These cliques indicated that an edge inferred by our network was between two genes in the same pathway. The Fisher Exact test was then used to determine if the num- ber of cliques found in this annotated network was statistically signicant (see Materials and Methods).

We found a total of 17 cliques that were of size 3 or larger, with the cor- responding p-value from the Fisher Exact test of 0.026 which was determined signicant at a 0.05 signicance level, the cliques along with their respective elements are given in Figure 4.3. There was sucient evidence to reject the null-hypothesis of independence at a 0.05 signicance level. The cliques indicated that their edges captured putatively potential pathway eects estimated from the data.

In the topology of the clique network structure there were several cliques that are neighbours of the glycolysis/gluconeogenesis pathway, one of the ma- jor production sources for energy in yeast metabolism [49]. The neighbouring pathways include fructose and mannose metabolism, galactose metabolism and

pyruvate metabolism. Galactose is a known input to glycolysis and pyruvate a known output [48]. The eect of gluconeogenesis on the chronological ageing of yeast is investigated in [31]. This investigation suggested that a shift from glycolysis to gluconeogenesis is associated with the ageing of yeast cells. Furthermore, this ageing also involved an increase in activity of fructose 1,6 biphosphatase which has been shown to be negatively aected by YMR205C [23, 31].

The clique structure given by Clique 14 is associated with glycine and syrine metabolism and is also a neighbour of the glycolysis/gluconeogenesis pathway. The genes associated with the former pathway were determined to be putatively related to the starvation phenotype and may play an important role in the survival of yeast cells [38]. It was also determined in the same study that genes associated with glycolysis/gluconeogenesis may have a similar putative relationship with the starvation phenotype. Furthermore, the signicant clique-based pathways suggested that the yeast cells may be metabolically ac- tive. This agreed with the ndings of [15] (unpublished), where from initial expression analysis it was determined that the glycolytic genes were up-regulated along with their trans-acting regulators.

The involvement of the gene YHR104W, also known as GRE3, in the lifes- pan of yeast cells was predicted and putatively validated in [52]. It was pro- posed that the inuence of GRE3 on the pentose phosphate pathway eects reactive molecules that contain oxygen (ROS) and that ROS has a putative association with the chronological ageing of yeast cells [40, 52].

In the network YOR374W, also known as Aldehyde Dehydrogenase (ALD4), has a high degree. This gene is involved in several pathways and plays an important role in ethanol and pyruvate metabolism amongst others [7]. A yeast deletion strain for this particular gene resulted in an increased survival during ageing when compared to the wild-type, suggesting that ALD4 may play a putative role in the cellular ageing [27].

4.3 Materials and Methods

In document Structure learning of gene interaction networks (Page 66-69)