mirConnX 2.0 124 - An Integrated, Module-based Biomarker Discovery Framework

While each of the methods proposed in the first two aims constituted standalone studies and the source code will be made available separately, they were all developed toward the same goal of deciphering disease complexities and should be integrated. We expanded the mirConnX web environment to streamline all the methods to provide an integrated view to enable further user exploration and hypothesis generation. Figure 4.3 demonstrates the vision of the end product of this effort. As before, users supply mRNA and/or miRNA gene expression. In addition, they have the option of uploading clinical labels of interest that correspond to the same samples in the gene expression profiles. Examples include disease subtypes, normal/control labels, progression free survival, or relapse events.

Gene expression and miRNA expression, if supplied, are clustered separately by ReKS to produce coherent clusters that are candidate molecular signatures, potentially taking advantage of pathway information from KEGG pathway using the prior incorporation scheme. Next, using the group variable selection developed in Chapter 3, molecular signatures that are predictive of disease subtypes or user-input labels are identified. Finally, transcriptional and post- transcriptional regulatory information are provided to genes within and between the molecular signatures. In summary, a network with genes, miRNAs and TFs is generated with the following: (1) expression correlation edges indicating strength and sign of association (2) dynamic

cluster boundary set by user-defined significance thresholds (3) transcription factor regulations and strength, between selected TFs-selected genes/miRs, otherTFs-selected miRS,

as well as selected TFs-otherTFs (4) miRNA regulations and strength, between selected miRs- selected genes/TFs, selected miRs-co regulators or targest of other selected miRs/genes, and potentially (5) protein-protein interaction or (6) pathway boundary and interaction.

Figure 4.3 Integrative analysis with mirConnX2.0 on melanoma.

mirConnX 2.0, an integrated framework that includes clustering, feature selection, and regulatory relationship enrichment, was applied on the Melanoma data detailed in Section 3.5.2. Genes and miRNA clusters that are predictive of patient survival are selected (green). Cluster membership is indicated in light purple boxes. Several oncogenes known to be involved in melanoma disease mechanisms are highlighted in gold. Genes are represented by squares, miRNAs by triangles, and TFs by circles. Blue edges indicate TF-> gene regulatory interactions supported by literature or computational predictions. Red edges indicate miRNA-> gene targeting supported by literature or computational predictions. For visual simplicity, strength of association (regulatory strength plus correlation) and sign of association (repression or activation) are omitted in this figure, but are available to display as an option.

As a proof of concept, we illustrate the full power of mirConnX 2.0 by revisiting the melanoma mRNA and miRNA expression datasets used in the case study in Section 3.5.2. We applied mirConnX 2.0 to this dataset, using no prior clustering information. As shown in Figure 4.3, the nine groups of selected miRNAs and genes are shown in green, inside light purple boxes indicating their group memberships. mirConnX 2.0 automatically extract the biological context around these variables, including TF-> gene regulatory relationships and miRNA-> gene

Several preliminary observations can be made from this powerful analysis: 1) we notice several known oncogenes to be jointly regulated by prognostic biomarkers, for example RAS by has-miR-659 and LMO2, and a few others jointly by has-miR-219 as well as has-miR-659. 2) POU2F1(also known as OCT1) acts a the master regulator in this snapshot of the network, regulating both of the hub clusters (has-miR-219/has-miR-659, and LMO2/RPS56KL1). It binds to an octamer DNA sequence and is known to have cell type-specific effects on differentiation, but no clear association with Melanoma. However, another protein in the same family, POU3F2(also known as BRN2/OCT5), has been shown to be linked to melanoma proliferation by participating in the Wnt signaling as an early factor in melanoblasts that negatively regulated differentiation [230,231]. This observation presents POU2F1 and reinforces POU2F2 to be interesting targets for investigation, and one can already start generating interesting hypothesis about possible feedback loops they may form with the two hub regulators. 3) On the other hand, ISL1 and CTDSPL2 are regulated by both of the selected hub clusters. Neither have apparent association with melanoma, but ISL1 is phosphorylated by Rho kinase whose dysregulation contributes to the metastatic behavior of many tumor types including melanoma [232,233] and this pathway has been targeted by several clinical studies for anticancer therapeutics. Thus, we may want to extend our network to additionally include interactive partners of these genes.

These simple, preliminary observations demonstrate that by placing the selected biomarkers in their biological context and linking them through regulatory interactions, a much simpler, cleaner organization of the biomarkers and regulatory partners surface. In addition, relative biological importance of the biomarkers are immediately clear in a network context, with hub clusters being primary interests to many as they are ideal candidates for understanding dynamics and mechanism of disease formation, and they present possible points of attacks for

potential intervention. Even for a cluster with hundreds of members shown in this example, only a subset will be of immediate interests to bench scientists and failure in properly defining a set of computational thresholds for cluster selection by the user can be in most part remedied by this type of context information “filtering”. Finally, an advantage mirConnX 2.0 has over its predecessor is that the size of the final network is now very manageable, as we now focus on the part of the network containing mainly prognostic clusters.

We expect that users will be able to adjust size of the network by adjusting the significance thresholds (p-valule for conditional independence tests) in Section 3.3.2.2. The basic set of selected genes and miRNAs will remain unchanged. However, the size of the cluster will grow and shrink, and members of the clusters as well as regulatory relationships will be shown or hidden accordingly.

Additionally, users will be allowed to upload additional prior grouping information to be used in the prior incorporation scheme in Section 2.5. Example of the prior information that a user may want to supply include pathway information (from KEGG [88] or Ingenuity[234], for example), Protein-protein interaction, regulatory modules or co-regulation information, and domain expertise. Not all the prior information would be necessarily suitable, but it will be up to the user’s discretion.

In document An Integrated, Module-based Biomarker Discovery Framework (Page 124-127)