Summary of commonly used gene expression analysis methods for the generation of

4. Chapter IV – Generation and characterisation of trancriptomic and proteomic

4.1.1 Summary of commonly used gene expression analysis methods for the generation of

The activation of a gene results in its expression in the form of a so-called messenger RNA (mRNA) and the abundance of mRNAs can give indications in their activity. The comparison of gene expression changes between multiple groups, such as diseased or healthy, can highlight underlying patterns and subtypes relevant for the study of interest, such as cancer. Gene expression profiles can be generated from in vitro and in vivo, or even patient material, and can be utilised for the discovery and validation of novel markers associated with biological processes or disease states.

Two routinely-used methods for the study of the whole transcriptome are available and these are microarray profiling (Baldi, Hatfield 2011) and RNA-sequencing (Wang, Z., Gerstein et al. 2009). Both methods enable the analysis of coding and non-coding RNA and are routinely and successfully applied in the field of cancer research. An example of the successful application of microarray analysis was shown in the study of Lapointe et al, which profiled 225 prostate tumours for the identification of clinically relevant subtypes of PCa patients (Lapointe, Li et al. 2004). On the other hand, RNA-sequencing analysis was successfully used for the generation of 585 patient-derived gene expression profiles, which resulted in the identification of PCAT14 as a significant predictor for the development of metastasis, as well as biochemical-progression free survival (Shukla, Sudhanshu, Zhang et al. 2016). Both examples have shown that the study of transcriptomic profiles, independent from the generated platform, can generate meaningful outputs, potentially resulting in future clinically utilised information.

For both methods, RNA is extracted from a specific sample of interest, such as cell line material or tissue sections and cDNA is generated and tagged with either a fluorescence label or a sequencing adaptor. In the case of a microarray analysis, the cDNA material is then hybridised onto an array, which is covered with thousands of pre-defined DNA spots and incubated. During this step, the fluorescent-tagged cDNA can bind to covalent strands of DNA on the chip. After this, non-bound and non-specific bound cDNA molecules are removed during a washing step, and only specific bound cDNA is further

115 analysed. In the end, the array is scanned and excited with a laser, resulting in the ability to detect fluorescence intensity records for each DNA spot, presenting a single probe ID (Schulze, Downward 2001). In silico processing enables the normalisation and quantification of mRNA for each gene of interest. After this, expression intensities can be analysed, for example through the comparison of genes or sample groups.

RNA-sequencing started with the development of a chain-termination sequencing by Dr Frederick Sanger, which is therefore also called Sanger-sequencing (Sanger, Coulson 1975). This method is the gold standard for the sequencing of single genes and is still commonly used for the identification of the genetic sequence of single genes. The second generation of sequencing methods, mainly known as next-generation sequencing (NGS) enables the massive-parallel analysis and quantitation of thousands of genes. In this case, the most commonly used approach is a process called “Sequencing by synthesis” (SBS). Various companies, such as Illumina (Bentley, Balasubramanian et al. 2008) and Applied Biosystems (Voelkerding, Dames et al. 2009) are offering this type of sequencing. In this study, the RNA-sequencing was performed on an Illumina NextSeq500, therefore the sequencing method is described based on the companies’ approach.

SBS can be divided into 2 major steps, cluster generation and the actual sequencing. Initially, libraries of cDNA are generated. Adapter regions are added on both sides, then the cDNA is transferred onto a flow cell. This flow cell is a glass slide containing two types of oligos corresponding to one or the other adapter regions previously added to the cDNA. Initial copies of the bound cDNA fragments are generated, and the original template removed. The generated copy is then used to create clusters of identical complementary template molecules based on bridge amplification (Buermans, Den Dunnen 2014). After this, sequencing of the generated strands begins. During the cluster analysis, repeats of both strands are generated and for the first sequencing, one type of molecule is removed and the sequential extension of cDNA copies by fluorescent-tagged nucleotides is performed. The fluorescence tag differs for each single nucleotide, furthermore each nucleotide is attached to a terminator sequence. Every cycle, one nucleotide binds to the cDNA attached to the flow cell, the fluorescence is detected, and the terminator removed. This enables the binding of a new nucleotide to the analysed strand (Buermans, Den Dunnen 2014, Bentley, Balasubramanian et al. 2008). After a predefined number of cycles, the generated strand is removed, and a complete

116

complementary sequence is generated. This sequence is then used as a template for a second round of sequencing (Buermans, Den Dunnen 2014). The previously described sequencing process is repeated, resulting in so-called paired-end sequencing products. In

silico processing of the generated reads enables the identification and quantification of

RNA molecules in the analysed sample material.

Despite the successful application of both methods, RNA-sequencing offers strong advantages over microarray profiling. These include the unbiased screening of RNA present within the sample, which is limited in microarray analysis by the use of predefined probe sequences (Kukurba, Montgomery 2015). Furthermore, novel transcripts and gene variants at lower abundances can be routinely detected using RNA-sequencing. RNA- sequencing presents a broader dynamic range that can provide a more accurate detection of strong differentially expressed genes (Zhao, S., Fung-Leung et al. 2014, Nookaew, Papini et al. 2012). Microarray analysis shows limitations in the accurate quantification of very low and very highly expressed genes and transcripts (Kukurba, Montgomery 2015). Furthermore, microarray analyses generate gene expression values for multiple probe IDs per gene. These probe IDs cover different sequence segments of each gene and the binding affinity can vary. This commonly results in variations related to their significance and association across genes and can therefore limit the discovery of markers.

4.1.2 Summary of commonly used protein expression analysis methods

In document Biomarker discovery for disease progression and metastasis in prostate cancer: a multi-omic approach (Page 136-138)