Nearly all individual cells within a multicellular organism contains of the same genome. However, within each cell, different genes are transcriptionally active, resulting in cells and tissue displaying different gene expression patterns. This results in a myriad of structural, biochemical, functional and phenotypic variations amongst cells and tissues that might play a role in the differences observed between health and morbidity. This complete set of transcribed genes expressed as mRNA within an individual is known as the transcriptome (Su et al. 2002). Gene expression profiles not only have the potential to explain cellular functions, regulation and biochemical pathways but when contrasted between cases and controls (e.g. normal vs healthy), the transcriptome may reveal insight into disease pathology and identify new therapeutic points of intervention, enhancing diagnosis and improving prognosis (Van’t Veer et al. 2002; Xiong et al. 2013).
Transcriptomic changes are an important biological aspect of ageing (López-Otín et al. 2013; Glass et al. 2013). Indeed, variation in the regulation of gene expression, more-so than sequence variation, has been long postulated to be a more sensitive approach to studying ageing (King & Wilson 1975). The manifestation of profiling technologies and machine learning methods applied to global RNA profiles have already proven to yield sensitive and specific diagnostic and prognostic tools for cancer using sets of gene expression values of limited size (Patnaik et al. 2010; Shedden et al. 2008; Menden et al. 2013). While it is intuitive that a RNA profile obtained from a tumor demonstrates prognostic ability, the idea that a global RNA profile obtained from a non- diseased tissue sample can also produce an accurate and sensitive diagnostic that informs about
future disease has not been demonstrated.
2.2.1 Next-Generation sequencing and Microarrays
The development of transcriptome profiling technologies has allowed us unprecedented access to the world of RNA, with an ever-growing number of studies changing our view of its extent and complexity. Advances in molecular biology have brought utilization of microarrays and next- generation sequencing (NGS) technologies to the forefront of transcriptomics. Each of these technologies possesses a set of distinct features suitable for different applications and research goals. Current sequencing methods depend on the reconstruction of transcripts from sequenced fragments that generally do not exceed a few hundred nucleotides. These methods inevitably result in uneven coverage across the transcript (due to technical biases in the fragmentation and sequencing technologies), with the 5’ and 3’ ends often being the most problematic areas. With microarrays, RNA expression is measured through the amount of cDNA that hybridizes to pre- designed short DNA fragments, known as probes, immobilized on a chip. This limits the quantification of expression to areas in the genome that are matched by the probes. In addition to the need for having the correct type of probes, the distribution of probes must also be uniform (to an appropriate extent) across the transcripts’ untranslated regions. Thus, arrays have a fundamental design bias i.e., one can only explore and analyze the transcriptomic regions for which probes have been designed. Also, arrays are highly dependent on reference databases from which they are designed. On the contrary, with NGS, reads are generated without any a priori knowledge of transcriptome, thus permitting analysis of novel transcripts, splice junctions and noncoding RNAs and defined based on current genome knowledge. Due to this potential for NGS technologies to provide a more detailed look at the transcriptome, researchers have been keen to use it for gene expression studies (Mutz et al. 2013).
Despite the methodological benefits of RNA-Seq, microarrays have several potential advantages over sequencing, particularly for detecting lower abundance transcripts. Hybridization in microarray typically uses higher concentrations of cDNA than RNA-seq assays, but the detection of each unique cDNA (or cRNA) is independent thereby avoiding the competitive detection scenario encountered with NGS data. With sequencing, the inability to detect a large proportion of lower abundant transcripts is caused by a few highly abundant RNA transcripts accounting for a very large proportion of a cDNA library (Lei et al. 2015). This inability to robustly detect low abundance transcripts leads to high variability in the quantitative measurement of transcript expression. Microarrays, on the other hand, provide coherent and accurate gene expression quantitation irrespective of transcript abundance. Use of microarrays for research remains prevalent, as the technology has been proven successful in consistently providing genomics insight
for the past two decades (Harrington et al. 2000; Trevino et al. 2006; Yan & Gu 2009). Also, microarrays are generally considered easier to use as protocols for sample labeling, array handling and data analysis are less intensive. Moreover, general agreement has emerged on the major methods for processing the data and a wealth of good tools exist to analyze them, while the same cannot be said yet for RNA-seq. Further, despite NGS advancements and a recent drop in the cost associated with NGS, expression arrays are still economical and easier when processing large numbers of samples (e.g., hundreds to thousands) and yield higher throughput.
There are pragmatic reasons for using microarray technology in a study such as ours as well. The primary research objective within this thesis was to find a biomarker or a diagnostic tool for healthy ageing that had prognostic abilities for a clinical outcome. There are many pre-existing datasets profiling a variety of tissues in young and old available on a variety of microarray platforms. Therefore, from a validation perspective microarray was a sensible choice (Figure 2.1). The different datasets used in our study were profiled on various microarray platforms including Affymetrix HGU133Plus2, Affymetrix HuEx-1.0 ST, HTA-2.0, Illumina HT-12 V3 beadchip and Illumina HT-12 V4 beadchip (for detail of the datasets see Appendix 1).