Chapter 2. Materials and Methods
2.7 Differential Gene Expression Experiments
Differential gene expression (DGE) experiments were performed using RNA extracted from FFPE and cell line material. Two different gene expression profiling platforms were used to determine differential gene expression profiles between OPMD that underwent malignant transformation (MT) and those that had not undergone malignant transformation (NT):
• whole transcriptome sequencing using Illumina’s Next Generation Sequencing RNASeq platform (Illumina, USA)
• targeted transcriptome profiling using the NanoString nCounter platform (NanoString Technologies, Seattle, USA)
RNA extraction and purification from FFPE tissue for differential gene expression analysis was performed at the Institute for Genetic Medicine, Newcastle University. The RNA sequencing was performed by biomedical scientists at the Genome Centre, Queen Mary University of London. NanoString sample processing was performed with the kind assistance of Ms. Anastasia Resteu from the Human Dendritic Cell Laboratory, Institute of Cellular Medicine, Newcastle University.
2.7.1 Total RNA extraction from formalin-fixed paraffin-embedded (FFPE) tissue
After trimming excess paraffin off the sample block, 10μm sections were cut from the FFPE blocks and placed in 2ml microcentrifuge tubes after discarding the first two sections. Whole sections that included both epithelium and underlying connective tissue were used. The number of sections per sample was dependent on the size of the tissue; 4 sections for small sized samples, 3 - 4 sections for medium sized samples, 2 - 3 sections for large sized samples and 1 – 2 sections for very large sized samples (Table 2.1). RNA extraction and purification were performed using the QIAGEN RNeasy FFPE kit following the manufacturer's protocol (QIAGEN,
Manchester, UK). A brief outline of the protocol is listed in Appendix C. Following RNA extraction, the concentration and the quality of the isolated RNA were
measured using a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, UK). The samples were then stored in a -80°C freezer prior to utilisation in
downstream experiments.
2.7.2 Whole transcriptome sequencing - RNA sequencing (RNASeq)
Total RNA sequencing (RNASeq) was performed using RNA extracted from 20 FFPE samples (10 MT vs 10 NT). RNA samples were assessed for quantity and integrity using the NanoDrop 8000 spectrophotometer V2.0 (Thermo Fisher Scientific, USA) and Agilent 2100 Bioanalyser (Agilent Technologies, Waldbronn, Germany). From each sample, 100ng of total RNA was used to prepare RNA libraries using the KAPA Stranded RNASeq Kit with RiboErase (KAPA Biosystems, Massachusetts, USA). Prior to first strand cDNA synthesis, fragmentation was carried out using incubation conditions recommended by the manufacturer for degraded samples (65°C for 1 minute), and 14 cycles of PCR were performed for final library amplification. The libraries produced were quantified using the Qubit 2.0 spectrophotometer (Life Technologies, California, USA) and assessment of the average fragment size was performed using the Agilent 2200 Tapestation (Agilent Technologies, Waldbronn, Germany). The Illumina NextSeq®500 (Illumina Inc., Cambridge, UK) was used to generate 75bp paired-end reads for each library.
2.7.3 Bioinformatic analysis of RNASeq data
Bioinformatic analysis was performed by Mr. John Casement from the Bioinformatics Support Unit of Newcastle University. FastQ files generated from the sequencing runs were downloaded from the Illumina server using BaseMount, the command line interface for Illumina BaseSpace. Read quality of the FastQ files generated from the sequencing run were assessed using FastQC
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and MultiQC
(http://multiqc.info) was used to obtain summary statistics for quality control tests on
the read quality. Reads were quantified against transcripts using “Kallisto” (Bray et al., 2016). Kallisto is a program for quantifying abundances of transcripts from RNASeq data, which determines the compatibility of reads with targets without the need for alignment. The summarised Kallisto workflow is as follows:
Gencode transcript FASTA files (Gencode version 24: http://www.gencodegenes.org/releases/24.html)
• (2) Run the quantification algorithm “kallisto quant” for each pair of forward (R1) and reverse (R2) FastQ files against the index.
Kallisto was used to quantify reads against transcripts. To obtain gene-level counts, a package from the R statistical programming language (R Foundation for Statistical Computing, Vienna, Austria), “tximport” was used. Gene annotation was obtained from Ensembl transcript IDs using the R package “biomaRt” (Durinck et al., 2005). The R package DESeq2 was used for normalisation and testing for differential gene expression by use of negative binomial generalised linear models (Love et al., 2014). Genes were considered to be significantly differentially expressed when the False Discovery Rate (FDR) using the Benjamini-Hochberg method corrected p-value was less than 0.05.
2.7.4 NanoString experiments
The NanoString nCounter system (NanoString Technologies, Seattle, USA) uses hybridisation of short length probes (35- to 50- base sequence) that are subsequently fixed to a biotin-coated cartridge which is then digitally imaged and counted to
quantify mRNA expression. In-depth details regarding NanoString technology can be obtained from Geiss et. al. (2008). NanoString sample processing was carried out at the Human Dendritic Cell Laboratory, Institute of Cellular Medicine, Newcastle
University using the nCounter MAX/FLEX system (NanoString Technologies, Seattle, USA) with the kind assistance of Ms. Anastasia Resteu. The experiment involved two stages:
• Stage 1: Differential gene expression experiment using the PanCancer Pathways Panel Plus of target genes
• Stage 2: Differential gene expression experiment using a customised list of target genes
Experiments were performed using previously extracted RNA from selected FFPE blocks as described in section 2.7.1. Each assay comes with engineered External
RNA Controls Consortium (ERCC) synthetic internal negative and positive control probes.
For the PanCancer Pathways Panel Plus, an additional ten probes targeting specific mRNA (Appendix D) were added to the pre-existing 770 gene list available in the PanCancer Pathways Panel. The additional targets were chosen from statistically significant differentially expressed genes from the earlier RNASeq experiment (FDR < 0.05 and fold change greater than 2). The selection of these additional candidate genes was based upon biological relevance and review of the relevant literature. This was discussed with and finalised through consensus by members of the Newcastle University Oral Cancer Research Group (OCRG). This experiment was performed using RNA extracted from 48 FFPE samples (25 NT and 23 MT cases). RNA samples were assessed for quantity and quality using the NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, UK). Samples were considered
suitable for the NanoString experiment if the A260/A280 ratio was between 1.7 – 2.3 and the A260/A230 ratio was between 1.8 – 2.3 (NanoString, 2016). RNA content for all samples was normalised to 30ng/μl and 5 μl (150ng of total RNA) per sample was used for the experiment [Dr Jim White, Senior Field Application Specialist,
NanoString Technologies; personal communication]. The summarised laboratory workflow for the NanoString nCounter assay using the PanCancer Pathways Panel Plus according to the manufacturer's protocol is listed in Appendix E (NanoString, 2016).
For the Customised CodeSet Panel experiment, a list of target genes was compiled based on the results from the RNASeq experiment, NanoString PanCancer
Pathways Panel Plus experiment and review of relevant scientific literature. The selection of candidate genes for this customised panel was discussed and finalised through consensus by members of the Newcastle University Oral Cancer Research Group (OCRG) and the gene list is shown in Appendix F. This experiment was performed using RNA extracted from 44 FFPE samples (24 NT and 20 MT) and four OED cell lines. RNA samples were assessed for quantity and quality using the NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, UK). Samples were considered suitable for the NanoString experiment if the A260/A280 ratio was between 1.7 – 2.3 and the A260/A230 ratio was between 1.8 – 2.3 (NanoString, 2016). RNA content for all samples was normalised to 30ng/μl and 5 μl (150ng of
total RNA) per sample was used for the experiment. The summarised laboratory workflow for the Customised CodeSet Panel gene expression assay according to the manufacturer's protocol is listed in Appendix G (NanoString, 2016).
Output from the nCounter Platform was quality assured using the quality control (QC) function in the nSolver analysis software 3.0 (NanoString Technologies, Seattle, USA). The following parameters were assessed during the QC function for each sample:
• Imaging QC: measure of the percentage of requested fields of view successfully scanned in each cartridge lane (75% cut-off)
• Binding Density QC: measure of reporter probe density on the cartridge surface within each sample lane (range between 0.05 – 2.25)
• Positive Control Linearity QC: measure of correlation between the counts observed for the ERCC synthetic positive control probes and the
concentrations of the spike-in synthetic target nucleic acids (0.95 cut-off) • Positive Control Limit of Detection QC: measures the limit of detection by
comparing results from positive control probes and negative control probes (0.5fM positive control probe should produce raw counts of > 2 standard deviations higher than the mean of the negative control probes)
2.7.5 Differential gene expression analysis of NanoString data
Differential gene expression data analysis between MT and NT groups was performed using the nSolver Analysis Software 3.0 (NanoString Technologies,
Seattle, USA). Prior to the DGE analysis, the raw data were normalised in a two-step manner. Firstly, the raw counts were background subtracted using the geometric mean of the internal negative controls followed by technical normalisation using the geometric mean of the internal positive controls. Subsequently, the data were then normalised using the geNorm algorithm that chooses only the most stable
housekeeping genes in the analysed dataset (Vandesompele et al., 2002).
For the PanCancer Pathways Panel Plus experiment, genes were considered to be significantly differentially expressed when the False Discovery Rate (FDR) using the Benjamini-Hochberg method corrected p-value was < 0.1. A False Discovery Rate (FDR) of < 0.1 was chosen to be significant for this experiment as this was an
exploratory experiment to find genes with altered expression between OPMD that undergo MT and those that do not. Setting the FDR rate too stringently could exclude key genes that may have been statistically significant if the cohort was larger. A hypergeometric test was carried out to identify Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways in which differentially expressed genes were over- represented. KEGG pathways were rendered using Pathview (Luo and Brouwer, 2013).
For the Customised CodeSet Panel experiment, genes were considered to be significantly differentially expressed when the False Discovery Rate (FDR) using the Benjamini-Hochberg method corrected p-value was < 0.05. Raw and log2 normalised counts of significant differentially expressed genes were then exported in a CSV file to be used for statistical analysis and model building. Log2 normalised counts of the significant differentially expressed genes between MT and NT cases were then dichotomised using the respective medians into low-expression and high-expression sub-groups for further analysis [Dr Kim Pearce and Dr Syed Haider; personal
communication].
2.7.6 Development of gene-signature for clinical outcome in OPMD
Development of a gene-signature was performed by fitting the dichotomised (low- expression; high-expression) log2 normalised gene expression of significant differentially expressed genes between MT and NT cases from the Customised CodeSet experiment into a Cox regression (proportional hazards regression) model using a stepwise method. The b-coefficients from the gene-signature model were used to obtain risk scores and the resultant risk scores were then dichotomised into risk groups (low-risk and high-risk) using the median of the risk scores [Dr Kim Pearce and Dr Syed Haider; personal communication].