RNA-based high-throughput sequencing - High-throughput sequencing

1 Introduction

1.5 Nucleic-acid based techniques to study microbial ecology

1.5.4 High-throughput sequencing

1.5.4.2 RNA-based high-throughput sequencing

Transcriptomics is the study of total messenger RNA (mRNA) molecules, or transcripts, produced by one microorganism or a whole population of microbial cells at a specific developmental stage or physiological condition (Wang et al. 2009, Zhang et al. 2010). The mRNA of a cell or an organism is the template for protein synthesis and the genes that are actively expressed are reflected by the transcriptome (Horgan and Kenny 2011). RNA sequencing technologies (RNA- seq) have transformed the study of bacterial transcriptomes (Croucher and Thomson 2010). Transcriptomics delivers information on what genes are expressed and as RNA-seq is a quantitative tool, it can be applied to determine gene

tions. Comparing control datasets and treatment data, genes that are up- or down- regulated in response to a specific treatment can be identified and furthermore by combining genes to gene ontology (GO) functional categories, identification of specific groups of molecular processes that have been effected by a particular treatment is possible (De Wit et al. 2012).

However, working with bacterial RNA can be a challenge as summarized by van Vliet (2010) due to (i) the lack of a poly-A tail in bacterial mRNA which therefore does not allow its specific capture from the total cellular RNA; (ii) bacterial RNA preps comprising up to 80% rRNA and transfer RNA (tRNA) (Condon 2007) and (iii) the short half-life and instability of bacterial mRNA.

Transcriptomics includes the following steps: (i) RNA extraction and purification (ii) library preparation and sequencing of complementary DNA (cDNA) and (iii) bioinformatic analysis.

(i) RNA extraction and purification

RNA can be extracted using organic solvents or commercially available kits. During cell lysis RNAse enzymes are released. These enzymes are also found in the air, on clothing and on hands and can rapidly degrade RNA (Peirson and Butler 2007). Therefore it is important to work efficient and to apply good labora- tory practice during RNA extraction and purification. RNA extraction can be completed with kits such as Qiagen’s RNeasy extraction kit, Roche’s High Pure RNA Isolation kit or Invitrogen’s Trizol extraction method (Invitrogen). If ge- nomic DNA or protein is extracted alongside from the same sample then the Tri- zol extraction method may be recommended (De Wit et al. 2012).

Since the extracted RNA will comprise up to 80% rRNA and tRNA (Condon 2007), depletion of these molecules should be carried out in order to increase the coverage of mRNA during the sequencing process. Depletion of rRNA can be achieved with kits such as MICROBExpress (Ambion Inc., USA) or Ribo-Zero (Epicentre Technologies Corporation, USA), which deplete rRNA through hybri- dization of magnetic bead-linked complementary oligonucleotides (Croucher and Thomson 2010). rRNA can also be depleted by using terminator exonucleases that degrade transcripts with a 5’-monophosphate group, e.g. TerminatorTM 5’- phosphate-dependent exonuclease (Epicentre). Subsequent rRNA depletion, the

purified and enriched mRNA can be checked on a Bioanalyzer (e.g. Agilent 2100 Bioanalyzer, Agilent Technologies, Inc., USA) using an mRNA Pico or Nano chip (Aglient) to standardize the total RNA concentration across samples before cDNA library preparation.

(ii) Library preparation and sequencing of cDNA

In order to convert mRNA into cDNA reverse transcription can be performed. Commercially available kits for cDNA library preparation from mRNA for NGS are, e.g. Illumina’s TruSeq preparation kit, Invitrogen’s cDNA library construc- tion kit or Clontech’s SMART cDNA synthesis method (Clontech Laboratories, Inc., USA). A straight forward method is to construct Illumina libraries from first strand cDNA (Croucher and Thomson 2010). The final choice for cDNA library synthesis depends on the sequencing technology selected. Possible sequencing platforms that have been used in bacterial RNA-sequencing studies are Illumina

MiSeq and HiSeq and 454 Life Sciences (Roche) (Beaume et al. 2011,

Giannoukos et al. 2012, Sharma et al. 2010, Tjaden 2015). More details about

library amplification and the sequencing process can be found in section 1.5.4.1 DNA-based high-throughput sequencing under (ii) choosing a sequencing plat- form (page 37) and (iv) library amplification and sequencing (page 40).

(iii) Bioinformatic analysis

After sequencing, reads can be aligned to a reference genome or if there is none,

reads can be assembled de novo. Several computational tools exist for reference-

based transcriptome assembly, such as Bowtie (Langmead and Salzberg 2012),

PerM (Chen et al. 2009) and SOAP (Li et al. 2008) as well as for de novo

transcriptome assembly like ABySS (Birol et al. 2009), Trinity (Grabherr et al.

2011) and Oases (Schulz et al. 2012). However, the majority of these tools have

been designed mainly for eukaryotic transcriptomics. Bacterial transcriptome assembly is confronted with different challenges compared to eukaryotic transcriptome assembly. These challenges are for example, overlapping bacterial transcripts that make it difficult to distinguish the boundaries of adjacent transcripts, activation of different promoters of an operon under different condi- tions and polycistronic messages (Tjaden 2015). More recently bioinformatic tools have been developed to overcome these challenges in bacterial transcripto-

transcriptomics (McClure et al. 2013) and Rockhopper 2 for de novo assembly of bacterial transcriptomes (Tjaden 2015). The Rockhopper tool for reference-based bacterial transcriptome analysis comprises the following steps: (i) alignment of reads to a reference-genome, (ii) normalization in order to allow for data compari- son between different samples and experiments, (iii) assembly of transcripts and identification of transcript boundaries, (iv) quantification of transcript abundance, (v) test for differential gene expression, (vi) prediction of operons and (vii) visu- alization of results.

As with similar methods, RNA-seq requires biological replicates for solid quantification of differential expression (Croucher and Thomson 2010). However, RNA-seq datasets have been very reliable, when comparing either biological or technical replicates (Croucher et al. 2009, Marioni et al. 2008) making RNA-seq an appropriate methods for expression studies (Croucher and Thomson 2010).

1.6 Proteomics and metaproteomics

In document Environmental genomics and proteomics of plant associated microbial dimethylsulfide degradation in a coastal salt marsh (Page 70-74)