Studying bacterial transcriptomes using RNA-seq

(1)

Studying bacterial transcriptomes using RNA-seq

Nicholas J Croucher and Nicholas R Thomson

Genome-wide studies of bacterial gene expression are shifting from microarray technology to second generation sequencing platforms. RNA-seq has a number of advantages over hybridization-based techniques, such as annotation-independent detection of transcription, improved sensitivity and increased dynamic range. Early studies have uncovered a wealth of novel coding sequences and non-coding RNA, and are revealing a transcriptional landscape that increasingly mirrors that of eukaryotes. Already basic RNA-seq protocols have been improved and adapted to looking at particular aspects of RNA biology, often with an emphasis on non-coding RNAs, and further refinements to current techniques will improve our understanding of gene expression, and genome content, in the future.

Address

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Cambridgeshire, CB10 1SA, UK

Corresponding author: Thomson, Nicholas R ([email protected])

Current Opinion in Microbiology 2010, 13:619–624

This review comes from a themed issue on Genomics

Edited by George Weinstock and Gordon Dougan

Available online 29th September 2010

1369-5274 # 2010 Elsevier Ltd.

DOI10.1016/j.mib.2010.09.009

Introduction

The advent of second generation sequencing technol-ogies has created many opportunities to improve func-tional genomics experiments, including quantitative gene expression studies. Most previous transcriptional analysis methods have relied on hybridization of targeted oligonucleotides to particular loci for their sequence specificity: either primers binding to target cDNA in quantitative reverse transcription polymerase chain reac-tion (qRT-PCR), labeled probes binding to RNA in Northern blotting or hybridization of cDNA to probes on microarray chips. RNA-seq is different in principle in that data are matched to genes by sequence alignment instead.

This has intrinsic advantages: first, because no probe sequences are specified, all transcription is studied in an unbiased manner, and experimental design does not need to be altered in accordance with differences in

genome sequence. This promises to be a particular advantage in the study of bacteria with large amounts of genetic variation between strains [1]. It also allows the discovery of novel genetic features, as well as permitting the delineation of operons and untranslated regions, allowing the improvement and extension of sequence annotation.

Second, mapping of sequence data is more precise than hybridization between oligonucleotides. This allows tran-scription to be studied at a much higher resolution by sequencing, thereby also permitting the study of more repetitive regions of the genome. Additionally, it means quantification of gene expression by RNA-seq does not suffer from the issues of interference between genes due to non-specific hybridization of cDNA to probes [2,3].

Third, whereas hybridization-based methods measure gene expression levels through detection of fluorescence or radioactivity, RNA-seq uses the amount of data match-ing a given codmatch-ing sequence (CDS), typically quantified as reads per kilobase CDS length per million reads analyzed (RPKM) [4]. This measure cannot be saturated in the way the detection of light or radioactivity can, hence RNA-seq has a much greater dynamic range for measuring variability in expression levels. Consequently, it can also be much more discriminatory both at high levels of gene expression and more sensitive at very low levels of expression, given sufficient sequencing depth.

Preparation of cDNA

RNA is typically extracted using organic solvents or commercially available kits; however, care should be taken to ensure the method does not bias the sampling of the transcriptome [5] and is capable of harvesting sufficient starting material needed to construct a sequen-cing library, as more RNA is typically needed than for microarray experiments. Furthermore, the exclusion of highly expressed transcripts, which risk saturating the dataset, is also more difficult than with microarray exper-iments, where probes can be omitted from the chip design as required. As ribosomal RNA comprises the vast majority of the extracted RNA population, depletion of these molecules through hybridization to magnetic bead-linked complementary oligonucleotides [5–10,11], or the use of terminator exonucleases that specifically degrade transcripts with a 50 monophosphate group [12], has been used in efforts to increase the coverage of mRNA and ncRNA. However, the rapid increase in the pro-ductivity of the second generation sequencing technol-ogies renders the expensive depletion processes largely unnecessary, especially given the opportunity for sample Open access under CC BY license.

(2)

degradation and bias it presents [10]. Nevertheless, satur-ation of sequence data by abundant transcripts will remain an issue in some cases; for instance, when analyz-ing bacterial gene expression within host tissues, where eukaryotic RNA will be far more abundant than that of the prokaryote.

In the original RNA-seq protocols, following extensive DNase treatment, RNA was typically converted into cDNA through random hexamer-primed reverse tran-scription followed by second DNA strand synthesis [5– 9,13]. However, using double stranded cDNA for

mak-ing sequencmak-ing libraries results in equal levels of signal on both the sense and antisense strands, thereby losing information regarding the direction of transcription. A simple method for maintaining the directional signal in RNA-seq data is to construct Illumina libraries from only first strand cDNA [10]. Alternative techniques used to maintain directional fidelity involve sequentially ligating adapters onto RNA molecules in an orientation-specific manner [14,15], with one approach implemented in stu-dies of Mycoplasma pneumoniae and Pseudomonas syringae transcriptomes [16,17] and another used for RNA-seq in Helicobacter pylori and Salmonella enterica Typhimurium

Figure 1

Methods for preparation of cDNA. All methods require the extraction of nucleic acids from a sample of cells, followed by the enzymatic removal of DNA. Ribosomal RNA may then be depleted to increase the sequence coverage of other transcripts. To identify putative transcriptional start sites, samples are first treated with tobacco acid pyrophosphatase (TAP), which converts the triphosphate group at the 50_{end of intact transcripts to a}

monophosphate. This is required for the ligation reaction to attach an adapter to the 50_{end; polyadenylation or oxidation of the 3}0_{end of the RNA is}

used to ensure the specificity in the orientation of this reaction. This allows the 30_{part of the cDNA, corresponding to the extreme 5}0_{end of the original}

transcript, to be targeted for sequencing. In order to obtain sequence data covering the entire transcriptome, small cDNA molecules must be randomly generated from throughout the RNA sample. This has frequently been achieved through random hexamer-primed reverse transcription; using only the first strand for sequencing library construction allows information on the direction of transcription to be maintained. Alternatively, the RNA may be fragmented, and information on the template strand for transcription retained through orientation-specific, stepwise attachment of adapters. One method involves dephosphorylating the 50_{end so the first adapter can only be ligated to the 3}0_{end of the transcript; the complementary approach is to}

polyadenylate the 30_{end such that the first adapter is only found attached the 5}0_{end of the RNA. One technique not shown is the use of fragmented}

RNA as a template for random hexamer-primed reverse transcription, as performed by Oliver et al. A wider range of methods has been applied in obtaining similar information from eukaryotic transcriptomes (see text).

(3)

[12,18] (Figure 1). Other methods for maintaining directional information pioneered in studies of eukar-yotes include the use of template switching PCR [19], bisulfite-induced conversion of cytosine to uracil in tran-scripts before reverse transcription [20], addition of sequence tags into the primers used for reverse transcrip-tion [21] and incorporation of deoxyuridine into the second strand of cDNA, which can subsequently be degraded using uracil-N-glycosylase [22]. The import-ance of this information in characterizing ncRNA and observing antisense transcription is becoming increas-ingly evident.

Alternative applications of RNA-seq

As well as surveying the entire transcriptomes of bacterial strains, RNA-seq can be adapted to other experiments as well. For instance, techniques have been developed to specifically sequence the 50 region of RNA molecules, allowing the identification of putative transcriptional start

sites and helping to define operons and ncRNA [12,13] (Figure 1). In S. Typhimurium, coimmunoprecipitation of RNA molecules with Hfq, a chaperone that facilitates hybridization between ncRNA and mRNA, was used to enrich a sample for transcripts participating in such inter-actions [18], while in Vibrio cholerae, a very stringent depletion and size-selection process was used to specifi-cally sequence small ncRNA [23]. RNA-seq has also been applied to whole environments, leading to the develop-ment of techniques for sampling the metatranscriptomes of marine [24,25] and soil communities [26].

Analysis of sequence data

Illumina, 454 and SOLiD sequencing platforms have been used in bacterial RNA-seq studies [27–29]. Each offers a different compromise between the length of reads, which determines what proportion of the genome data can be uniquely mapped to, and depth of coverage,

Figure 2

Display of RNA-seq data. Data from a Salmonella bongori transcriptome, prepared as described in Ref. [9], displayed using Artemis. Using BamView, the total coverage is shown displayed as a plot (a), as raw reads aligned against the reference sequence (b) and as reads assigned separately to the two strands of the genome (c). A strand-specific coverage plot is also shown (d) and the genome annotation is displayed underneath.

(4)

which determines the dynamic range over which gene expression can be quantified.

However, above a certain threshold, obtaining longer reads results in a relatively small increase in the amount of the genome that can be studied, hence read depth will be the more important consideration in almost all cases. After sequencing, reads can be assembled using software either based on overlap graphs, such as EDENA [30], or de Bruijn graphs, for instance ABySS [31], ALLPATHS [32] or Velvet [33], which features a strand-specific assem-bly mode. Alternatively, the reads can be mapped onto a reference sequence. Some studies have used BLAST-based or nucmer-BLAST-based algorithms [34,35] to align sequence reads to the genome, but a number of programs have been developed specifically for mapping short read data [36–39], which often have the advantages of con-sidering base quality and read pair information when performing alignments. The results of mapping analyses have commonly been visualized as a graph of sequence read coverage across a genome, displayed using software such as the Integrated Genome Browser [40] or Artemis [41]. With the introduction of specialist tools such as BamView [42], raw sequence data can be visualized as well as coverage graphs, allowing a more intuitive un-derstanding of the transcriptional landscape (Figure 2).

RNA-seq, as with comparable methods, requires biological replicates for robust quantification of differential expres-sion. However, the greater cost of sequencing relative to microarray hybridization makes such repetition expensive, so statistical methods have been developed to overcome this by modeling the expected distributions of sequence reads mapping to a locus in different samples. DEGseq [43] uses a Poisson distribution to model the variation between datasets [44], whereas the approaches of edgeR [45] and DEseq [46] are based on the negative binomial distribution, which is suggested to be more appropriate for modeling the variation inherent between biological repli-cates [47].

Characteristics of bacterial transcriptomes

The results of bacterial RNA-seq studies have done much to refine our understanding of bacterial gene expression. One initial insight was that genome-wide CDS expression levels appear to be continuously distributed, with no obvious division between actively expressed genes and a ‘background’ transcription level [6,7]. By contrast, marine metatranscriptome studies have found that gene sequences that are most highly represented in cDNA samples are often rare, or absent, from the corresponding genomic DNA samples, suggesting some bacteria may be transcribing a set of uncharacterized genes at an unusually high level [24,25].

Annotation of CDSs has been significantly improved using RNA-seq data. Novel CDSs have been identified

in most studies [7–9,11,13,17], including that of M. pneumoniae, which has a genome just 816 kb in size [16]. Existing gene models have been refined, often involving correcting the choice of start codon, and associated with one another into operons, which can include the identi-fication of untranslated regions.

However, in both M. pneumoniae and H. pylori, annotation of transcriptional units was complicated by an unexpect-edly high level of flexibility in the structure of operons [12,16]. Evidence from both tiling microarray and RNA-seq data indicated different promoters appeared to be driving expression of the same genes under different conditions, leading to the division of genes into ‘suboper-ons’. The level of such alternative transcript forms in M. pneumoniae was estimated to be similar to that in some eukaryotes [16].

All these amendments to genome annotation are aided by having information on the 50ends of transcripts; in Sulfo-lobus solfataricus, mapping these ends was also used to detect putative transcript degradation products. Enrich-ment of such sites was found to inversely correlate with the half life of the RNA molecule, suggesting an endor-ibonucleolytic cleavage mechanism may be important in gene regulation [13].

Bacterial whole transcriptome studies have thus far had a very high success rate of ncRNA discovery. Such tran-scripts have even been identified and mapped to genomes from marine metatranscriptome data, where certain putative ncRNA showed distinct spatial distributions throughout the water column [48]. Validation using RT-PCR and Northern blots has been largely successful [6,12,16,18,23], and work has even begun on func-tionally characterizing these targets. In H. pylori, both in silico analysis and mutational inactivation suggested that one novel ncRNA uncovered by RNA-seq regulated a chemotaxis receptor as an antisense RNA [12], and a similar mechanism was posited for a novel ncRNA in V. cholerae, which was found to down regulate mannitol metabolism [23].

Directional RNA-seq data are particularly helpful in annotating ncRNA, as it allows reads to be assigned to a particular strand. Furthermore, it has allowed the detec-tion of large amounts of cis antisense ncRNA: regions of CDSs that are bidirectionally transcribed, and suggested to act to block expression of the encoded protein [12,13,16,17]. Such transcripts, identified from both whole genome RNA-seq and on the basis of transcrip-tional start site identification, have been detected and characterized before [49], but the genome-wide scale of their prevalence is only now being appreciated.

Overall, bacterial transcription is starting to appear to more closely mirror that of eukaryotes. Rather than

(5)

operons being fixed polycistronic transcriptional units, they may represent one way, of several, of transcribing a particular gene, with CDSs having a greater than expected level of independence from their neighbors. Additionally, antisense RNAs, acting either in cis or trans, may prove to be much more important than previously appreciated.

Limitations, problems and future directions

RNA-seq datasets have proved to be highly consistent, when comparing either technical or biological replicates, making them appropriate for expression studies [10,50]. However, there are technical issues still awaiting resol-ution, such as the highly variable nature of the coverage across genes and operons, thought to be the combined result of transcript secondary structure and biases intro-duced through random hexamer priming of reverse tran-scription and second strand synthesis [4,51]. This variability, which is generally reproducible between repli-cate experiments [11], introduces uncertainty into the quantification of RNA abundance. More even coverage has been achieved in eukaryotic datasets through redu-cing transcript secondary structure by using metal ion-catalyzed hydrolysis to fragment the RNA before reverse transcription [4], and it has been demonstrated that bacterial RNA can be fragmented in a similar manner [5,12,15,17]. There is also the issue of the PCR ampli-fication stages of sequence library construction for all three second generation sequencing platforms, which result in redundant sequence reads and bias the final dataset.

To circumvent such issues, techniques such as direct RNA sequencing [52] and FRT-seq [53], which sequence RNA directly without cDNA intermediates, have been developed. These promise to eventually replace current methods, but suffer from the disadvan-tage of requiring ribonuclease-free sequencing environ-ments, difficult to maintain in a high throughput sequencing facility. Efforts are also being made to reduce the quantity of starting material required for RNA-seq, with the aim of characterizing the transcriptomes of individual cells [54].

Conclusions

RNA-seq promises to gradually replace microarrays in most, if not all, genome-wide gene expression studies. Both technologies have their own limitations, but the opportunity to quantitatively study transcription to single nucleotide resolution makes RNA-seq increasingly attractive as sequencing become cheaper and easier. The use of protocols that sequence RNA in a strand-specific manner, and identify transcriptional start sites, will prove especially useful in the identification of ncRNA and defining the operons to which genes belong. Hence there is the potential for this technique to greatly refine our understanding of bacterial gene regulation.

Acknowledgements

We thank Adam Reid, Maria Fookes and Julian Parkhill for their comments on this manuscript. The authors are funded by the Wellcome Trust.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

of special interest of outstanding interest

1. Tettelin H, Riley D, Cattuto C, Medini D: Comparative

genomics: the bacterial pan-genome. Curr Opin Microbiol 2008, 11:472-477.

2. Cloonan N, Grimmond SM: Transcriptome content and dynamics at single-nucleotide resolution. Genome Biol 2008, 9:234.

3. Kane MD, Jatkoe TA, Stumpf CR, Lu J, Thomas JD, Madore SJ: Assessment of the sensitivity and specificity of

oligonucleotide (50mer) microarrays. Nucleic Acids Res 2000, 28:4552-4557.

4. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5:621-628.

5. Oliver HF, Orsi RH, Ponnala L, Keich U, Wang W, Sun Q, Cartinhour SW, Filiatrault MJ, Wiedmann M, Boor KJ: Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs. BMC Genomics 2009, 10:641.

6. Yoder-Himes DR, Chain PS, Zhu Y, Wurtzel O, Rubin EM, Tiedje JM, Sorek R: Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing.

Proc Natl Acad Sci U S A 2009, 106:3976-3981.

7. Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and complexity of a bacterial transcriptome. J Bacteriol 2009, 191:3203-3211.

8. Camarena L, Bruno V, Euskirchen G, Poggio S, Snyder M: Molecular mechanisms of ethanol-induced pathogenesis revealed by RNA-sequencing. PLoS Pathog 2010, 6:e1000834.

9. Mao C, Evans C, Jensen RV, Sobral BW: Identification of new genes in Sinorhizobium meliloti using the Genome Sequencer FLX system. BMC Microbiol 2008, 8:72.

10. Croucher NJ, Fookes MC, Perkins TT, Turner DJ, Marguerat SB, Keane T, Quail MA, He M, Assefa S, Bahler J et al.: A simple method for directional transcriptome sequencing using Illumina technology. Nucleic Acids Res 2009, 37:e148.

11.

Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ et al.: A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 2009, 5:e1000569. An interesting analysis of the transcription of the accessory genome and non-coding RNAs of Salmonella enterica Typhi.

12.

Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, Chabas S, Reiche K, Hackermuller J, Reinhardt R et al.: The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 2010, 464:250-255.

A highly informative study, particularly with regard to the unexpectedly high levels of non-coding RNA expression and antisense transcription.

13.

Wurtzel O, Sapra R, Chen F, Zhu Y, Simmons BA, Sorek R: A single-base resolution map of an archaeal transcriptome. Genome Res 2010, 20:133-141.

This description of transcription in an archeal organism makes an inter-esting comparison to those performed in eubacteria.

14. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR: Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 2008, 133:523-536.

15. Vivancos AP, Guell M, Dohm JC, Serrano L, Himmelbauer H: Strand-specific deep sequencing of the transcriptome. Genome Res 2010, 20:989-999.

(6)

16.

Guell M, van Noort V, Yus E, Chen WH, Leigh-Bell J,

Michalodimitrakis K, Yamada T, Arumugam M, Doerks T, Kuhner S et al.: Transcriptome complexity in a genome-reduced bacterium. Science 2009, 326:1268-1271.

This study is of particular interest because of the analysis of antisense RNA and alternative transcript forms observed across the genome.

17.

Filiatrault MJ, Stodghill PV, Bronstein PA, Moll S, Lindeberg M, Grills G, Schweitzer P, Wang W, Schroth GP, Luo S et al.: Transcriptome analysis of Pseudomonas syringae identifies new genes, noncoding RNAs, and antisense activity. J Bacteriol 2010, 192:2359-2372.

An interesting discussion of non-coding RNAs, antisense expression and transcriptional read-through of adjacent genes.

18.

Sittka A, Lucchini S, Papenfort K, Sharma CM, Rolle K, Binnewies TT, Hinton JC, Vogel J: Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS Genet 2008, 4:e1000163. A good example of the use of RNA-seq for a specific application, targeted at studying Hfq-mediated interactions between transcripts.

19. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G et al.: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 2008, 5:613-619.

20. He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW: The antisense transcriptomes of human cells. Science 2008, 322:1855-1857.

21. Armour CD, Castle JC, Chen R, Babak T, Loerch P, Jackson S, Shah JK, Dey J, Rohl CA, Johnson JM et al.: Digital

transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat Methods 2009, 6:647-649.

22. Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A: Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res 2009, 37:e123.

23. Liu JM, Livny J, Lawrence MS, Kimball MD, Waldor MK, Camilli A: Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing. Nucleic Acids Res 2009, 37:e46.

24. Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW, Delong EF: Microbial community gene expression in ocean surface waters. Proc Natl Acad Sci U S A 2008, 105:3805-3810.

25. Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I: Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One 2008, 3:e3042.

26. Urich T, Lanzen A, Qi J, Huson DH, Schleper C, Schuster SC: Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS One 2008, 3:e2527.

27. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456:53-59.

28. Ronaghi M, Uhlen M, Nyren P: A sequencing method based on real-time pyrophosphate. Science 1998, 281: 363, 365.

29. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM: Accurate multiplex polony sequencing of an evolved bacterial genome. Science 2005, 309:1728-1732.

30. Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 2008, 18:802-809.

31. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res 2009, 19:1117-1123.

32. Maccallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP et al.: ALLPATHS 2: small genomes assembled accurately and with

high continuity from short paired reads. Genome Biol 2009, 10:R103.

33. Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18:821-829.

34. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.

35. Delcher AL, Phillippy A, Carlton J, Salzberg SL: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 2002, 30:2478-2483.

36. Li H, Homer N: A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 2010doi: 10.1093/ bib/bbq015.

37. Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA databases. Genome Res 2001, 11:1725-1729.

38. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18:1851-1858.

39. Li H, Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010, 26:589-595.

40. Nicol JW, Helt GA, Blanchard SG Jr, Raja A, Loraine AE: The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 2009, 25:2730-2731.

41. Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA: Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics 2008, 24:2672-2676.

42. Carver T, Bohme U, Otto TD, Parkhill J, Berriman M: BamView: viewing mapped read alignment data in the context of the reference sequence. Bioinformatics 2010, 26:676-677.

43. Wang L, Feng Z, Wang X, Zhang X: DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 2010, 26:136-138.

44. Jiang H, Wong WH: Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 2009, 25:1026-1032.

45. Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26:139-140.

46. Anders S, Huber W: Differential expression analysis for sequence count data. Nat Precedings 2010doi: 10.1038/ npre.2010.4282.1.

47. Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 2010, 11:R25.

48. Shi Y, Tyson GW, DeLong EF: Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column. Nature 2009, 459:266-269.

49. Brantl S: Regulatory mechanisms employed by cis-encoded antisense RNAs. Curr Opin Microbiol 2007, 10:102-109.

50. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008, 18:1509-1517.

51. Hansen KD, Brenner SE, Dudoit S: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 2010, 38(12):e131.

52. Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, Milos PM: Direct RNA sequencing. Nature 2009, 461:814-818.

53. Mamanova L, Andrews RM, James KD, Sheridan EM, Ellis PD, Langford CF, Ost TW, Collins JE, Turner DJ: FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nat Methods 2010, 7:130-132.

54. Tang F, Barbacioru C, Wang Y, Nordman E, Li C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA: mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 2009, 6(5):377-382.