RNA isolation, sequencing and read processing

Chapter 6: Genome sequencing of beer-spoiling organism Lactobacillus brevis BSO 464 and transcriptomic analysis for growth in degassed and gassed beer transcriptomic analysis for growth in degassed and gassed beer

6.4. RESULTS and DISSCUSSION

6.4.3. RNA isolation, sequencing and read processing

The isolation of Lb464 mRNA from experimental samples was deemed successful by various assessments (i.e., Experion, Qubit quantification and qPCR assessment). Further, the library preparation and Illumina paired-end sequencing of mRNA performed at NRC PBI was also successful, given that a total of 170,794,473 reads were obtained from one lane with a total of 12 samples multiplexed, when generally 150 million reads are expected (22). Upon quality processing of these reads to discard reads below 20 nt and/or with a Phred score < 30, and subsequent mapping of these reads to the Lb464 genome via Bowtie 2, there was a high percentage (91 to 99%) of total reads which mapped to the genome (Table 6.2). Of the aligned,

Table 6.2. Bowtie 2 alignment of RNA sequencing reads for Lb464

a Samples denoted as “L-“ (Lb464), “deg” and “gas” are degassed beer and gassed beer, respectively, mM is mMRS pH 5.5, and “I”

and “II” denote replicates.

b Values obtained from raw sequencing files with no prior quality processing or trimming.

c Values obtained following processing of raw sequencing reads to remove low quality reads via FastQC.

d Percentage of all reads aligned to Lb464 genome using Bowtie-2 alignment.

e Percentage of aligned paired-end reads corresponding to rRNA genes.

f Percentage of non-rRNA aligned paired-end reads corresponding to annotated CDS regions.

g Total number of high-quality, single read fragments aligning to CDS regions.

Sample ^a Total Paired Reads

% Aligned Reads^d

% rRNA Reads^e

% Annotated CDS^f

# Single Reads mapping to CDS ^g

Unfiltered^b QC ^c Unfiltered QC Unfiltered QC Unfiltered QC QC

L-deg-I 14,536,859 14,418,996 98.2 98.7 15.8 15.9 74.9 75.4 21,748,136 L-deg-II 13,954,174 13,954,174 99.1 99.1 18.0 18.2 74.5 75.1 20,985,400

L-gas-I 13,775,070 13,658,680 98.4 99.0 0.3 0.3 88.8 89.4 24,425,970

L-gas-II 14,354,212 14,240,958 97.9 98.5 0.8 0.8 85.9 90.1 24,620,116

L-mM-I 14,482,096 13,652,921 89.4 94.9 76.4 81.1 11.5 12.2 3,324,704

L-mM-II 14,465,266 14,027,270 88.9 91.6 75.0 77.3 12.2 12.6 3,537,528

172

173

quality-controlled (QC) reads, between 0.3 to 81% of these reads mapped to rRNA regions across samples. These levels of rRNA reads, even after physical rRNA removal is not unexpected, although the wide range in values across experimental samples is surprising given that all samples were processed identically (21, 22). This notably large disparity in rRNA removal efficiency of samples is likely a function of the fact that more rapidly growing cells, like Lb464 cells in L-mM controls versus in beer require more ribosomes to cope with demand for increased protein synthesis. Further, efficieny of rRNA removal may be affected by the total RNA extraction efficiency of these samples. For instance, the extraction of quality RNA from

“L-gas” samples (Table 6.2) was extremely difficult given the physiological adaptation of the Lb464 cells to the harsh beer environment, with fewer cells grown in this medium (10; Chapter 2). In fact, following rRNA removal, the absolute minimum of mRNA allowable for library preparation remained for “L-gas” samples. In contrast, RNA extraction from Lb464 cells grown in the control MRS medium was comparatively easy and yielded more total RNA. Such higher amounts of total RNA for some samples means there is a higher number of rRNA molecules present and input into the rRNA removal procedure, which results in a decrease in the efficiency with which the rRNA can be removed (Table 6.2).

rRNA accounts for 80 to 95% of total bacterial RNA, thus representing a large percentage of available reads, even with rRNA-removal steps performed during the mRNA preparation (21, 22). Any rRNA reads that remain can skew analysis away from recognizing significantly expressed small or rare RNA transcripts. Therefore, rRNA and tRNA genes were removed from the Lb464 annotation files prior to counting the number of reads that mapped to CDS features and undertaking downstream differential expression analysis (36).

The disparity and relatively low mapping rate of remaining non-rRNA reads to annotated CDS regions across samples initially appears to be concerning. For each sample, there is a large proportion of reads that align to the genome, but in “no feature” regions – i.e., indicating they map to intergenic regions (see section 6.4.4). This indicates potential genomic DNA (gDNA) contamination of these rRNA samples, which is highly surprising given that assessment of samples prior to sequencing by both Qubit spectrometry, and qPCR assessment indicated negligible levels of DNA present. BLAST analysis of these reads revealed that they do belong to

174

Lb464 and given the high level of initial alignment to the genome confirms that this contamination is not from an outside source, but that either DNase treatment was not as efficient as quantitative readings indicated or more likely that the great sequencing depth of theses samples included detection of very low level genomic DNA (1, 21). Once more, the affect of high total RNA yield is observed in relation to increased rRNA and potential gDNA contamination levels, with mMRS samples having higher proportions of gDNA-mapping reads and thus a lower proportion of read belonging to CDS regions.

When “no feature” reads are added to the number of reads mapping to CDS regions, there is still a proportion of QC reads that do not align to the genome (between 1 and 4% of reads). This small proportion of non-aligning reads are similar to previous studies, where roughly 5% of reads did not align to the genome (37), and it is likely that this proportion of reads are artificial sequencing chimeras (29).

Regardless of inefficient rRNA removal and presence of gDNA, the number of quality, non-rRNA read pairs that map to Lb464 CDS loci for all samples is still sufficient for detection, as previous studies have found that between 5 and 10 million non-rRNA fragments allow detection of all but a few of the most low expressed genes in diverse bacteria growing under a variety of conditions (21). This same study also found that the use of biological replicates provides for differential expression analysis of genes with high statistical significance, even when the number of reads per sample is reduced to 2 to 3 million, as was the case for the mMRS samples (“L-mM;

Table 6.2) (21).

In document Plasmid analysis, comparative genomics and transcriptomics of beer-spoilage lactic acid bacteria emphasizing the role of dissolved carbon dioxide and traditional beer-spoilage markers (Page 187-190)