• No results found

5 DISCUSSION

5.1 Optimization of small RNA deep sequencing

Deep sequencing of small RNAs is a very powerful method for the discovery of new RNAs as well as the quantification of small RNA expression profiles. The cost of next-generation sequencing is still considerable, therefore efficient strategies for sequencing of pooled libraries are essential. The continuous technical advances are steadily increasing the number of reads obtained in an experiment, thus sequencing of multiplexed libraries can now yield sufficient sequencing depth for most applications. The reads can afterwards be sorted bioinformatically through the introduction of specific sequence tags called bar codes. These bar code sequences can be introduced either during the ligation steps or PCR amplification.

Our first multiplex libraries prepared to examine small RNAs during the cell cycle were constructed by introduction of bar codes within the linker appended to the 5’ end of small RNAs. We observed huge biases both according to the different adapter oligos introduced in the 3’ ligation as well as biases due to 5' bar codes (Table 4.1, Figure 4.3). The abundance of the miRNA bantam and its length profile agreed when comparing the same bar codes in different cell cycle phases but not vice versa (Figure 4.3). Consistent with our results, miRNA profiles with the same 5’ ligation bar codes in libraries from two different biological conditions (normal and diseased mouse heart) presented more agreement than miRNA profiles with different bar codes in the same tissue (Alon, Vigneault et al. 2011). Furthermore, after comparison of libraries with different 3’ ligation adapters but the same 5’ ligation barcode, we concluded that the miR-184 bias in the G1 cell cycle phase is caused by the 3’ ligation adapter. These results together indicate that there is sequence preference or possibly dependence on the ligation of adapters to small RNAs. Small RNA libraries from the 293T and mES cells, generated by using a pool of various 5’ and 3’ adapters, demonstrated that each miRNA seems to have a favored adapter pair confirming the previous hypothesis (Jayaprakash, Jabado et al. 2011).

Our first sequencing round consisted of 4 pooled libraries and resulted in an overrepresentation of bantam, miR-184 and miR-8 to different extents (Table 4.1A). We tried to reduce these artifacts by lowering the number of amplification cycles during PCR but did not succeed. Hence, PCR is not responsible for the generation of these biases. Supporting our results, no significant improvement was observed in total RNA libraries from 293T cells after reduction of the number of PCR cycles from 25 to 18 (Jayaprakash, Jabado et al. 2011).

In addition, neither the reverse transcription nor the sequencing technology are generating prominent biases as demonstrated by others, finally ending with the conclusion that the biases in the read distribution are caused primarily by the T4 RNA ligases (Hafner, Renwick et al. 2011). The enzymatic reactions are

sensitive to sequence and structure of their substrates by a varying degree. Indeed, it was shown that RNA secondary structure affects the efficiency of both 3’ adapter and 5’ adapter ligation steps (Hafner, Renwick et al. 2011; Sorefan, Pais et al. 2012). Thus, distinct RNA structures differ in their reactivity during adapter ligation, resulting in a significant impact on read frequencies. Small RNAs in a stable, nonreactive secondary structure are at risk of exclusion from the libraries. Both families of RNA end-joining enzymes differently impact the ligation bias as Rnl2 favor ss nucleotides downstream of the ligation site and ds nucleotides upstream of the ligation site while Rnl1 has a strong preference for ss ligation site (Sorefan, Pais et al. 2012). The thermodynamic stability of secondary structure also depends on nucleic acid backbone modifications. To reduce the effect of secondary structure to some extent, we used a modified DNA 3’ adapter. It was shown that chemical pre-adenylation of 5’-phosphorylated donor molecules extends the range of substrates amenable to RNA ligation (England, Gumport et al. 1977). Further modifications of small RNAs, for example a 2’-O-methylation of the 3’-terminal nucleotide, was shown to negatively influence the ligation efficiency and reduce their representation in the library (Munafo and Robb 2010). Recently, a pooled adapter approach was suggested to overcome the limitations of the RNA ligase bias by using various 5’ and 3’ adapters with additions to the ligating 3’ end of the 5’ adapter and the 5’ end of the 3’ adapter (Jayaprakash, Jabado et al. 2011).

In the second part of the thesis, the bar codes were introduced during the PCR step. This greatly reduced the variability between different bar codes but certain artifacts remained. We observed a strikingly large proportion of reads originating from only four transposable elements (roo, 297, TNFB and blood). This poses a question whether their abundance reflects the biological situation (Table 4.4). The roo transposon generates the most abundant ovarian piRNAs (Li, Vagin et al. 2009). We observed that roo generates highly abundant small RNAs in soma but their preference for sense orientation indicated that degraded roo mRNA might contribute to this. We concluded that blood represents a technical artifact since one specific sequence that mapped to a defined position existed exclusively in the homozygous r2d2 mutant library. Therefore, these four mobile elements were filtered out and the remainder of the results was further analyzed. Taken together, the overrepresentation of specific small RNAs was significantly lower after introduction of bar codes during the PCR than during 5’ ligation. One other study using PCR-based bar code introduction almost completely suppressed the bar code bias (Alon, Vigneault et al. 2011). In conclusion, the introduction of bar codes during PCR represents a method for more reliable detection of differentially expressed small RNAs.