An introduction to the RNA-seq protocol - Informative sequence based models for fragment distri

The use of high throughput sequencing techniques for analysing RNA data has provided another way in which sequencing can be used to probe the mechanisms within the cell that enable the genetic code to be involved in the control of almost every aspect of the functioning of a cell.

There are a variety of different types of RNA in the cell, and different variants of RNA sequencing protocols have been developed that are tailored to each type of RNA species. For example, the growing interest in the role of microRNAs[80], has resulted in the increased use

of these microRNAs [31]. The lengths of these RNAs mean that the fragmentation stage that is necessary in DNA sequencing is not required, as each microRNA is short enough to be sequenced in its entirety.

However, the first and still the dominant RNA based protocol is RNA-seq, where the RNA is extracted from the cell, purified and then fragmented so that, by sequencing the ends of the fragments that are obtained, a picture can be constructed of the nucleotide sequence of the RNA molecules that are found in the cell and their relative abundance. The primary purpose of this procedure is to investigate messenger RNA (mRNA) in the cell, where it can be used to investigate the expression levels of genes, and to investigate the different transcript variants [73, 98]. It is then able to be used to refine the genome annotation to provide more detail of the differential transcript expression.

RNA sequencing protocols contain many of the elements from the ChIP-seq like protocols, but there are a number of very specific and important differences. In particular, high throughput sequencing protocols have been designed around the processing and sequencing of DNA, and many of the stages of the procedure such as the amplification using PCR are only currently possible using DNA. Consequently a necessary step in the sequencing of RNA fragments is the conversion of RNA into the complementary DNA sequence using the reverse transcriptase enzyme so that it can then be amplified and sequenced [98]. Reverse transcriptase can only initiate the transcription at a location in the RNA where the complementary strand of DNA is already present which acts as a primer for the process.

The problem that has to be solved when transcribing the RNA fragments to DNA is that the sequences of the fragments are all very different, making it difficult to design a primer or set of primers which will bind to the RNA and act as a primer and allow all of the fragments to be converted to DNA in an unbiased way.

The solution that has been widely adopted is to create a set of DNA primers that are six nucleotides long and are a random mix of the 4096 different combinations of six nucleotides that are present in such hexamers [32]1. A six nucleotide length primer is sufficiently long to enable reverse transcriptase to bind and start the reverse transcription process, and there will be a hexamer present in the mix that can bind to any position in an RNA fragment.

1_{The primary subject of this journal article was the hypomethylation of cancer genes. However this article now}

has almost 1000 citations which are almost entirely to the random primer method that was developed for the experiment and which is incidental to the main purpose of the article.

Random DNA Hexamer a) T T A A T G G G C A A A RNA 5' A C G G G A A U U A C G G A A U U A C C C G U U U C G 3' b) C C T T A A T G A A T G G G C A A A 5' RNA 5' A C G G G A A U U A C G G A A U U A C C C G U U U C G 3' c) DNA 3' T G C C C T T A A T G C C T T A A T G G G C A A A RNA 5' A C G G G A A U U A C G G A A U U A C C C G U U U C G 3' d) DNA 3' T G C C C T T A A T G C C T T A A T G G G C A A A 5' C G G G A A

Random DNA Hexamer e) DNA

3' T G C C C T T A A T G C C T T A A T G G G C A A A 5' 5' C G G G A A T T

f) Double stranded DNA

3' T G C C C T T A A T G C C T T A A T G G G C A A A 5' 5' C G G G A A T T A C C C G T T T A C C C G T T T 3' g) Overhangs removed 3' G C C C T T A A T G C C T T A A T G G G C A A A 5' 5' C G G G A A T T A C C C G T T T A C C C G T T T 3' DNA Polymerase Reverse transcriptase

Figure 1-7 RNA fragment is converted to a slightly shorter double stranded DNA fragment

during reverse transcription. a) Random DNA hexamers binds to RNA. b) Reverse transcriptase

completes a complementary DNA strand c) Completed first DNA strand d) RNA removed with RNase and random hexamer binds to DNA. e) DNA polymerase completes second strand. f) Second strand completed g) Overhangs removed with T4 DNApolymerase and Klenow DNA polymerase. The DNA fragment is slightly shorter than the original RNA, the degree of shortening being determined by the positions where the hexamers bind.

The conversion is a two stage process. In the first stage, complementary DNA is added to the single stranded RNA fragments in order to create a double stranded polynucleotide that is a pairing of DNA and RNA. These are then separated, and the single stranded DNA converted to double stranded DNA using DNA polymerase (Figure 1-7).

Both of these processes require a DNA primer to be bound to the DNA or RNA to allow the enzyme to bind and reverse transcription or polymerisation to proceed. In both cases the primers will bind at multiple locations, and the transcription will proceed at each location until the enzyme meets the location of the next primer. At this point the enzyme can complete the second strand right up to the primer. In both cases however, transcription can only start from the position where the first primer bound, and this may not be right at the start of the template RNA or DNA fragment. A consequence of this is that the ends of the final DNA fragment that is sequenced will not correspond to the ends of the fragment that was originally formed when the RNA was fragmented.

In document Informative sequence based models for fragment distributions in ChIP seq, RNA seq and ChIP chip data (Page 36-39)