• No results found

1.4 Next Generation Sequencing

1.4.1. NGS technologies

Pyrosequencing: Roche 454

Roche 454 was he first commercially successful NGS system and is based on pyrosequencing technology, which relies on the detection of the pyrophosphate released during nucleotide incorporation. The DNA library is prepared with specific adaptors and then denatured into a single strand and captured by amplification beads, followed by emulsion PCR156. Then on a picotiter plate, one dNTP will complement the bases of the

template strand and release pyrophosphate equaling the amount of incorporated nucleotide156. The ATP transformed from pyrophosphate drives the luciferin into

oxyluciferin and generates visible light, while the unmatched bases are degraded. Then another dNTP is added into the reaction system and the pyrosequencing reaction is repeated156. The biggest advantages of the Roche system are its speed, its high read length

and its possible automatization. However, the high cost of reagents remains a challenge, and it has a relatively high error rate with polybases longer than 6 bp156.

Sequencing-by-ligation: SOLiD

The Applied Biosystems’ SOLiD (Sequencing by Oligo Ligation Detection) sequencer adopts a technology based on ligation sequencing. Here, the libraries are sequenced on a flowcell using an 8-base ligation probe, which contains a ligation site (1st base), a cleavage

site (5th base) and 4 different fluorescent dyes (linked to the last base)156. The fluorescent

signal is recorded during the probes’ complementary ligation to the template strand and vanished by the cleavage of the probes’ last 3 bases. After 5 rounds of sequencing, the sequence of the fragment can be deduced using ladder primer sets156. The main

advantage of this method is its high accuracy, while its principal drawbacks are the short read length and its incapability of sequencing de novo156. A complete run can be finished

within 7 days, and automation can be used in library preparations. Its applications include WGS, targeted resequencing, transcriptome research and epigenome156.

Sequencing by synthesis: Illumina

Illumina systems adopt the technology of sequencing by synthesis. The most extended sequencer of this company is called HiSeq, which is comparable to the aforementioned systems. It addition, there are other platforms with different scales, including NextSeq and MiSeq, a compact sequencer that is small in size with fast turnover rates but limited data throughput. All of Illumina’s instruments are based on the same principles, and given that the NGS platform used in this thesis was the HiSeq 2000, this technology is shown in Figure 1.7 and will be explained more in depth.

In the first place, the DNA library (which involves generating a collection of DNA fragments for sequencing) is typically prepared by fragmenting a genomic DNA sample

and ligating specialized adapters to both fragment ends. An alternative is called tagmentation, which combines the fragmentation and ligation reactions into a single step. Adapter-ligated fragments are then PCR amplified and gel purified. The library preparation will depend on the NGS application157. The DNA library prepared with fixed

adaptors is then denatured to single strands and grafted to the flowcell, followed by bridge amplification to form clusters that contain clonal DNA fragments156. Before

sequencing, the library splices into single strands with the help of a linearization enzyme, and then four kinds of nucleotides containing different cleavable fluorescent dyes and a removable blocking group complement the template one base at a time. Finally, the emitted signals are captured by a charge-coupled device156.Compared with Roche 454

and SOLiD, HiSeq 2000 has the lowest reagent cost and features the biggest output, while the SOLiD system has the highest accuracy, and the Roche 454 system has the longest read length156.

Regarding the data analysis, HiSeq control system and real-time analyzer calculate the number and position of clusters based on their first 20 bases, based on which the output and quality of each sequencing is decided156. HiSeq 2000 uses two lasers and four

filters to detect the four types of nucleotide, whose emission spectra have cross-talk resulting in images that are not independent and the quality of sequencing affected by the distribution of bases156. The standard sequencing output files of the HiSeq 2000

consist of *bcl files, which contain the base calls and quality scores in each cycle. These files are then converted into *_qseq.txt files by the BCL Converter156.

Ion semiconductor sequencing: Ion Torrent

Ion Torrent has two platforms of different capacity based on semiconductor sequencing technology: the compact Ion Personal Genome Machine (PGM), and the larger Ion Proton. In these sequencers, when a nucleotide is incorporated into the DNA molecules by the polymerase, a proton is released. The instrument recognizes whether the nucleotide is added or not by detecting the change in pH156. The chip is flooded with

one nucleotide after another, detecting no voltage if it is not the correct nucleotide. On the other hand, if 2 nucleotides are added, the voltage detected would be double156. PGM

was the first commercial sequencing machine that did not require fluorescence and camera scanning, resulting in higher speed, lower cost, and smaller instrument size156. Ion

Torrent has a stable quality along sequencing reads and a good performance on mismatch accuracies, but rather a bias in detection of indels156.

Figure 1.7. Sequencing-by-synthesis NGS technology used by the Illumina platforms. In the first place, the DNA library is prepared by fragmentation and ligation of specific adapters (1). Then, the DNA fragments are attached to the flowcell through those adapters (2) and the clusters are generated after several consecutive bridge amplifications (3 and 4). Finally, single strands are sequenced by synthesis of the complementary strand using fluorescent dNTPs (5). The fluorescence is detected after each round to obtain the sequence of each cluster (6), which is later aligned against a reference genome (7). Adapted from seqanswers.com.

Third generation sequencers:

Third-generation sequencing has two main characteristics. First, PCR is not needed before sequencing, which shortens DNA preparation time156. Second, the signal is

captured in real time, which means that the signal – no matter whether it is fluorescent (Pacbio) or electric current (Nanopore) – is monitored during the enzymatic reaction of adding nucleotides to the complementary strand156.