• No results found

Introduction to next-generation sequencing data

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to next-generation sequencing data"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to next-generation sequencing data

Centre for Experimental Medicine Queens University Belfast

http://www.qub.ac.uk/research-centres/CEM/

David Simpson

(2)

Outline

• History of DNA sequencing

• NGS or ‘massively parallel’ sequencing

– How it works: Illumina ‘sequencing by synthesis’

– Library preparation

– Clonal amplification – future single molecule

• Characteristics of the data: Quality control

– Base calling and quality (FastQ format)

• Phasing and homopolymers

– Trimming

– Implications of PCR

• Duplicates and bias

– Contamination

(3)

Sequencing time-line

Andy Vierstraete

2014 : Illumina HiSeq X10 - $1,000 Genome?

(4)

Conventional DNA sequencing

• Dideoxy terminator

– Sanger method

• Fluorescent dyes

• Gel electrophoresis

• 1 lane = 1 sequence

• Capillary electrophoresis

http://www.bio.davidson.edu/Courses/Molbio/MolStudents/

spring2003/Obenrader/sanger_method_page.htm

"G" tube:

All four dNTP's, ddGTP and DNA polymerase

"A" tube:

All four dNTP's, ddATP and DNA polymerase

"T" tube:

All four dNTP's, ddTTP and DNA polymerase

"C" tube:

All four dNTP's, ddCTP and DNA polymerase

Electropherogram

Primer

(5)

Next Generation Sequencing (NGS)

• Process millions of sequencing reads in parallel

• Common concept is the analysis of millions of sequences associated with a solid surface (or in wells)

– Contrast with traditional gel electrophoresis

• Range of platforms available

– Illustrate with Illumina

– Ion Torrent (Life Technologies/Thermo Fisher)

(6)

NGS workflow

Library preparation

Template preparation:

Single molecule

‘clonal’ amplification

Sequencing

RNA DNA

Fragmentation/size selection Addition of adaptors

Bridge PCR on a slide

(cluster generation)

Emulsion PCR

Reversible terminator

(Illumina)

Semiconductor (Ion Torrent)

Single molecule

(Nanopore)

(7)

Overview of DNA-Seq and RNA-Seq

AAAAAAA

Extract RNA cDNA

library

Exon 1 Exon 2

Reference sequence

Align to reference sequence

Fragmented DNA Library

Massively parallel sequencing

>10 million reads

TACATTTGGGAAAAGTAAATTTGCTGAAAATAATCCCGGT AAGAAAGAAACACTTTTCATGTAATTAGCTTTTTTACATC AAACTTCAGAACCCAAAGTCATTGAGAATATTAGGGATCA CAGAACCACATGAGTCAGAATCATCAGAATATCCCACCAA AGGAGAAGGAAGGAGCAGAGGATTCAAAAGGAAATGGAAT GATGAATATGAAGAAATGTCAGAAATGAAAGAAGGGAAAG GAAATTGAATTCGATGAAATAAATGATACTTGCTTATCTG

...

...

Genomic DNA

(8)

Library preparation

http://res.illumina.com/documents/products/research_reviews/sequencing-methods-review.pdf

(9)

Illumina: Cluster generation

Clonal amplification achieved by generating clusters on the surface of a flow cell (slide)

See SBS technology video at www.illumina.com/

(10)

Massively parallel sequencing

Glowing dots on a glass slide mark cloned DNA being sequenced

(11)

Reading the sequence

• Wash over all 4 nucleotides each with a fluorescent dye

• Only one complementary nucleotide incorporated

(12)

Illumina: Sequencing by synthesis:

• Prepare libraries with different index sequences

• Pool and sequence together – ‘multiplexing’

(13)

Platforms

• Illumina has several instruments

– Desktop-sized MiSeq that can complete smaller runs in under a day – NextSeq 500

– High throughput HiSeq 2500

• Ion Torrent ‘semi-conductor’ sequencing (Life Technologies)

– Fast, cheap entry level, output increasing rapidly – Personal Genome Machine

– Proton

HiSeq 2500 PGM 314 chip Proton P1 chip Total output 600/120 Gb up to 100Mb 10Gb

Run time 11 days/27 hrs 2-4 hrs 2-4 hrs Output/day 55 Gb up to 200 Mb ~20 Gb Read length 2 x 100/150bp up to 400b up to 200bp

# of single

reads 3/0.6 Billion up to 0.6M up to 82 Million

(14)

Ion torrent ‘Semiconductor sequencing’

• No optics required!

Beads with template attached (prepared by emulsion PCR)

Incorporation of a nucleotide changes pH

Detected on a semiconductor sequencing chip

(15)

Signal processing to optimise base calling

• Signal Decay

• Phase correction

– phasing is the rate at which single molecules within a cluster loose sync with each other.

– Incomplete Extension

• Limit read length

Further discussion

Ion torrent: http://biolektures.wordpress.com/2011/08/10/fundamentals-of-base-calling-part-1/

Illumina: http://pathogenomics.bham.ac.uk/blog/2013/11/diagnosing-problems-with-phasing-and-pre-phasing-on-illumina-platforms/

(16)

Read length and quality

• Per base sequence quality

• Phred quality score: Q an integer mapping of p, the probability that the corresponding base call is incorrect

Damien Gregory: http://www.somewhereville.com/?p=1508

(17)

FASTQ format

Nucleotide sequence and associated quality score (represented by ASCI characters)

@PSI179204_0007:4:1:1025:10482#0/1

GAGCAAAATTGTAGAAGAATTCAGGATCTCGTATGCCGTC +PSI179204_0007:4:1:1025:10482#0/1

C-:AC:?5:C-AAA-5>-,A5A>5:A?-DD?5A::>;><B

Flowcell lane & tile

‘X'-and ‘Y’co- ordinates of the cluster

Index of

multiplex sample

Illumina:

P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer and P. M. Rice, “The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.” Nucleic Acids Research, 2010, Vol. 38, No. 6, 1767–1771 doi:10.1093/nar/gkp1137

(18)

Homopolymers (runs of the same nucleotide)

• Illumina: Flow all 4 nucleotides, incorporate single one

• Ion torrent: Sequential flows of individual unmodified nucleotides

Ionogram (Ion torrent)

EBI

(19)

Trimming

• Quality

– Ends

• Adaptors

– Clip adaptors (fastx clipper)

Insert Adaptor B

Adaptor A Adaptor A Adaptor B

FASTX-toolkit by Assaf Gordon

(20)

Implications of PCR

• Duplicate reads

– Erroneous quantification or variant detection

• Uneven coverage

– Additional sequencing required to achieve minimal coverage

(21)

Single nucleotide resolution

• High specificity

• Show ZEB1 mutation

Mutation:

c.1920G>T p.Gln640His

ZEB1 exon 7

CAG = Gln

CAT = His

(22)

Contamination

• Sample mix ups (!) - indexing

• Carry-over from previous run

• FastQ screen

(23)

Single molecule sequencing: Nanopore

https://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing

• Single-stranded DNA polymer is passed through a protein nanopore

• Individual DNA bases on the strand are identified in sequence as the DNA

molecule passes through

Oxford Nanopore

(24)

Summary

• NGS works by sequencing millions of reads in parallel

• Library preparation

– Add adaptors to DNA of interest

– Requires clonal amplification (template preparation)

• Sequence data presented in FastQ format

– Quality control critical

• Errors inherent in the technology, eg. Phasing and homopolymers, PCR

• Trimming

– Contamination

(25)

‘To analyze NGS data effectively you

need to understand the technology’

References

Related documents

Furthermore, according to an internal audit of a Chinese bank, processing every single application for second-hand mortgage loans by a local branch office would involve at least

Similarly, a blog that has a rating of one consists of uplifting photos, texts, videos, etc.. There is extremely little or no

The shares in the homogeneous goods market allow us—along with our demand elasticity estimate from the first stage of the hypothetical monopolist test—to determine price elevation..

WEFS is proud to be an independent public television station, offering a unique broadcast service to our Central Florida

Although the data analyses of the study itself yielded some interesting findings, there are several limitations to the study. First, the data for the present study was collected by

 Library Committee: Concept, Importance, Function, Types of committees, rules and regulations Unit – 5 Human Resource Management..  HRM: concept, need and purpose, Planning,

El presente trabajo tiene como finalidad informar sobre el eje fluvial longitudinal que conforman los ríos Paraguay y Paraná, el Acuerdo de Transporte Fluvial y el programa de

The second module focuses on the main political currents and political parties in Turkey, examining Turkish nationalism, conservatism-Islamism, left-social democracy