NGS: The Basics
Sample Prep & Sequencing for Metagenomics
Stefan J. Green, PhD, Sequencing Core, Research Resources Center University of Illinois at Chicago – [email protected]
UIC SQC
© ESCMID
eLibrary
by author
Disclosure of speaker’s interests
(Potential) conflict of interest Potentially relevant company relationships in
connection with event 1
Sponsorship or research funding2
Fee or other (financial) payment3
Shareholder4
Other relationship, i.e. …5
None
None
None
Please note that the use of images of specific products in this presentation does NOT represent an endorsement of that product. There are many competing instruments and reagents of high quality that are available.
© ESCMID
eLibrary
by author
Sample Prep & Sequencing for Metagenomics
© ESCMID
eLibrary
by author
Sample Prep & Sequencing for Metagenomics
• DNA-based protocols
• Shotgun metagenome sequencing
• RNA-based protocols
• Whole transcriptome sequencing
© ESCMID
eLibrary
by author
The process…
• Define question
• Acquire samples
• Preserve samples for future, if possible
• Label samples properly; use barcoded tubes
• Extract nucleic acids and perform QC
• Prepare NGS libraries and perform QC
• Sequence and perform QC
• Data analysis © ESCMID
eLibrary
by author
Nucleic Acid Extraction
• Adapt protocol to tissue type, target organism, nucleic acid type, and
sequencing type
• High molecular weight DNA is needed for long-read applications; enzymatic lysis
• High-energy lysis is best for microbes, but not for long-read sequencing
• Selective lysis to limit host DNA
• Automation is available © ESCMID
eLibrary
by author
How much DNA/RNA do I need?
• Application dependent
• Short-read technologies need much less than long-read applications
• More nucleic acid provides a wider range of possible protocols
• Realistically:
• Short-read Illumina sequencing: minimum DNA – 1 ng
• Long-read Nanopore and PacBio sequencing: minimum DNA – 250 ng to 1 microgram
• For RNA, single-cell RNAseq is possible (sub-ng amounts); but poly(A) vs rRNA depletion have different requirements
• Molecular weight critical for long-read applications
© ESCMID eLibrary
by author
Library Preparation
• Preparing nucleic acids for sequencing is called library preparation.
• Sequencing reactions are initiated from identical regions of DNA
• Identical regions are artificially introduced using reverse transcription, PCR, ligation, or transposons.
© ESCMID
eLibrary
by author
• DNA is fragmented into small pieces (short-read only)
• Multiple options available
• Fragments are enzymatically “cleaned” up
• Sequencing adapters are ligated to fragments
• Adapters are custom sequences, unique to each sequencing platform
• Adapters serve multiple purposes
• Identical initiation site for primer binding
• Aid in clonal amplification
• Adapters include a sample-specific sequence known as a barcode
• Adapters MAY also include a unique molecule identifier (UMI)
• Final size selection may be performed
• Library undergoes QC and quantification
How is NGS achieved? (DNA)
© ESCMID
eLibrary
by author
Shotgun sequencing approach for genome sequencing
5’
Typically, >10 kb genomic DNA fragments
5’
End Repair + A-tailing (sometimes)
A A
5’
5’
A A
5’
5’
A A
5’
5’ A
A
5’
5’
A A
5’
5’
A A
5’
5’
A A
5’
5’
Shearing (acoustic or enzymatic)
Sequencing Adapter Ligation (NGS-platform-specific)
Adapter 2 BC
5’
5’
Adapter 1 Adapter 1 BC
Adapter 2 Adapter 2
BC
5’
5’
Adapter 1
BC Adapter 1
Adapter 2
Adapter 2 BC
5’
5’
Adapter 1 Adapter 1 BC
Adapter 2
Adapter 2 BC
5’
5’
Adapter 1
BC Adapter 1
Adapter 2
Adapter 2 BC
5’
5’
Adapter 1
BC Adapter 1
Adapter 2
Adapter 2 BC
5’
5’
Adapter 1 Adapter 1 BC
Adapter 2
© ESCMID
eLibrary
by author
Shearing of gDNA
© ESCMID
eLibrary
by author
Images of nucleic acids
Genomic DNA Extracts
© ESCMID
eLibrary
by author
Workflow
Neiman et al. "Library preparation and multiplex capture for massive parallel sequencing applications made efficient and easy." PLoS One 7.11 (2012): e48616.
T4 DNA polymerase
Taq DNA polymerase T4 Polynucleotide kinase
T4 DNA ligase
© ESCMID
eLibrary
by author
DNA Repair
End repair is needed to prepare DNA for ligation by ensuring that each molecule is free of overhangs, and contains 5′phosphate and 3′ hydroxyl groups.
Step 1: Add 5 microliters of End Repair Mix to 10 microliters of sample. Mix, spin and place on ice.
Step 2: Place tubes in thermocycler – 25˚C for 30 min; 70˚C for 10 min; hold at 10˚C
Step 3: Spin tubes and place on ice
© ESCMID
eLibrary
by author
DNA Ligation
Step 4: Add 6 microliters of sample-specific adapter mix to each well.
Step 5: Add 9 microliters of ligase mastermix to each well. Mix and spin.
Step 6: – 25˚C for 30 min; 70˚C for 10 min; hold at 10˚C Step 7: Spin tubes and place on ice © ESCMID
eLibrary
by author
PCR Amplification
Step 8: Add 70 microliters of amplification mix to each well.
Step 9: Perform PCR: 5-10 cycles of PCR depending on input DNA concentration.
© ESCMID
eLibrary
by author
Transposon- based DNA
fragmentation
• Illumina Nextera
• Illumina Nextera XT
• Illumina Nextera FLX
“Transposases catalyze the random insertion of excised transposons into DNA targets with high efficiency.”
© ESCMID
eLibrary
by author
Seq uencer Read y DNA
© ESCMID
eLibrary
by author
• Quality analysis of samples and libraries:
• Quantification of library – fluorimetry, qPCR
• Quality analysis – electrophoresis
Final Steps
© ESCMID
eLibrary
by author
Final Steps
• Size selection
• Increase distance between paired- end reads
• Select for shorter fragments to allow paired reads to overlap
• Remove unwanted fragments
• Decrease variability in size
distribution between samples
• Fragment size, or distance between adapters, is known as the insert size.
© ESCMID
eLibrary
by author
Size Selection
© ESCMID
eLibrary
by author
Final Steps
• Combine samples into a final ‘Pool’
• Perform quantitative PCR to measure the number of prepared molecules with
adapters
www.neb.com
© ESCMID
eLibrary
by author
Sequencing Choices to Make
• How much data do I need? [Depth of sequencing]
• Typically, 1 M to 20 M clusters of 2x150 (300 Mb to 6 Gb)
• What sequencing platform should I use?
• What read-length should I use?
• What insert size should I use?
• What kind of barcodes do I need?
• Should I use single-end or paired-end data (Illumina)?
© ESCMID
eLibrary
by author
Pai red -end seque ncing
© ESCMID
eLibrary
by author
Sequencing Run Quality Assessment (Illumina)
• Run Metrics
• Number of clusters
• Clusters passing filter
• Total yield
• % of bases with >Q30
• Error rate
• % phiX detected
• % of clusters by sample
• Caveats
• Not every library type is the same
• Metrics expected for one type of library may not be achievable for others
• PF >80%
• %Q30 >75%
© ESCMID
eLibrary
by author
Sequencing Run Quality Assessment (Illumina)
© ESCMID
eLibrary
by author
Sequencing Run Quality Assessment (Illumina)
© ESCMID
eLibrary
by author
Sequencing Run Quality Assessment (Illumina)
https://blog.horizondiscovery.com/diagnostics/the-5-ngs-qc-metrics-you-should-know
© ESCMID
eLibrary
by author
Sequencing Run Quality Assessment (Illumina)
https://blog.horizondiscovery.com/diagnostics/the-5-ngs-qc-metrics-you-should-know
© ESCMID
eLibrary
by author
• Many different protocols
• Poly(A) Capture – eukaryotic organisms only
• Ribosomal RNA removal – custom and pre-designed
• Small RNA (e.g., miRNA) protocols
• More challenging than DNA-based protocols
• Most protocols require conversion to ds-cDNA
• RNA is more readily degraded
• Microorganisms do not polyadenylate their mRNAs
• Microorganisms do rapidly change their expression profiles
• RNAseq can be confounded by residual gDNA
• Ribosomes are the dominant RNA species
• Microbial mRNA-seq in the presence of host RNA is tricky
How is NGS achieved? (RNA)
© ESCMID
eLibrary
by author
RNA Quality Analysis
© ESCMID
eLibrary
by author
RNA Quality Analysis
• RNA quality determines
which RNAseq protocol can be used
• For microorganisms, poly(A) capture can never be used
• Thus, ribosomal RNA
depletion must be performed (or excess sequence data
generated)
https://www.mun.ca/biology/desmid/brian/BIOL2060/BIOL2060-22/CB22.html
© ESCMID
eLibrary
by author
Ribosomal RNA Depletion
• Multiple techniques for removal of ribosomes
• Hybridization of DNA probes,
followed by RNAse H (shown right)
• Hybridization of biotinylated DNA probes, followed by streptavidin capture
• Hybridization of probes, followed by selective restriction digest
• Double depletion needed for samples with host RNA
www.neb.com
© ESCMID
eLibrary
by author
Sample mRNA workflow
• RNA may or may not be fragmented
• Reverse transcription with
random primers – incorporate artificial sequence at 5’ end
• Molecular tricks to incorporate artificial sequence at 3’ end
• PCR amplify to incorporate sequencing adapters and
barcode © ESCMID
eLibrary
by author
Microbial RNAseq
• Depth of sequencing needed
• Generally, 5-10 M clusters for single organism after ribosomal RNA depletion
• 50 M clusters or more may be needed for complex microbial communities
• Paired-end sequencing not
absolutely necessary due to low intronic content
Ofek-Lalzar, Maya, et al. "Niche and host-associated functional signatures of the root surface microbiome." Nature communications 5 (2014): 4950.