Analysis of DNA methylation:
An easy view of the bisulfite approach
TAGTA
C
GTTGAT TAGTA
C
GTTGAT
TAGTA
C
GTTGAT TAGTA
T
GTTGAT
CH
3
|
genome
read
Three main problems
1. We need some software specifically designed to align bisulfite reads
2. Loss of sensibility and specificity due to the reduced complexity
(3 letters instead than 4) and to the increased size of the reference
3. Need of special strategies for making the shotgun libraries
Three main problems
1. We need some software specifically designed to align bisulfite reads
2. Loss of sensibility and specificity due to the reduced complexity
(3 letters instead than 4) and to the increased size of the reference
3. Need of special strategies for making the shotgun libraries
5'ATGCTGCACTGACACGTGAT3'
3'TACGACGTGACTGTGCACTA5'
5'ATG
U
TG
U
A
U
TGA
U
A
U
GTGAT3'
3'TA
U
GA
U
GTGA
U
TGTG
U
A
U
TA5'
Before
Need of special strategies for making the shotgun libraries
Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L,
Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at
Nuclei
Cells
Sequencing
PCR
Adaptor
ligation
Bisulfite
treatment
DNA
CRIBI method for bisulfite libraries preparation - MeSS – Methylome Solid Sequencing
Optimization of
adaptor ligation
Comparing to other Bis-seq methods,
MeSS requires ten times less starting
genomic DNA, avoids intermediate
purification steps between enzymatic
reactions, and allows an efficient
amplification with fewer PCR cycles.
Loss of sensibility and specificity due to the reduced complexity
(3 letters instead than 4) and to the increased size of the reference
5'ATGCTGCACTGACACGTGAT3'
3'TACGACGTGACTGTGCACTA5'
5'ATG
U
TG
U
A
U
TGA
U
A
U
GTGAT3'
3'TA
U
GA
U
GTGA
U
TGTG
U
A
U
TA5'
Before
After
Directional cloning
would half the
mapping complexity
SOLiD color space maintains the full
set of 4 colors after C/U conversion
>882_4_710_F3
T12303201320002311102023132033102120101
>882_4_840_F3
T30132200013022300130131231321021133033
>882_4_1657_F3
T33213100102312210311012322012203112333
>882_5_1275_F3
T31201000021203112332021200212201223112
>882_6_553_F3
T31321031020123002032223323001301333313
...
STEP 1
Virtual bisulfite conversion of the genome
Genome
...ATGCTGCACTGACACGTGATGTCGTA...
↓
Converted AGT genome
...ATGTTGTATTGATATGTGATGTTGTA...
STEP 2
Virtual bisulfite conversion of any C in the reads, remembering the original
Read #1
TGTTGTATTG
→
TGTTGTATTG
Read #2
TGATGT
C
GTA
→
TGATGT
T
GTA
…
STEP 3
Alignment of three base sequences
Converted genome
...ATGTTGTATTGATATGTGATGTTGTA...
Converted reads
TGTTGTATTG TGATGT
T
GTA
STEP 4/5
If original read had any C, check that also genome was C and label as Met
Original genome
...ATGCTGCACTGACACGTGATGT
C
GTA...
Converted genome
...ATGTTGTATTGATATGTGATGT
T
GTA...
Converted read
TGATGT
T
GTA
Original read
TGATGT
C
GTA
CH3 /
Simulated test set
Starting from 3 simulated hg19 reference genome which cytosines was randomly
methylated on both DNA strands to obtain 3 cytosines methylation percent level
( 0% , 50% and 100% ) we have generated 6 test sets containing 1 million of reads
each one (3 for colorspace and 3 for basespace data) using dwgsim-0.1.8 (ref.)
program. The same procedure is applied to obtain the not bisulfite threated DNA
simulated test sets except for the unmodified hg19 reference genome as input of
dwgsim-0.1.8 program.
Used parameters:
[ -y 0 -z 0 -d 100 -S 2 -c 0 or 1 (for Illumina or SOLiD data) -1 50 -2 50 -C -1 -N 1000000 ]
The per base/color/flow error rate and the rate of mutation is set to the default values
(respectively: 0.02 and 0.001). All simulated test sets was produced using the same
seed, so they are comparable for number of reads, position and strand to the human
reference genome (hg19 ).
General strategy
1. Find seeds in base space
2. Extend alignment in color space
SOLiD chemistry: ligation probes
•
Ligation Probes are Octamers
–
N=degenerate bases, Z=universal bases
–
4
5
= 1024 probes (256 probes per color)
3’ Ligation site
A T
n n n z z z
3’ Ligation site, cleavage site & dye are spatially separated
Fluorescent dye interrogates
base on
1
st+ 2
ndposition
A C G T
A
C
G
T
2
nd
Base
1
st
B
as
e
Cleavage site
2-base encoding is based on ligation sequencing rather than sequencing by synthesis. It takes advantage of fluorescent
labeled 8-mer probes that distinguish the two 3 prime most bases (AT in the figure). To have a full coverage, repeated
SOLiD 4-color ligation
Ligation reaction
ligase
3’ 5’
universal seq primer
Template Sequence
5’
3’
P1 Primer
1µm
bead
Y
-probe
5’
3’
B-probe
5’
3’
G
-probe
5’
3’
R-probe
5’
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
1µm
bead
ligase
SOLiD 4-color ligation
Ligation reaction
ligase
Template Sequence
5’
P1 Primer
3’
1µm
bead
universal seq primer
1µm
bead
p5’Y
-probe
5’
3’
B-probe
5’
3’
G
-probe
5’
3’
R-probe
5’
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
x x
SOLiD 4-color ligation
Visualization
Y
1-2
Template Sequence
5’
P1 Primer
3’
1µm
bead
universal seq primer
1µm
bead
SOLiD ligation-based sequencing chemistry (2)
Image
Cap unextended strands
x x
SOLiD 4-color ligation
Cleavage
Template Sequence
5’
P1 Primer
3’
1µm
bead
universal seq primer
1µm
bead
Y
1-2
ligase
SOLiD 4-color ligation
Ligation (2nd cycle)
ligase
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer
1µm
bead
Y
-probe
5’
3’
B-probe
5’
3’
G
-probe
5’
3’
R-probe
5’
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
Y
1-2
x x
x x
R
6-7
SOLiD 4-color ligation
Visualization (2nd cycle)
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer
X X
1µm
bead
Y
1-2
x x
SOLiD 4-color ligation
Cleavage (2nd cycle)
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer
1µm
bead
p5’ X Xx x
R
6-7
Y
1-2
SOLiD 4-color ligation
interrogates every 4th-5th base
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer
B
16-17
R
11-12
1µm
bead
G
21-22
X X X X X X X X X XR
6-7
Y
1-2
SOLiD 4-color ligation Reset
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
1µm
bead
ligase
SOLiD 4-color ligation
(1
st
cycle after reset)
ligase
3’ p5’
universal seq primer n-1
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer n-1
1µm
bead
p5’Y
-probe
5’
3’
B-probe
5’
3’
G
-probe
5’
3’
R-probe
5’
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
X X
n n n z z z
x x
SOLiD 4-color ligation
(1
st
cycle after reset)
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer n-1
1µm
bead
x x
R
0-1
SOLiD 4-color ligation
(2
nd
Round)
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
universal seq primer n-1
1µm
bead
R
5-6
R
10-11
G
20-21
B
15-16
R
0-1
X X X X X X X X X XSequential rounds of sequencing
Multiple cycles per round
3’
universal seq primer
4-5
9-10
14-15
19-20
Template Sequence
5’
Adapter Oligo Sequence
3’
1µm
bead
1µm
bead
3’
universal seq primer
n-1
reset
3-4
8-9
13-14
18-19
23-24
3’
2-3
7-8
12-13 17-18 22-23
reset
3’
1-2
6-7
11-12 16-17 21-22
reset
3’
0-1
5-6 10-11 15-16
20-21
reset
universal seq primer
n+3
spacer
universal seq primer
n+1
spacer
universal seq primer
n+2
spacer