• No results found

Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing

N/A
N/A
Protected

Academic year: 2021

Share "Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

Analysis of DNA methylation:

(2)
(3)

An easy view of the bisulfite approach

TAGTA

C

GTTGAT   TAGTA

C

GTTGAT

TAGTA

C

GTTGAT   TAGTA

T

GTTGAT

CH

3

 |

genome

read

(4)
(5)

Three main problems

1. We need some software specifically designed to align bisulfite reads

2. Loss of sensibility and specificity due to the reduced complexity

(3 letters instead than 4) and to the increased size of the reference

3. Need of special strategies for making the shotgun libraries

(6)

Three main problems

1. We need some software specifically designed to align bisulfite reads

2. Loss of sensibility and specificity due to the reduced complexity

(3 letters instead than 4) and to the increased size of the reference

3. Need of special strategies for making the shotgun libraries

5'­ATGCTGCACTGACACGTGAT­3'

3'­TACGACGTGACTGTGCACTA­5'

5'­ATG

U

TG

U

A

U

TGA

U

A

U

GTGAT­3'

3'­TA

U

GA

U

GTGA

U

TGTG

U

A

U

TA­5'

Before

(7)

Need of special strategies for making the shotgun libraries

Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L,

Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at

(8)

Nuclei

Cells

Sequencing

PCR

Adaptor

ligation

Bisulfite

treatment

DNA

CRIBI method for bisulfite libraries preparation - MeSS – Methylome Solid Sequencing

(9)
(10)

Optimization of

adaptor ligation

Comparing to other Bis-seq methods,

MeSS requires ten times less starting

genomic DNA, avoids intermediate

purification steps between enzymatic

reactions, and allows an efficient

amplification with fewer PCR cycles.

(11)

Loss of sensibility and specificity due to the reduced complexity

(3 letters instead than 4) and to the increased size of the reference

5'­ATGCTGCACTGACACGTGAT­3'

3'­TACGACGTGACTGTGCACTA­5'

5'­ATG

U

TG

U

A

U

TGA

U

A

U

GTGAT­3'

3'­TA

U

GA

U

GTGA

U

TGTG

U

A

U

TA­5'

Before

After

Directional cloning

would half the

mapping complexity

SOLiD color space maintains the full

set of 4 colors after C/U conversion

>882_4_710_F3

T12303201320002311102023132033102120101

>882_4_840_F3

T30132200013022300130131231321021133033

>882_4_1657_F3

T33213100102312210311012322012203112333

>882_5_1275_F3

T31201000021203112332021200212201223112

>882_6_553_F3

T31321031020123002032223323001301333313

...

(12)
(13)

STEP 1

Virtual bisulfite conversion of the genome

Genome

...ATGCTGCACTGACACGTGATGTCGTA...

      

Converted AGT genome

...ATGTTGTATTGATATGTGATGTTGTA...

  

STEP 2

Virtual bisulfite conversion of any C in the reads, remembering the original

Read #1

TGTTGTATTG  

  TGTTGTATTG  

Read #2

TGATGT

C

GTA  

  TGATGT

T

GTA

STEP 3

Alignment of three base sequences

Converted genome

...ATGTTGTATTGATATGTGATGTTGTA...

Converted reads

TGTTGTATTG     TGATGT

T

GTA

STEP 4/5

If original read had any C, check that also genome was C and label as Met

Original genome

...ATGCTGCACTGACACGTGATGT

C

GTA...

Converted genome

...ATGTTGTATTGATATGTGATGT

T

GTA...

Converted read

  TGATGT

T

GTA

Original read

  TGATGT

C

GTA

CH3 /

(14)

Simulated test set

Starting from 3 simulated hg19 reference genome which cytosines was randomly

methylated on both DNA strands to obtain 3 cytosines methylation percent level

( 0% , 50% and 100% ) we have generated 6 test sets containing 1 million of reads

each one (3 for colorspace and 3 for basespace data) using dwgsim-0.1.8 (ref.)

program. The same procedure is applied to obtain the not bisulfite threated DNA

simulated test sets except for the unmodified hg19 reference genome as input of

dwgsim-0.1.8 program.

Used parameters:

[ -y 0 -z 0 -d 100 -S 2 -c 0 or 1 (for Illumina or SOLiD data) -1 50 -2 50 -C -1 -N 1000000 ]

The per base/color/flow error rate and the rate of mutation is set to the default values

(respectively: 0.02 and 0.001). All simulated test sets was produced using the same

seed, so they are comparable for number of reads, position and strand to the human

reference genome (hg19 ).

(15)

General strategy

1. Find seeds in base space

2. Extend alignment in color space

(16)

SOLiD chemistry: ligation probes

Ligation Probes are Octamers

N=degenerate bases, Z=universal bases

4

5

= 1024 probes (256 probes per color)

3’ Ligation site

A T

n n n z z z

3’ Ligation site, cleavage site & dye are spatially separated

Fluorescent dye interrogates

base on

1

st

+ 2

nd

position

A C G T

A

C

G

T

2

nd

Base

1

st

B

as

e

Cleavage site

2-base encoding is based on ligation sequencing rather than sequencing by synthesis. It takes advantage of fluorescent

labeled 8-mer probes that distinguish the two 3 prime most bases (AT in the figure). To have a full coverage, repeated

(17)

SOLiD 4-color ligation

Ligation reaction

ligase

3’ 5’

universal seq primer

Template Sequence

5’

3’

P1 Primer

1µm

bead

Y

-probe

5’

3’

B-probe

5’

3’

G

-probe

5’

3’

R-probe

5’

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

1µm

bead

(18)

ligase

SOLiD 4-color ligation

Ligation reaction

ligase

Template Sequence

5’

P1 Primer

3’

1µm

bead

universal seq primer

1µm

bead

p5’

Y

-probe

5’

3’

B-probe

5’

3’

G

-probe

5’

3’

R-probe

5’

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

x x

(19)

SOLiD 4-color ligation

Visualization

Y

1-2

Template Sequence

5’

P1 Primer

3’

1µm

bead

universal seq primer

1µm

bead

(20)

SOLiD ligation-based sequencing chemistry (2)

Image

Cap unextended strands

(21)

x x

SOLiD 4-color ligation

Cleavage

Template Sequence

5’

P1 Primer

3’

1µm

bead

universal seq primer

1µm

bead

Y

1-2

(22)

ligase

SOLiD 4-color ligation

Ligation (2nd cycle)

ligase

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer

1µm

bead

Y

-probe

5’

3’

B-probe

5’

3’

G

-probe

5’

3’

R-probe

5’

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

Y

1-2

x x

x x

(23)

R

6-7

SOLiD 4-color ligation

Visualization (2nd cycle)

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer

X X

1µm

bead

Y

1-2

x x

(24)

SOLiD 4-color ligation

Cleavage (2nd cycle)

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer

1µm

bead

p5’ X X

x x

R

6-7

Y

1-2

(25)

SOLiD 4-color ligation

interrogates every 4th-5th base

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer

B

16-17

R

11-12

1µm

bead

G

21-22

X X X X X X X X X X

R

6-7

Y

1-2

(26)

SOLiD 4-color ligation Reset

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

1µm

bead

(27)

ligase

SOLiD 4-color ligation

(1

st

cycle after reset)

ligase

3’ p5’

universal seq primer n-1

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer n-1

1µm

bead

p5’

Y

-probe

5’

3’

B-probe

5’

3’

G

-probe

5’

3’

R-probe

5’

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

X X

n n n z z z

x x

(28)

SOLiD 4-color ligation

(1

st

cycle after reset)

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer n-1

1µm

bead

x x

R

0-1

(29)

SOLiD 4-color ligation

(2

nd

Round)

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

universal seq primer n-1

1µm

bead

R

5-6

R

10-11

G

20-21

B

15-16

R

0-1

X X X X X X X X X X

(30)

Sequential rounds of sequencing

Multiple cycles per round

3’

universal seq primer

4-5

9-10

14-15

19-20

Template Sequence

5’

Adapter Oligo Sequence

3’

1µm

bead

1µm

bead

3’

universal seq primer

n-1

reset

3-4

8-9

13-14

18-19

23-24

3’

2-3

7-8

12-13 17-18 22-23

reset

3’

1-2

6-7

11-12 16-17 21-22

reset

3’

0-1

5-6 10-11 15-16

20-21

reset

universal seq primer

n+3

spacer

universal seq primer

n+1

spacer

universal seq primer

n+2

spacer

(31)

Agenda Item

01

02

Agenda Item

03

Agenda Item

SOLiD™ Chemistry

Double Base Encoding

(32)

2 Base Pair Encoding

Using 4 Dyes

A

C

G

T

A

C

G

T

2

nd

Base

1

st

B

as

e

Red

-probe

5’

A T n n n z z z

3’

Blue

-probe

5’

T

T

n n n z z z

3’

(33)

2 base pair encoding reference alignment in color space

A C G G T C G T C G T G T G C G T

Base reference

(34)

A C G G T C G T C G T G T G C G T

2 base pair encoding reference alignment in color space

reference

expected

observed

A SNP to be real must be encoded by two color changes

(35)

A C G G T C G T C G T G T G C G T

Advantages of 2 base pair encoding Miscall

reference

expected

observed

A

C

G

T

A

C

G

2

nd

Base

1

st

B

as

e

A C G G T C G 

C

 

T A C A C A T A C

(36)

But there is more…

Consider a triplet of bases, they

define 2 colors.

There are only 3 possibilities for a

change in the middle base, hence

only 3 possibilities for the 2 colors

to change to.

Any of the other 6 possibilities for

a 2-color change are not allowed

and most probably represent

measurement errors.

A

 T

(37)

A

 T

C

 T C 

T

 T

C

G

T

Reverse Colors

Other two colors (both orientations)

Any other transitions would require the outer two bases to change

(38)

A

 T

A

C

G

T

A

C

G

T

2

nd

Base

1

st

B

as

e

T

 

C

 T

G

 

T

 T

A

 

G

 T

C

 

A

T

 

G

G

 

C

1/3

rd

allowed vs 2/3

rd

not allowed

(39)

SOLiD Exact Call Chemistry (ECC)

ECC allows to perform an extra run of ligations with 3-base encoding. This is

used as a control of the accuracy, thus improving the quality of the sequence in

(40)

General strategy

1. Find seeds in base space

2. Extend alignment in color space

3. Rescue unaligned reads using a reference

with the combination of methylated patterns

PASS implementation of bisulfite alignment

(Davide Campagna)

(41)
(42)

References

Related documents

IRCC, a partner college in the National Center for Optics and Photonics Education (OP-TEC), with the assistance of other colleges like Camden County College which already offers

More precisely, the paper is organized as follows; Section 1 reviews the social protection programs in the MENA region and the literature measuring the impact of different

Nevertheless, this work has clearly documented that natural selection for a clearly adaptive trait (copper tolerance) has caused a gene for post-zygotic isola- tion to spread

After summarizing the literature and presenting several example test and evaluation environments that have been used in the past, we propose a new open source evaluation

This interdisciplinary Center (which includes faculty from Law, Political Science, Philosophy, Sociology, and other disciplines) will study the implementation

Circuit had previously held that individuals could be liable for aiding and abetting under the ATS, 13 it had never directly addressed whether the ATS extended