• No results found

RNA Secondary Structure Prediction

N/A
N/A
Protected

Academic year: 2022

Share "RNA Secondary Structure Prediction"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

RNA Secondary Structure Prediction

BMI/CS 776

www.biostat.wisc.edu/bmi776/

Mark Craven

[email protected] Spring 2011

Goals for Lecture

the key concepts to understand are the following

•! RNA secondary structure

•! secondary structure features: stems, loops, bulges

•! Pseudoknots

•! the Nussinov algorithm

•! adapting Nussinov to take free energy into account

(2)

Why RNA Is Interesting

•! in addition to messenger RNA (mRNA), there are other RNA molecules that play key roles in biology

–! ribosomal RNA (rRNA)

•! ribosomes are complexes that incorporate several RNA subunits in addition to numerous protein units

–! transfer RNA (tRNA)

•! transport amino acids to the ribosome during translation –! the spliceosome, which performs intron splicing, is a

complex with several RNA units

–! microRNAs and others that play regulatory roles

–! the genomes for many viruses (e.g. HIV) are encoded in RNA

–! etc.

•! the folding of an mRNA can be involved in regulating the gene’s expression

RNA Secondary Structure

•! RNA is typically single stranded

•! folding, in large part is determined by base-pairing A-U and C-G are the canonical base pairs

other bases will sometimes pair, especially G-U

•! the base-paired structure is referred to as the secondary structure of RNA

•! related RNAs often have homologous secondary structure without significant sequence similarity

(3)

tRNA Secondary Structure

tertiary structure

Small Subunit Ribosomal RNA

Secondary Structure

(4)

6S RNA Secondary Structure

H. influenzae

B. subtilis

GUUUUGA CC - - -

--

- .

ACAUUUUUUU

UU

GAGA U - .

- - - - - - - -

- - - -.. - - -.. - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - G

GUCCUGAU GUGUU GUUGUAA CA

CCUA GA

GA

UUUGGG GCCCA GCA

UUUUUAAAUGGC GUACAU

GCCUCUUU UCA U

AA A

CAGGGUUA CACGG CAACGUU C

GAAAGGAAACAAAACU GGU GC ACCC CGGGA A GAAAAUUUAGA ACAU UC AGGAGAAA GGU C CGU

CC A

UU U

- -

C U

-. - -- - --.

- -

- -.. - - -. - -. GUAGGC- - - GUCC- - - -CUGAGCC- - -. - - - - - -..- - - .. - - - - - - - -. - - - CCUGGAGUGUUCGUCA UAU AAAUUGGUUUCCUAUCGUUGGUGUGUAGGCUUAACCUUUGACU U A

CGCCCG CAGGAACUUGG

GGGUCUCACG GCGA UCAU UUUAGUCAA C U AUGGCAAAAGUCCAA AGAAUCGGGUUACU GU

CUG

AAA UCU U AC

UA U U

G

CC A AGUUCC U

GAU CUG

E. coli

3'-

A

-.

- - -

CGC CCCU

GCG G CGUCCGA

CAUUGGGAACUUGG

CCUU GA

ACCA A GAGCC

UC

CCACUUACACA AGCGUCAA A AUUCCGAAGAGCC GU AGAGGCUCUACG- - - -. - -- - - -. - -. - - - - - - - - - - - - - - - - - - - -. - - - .- - - - - - - - C 5'-

UCUCUGAGAUGUU AAGCGGGCCAG UC AU

U GAUAUU UA

CC A

GAAUGUGG CUCCGCGGUUGGUGAGC AU

GCUCGGU C CA

A

GUU

CG- - GC

Conservation of proposed6S RNA Secondary Srtucture

Secondary Structure Features

H. influenzae

B. subtilis

GUUUUGA CC - - -

--

- .

ACAUUUUUUU

UU

GAGA U - .

- - - - - - - -

- - - -.. - - -.. - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - G

GUCCUGAU GUGUU GUUGUAA CA

CCUA GA

GA

UUUGGG GCCCA GCA

UUUUUAAAUGGC GUACAU

GCCUCUUU UCA U

AA A

CAGGGUUA CACGG CAACGUU C

GAAAGGAAACAAAACU GGU GC ACCC CGGGA A GAAAAUUUAGA ACAU UC AGGAGAAA GGU C CGU

CC A

UU U

- -

C U

-. - -- - --.

- -

- -.. - - -. - -. GUAGGC- - - GUCC- - - -CUGAGCC- - -. - - - - - -..- - - .. - - - - - - - -. - - - CCUGGAGUGUUCGUCA UAU AAAUUGGUUUCCUAUCGUUGGUGUGUAGGCUUAACCUUUGACU U A

CGCCCG CAGGAACUUGG

GGGUCUCACG GCGA UCAU UUUAGUCAA C U AUGGCAAAAGUCCAA AGAAUCGGGUUACU GU

CUG

AAA UCU U AC

UA U U

G

CC A AGUUCC U

GAU CUG

E. coli

3'-

A

-.

- - -

CGC CCCU

GCG G CGUCCGA

CAUUGGGAACUUGG

CCUU GA

ACCA A GAGCC

UC

CCACUUACACA AGCGUCAA A AUUCCGAAGAGCC GU AGAGGCUCUACG- - - -. - -- - - -. - -. - - - - - - - - - - - - - - - - - - - -. - - - .- - - - - - - - C 5'-

UCUCUGAGAUGUU AAGCGGGCCAG UC AU

U GAUAUU UA

CC A

GAAUGUGG CUCCGCGGUUGGUGAGC AU

GCUCGGU C CAA

GUU

CG- - GC

Conservation of proposed6S RNA Secondary Srtucture

bulge

internal loop

stem hairpin loop

(5)

Four Key Problems

•! predicting RNA secondary structure Given: RNA sequence

Do: predict secondary structure that sequence will fold into

•! searching for instances of a given structure

Given: an RNA sequence or its secondary structure Do: find sequences that will fold into a similar structure

•! modeling a family of RNAs

Given: a set of RNA sequences with similar secondary structure Do: construct a model that captures the secondary structure

regularities of the set

•! identifying novel RNA genes

Given: a pair of homologous DNA sequences

Do: identify subsequences that appear to have highly conserved RNA secondary structure (putative RNA genes)

RNA Folding Assumption

•! the algorithms we’ll consider assume that base pairings do not cross

•! for base-paired positions i, i’ and j, j’, with i < i’

and j < j’, we must have either

i < i’ < j < j’ or j < j’ < i < i’ (not nested)

i < j < j’ < i’ or j < i < i’ < j’ (nested)

•! can’t have i < j < i’ < j’ or j < i < j’ < i’!

i! i’! j! j’!

i! j! j’!i’!

(6)

Figure from Seliverstov et al. BMC Microbiology, 2005"

pseudoknot

Pseudoknots

•! these crossings are called pseudoknots

•! dynamic programming breaks down if pseudoknots are allowed

•! fortunately, they are not very frequent

Predicting RNA Secondary Structure:

the Nussinov Algorithm

[Nussinov et al., SIAM Journal of Applied Mathematics 1978]

key idea:

–! find maximal number of base pairings for a given sequence

–! do this using dynamic programming

•! start with small subsequences

•! progressively work to larger ones

(7)

DP in the Nussinov Algorithm

•! incrementally find the best secondary structure for subsequences [i, j]

•! consider four possibilities

i+1!

c

c u

c g c

a u g c g

j-1!

i! j!

i, j paired

c u

c g c

a u g c i+1!

i!

j!

c

i unpaired

c u

c g c

a u g c i!

j!

j-1!

c j unpaired

i!

c u

c g c

a u g c

c u

c g c

a u g c

c u j!

k! k+1!

bifurcation: combine two substructures

DP in the Nussinov Algorithm

•! let

•! initialization:

•! recursion

!"

=#

otherwise

0

ary complement are

and

if ) 1

,

( xi xj

j

$ i

!

"(i, j) = max

"(i +1, j)

"(i, j -1)

"(i +1, j -1) + #(i, j) maxi<k< j

[

"(i,k) + "(k +1, j)

]

$

%

&

&

'

&

&

L i

i i

L i

i i

to 1 for 0

) , (

to 2 for 0 ) 1 , (

=

=

=

=

!

"

"

max # of

paired bases in subsequence [i, j]

(8)

Nussinov Algorithm Traceback

break

) push(

) 1 ( push

) , ( ) , 1 ( ) , ( if : 1 to 1 for

else

) 1 , 1 ( push

pair base record

) , ( ) , ( ) 1 , 1 ( if else

) 1 , ( push ) , ( ) 1 , ( if else

) , 1 ( push ) , ( ) , 1 ( if else

continue if

) ( pop

empty is

stack until

repeat

stack onto ) 1 ( push

i,k , j k

j i j

k k

i j-

i k

j i

i, j

j i j

i j

i

j i j

i j

i

j i j

i j

i j i

i,j ,L

+

= +

+ +

=

! +

= +

! +

!

=

!

+

= +

"

#

#

#

#

$

#

#

#

#

#

Predicting RNA Secondary Structure by Energy Minimization

•! it’s naïve to predict folding just by maximizing the number of base pairs

•! however, we can generalize the key recurrence relation so that we’re minimizing free energy instead

[ ]

! !

"

! !

#

$

+ +

+

=

<

<

) , (

) 1 (

) ( min

) 1 (

) 1 ( min )

, (

j k i

j i P

j , k E k

i, E j- i, E

j , i E j

i E

case that i and j!

are base paired

(9)

Predicting RNA Secondary Structure by Energy Minimization

•! a sophisticated program, such as Mfold [Zuker et al.], can take into account free energy of the “local

environment” of [i, j]

!

P(i, j) = min

"(i, j) +LoopEnergy( j # i #1)

"(i, j)+ StackingEnergy(i, j, i +1, j #1) + E(i +1, j #1)

mink$1

[

"(i, j) +BulgeEnergy(k) + E(i + k +1, j #1)

]

mink$1

[

"(i, j) +BulgeEnergy(k) + E(i +1, j # k #1)

]

mink ,l $1

[

"(i, j) +LoopEnergy(k + l) + E(i + k +1, j # l #1)

]

%

&

' ' ' '

( ' ' ' '

Predicting RNA Secondary Structure by Energy Minimization

c u

c g c

a u

i! j!

c u c

a u g c

i!a u j!

g c i+k+1! j-1!

c u c

a u g c

i!a u j!

g c j-l-1!

i+k+1!

c u

c

a u c g g c

i! j!

i+1! j-1!

g c )

1 (

LoopEnergy )

,

(i j + j!i!

"

[

( ) BulgeEnergy( ) ( 1 1)

]

min1 + + + + !

" i, j k E i k , j

k #

!

"(i, j) + StackingEnergy(i, j, i +1, j #1)+ E(i +1, j #1)

[

( ) LoopEnergy( ) ( 1 1)

]

min, 1 + + + + + ! !

" i, j k l E i k ,j l

l

k #

References

Related documents

So-called “giant flux leaps” have been observed by a number of research groups investigating PFM [7,18,23-26], and more recently in unpublished experiments carried out in our

The main mechanisms include increased access to health insurance coverage through the expansion of Medicaid (government-sponsored health insurance for low-income children

All cases recruited for this study were considered to fol- low the included and excluded criteria. The inclusive cri- teria included a) studies that compared outcomes of par

Different conventional thermal treatments are used in the drying of biological products such as, hot-air drying, vacuum drying, sun- drying and freeze drying result in

PCAS: Post-cardiac arrest syndrome; GCS: Glasgow coma scale; GWR: Gray matter attenuation to white matter attenuation ratio; CAST score: Post-cardiac arrest syndrome for

plummerae the elongate, biseriate hairs are moderately common on the lower (abaxial) leaf surfaces, but are sparse on the upper (adaxial) surfaces often persisting only

2.5 Programska oprema Za izdelavo 3D informacijskega modela so na voljo programi, Autodesk Revit, Graphisoft ArchiCAD, Nemetschek Allplan in drugi [11], v katerih je omogočeno

CFA: Confirmatory Factor Analysis; JDR: Join Dementia Research; RMSEA: Root Mean Square Error of Approximation; RS-14: Resilience Scale – 14 items; SEMD: Self- Efficacy for