• No results found

5.3 Ambiguity

5.3.3 Efficiency

Ambiguous recurrences are specialized recurrences for one type of evaluation. Therefore, it is only logical to assume that Zuker allowed ambiguity to speed up the computation or to save space, because certain structural distinctions are irrelevant for the evaluation function in question. In our case this is the minimization of free energy. It does not matter wether we evaluate the energy for an internal loop in more than one place, and in more than one way, because it either is the structure with minimal free energy or it is not. Worse, the end result would still be correct if one of the alternative computations gave wrong results that always were higher in value. In the next section a canonical parser will be presented for the nearest neighbor model.

6 Canonicity (Wuchty’s Algorithm)

The aim of a canonical parser for a given model is to produce all legal solutions for a given input (problem) without producing duplicates.

Definition 18 «Canonical Models and Canonical Yield Grammars» Let

K

be a set, the canonical model. Let k be a mapping from

L

(

G

)to

K

. A yield grammar(

G

;y)is canonical w.r.t.

K

and k if it is unambiguous and the mapping k is bijective. A DP algorithm is canonical w.r.t.

K

and k, if the underlying yield grammar is canonical w.r.t.

K

and k. 

Wuchty and others (Wuchty et al., 1999) have presented DP recurrences in 1999 for the nearest neighbor model because they wanted to produce statistics of free energy folding landscapes. According to Zuker (Zuker, 2000) they reinvented the recurrences of Williams and Tinoco (Williams and Tinoco, 1986) in the process. To avoid this happening again we will base our ADP recognizer on the schema given in Wuchty’s paper. Thus, gaining a more general implementation of the nearest neighbor model for RNA secondary structure formation not restricted to free energy calculation. This implementation will form the basis on which further improvements and enhancements will be presented in this thesis.

6.1 Wuchtys Algorithm in ADP

module Wuchty where import RNA import Combinators import FreeEnergy import EnergyTables import Structure import Utils import List

wuchty99 algebra = axiom external where

(el,sadd,cons,sr,hl,bl,br,il,ml, concat,ul,addss,ssadd,nil, h,h_l,h_s) = algebra

external = el <<< q struct

The enclosing structure of a RNA secondary fold is the external loop, which consists of a chain of substructures.

where struct = listed (

sadd <<< base +~~ q struct ||| cons <<< p closed ~~~ q struct ||| nil ><< empty ... h_s)

This chain may be empty or consist of any number of unpaired bases and closed substructures. The tabulation used here takes the form of a one-dimensional array (listed and q) because the boundary of all statements is fixed to n, the 3’-end of the

RNA sequence. Remember, that ADP combinator parsers always have to consume the complete input to succeed. We want the complete RNA sequence folded and not only a part of it. Canonicity is ensured, because every statement involved is right-recursive and enforces a unique structure element (base or closed) on its left side. Moreover, only the empty statement ends the recursion.

where closed = tabulated (

((stack ||| hairpin ||| leftB ||| rightB ||| iloop ||| multiloop) ‘with‘ basepairing) ... h)

where

stack = sr <<< base +~~ p closed ~~+ base

hairpin = hl <<< base +~~ (region ‘with‘ minLoopSize 3) ~~+ base leftB = bl <<< base +~~ region ~~~ p closed ~~+ base rightB = br <<< base +~~ p closed ~~~ region ~~+ base iloop = il <<< base +~~ region ~~~ p closed ~~~ region ~~+ base

The statementsstack, hairpin,leftB,rightB, andiloop, are old friends and are one to one chains of their constituent elements including the closing base pair (see Section 3.3). As such, there is no room for ambiguity.

multiloop = ml <<< base +~~ p block ~~~ p comps ~~+ base comps = tabulated (

concat <<< p block ~~~ p comps ||| p block ||| addss <<< p block ~~~ region ... h_l) block = tabulated (

ul <<< p closed ||| ssadd <<< region ~~~ p closed ... h_l)

A multiple loop consists of two or more blocks and an optional trailing region of upaired bases at the 3’-end. A block in turn is a paired structure with an optional 5’-region of single stranded bases. Again, uniqueness of structures is ensured by strictly disjoint statements.

The free energy algebra adds up the energies of sub-structures and evaluates the free energy of given structures by applying the energy functions given in Section 3.3.

wuchtyFreeEnergyAlg = (id,sadd,cons,sr,hl,bl,br,il,ml, concat,ul,addss,ssadd,nil, minimize,minimize,minimize) where

sadd _ e = e

cons :: Energy -> Energy -> Energy cons c e = e + c

sr i e j = e + dg_sr (i,j) hl i _ j = dg_hl (i,j)

bl i (_,n) e j = e + dg_bl (i,j) (n+1,j-1) br i e (m,_) j = e + dg_bl (i,j) (i+1,m) il i (_,n) e (u,_) j = e + dg_il (i,j) (n+1,u) ml _ b c _ = ml_init_penalty + b + c concat :: Energy -> Energy -> Energy

concat b c = b + c

ul c = ml_helix_penalty + c addss c r = c + ml_unpaired r

ssadd r c = c + ml_helix_penalty + ml_unpaired r nil = 0.0::Energy

Counting works by multiplying sub-structure combinations and adding up structure alternatives. wuchtyCountAlg = (id,sadd,cons,sr,hl,bl,br,il,ml, concat,ul,addss,ssadd,nil, addup,addup,addup) where sadd _ e = e

cons c e = c * e sr _ e _ = e hl _ _ _ = 1 bl _ _ e _ = e br _ e _ _ = e il _ _ e _ _ = e ml _ b c _ = b * c concat b c = b * c ul c = c addss c _ = c ssadd _ c = c nil = 1::Integer

The structure algebra is the direct application of the corresponding elements in the data type. In some cases functions have to be applied to perform convolved list concate- nations. wuchtyStructureAlg = (EL,sadd,(:),SR,HL,BL,BR,IL,ml, (++),(:[]),addss,ssadd,[], id,id,id) where sadd m [] = SS (m-1,m) : [] sadd m (SS (i,j):c) = SS (m-1,j) : c sadd m c = SS (m-1,m) : c ml i b c j = ML i (b ++ c) j addss c r = c ++ [SS r] ssadd r c = [SS r, c]

Related documents