• No results found

A zero one programming model for RNA structures with arclength ≥ 4

N/A
N/A
Protected

Academic year: 2020

Share "A zero one programming model for RNA structures with arclength ≥ 4"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

A

 

zero

­

one

 

programming

 

model

 

for

 

RNA

 

structures

 

with

 

arc

­

length

 ≥ 

4

  

G.H.SHIRDEL AND N.KAHKESHANI

Department of Mathematics, Faculty of Basic Sciences, University of Qom, Qom, Iran

(Received August 1, 2012)

A

BSTRACT

In this paper, we consider RNA structures with arc-length4. First, we represent these structures as matrix models and zero-one linearprogramming problems. Then, we obtain an optimal solution for this problemusing an implicit enumeration method. The optimal solution corresponds toan RNA structure with the maximum number of hydrogen bonds.

Keywords: RNA structure,zero-one linear programming problem, additive algorithm.

1.

I

NTRODUCTION

A problem in mathematical biology is enumeration of RNA structures. The RNAhas an important role within cells and also, its functions depend on the structure of theRNA molecules. Hence, understanding of its helical configuration is important. TheRNA molecule is a sequence of four nucleotides A, C, G and U which plays an importantrole in Biological reactions. These nucleotides are connected to each other via hydrogenbonds. The formation of these bonds stabilizes the molecule by lowering its free energy[2]. The RNA structures can be displayed in various ways such as tree, linear encodingof tree, coarse grained representation, homeomorphically irreducible tree and diagram[3, 4]. In this paper, we represent another two models for RNA structures. Accordingto [4], the diagram representation is defined as following:

Let ( , )

n n G

G

n V E

G  be a directed graph such that

1, ,

and

(, )|1

. ]

[n n E i j i j n

V

n

n G

G       

n

G V and

n

G

E are called the sets of vertices and arcs, respectively. Each directed graphcan be

(2)

hydrogen bonds, respectively. We attribute two parameters to the diagrams: the minimum arc-length, , the minimum stack-length, . In diagram representation, the length of an arc

) ,

(i j is ji and a stack of length  is a sequence of the parallel arcs like

))) 1 ( ), 1 ( ( , ), 1 , 1 ( ), ,

((i j ij i  j  , see Figure 1. We denote the number of RNA structures with 4 and  1 over [n] by S4,1(n).

Figure1. RNA Structure with 4, 2.

The remainder of this paper is organized as follows. In sections 2 and 3, we represent RNA structures with arc-length 4and stack-length 1 as matrix models and zero-one linear programming problems, respectively. Also, we express the results forenumeration of RNA structures. In section 4, we use the additive algorithm for solving linear programming problems with binary variables only [1]. The general notion of the additivealgorithm is based on testing a few number of possible solutions, m

2 (in which m is thenumber of variables), of a problem instead of all solutions. In other words, through thismethod, some of the possible solutions of the problem are left unexamined. Also,the zero-one linear programming problems can be solved using each of the generalinteger programming techniques. Finally, conclusions and future works are discussed insection 5.

2.

M

ATRIX

M

ODELS

Each RNA structure over [n], Gn, corresponds to a nn matrix. We display this matrix by M(Gn)[mij] such that

Theorem 1. Suppose Gn is an RNA structure with arc-length 4over [n]. Then we have

   

  

22 21

12 11

) (

A A

A A G

M n (1)

whereA110(n4)4, A21044, A22 04(n4) and A12is the (n4)(n4)upper triangular

matrix.

. )

, ( 0

) , ( 1

   

  

n n

G G ij

E j i if

E j i if m

1      2         3         4         5         6         7         8      9        10        11       12

(3)

 

Proof. For all (i,j), where ij,

n

G E j i, )

( . Therefore,M(Gn) is an upper triangular

matrix. Since Gn is a directed graph with arc length 4, we have(i,j)EGnand mij 0

for all (i,j), where ji3. Therefore, three diagonals above theprincipal diagonal are

zero. Then, the matrixM(Gn) is of the form (1).

We now write the upper triangular matrix A12 as follows:

Theorem 2. S4,1(n)is equal to the number of situations in which the entries aboveand on

the principal diagonal in A12can be equal to 1 such that for each 1in4and 5 jn,

at most one of the entries

mi(i4),,min,mj(j4),,mjn,m1i,,m(n4)i,m1j,,m(n4)j

be 1.

Proof. LetGnis an RNA structure with arc-length 4over [n]. The degree of each vertex

n

G is at most 1. Therefore, each row and column A12 is0 or ek, where1kn4.

Suppose that there exist 1in4 and 5 jnso that mij 1. Then

n

G E j i, )

( . Since

the degree of each vertexGn is at most 1,we have

n

G E h i i

k, ),(, )

( for each kiand hi,

where hj. Similarly, we have

n

G E j k h

j, ),( , )

( for each hjand kj, where ki.

This completes our argument.

3.

Z

ERO

-

ONE

L

INEAR

P

ROGRAMMING

P

ROBLEMS

Here, we present a zero-one linear programming problem for displaying of RNA structures with arc-length 4 and stack-length 1 .Since each row of the matrixA12is 0 or ek, where

4

1kn , we have the following constraints:

    

 

    

 

n n

n n

m m m

m m

m

) 4 (

2 26

1 16

15

0 0 0

    

(4)

                                                         1 1 1 1 1 ) ( 1 1 1 1 ) ( ) 4 ( ) 5 ( ) 1 )( 5 ( ) 6 ( ) 1 )( 6 ( ) 2 )( 6 ( ) 7 ( ) 3 )( 7 ( 5 59 4 48 3 37 2 27 26 1 17 16 15 n n n n n n n n n n n n n n n n n n n n n x x x x x x x x x x b x x x x x x x x x x x a       

Similarly, since each column of matrix A12is 0 or ek, where 1kn4, we have the

following constraints:

(5)

                                                                                 1 1 1 1 1 1 1 ) ( ) 4 ( ) 4 )( 8 ( ) 4 ( 2 ) 4 ( 1 ) 5 ( ) 1 )( 5 ( ) 5 )( 9 ( ) 5 ( 2 ) 5 ( 1 ) 6 ( ) 1 )( 6 ( ) 2 )( 6 ( ) 6 )( 10 ( ) 6 ( 2 ) 6 ( 1 8 ) 13 ( 8 ) 12 ( 8 48 38 28 18 7 ) 12 ( 7 ) 11 ( 7 37 27 17 6 ) 11 ( 6 ) 10 ( 6 26 16 5 ) 10 ( 5 59 15 n n n n n n n n n n n n n n n n n n n n n n n n n n n n x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x e          

Based on the following lemma, some of the constraints are extra and they can beomitted.

Lemma 1. Let S

x|Axb,x0

, where A is a mn matrix with rank m and m

bR .

Let two constraints

i j k k k i

k b x x x x b

x

x1  and 1  1

belong to the set of constraints Axb. Then the constraintx1xkbiis extra.

Proof. Suppose that S be the feasible region after deleting the constraint x1xkbi. Let the constraintx1xkbi isn't extra. ThenSS. Let

S x x, , n) 

( 1  and(x1,,xn)S. Therefore,

.

and 1 1

1 xk bi x xk xk xk j bi

x       

Introducing the slack variables y0and z0, we have the following constraints in standard form:

.

and 1 1

1 xk z bi x xk xk xk j y bi

x             Therefore,

(3)

On the other hand,

, 0 1     

x y z

xk k j (4) .

0 1     

(6)

Since xk1,,xkj,y0and z0. But, this is a contradiction. Then SS.

According to Lemma 1, the constraints (b)and (c)are extra. Therefore, we can delete them. Now, we define the Problem A as follows:

Problem A:

E j i x

n j

x x

n n n n j x

i x

t s

x Max

ij

n

j i

ji j

i ij j

i ij n

i j

ij E j i

ij

  

 

 

    

 

  

 

  

) , ( 1

, 0

4 , , 6 , 5 1

, 1 , 2 , 3 1

4 , 3 , 2 , 1 1

. .

4 4

1 4

1 4 ) , (

where E

(i,j)| ji4

.

The number of variables and constraints of the Problem A has been presented inTable 1.

Theorem 3. S4,1(n)is equal to the number of feasible solutions of the Problem A. Also,

thenumber of optimal solutions of the Problem A is equal to the number of RNA structures with arc length 4 and maximum number of arcs over [n].

Proof. The Problem A is written on the basis of the matrix model (1). Therefore, Theorem 2 guarantees that S4,1(n) is equal to the number of feasible solutions. Sincethe objective

function is equal to the sum of the variables and the Problem A is themaximization problem, then among the feasible solutions, the optimal solution belongs to the one in which the maximum number of xijvariables would be equal to 1. So, the optimal solution

has maximum number of arcs over [n].

4.

U

SING

A

DDITIVE

A

LGORITHM FOR

S

OLVING THE

P

ROBLEM

A

(7)

 

1. Its objective function should be in the form of minimization. 2. The coefficients of the objective function should be nonnegative. 3. All the constraints must be of the  type.

Table 1.The Number of Variables and Constraints of the Problem A.

n 6 7 8 9 10 11 n12

The number of variables 3 6 10 15 21 28 8n60 The number of

constraints

2 4 6 9 10 11 n

Bysetting xij 1xij, the Problem A is converted into the following problem:

Problem B:

E j i x

n j

n x

x

n n n n j j

x

i n

i x t

s

x Min

ij

n

j i

ji j

i ij j

i ij n

i j

ij E j i

ij

  

 

     

    

  

 

   

  

 

   

) , ( 1

, 0

4 , , 6 , 5 8

, 1 , 2 , 3 5

4 , 3 , 2 , 1 4

. .

4 4

1 4

1 4 ) , (

Now, we can apply the additive algorithm for solving the Problem B. In Table 2, we list the optimal solutions for n6,,10.

Example 1. For n = 7, the problems A is as follows:

0 , , , , ,

1 1

1 1 .

.

37 27 26 17 16 15

37 27 17

26 16

27 26

17 16 15

37 27 26 17 16 15

 

 

 

 

  

     

x x x x x x

x x x

x x

x x

x x x t

s

x x x x x x z Max

 

(8)

. 3 , 1 ,

0 15 26 37 *

27 17

16 xxxxxz

x

This solution is corresponding to the RNA structure which is shown in Figure 2.  

 

 

Figure 2. Optimal Structure for n7.

The matrix model of this optimal structure is as follows:

Table2. The Optimal Solutions of the Problem A, for n6,,10.

n Optimal solution Optimal solution

value

6 1, 0

16 26

15 xx

x 2

7 1, 0

27 17 16 37

26

15 xxxxx

x 3 8 0 , 1 38 28 27 18 17 16 48 37 26 15           x x x x x x x x x x 4 9 0 , 0 , 1 49 39 38 29 28 27 26 19 18 17 15 59 48 37 16                x x x x x x x x x x x x x x x 4 10 0 , 0 , 0 , 1 ) 10 ( 5 ) 10 ( 4 ) 10 ( 3 ) 10 ( 2 ) 10 ( 1 59 48 39 37 29 28 26 19 18 17 16 ) 10 ( 6 49 38 27 15                      x x x x x x x x x x x x x x x x x x x x x 5                       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0

 1         2         3         4      5         6         7   

(9)

 

5.

C

ONCLUSION

Poolsap et al. in [5] represented an integer programming problem for the RNA structures. Their model is complicated with many variables. But, in here, we represented another programming problem for the RNAstructures such that the number of variables is lessthan the number of variables in [5]. The optimal RNA structure obtained by solving this problem is of arc-length 4 and has the maximum number of hydrogen bonds. Inother words, formation of these bonds stabilizes the structure by lowering its free energy over [n].If the enumeration of the feasible solutions of the zero-one linear programming problem is possible, then we are able to enumerate the RNA structures with arc-length 4over

]

[n .also, in this case, the number of optimal solutions will be equal to the number of optimal RNA structures. Therefore, a future work can be the enumeration of the RNA structures using matrix and linear programming models.

R

EFERENCES

1. E. Balas, An additive algorithm for solving linear programs with zero-one variables, Operations Research, Vol. 13, No. 4 (1965) 517546.

2. R. T. Batey, R. P. Rambo, J. A. Doudna, Tertiary motifs in RNA structure and folding, Angew. Chem. Int. Ed. 38 (1999) 23262343.

3. I. L. Hofacker, P. schuster, P. F. Stadler, Combinatorics of RNA secondary structures, Discrete Appl. Math. 88 (1998) 207237.

4. E. Y. Jin, C. M. Reidys, Combinatorial design of pseudo knot RNA, Adv. Appl. Math. 42 (2009) 135151.

Figure

Table 1.The Number of Variables and Constraints of the Problem A.

References

Related documents

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

Marketing Theory under a Service Dominant Logic Perspective”, Naples Forum on Service, June 9-12, (2015) Naples, Italy.. troviamo di fronte ad un più alto livello di

The second input parameter timeout is a Python integer which is measured in 1/10s, and represents the time of waiting for the string to be sent to the AT command interface,

It improved the concept of relations between fuzzy sets, soft sets and hybrid models of these two types of sets because the relations have been constructed based on the Q-NSSs

No studies to our knowledge have examined all three health measures (MVPA, sedentary behavior, components of sleep quality) using accelerometers to 1) compare health behaviors

For the feature ranking and selection procedures, it is not surprising that CAD score is the most relevant fea- ture for the combined strategy, as it is a complete screening strategy

Gundry ,Reading Ebook The Plant Paradox: The Hidden Dangers in "Healthy" Foods That Cause Disease and Weight Gain By Steven R., M.D.. Gundry ,Reading Ebook The Plant

Para asegurar que todos los alumnos reciben una educación matemática de calidad, “todas las partes interesadas tienen que cooperar para tener clases de matemáticas donde