• No results found

siRNA Efficiency Prediction Using Support Vector Machine

N/A
N/A
Protected

Academic year: 2020

Share "siRNA Efficiency Prediction Using Support Vector Machine"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

605

siRNA Efficiency Prediction Using Support Vector Machine

Reena Murali

1

, David Peter.S

2

1 Assistant Professor , Rajiv Gandhi Institute of Technology 2,

Principal, School of Engineering, Cochin University of Science and Technology

1[email protected] 2[email protected]

Abstract – RNA Interference (RNAi) is a selective gene silencing mechanism initiated by double stranded RNA (dsRNA). The short RNA species called siRNAs are formed from dsRNA, which can degrade the messenger RNA (mRNA). This knockdown prevents mRNA from producing amino acid sequences which are responsible for gene expression. Thus siRNA alters the regulatory role of mRNA during gene expression by translational inhibition. Recent studies shows that up regulation of mRNA cause serious diseases like Cancer. So designing effective siRNA with good knockdown effects play an important role in cancer detection and diagnosis. In this work we are developing a Tool for predicting effective siRNAs using one of the machine learning techniques called Support Vector Machine.

Keywords – messenger RNA(mRNA), RNA Interference (RNAi), short interfering RNA(siRNA), double stranded RNA(dsRNA), Support Vector Machine(SVM).

I. INTRODUCTION

During gene expression DNA is first transcribed into messenger RNA(mRNA). It is in mRNA the information for a particular gene is encoded, which then acts as a template for the production of proteins. When a gene is expressed, the information encoded in a gene is converted into amino acid sequences. Thus mRNA plays a regulatory role in gene expression. But in some cases the normal gene regulatory role of mRNA is altered to cause up or down regulations. The up or down regulation of mRNA may cause several diseases like Cancer. Gene silencing is a mechanism to control the regulatory role of mRNA. Many studies show that non-protein coding RNAs such as microRNA(miRNA) and short Interfering RNA (siRNA) play an important role in gene silencing, cancer diagnosis and therapy.

RNAi is an important biological process by which selective gene silencing mechanism is initiated by double stranded RNA (dsRNA). Selective gene silencing is widely useful in gene expression analysis and functional genomic studies.

The short RNA species called siRNAs are formed from double stranded RNA (dsRNA) or are synthesized externally and then introduced into the cell. siRNA, when activated with RNA induced silencing complex(RISC) degrades complementary messenger RNA sequences. This is called mRNA knockdown by siRNA. This knockdown prevents mRNA from producing amino acid sequences which are responsible for gene expression. Thus gene expression can be altered by siRNAs which are efficient enough to do translational inhibition. So designing siRNA with good knockdown efficacy play an important role in cancer detection and diagnosis.

A. siRNA Efficacy

Numerous siRNA design tools had been introduced earlier to synthesize possible siRNAs targeting the mRNAs. But different studies indicate that out of the possible siRNAs that can be synthesized against a particular target, only a fraction of these are successful in causing any degradation and all siRNAs do not result in equal knockdown effects[1]. The efficacy of the siRNAs differed among different target sites in the same target mRNA. Therefore, it is important to select effective siRNA sequences that are highly functional in causing more than a certain percentage of the target mRNA sequence to degrade. In most studies, siRNAs causing knockdown of more than 75 percentage of the target mRNA are considered highly efficient but the threshold varies depending on the level of silencing required. Thus the goal of siRNA efficacy prediction is to aid in designing siRNA sequences that are highly efficient against their target mRNA sequences.

B. RNAi Pathway

(2)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

606 Genes causing diseases can be controlled during gene expression by transcriptional, post-transcriptional, and post-translational intervention. Drugs for disease control have been targeted towards proteins, which occurs in the post translational phase. RNAi mainly targets the protein-producing mRNA and can thereby control disease earlier in the transcription phase. RNAi has been successfully used to target diseases such as AIDS[5], neurodegenerative diseases[6], cholesterol[7] and cancer[8] on mice with the hope of extending these approaches to treat humans.

C. siRNA Biogenesis

The RNA molecule is formed from a sequence of nucleotides (nt). A nucleotide is a subunit of DNA or RNA and is made up of one of Adenine(A), Guanine(G), Cytosine(C), Uracil(U) in RNA and Adenine(A), Guanine(G), Cytosine(C), Thymine(T) in DNA, along with a phosphate molecule, and a sugar molecule. The complementary nucleotides of A, C, G and U are U, G, C and A respectively. When long dsRNA from an external source is introduced into the cell, it is recognized by Dicer, a member of the RNase III family of dsRNA specific ribonucleases. Dicer cleaves the dsRNA to produce siRNA duplexes of lengths 19 - 21 nt [9]. Each siRNA strand has a 5' phosphate group and a 3' hydroxyl group and has a 2 nt overhang at the 3' end [10].

The siRNAs created by Dicer initiated cleavage then binds with RNA induced Silencing Complex(RISC). Due to RISC activation, siRNA duplexes are unwinded and separates into sense and anti-sense strands. Both the sense and antisense strands of the siRNA are capable of directing RNAi but specificity depends on the anti-sense strand. The anti-sense strand is taken up by RISC[11]. The active RISC-siRNA complex then targets mRNA transcripts that have sequence complementary with the siRNA sequence. The targeted mRNA sequences are cleaved into smaller fragments which are then degraded[11]. This results in sequence specific removal of mRNA in targeted genes, which are then not expressed at the protein level.

II.EXISTING RULES FOR SIRNA DESIGN

Feature like Positional, thermodynamic and secondary structures of siRNAs are very much important in efficiency predicting [1],[9],[11],[12],[13]. The following sections briefly summarize the results of several studies.

A. Tuschl Rules

This technique is widely used for designing effective siR-NAs. According to this algorithm[12], synthesizing siRNA duplexes of lengths 21 nt with 19 nt base-paired sequence with 2 nt 3' overhang at both ends mediates efficient cleavage of target mRNA. The results of their study are summarized below.

 Select targeted region from an mRNA sequence beginning 50-100 nt downstream of start codon.

 First search for 23 nt sequence motif AA(N19)TT

 Search for 23 nt sequence motif NA(N21) and convert

 the 3' end of the sense siRNA to TT

 Finally search for NAR(N17)YNN, where R=A,G and Y=C,T

[image:2.612.48.290.470.659.2]

 Target sequence should have a GC content of around 50 percentage

(3)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

[image:3.612.127.489.134.584.2]

607

Fig. 2. Biogenesis of siRNA

B. Reynolds Rules

Reynolds et al.[13], analyzed a set of 180 siRNAs. They divided the siRNAs in to different groups based on their functionality to find properties with high correlation to functionality.

 < F50 - knockdown less than 50%

 > F50 - knockdown of 50% or more

 >F80 - knockdown of 80% or more

 >F95 - knockdown of 95% or more

They described a set of eight rules governing the siRNA sequence that are highly indicative in determining the extent of mRNA knockdown. These rules are listed below

 GC content between 30% and 52%

 Presence of nucleotide A at positions 3 and 19

 Presence of U at position 10

 Absence of G or C at position 19

 Absence of G at position 13

(4)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

608 This algorithm assigns a score based on the number of rules satisfied and siRNAs satisfying 6 or more rules are predicted to be functional.

C. Amarzguioui Method

Another study by Amarzguioui et al.[9] follows a similar scoring method but identified a different set of rules. They studied 46 siRNAs, and identified the following features of the 19 nt siRNA that correlates with knockdown of more than 70

 Difference in the number of A and U

 Presence of G or C at position 1

 Presence of A at position 6

 Absence of U at position 1

 Absence of G at position 19

 Presence of A/U at position 19

Each rule either adds or subtract a point when it is satisfied. Those siRNAs with a score of 3 or more are considered efficient. In this study, functionality is indicated by a knockdown of 70

D. Stockholm Rules

This prediction algorithm by Chalk et al.[14] incorporates the thermodynamic properties of the siRNA. The rules, called Stockholm rules are summarized below

 Total hairpin energy < 1

 Antisense 5 end binding energy < 9

 Sense 5 end binding energy in range 5 - 9 exclusive

 GC between 36% to 53%

 Middle (7 - 12) binding energy < 13

 Energy difference < 0

 Energy difference within -1 and 0

Using a scoring scheme that adds 1 for each rule satisfied, and a cutoff score of 6, efficient siRNAs can be detected. They further analyzed the siRNAs using the regression tree technique, but the energy parameters which were found to be statistically significant in their study did not get chosen as important features by this method.

E. Ui-Tei Rules

Ui-Tei et al.[15] analyzed 62 targets in mammalian cells and Drosophila cells and came up with four features which siRNAs should simultaneously satisfy to cause efficient silencing. These features which efficient siRNA should have are

 A/U at the 5 end of the antisense strand

 G/C at the 5 end of the sense strand

 At least five A/U bases in the 5 terminal one-third of the antisense strand

 Absence of any GC stretch of more than 9 nt in length These rules were found applicable to mammalian cells but did not apply to Drosophila cells.

F. Hseih Rules

Hseih et al.[16] identify the following features which distinguish effective and ineffective RNAi.

 Target sequences that are in the middle of the coding sequence resulted in significantly less silencing.

 Silencing by duplexes targeting the 3 untranslated region (UTR) is comparable with duplexes targeting the coding sequence.

 Pooling of four or five duplexes per gene results in highly efficient silencing.

 siRNA sequences seen to produce more than 70G or C in position 11 and T in position 19.

III. MATERIALS AND METHODS

Melting temperature is one of the important thermodynamic and stability factor of siRNAs. So the input parameters like Gibbs free energy change (delta G) and Melting temperature (Tm) were calculated according to the nearest neighbor model[17]. NetBeans IDE 6.9.1, the open-source Integrated Development Environment with GlassFish Application Server is used to develop our tool. Apache Derby Network Server is used for the implementation of servlets ans JSP. LIBSVM[18], the publicly available SVM program written in Java is used by us for solving the classification problem. It contains implementations of the linear, polynomial, radial basis function, and the sigmoid kernels.

A. Training and Testing with SVM

(5)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

609 Thus manually we could separate 359 siRNAs out of 653 from [19]. For each siRNAs the input parameters were calculated and training is done by SVM. The output is then scaled to keep features with large numerical scales with small numerical scales , to the range [0, 1]. Then testing is done with required mRNA sequences.

IV. RESULTS

The input is taken as mRNA sequences(Fig.3). SVM finally produces all possible efficient and inefficient siRNAs for a specified mRNA sequence(Fig.4). The predicted siRNAs by our tool are compared and analyzed with Ambion siRNA target finder tool.

(6)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

[image:6.612.49.561.135.529.2]

610

Fig 4. siRNA efficiency Prediction

V.DISCUSSION AND CONCLUSION

From the results it is observed that siRNA antisense stands with G-C content comes between 50-70 percentage and melting temperature 60-75 are found efficient in most cases. And on detailed analysis it is understood that all the siRNA antisense strands with G-C content between 66% to 75% and melting temperature between 70 to 75 are found efficient in all cases. Since melting temperature is one of the important thermodynamic and stability factor of siRNAs, we can come to conclusion that the efficiency of siRNA is connected with melting temperature. Thus by using Support Vector Machine we could design efficient siRNAs with good knockdown effects with melting temperature greater than 70 and GC content above 66%.

References

[1] T Holen, M Amarzguioui, MT Wiiger, E Babaie, and H Prydz

"Positional effects of short interfering rnas targeting the human coagulation trigger tissue factor " , Nucleic Acids Res, 30(8), pp. 1757 - 1766, 2002.

[2] A Fire, S Xu, MK Montgomery, SA Kostas, SE Driver, and CC Mello, "Potent and specific genetic interference by double stranded RNA in c. elegans ", Nature, vol.391, pp. 806 - 811, 1998.

[3] C Cogoni and G Macino, " Post-transcriptional gene silencing across kingdoms ", Genes Dev, vol.10, pp. 638 - 643, 2000. [4] H Shi, A Djikeng, T Mark, E Wirtz, C Tschudi, and E Ullu,

(7)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)

611

[5] MA Martinez, A Gutierrez, M Armand-Ugon, J Blanco, M Parera, J Gomez, B Clotet, and JA Este, "Suppression of chemokine receptor expression by rna interference allows for in¬hibition of hiv-1 replication ", AIDS, vol.16(18), pp. 2385 - 2390, 2002.

[6] H Xia, Q Mao, SL Eliason, SQ Harper, IH Martins, HT Orr, HL Paulson, L Yang,RM Kotin, and BL Davidson, "Rnai sup¬presses polyglutamine-induced neurodegeneration in a model of spinocerebellar ataxia " , Nature Medicine, vol.10, pp. 816 -

820, 2004.

[7] J Soutschek, A Akinc, B Bramlage, K Charisse, R Constien, M Donoghue, S El-bashir, A Geick, P Hadwiger, J Harborth, M John, V Kesavan, G Lavine, RK Pandey, T Racie, KG Ra-jeev, I Rohl, I Toudjarska, G Wang, S Wuschko, D Bumcrot, V Koteliansky, S Limmer, M Manoharan, and HP Vornlocher,

"Therapeutic silencing of an endogenous gene by systemic ad¬ministration of modified sirnas ", Nature, vol.432, pp. 173 -178, 2004.

[8] A Borkhardt, "Blocking oncogenes in malignant cells by RNA interference new hope for a highly specific cancer treatment? ",

Cancer Cell, vol.2(3), pp. 167 - 168, 2002

[9] M Amarzguioui and H Prydz, "An algorithm, for selection of functional sirna sequences ", Biochem Biophys Res Commun, vol.316(4), pp. 1050 - 1058, 2004.

[10] T Tuschl, " RNA interference and small interfering RNAs ",

Chembiochem, vol.2(4), pp. 239 - 245, 2001.

[11] J Martinez, A Patkaniowska, H Urlaub, R Luhrmann, and T Tuschl, "Single-stranded antisense sirnas guide target rna cleav- age in rnai ", Cell, vol.110(5), pp. 563 - 574, 2002.

[12] SM Elbashir, W Lendeckel, and T Tuschl, "RNA interfernce is mediated by 21 and 22nucleotide RNAs ", Genes and Devel¬opment, vol.15, pp. 188 - 200, 2001.

[13] A Reynolds, D Leake, Q Boese, S Scaring, W Marshall, and A Khvorova, " Rational siRNA design for RNA interference ",

Nature Biotechnology, vol.22(3), pp. 326 - 330, 2004.

[14] AM Chalk, CWahlestedt, and EL Sonnhammer, " Improved and automated prediction of effective sirna", Biochem. Biophys. Res. Commun , 319, pp. 264 - 274, 2004.

[15] K Ui-Tei, Y Naito, F Takahashi, T Haraguchi, H Ohki-Hamazaki, A Juni, R Ueda, and K Saigo, " Guidelines for the selection of highly effective sirna sequences for mammalian and chick rna interference ", Nucleic Acids Res., 32, pp. 936 -948, 2004. [16] AC Hsieh, R Bo, J Manola, F Vazquez, O Bare, A Khvorova, S

Scaringe, and WR Sellers, " A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens. ", Nucleic Acids Res., 32(3), pp.893 - 901, February 2004.

[17] Panjkovich A, Melo F, " Comparison of different melting tem¬perature calculation methods for short DNA sequences ",

Bioin-formatics, vol.21, pp. 711 - 722, 2005.

[18] CC Chang and CJ Lin, " Libsvm: a library for support vector machines ", http://www.csie.ntu.edu.tw/cjlin/libsvm/, 2001. [19] Svetlana A Shabalina, Alexey N Spiridonov and Aleksey Y

Figure

Fig. 1.  siRNA
Fig. 2. Biogenesis of siRNA
Fig 4.  siRNA efficiency  Prediction

References

Related documents

The paper is organized as follows: section 2 introduces methods for detecting user preferences and searching for relevant objects and it provides a theoretical model of

Wraz ze wzrostem sub−MICs, bez względu na rodzaj leku, zmniejszała się liczba bakterii wiązanych na ko− mórkach nabłonkowych.. Przyczyną tego zjawiska były zmiany w

In summary, we have presented an infant with jaundice complicating plorie stenosis. The jaundice reflected a marked increase in indirect- reacting bilirubin in the serum. However,

3: The effect of PTU-exposure (1.5, 3, and 6 ppm) and levothyroxine therapy (Hypo 6 ppm + Levo) on the distance moved (A) and the escape latency (B) of the male

19% serve a county. Fourteen per cent of the centers provide service for adjoining states in addition to the states in which they are located; usually these adjoining states have

In the other hand we notice that catalase, superoxide dismutase, glutathione reductase activity and peroxidized lipid level (TBARS or malondialdehyde) + aqueous

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

class I integrons and antibiotic resistance profile of Salmonella enterica serovars isolated from clinical specimens.. MATeRIAlS AND