• No results found

Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network

N/A
N/A
Protected

Academic year: 2022

Share "Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network

Dilpreet Kaur

Department of Computer Science and Engineering PEC University of Technology

Chandigarh, India [email protected]

Dr. Shailendra Singh

Associate Professor, Department of Computer Science and Engineering PEC University of Technology

Chandigarh, India [email protected]

Abstract- Proteins form a very important part of a living cell. The biological functions are carried out by the proteins within the cell by interacting with other proteins in other cells. This is called protein-protein interaction. Protein-Protein Interactions are very important in understanding the diseases and finding their cause. It can also provide the basis for new therapeutic approaches. A number of classifiers have been developed till date to classify protein-protein interactions namely SVM, SVM-KNN, Back- propagation Neural Network (BPNN). In this work Jordan Recurrent Neural Network (JRNN) is used to classify the protein-protein interactions. The classifier developed for this work uses amino acid composition of proteins as input to classify the percentage of interacting and non-interacting proteins.

The results obtained were best at the threshold value of zero. The classifier gives an accuracy of 97.25%

which is 8.7% more than BPNN. The overall accuracy of JRNN for threshold ranging from -1 to +1 with a difference of 0.1 comes out to be 80.1%.

Keywords: Protein-Protein Interaction; Jordan Recurrent Neural Network (JRNN); Amino acid composition; Back-Propagation Neural Network

I. INTRODUCTION

Bioinformatics is a conceptualization of biology in terms of molecules i.e. in sense of physical-chemistry and then applying informatics techniques, derived from math, computer science and statistics, to understand and organize the information associated with these molecules on a large scale. Bioinformatics is more of a tool than a discipline, the tools for analysis of Biological Data. The primary goal of bioinformatics is focus on developing and applying computationally intensive techniques (e.g. pattern recognition, data mining, machine learning algorithms and visualization) to increase the understanding of biological processes. The bioinformatics is extremely broad and is rapidly changing, particularly in recent years. The current scope of bioinformatics is mainly at bimolecular level particularly on macromolecules such as DNA, RNA. Proteomics one of the fields of bioinformatics deals with the study of proteins especially its structure and function. Proteins work in collaboration with other proteins so the main goal of proteomics is predict the proteins that interact [14].

Prediction and classification of interacting and non-interacting proteins is helpful in improving the understanding of diseases. Protein-Protein Interaction has become a very important research area now a days.

Protein-Protein Interactions occur when proteins bind together to carry out some biological function. The most important molecular process in a cell such as DNA replication is carried out by large number of protein components organizes by their protein-protein interactions [15]. Protein interactions are studied in the aspect of biochemistry, quantum chemistry, molecular dynamics, chemical biology, signal transduction and other metabolic or genetic/epigenetic networks. Most of the biological functions are performed due to the protein- protein interactions. For example, signals from the exterior of a cell are mediated to the inside of that cell by protein–protein interactions of the signaling molecules. This process, called signal transduction, plays a fundamental role in many biological processes and in many diseases.

The classification of interacting and non-interacting proteins has been done using various classifiers till date namely SVM [1], SVM-KNN [2], BPNN [3] but no classifier gave better accuracy. In this work a classifier is developed using Jordan Recurrent Neural Network (JRNN). The JRNN classifier takes amino acid composition of proteins as input. Amino acid composition has been calculated for different purposes in [4] [5]. In [4] the

(2)

local com the amino This pap classifier Protein–p identify a materials tells abou network.

A. Datase Proteins proteins.

protein d work a d DSSP [1 positive interactin dataset is patterns a B. Amino The mos can conta this kind known pr The simp formulate where R1

according

Pre- Proce

•Data desig alrea datab intera non i prote

mposition or c o acid compos er is divided r, results given protein interac and catalog p s and methods ut the data ma The phases in

et of Interacti perform their A number of databases deve dataset is dev

3]. The datase patterns wer ng residues in s used becaus are equal. The o Acid Compo t typical sequ ain its most co d of approach

roteins. Thus, plest discrete ed as follows.

1 represents th g to the amino

essing

set was gned using ady existing

bases of acting and interacting ein pairs

composition p sition.

into different n by the classi ction predictio physical intera s used to dev aterial used, am

nvolved in the

ing and Non-In r functions w f protein data eloped includ veloped from et developed re randomly n its center w

e machine-lea e dataset deve

sition Calcula uential represe omplete inform

failed to wor various discr e model is us Given a prote

he 1st residue o acid compos

Amino A Compos Calcula

•Amino a composi the datas created w calculate a perl pr already designed GPSR gr

profile of patt t sections that ifier and at the II. MATER on is a field co actions betwe

elop the class mino acid cal e development

Figure 1. Pha

Interacting Pro within a cell b

abases has be des UniProt [6

already existi have equal nu picked from while negative

arning techniq loped include ation

entation for a mation. This i rk when a que rete models we sing the amin ein sequence P

e of the protei sition (AAC) m

Acid sition ation

acid ition of set was ed using rogram d by

roup

ters is calculat t include the m e end the conc

RIALS AND ombining bioi een pairs or g sifier for prote

culation of pr t of the classif

ases of Classifier

oteins by interacting

en developed 6], SwissProt

ing databases umber of inter

the pool of e patterns con ques are more

s 753 positive protein samp is an obvious ery protein di ere proposed.

no acid comp P with L amin

n P, R2 repres model, the pro

Network Design

•Jordan Recurrent Neural Netw was designe and trained using CRAN software to classify interacting a non-interact protein pair

ted by the aut material and m clusion and fu

METHODS informatics an

roups of prot ein-protein in roteins, and ex

fier are shown

r Development

with each ot d in past years

[7], PDB [8]

s namely Pfam racting and no f interacting ntain non-inte e efficient in le e patterns and ple is its entire

advantage of id not have si position (AAC no acid residu

sents the 1st r otein P of Eq.

work ed N R

and ting rs

Tes Ne

•Af tri ne ne tes da sh ca pr int no pa

thors i.e. a pa method that a ture scope of nd structural b teins. This sec nteraction clas

xplanation of n in Figure 1.

ther and by p s by various r ], HPRD [9]

m [10], 3DID on-interacting proteins. Pos eracting resid

earning when 656 negative e amino acid the sequential ignificant hom C) to represe

es, I [16].

residue of the 1 [16] can be (

sting etwork

fter the iaining of the etwork the etwork was

sted on the te ata created to how its apability to redict

teracting and on-interacting airs

attern is repre are used to de the work is di biology in an a

ction describe ssification. Th

Jordan recurr

passing signal researchers. T

and Pfam [10 D [11], Negato g proteins, in

sitive pattern dues in its cen n negative and

patterns.

(AA) sequen l model [16].

mology to the nt protein sa

(1)

e protein P an expressed by 2)

est

d g

Post- Proce

•Predic were with t value match actual then t netwo predic accur

esented by evelop the iscussed.

attempt to es various his section ent neural

s to other The major 0]. In this ome [12], which the ns contain

nter. This d positives

nce, which However, attribute- amples, as

d so forth

essing

cted values compared the actual

s, if it hes the

l value the ork has

cted it ately

(3)

where fi (i=1,2,3,...,20) are the normalized occurrence frequencies of the 20 native amino acids in P and T the transposing operator. Accordingly, the amino acid composition of a protein can be easily derived once the protein sequencing information is known.

In this work the sequence is represented by a vector a vector of dimension 21 as used in [4] which represents twenty natural amino acids and one dummy amino acid ‘‘X’’. Amino acid composition of a pattern was computed using the following formula [4] [5]:

ܿ݋݉݌ሺ݅ሻ ൌோሺ௜ሻ (3) where comp(i) is the fraction of residue or composition of residue of type i. Ri and N are number of residues of

type i and total the number of residue in protein i (length of protein) respectively.

C. Jordan Recurrent Neural Network

The Jordan Neural Network is a simple recurrent network (SRN) developed by Michael I. Jordan [18] in 1986.

The context layer holds the previous output from the output layer and then echos that value back to the hidden layer's input. The hidden layer then always receives input from the previous iteration's output layer [17]. Jordan neural networks are generally trained using genetic, simulated annealing, or one of the propagation techniques.

Jordan neural networks are typically used for prediction. The architecture of Jordan Recurrent Neural Network is shown in Fig. 2.

In this work a Jordan Recurrent Neural Network based classifier is designed using RSNNS [19] package of CRAN R [20]. The network used five-fold cross validation to train and test the input data. The neural network used JE_BP learning function, which is a standard back-propagation training function, to train the network.

III. RESULTS

The results of the Jordan Recurrent Neural Network classifier are shown in Table I. There are a total of 1379 protein pairs that are taken out of which 753 are interacting protein pairs and 656 are non-interacting protein pairs.

Output Layer

Hidden Layer

Input Layer Context Unit

1 xt-1 xt-12 xt-13 Figure 2. Jordan Recurrent Neural Network

From the confusion matrix shown in table I the sensitivity of Jordan recurrent neural network classifier is found to be 95.09% and the specificity is 99.84%. These values show that Jordan recurrent neural network classifier can differentiate between interacting and non-interacting protein pair with high probability. The positive predictive value (PPV) and negative predictive value (NPV) are calculated to be 99.86% and 94.41%

respectively. The high values of PPV indicate that Jordan recurrent neural network classifier can correctly identify interacting protein pairs.

(4)

A. Compa The com [3] is don accuracy

The Spe Network

The Sen Network interactin

Posi

Nega

arison with Ba mparison of Jor

ne on the bas y values for JR

TABL

ecificity comp is shown in F

nsitivity comp is shown in F ng.

8 8 8 9 9 9 9 9

Percentage

itive

ative (T

ack-Propagat rdan neural n sis of sensitiv RNN and BPN

E II SPECIF

Classif Parame

BPN JRNN

parison of Jo Figure 3.

Fi

parison of Jo Figure 4. The

4 6 8 0 2 4 6 8

Positive

TP 717

FN Type II Error

37

Sensitivity=

95.09%

tion Neural Ne etwork classif vity, specificity NN.

FICITY, SENSITI

fier/

eter

Speci

NN 86 N 99.

ordan neural

igure 3. Specific

ordan neural value of sens BPNN

Clas

SP

(T

r)

S

Network fication mode y and accurac

IVITY AND AC

ficity Sens

6.0 9

84 9

network clas

ity Comparison o

network clas sitivity gives t ssifier Type

PECIFICI

Negative

FP Type I Error)

1

TN 625

Specificity=

99.84%

el is done with cy. Table II s

CURACY VALU

sitivity Ac

91.1

95.9 9

ssification mo

of BPNN and JRN

ssification mo the percentage

JRNN

TY

)

PP 99.8

NN 94.4

h Back-propa hows the spe

UES OF BPNN &

ccuracy

88.5 97.25

odel with Ba

NN

odel with Ba e of interactin SPEC

PV=

86%

NV=

41%

gation Neural cificity, sensi

& JRNN

ack-propagatio

ack-propagatio ng proteins cla

CIFICITY

l Network itivity and

on Neural

on Neural assified as

(5)

The Acc Network

B. Discus The Jord predict a classifica and non- interactin classifica The analy and non- classifica

Proteomi of living number interactio

curacy compa is shown in F

ssion dan neural net

and classify i ation model ha

-interacting p ng and non-i ation model ca ysis, interpret -interacting pr ation among in

ics is the large g organisms, a of technique ons. The tech 8 8 8 9 9 9 9 9

Percentage

84 86 88 90 92 94 96 98

Percentage

Fig

arison of Jor Fig. 5.

Fi

twork classifi interacting an as increased b protein pairs.

interacting p an correctly id tation and com rotein pairs p nteracting and IV.

e-scale study o as they are th es have been hniques devel 84

86 88 90 92 94 96 98

gure 4. Sensitivit

rdan neural n

igure 5. Accuracy

cation model nd non-intera by 8.7%. The

Jordan neura protein pairs dentify up to 9 mparison of JR prove that Jor

d non-interacti . CONCLU of proteins, pa he main com n developed

oped in past BPNN

Clas

SE

BPNN Clas

AC

ty Comparison of

network class

y Comparison of

used the ami cting protein

accuracy imp al network cl with an acc 97.25% of pro RNN with vari dan recurrent ing protein pa USION AND F

articularly the mponents of th

for the iden years are sti ssifier Type

ENSITIVI

ssifier Type

CCURAC

f BPNN and JNN

sification mo

BPNN and JNNC

ino acid comp pairs. The a provement has lassification m curacy of 97 otein pairs as p ious technique t neural netwo airs.

FUTURE SCO eir structures a

he physiologi ntification an ill far from p

JRNN

ITY

JRNN

CY

NCM

odel with Ba

CM

position of pr accuracy of J s helped to be model can cla 7.25% i.e. Jo pairs with and es for the clas ork classifier

OPE

and functions.

cal metabolic nd classificati perfect. The J

SENS

AC

ck-propagatio

rotein pairs a Jordan neural etter classify in

assify protein ordan neural

without inter sification of in

is a better m

Proteins are v c pathways of

ion of prote Jordan neural SITIVITY

CCURACY

on Neural

s input to l network nteracting n pairs as

network ractions.

nteracting method for

vital parts f cells. A in-protein l network

(6)

classification model tries to overcome this problem. The Jordan Neural Network takes amino acid composition of protein pairs to classify them interacting and non-interacting. On comparing, Jordan neural network classification model is found to have higher accuracy (97.25%) as compared to BP neural network (88.55). The percentage improvement is 8.7%.

Jordan neural network classification model outperforms the other methods for protein-protein interaction classification. Jordan neural network classification model proves to be better model with higher accuracy along with improved specificity and sensitivity than the various existing techniques.

A. Future Scope

Jordan recurrent neural network classifier the input given had almost equal positive and negative patterns. It gives the output which shows very good results nearly equal to perfect. In this model the input can be changed i.e. the input file can be altered having more negative patterns and less positive patterns as compared to the negative patterns to get better results than the results given by Jordan neural network classification model with input file having equal negative and positive patterns.

The Jordan Neural Network can also use other parameters related to proteins to predict and classify protein- protein interactions. These parameters include the six physiochemical properties of proteins namely assessable residues, buried residues, hydrophobicity, molecular weight, polarity and average area buried as used in [3].

REFERENCES

[1] Lishuang Li, Linmei Jing et. al., “Protein-Protein Interaction Extraction from Biomedical Literatures Based on Modified SVM-KNN”, IEEE International Conference on Natural Language Processing and knowledge Engineering, pp. 1-7, 2009.

[2] Hong-Wei Liu, “Protein-Protein Interaction Detection by SVM from Sequence Information”, The Third International Symposium on Optimization and Systems Biology, pp. 198-206, 2009.

[3] Zhiqiang Ma, Chunguang Zhou et. al., “Predicting Protein-Protein Interactions Based on BP Neural Network” IEEE Conference on Bioinformatics and Biomedicine Workshops, pp. 3-7, 2007.

[4] Agarwal S, Singh H et. al. “Identification of Mannose Interacting Residues Using Local Composition”, PLoS ONE, 2011.

[5] Gajendra PS Raghava, Joon H Han, “Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein”, BMC Bioinformatics, 2005.

[6] Cathy H. Wu, Rolf Apweiler, Amos Bairoch et. al., “The Universal Protein Resource (UniProt): an expanding universe of protein information”, Nucleic Acids Research,vol. 34, pp. 187–191, 2006.

[7] Amos Bairoch, Rolf Apweiler, “The SWISS-PROT protein sequence data bank and its supplement TrEMBL”, Nucleic Acids Research, vol. 25, no. 1, pp. 31–36, 1997.

[8] Helen M. Berman, John Westbrook et. al., “The Protein Data Bank”, Nucleic Acid Research, vol. 28, no.1, pp. 235-242, 2000.

[9] Suraj Peri, J. Daniel Navarro, Troels Z. Kristiansen, Ramars Amanchy et. al., “Human protein reference database as a discovery resource for proteomics”, Nucleic Acids Research, vol. 32, pp. 497-501, 2004.

[10] Robert D. Finn, John Tate et. al., “The Pfam protein families database”, Nucleic Acids Research, vol. 36, pp. 281–288, 2008.

[11] Amelie Stein, Robert B. Russell and Patrick Aloy, “3did: interacting protein domains of known three-dimensional structure”, Nucleic Acids Research, vol. 33, pp. 413–417, 2005.

[12] Pawel Smialowski, Philipp Page et. al., “The Negatome database: a reference set of non-interacting protein pairs”, Nucleic Acids Research, pp. 1–5, 2009.

[13] Kabsch W, Sander C, “Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features”, Biopolymers, pp. 2577-2637, 1983.

[14] http://en.wikipedia.org/wiki/Proteomics

[15] http://en.wikipedia.org/wiki/Protein%E2%80%93proteininteraction [16] http://en.wikipedia.org/wiki/Pseudo_amino_acid_composition [17] http://www.heatonresearch.com/wiki/Jordan_Neural_Network

[18] Jordan, M.I., “Serial Order: A parallel Distributed Processing Approach”, Tech. rep. Report, pp. 86-104, 1986.

[19] Christoph Bergmeir, José M. Benítez, “Neural Networks in R using the Stuttgart Neural Network Simulator”, Repository CRAN, 2012.

[20] W. N. Venables, D. M. Smith, “R: A Programming Environment for Data Analysis and Graphics”, Version 2.15.0, 2012.

References

Related documents

uncorrelated with the background error. Right panel, with correlated errors. the analysis error variance) is minimum. Yet, the principles and theory still hold in

related complications. Sixty percent of the neonates as- signed to early treatment received endotracheal intuba- tion and 43% received calf lung surfactant extract at a median age

In modal analyses, the dynamic characteristics of UGS tubes were investigated by taking into added mass and fluid-structure interaction models, in which natural frequencies

Therefore, this paper proposes a semi-supervised spatio-temporal convolution network for recognition of surgical workflow, taking laparoscopic cholecystectomy surgical video data as

BioMed Central World Journal of Surgical Oncology ss Open AcceCase report The Merendino procedure following preoperative imatinib mesylate for locally advanced gastrointestinal

In this review, we included 35 trials described in 45 publications because they measured the impact on test ordering behavior of CCDSSs that gave suggestions for ordering or

For example, we note that the analytical pyrolysis results of samples from object 207 [ 10 ], in which the alum content was largely ammo- nium alum, reflect a less degraded

Perumal et al BMC Pregnancy and Childbirth 2013, 13 146 http //www biomedcentral com/1471 2393/13/146 RESEARCH ARTICLE Open Access Health and nutrition knowledge, attitudes and practices