Study on the Analysis and Optimization of Brake Disc: A Review

(1)

71 Copyright © 2016. Vandana Publications. All Rights Reserved.

Volume-6, Issue-6, November-December 2016

International Journal of Engineering and Management Research

Page Number: 71-77

A Review on Computational Methods Used in Construction of Protein

Protein Interaction Network

M. Mary Sujatha1, K. Srinivas2, R. Kiran Kumar3

1_{Research Scholar, Department of Computer Science, Krishna University, Machilipatnam, Andhra Pradesh, INDIA} 2_{Professor, Department of CSE, VR Siddhartha Engineering College, Vijayawada, Andhra Pradesh, INDIA} 3_{Assistant Professor, Department of Computer Science, Krishna University, Machilipatnam, Andhra Pradesh, INDIA}

ABSTRACT

Analysis of Protein Protein interaction Network helps in drug discovery and to predict functionality of a target protein. An approach called in silico and various other computational methodologies made a progressive milestone in proteomics. Computational approaches and methods reduce cost, expertise and time intervals when compare with other traditional approaches such as in vivo and in vitro. In this survey different computational approaches and methods such as genomic based methods, Homology based methods and biological mining techniques were discussed with their pros and cons in the area of protein protein interactions. It altogether gives an idea on available methods to build efficient protein protein interaction network for further research.

Keywords— PPI, in silico, Amino Acids (AA), STRING, SVM

I. INTRODUCTION

Each living organism starts at cell and the function of a cell monitored and governed by existence of protein. A protein is a large molecule consisting of amino acids which our body and cells in our body needs to perform its functionality. A protein can perform its function on its own or interacts with other protein to deliver its functionality. Association of protein molecules with one another can be termed as “protein-protein interactions”as this study is important in development of novel drugs for disease became subject of intense research in recent years. [1-3]. PPI data can be visualized inform of a network for better understanding of pathways. In other words protein protein interaction network is a graph like representation having set of proteins as nodes and their relation as edges. An edge between two proteins can be constructed based on structure similarity, sequence similarity, functional similarity, domain similarity, commonality in evolution processes and based on its history. In this area there are many non computational methods such as Yeast-two-hybrid (Y2H)[2], mass spectrometry [1], Tandem Affinity Purification (TAP)[1], Protein chips and other[1,3] as a part of in vitro and in vivo approaches. However they are limited with labor intensive

time consuming work and produced poor accurate experimental results. They are not applicable to all proteins in all organisms. PPI data usage in computational techniques makes better understanding of human disease at system level which can‟t be well explained through conventional methods. Recent developments in this era include diagnosis behind Alzheimer‟s [5], diabetic [6] and many other multi genetic diseases. In this paper computational methods based on genomic context, homology, and biological mining were explained. Table-1 provides overall list of available methods under specified approaches.

1.Genomic context Methods

Gene Neighborhood Gene Fusion Gene Expression Phylogentic relationship

2.Homology based methods

Sequence similarity

Sequence based statistical methods

Mirror tree Coevolutionary Divergence PIPE

Sequence based machine learning methods

Auto Covariance AA Composition

Pair wise

similarity UNISPPI AA Traid ETB-Viterbi Structure

Similarity

Structure based template methods

PRISM PrePPI

Structure based statistical methods

MEGADOCK Meta Approach Pre SPI DCC

PID Matrix score

Structure based machine

(2)

72 Copyright © 2016. Vandana Publications. All Rights Reserved.

learning

methods Domain Similarity 3.Biological

Mining Methods

Random Forest and Decision Trees Support vector machine

Bayesian classification Artificial neural networks

Table1: List of computational methods

II. METHODOLOGY

2.1 Methods based on genomic

The methods under this approach consider gene sequence as its input.

2.1.1 Gene Neighborhood

Neighboring genes in an organism consist related proteins based on functionality [7].A protein associated inside a gene infer neighborhood with other gene protein. Gene neighborhood provides strong signals for functional association between gene products among different species. Given fig1 shows set of genomes associated with neighboring proteins p1, p2, p3. This method utilizes the idea of co-localization. This method is mostly suitable for bacterial genome as we can observe gene neighboring in bacteria. Besides its simplicity it is applicable only on eukaryotic organism. The performance of gene neighboring method varies based on choice of genes.

Genome1

Genome2

Genome3

Fig1: PPI prediction by gene neighborhood approach

2.1.2 Gene Fusion

It utilizes the features of Rosetta stone method[8]. It proposes existence of single fused polypeptide causes functional based protein interactions. It is based on the concept of proteins having single domain in one organism can fuse to form a multi domain protein in other organism [17]. Fusion events are common in proteins having metabolic pathways. As gene fusion uses comparative genomics and evolutionary information, It can be treated as complement of gene neighborhood and phylogenetic profile methods. It produces reliable and informative functional relationship. Domain arrangements in different genomes can be used as information to predict protein-protein interaction, on the other side it became a pit fall of this method as it is applicable to proteins of having domain arrangement.

2.1.3 Gene Expression

Gene Expression is a process of transferring functional gene information to proteins under different experimental conditions and time intervals [9]_{. High throughput detection of the}

whole gene transcription level in an organism. Active set of genes at a given state and co regulated set of genes at other states can be determined with gene expression analysis by applying the clustering algorithms [10] on different expression genes, they can

be grouped together according to their expression levels. Gene expression defines various genes functional relationships. The gene co expression concept is an indirect way to infer the protein interaction. It suffers from cross hybridization and signal to noise ratios.

2.1.4 Phylogenetic relationship

It is used to detect presence or absence of functionally linked proteins during evolution process [11,12].

Functional linkage among genes resides together across distant spices and it would be inherrited to proteins during evolution process. A phylogenetic profile describes an occurrence of specific proteins in a set of genomes through a vector, if two proteins share the same phylogenetic profiling it indicates that the two proteins have functional linkage. In order to construct the phylogenetic profile, a predetermined threshold is used, which in turn used to detect the presence or absence of the homological proteins on the target genome with the source genomes. In Table-2 the presence/ absence of a given protein in a given genome is indicated as „„1‟‟ or „„0‟‟ at each position of a profile. On other hand phylogenetic tree [13,14] gives the evolution history of the protein. Its predictions are based on belief that the interacting proteins show similarity because of the co evolution through the interaction.

Proteins Genome1 Genome2 Genome3

P1 1 0 1

P2 1 0 0

P3 1 0 1

P4 0 1 0

Table2: functional linkage between P1 and P3.

The following drawbacks should be noted, It require high computational cost as this method is based on full genome sequence. it is not suitable for proteins where no difference can be detected from the phylogenetic profile. There is a chance of producing false correlations between two proteins if phylogenetic profile is shared. It might contribute to noise during the co evolution, such as gene duplication. There is a possible loss of gene functions in the course of evolution, which may corrupt the phylogenetic profile of single genes. Phylogenetic relation[16]

based methods showed satisfactory performance only on prokaryotes but not on eukaryotes.

2.2 Methods based on Homology

Homologous nature of a protein indicates similarity of probe at different levels in spice of origination.

2.2.1 Sequence Similarity

Sequence based prediction approach [18] is based on the concept that an interaction found in one species can be used to infer the interactions in other species. There is a possibility to consider amino acid sequence information alone in prediction of protein-protein interactions. It was mostly used in threading based approach. Recent development on this approach is based on a learning algorithm in support vector machine combined with a kernel function and a co-joint triad feature for describing amino acids. The prediction process starts with the comparison of a protein with those annotated proteins in other species. If the protein has high similarity to the sequence of protein with known function in another species, it can be assessed that the protein has either the same function or similar properties. Computational

P1 P2 P3 P4

P1 P2

P1 P3

P4 P2

P1 P4

(3)

73 Copyright © 2016. Vandana Publications. All Rights Reserved.

methods used in protein protein interaction are broadly classified

into two categories Statistical sequence based methods Machine learning based methods [19]. All these methodologies consider amino acid sequence as its input.

2.2.1.1 Sequence based statistical methods i. Mirror Tree

Mirror tree compares evolutionary distances between sequences of associated protein families with the help of phylogenetic trees in prediction of protein interactions. It is different from other tree construction methods. While tree construction it doesn‟t require existence of all sequences genomes. This approach can be further improved by increasing the number of possible interactions by collecting sequences from larger number of genomes.

ii. Co-evolutionary Divergence

Co evolutionary Divergence (CD)[20] is defined as the absolute value of the substitution rate between two proteins. It was defined based on two assumptions First, PPI pairs may have similar substitution rates. Second, protein interaction is more likely to converse across related species. CD values of interacting protein pairs are expected to be smaller than those of non interacting pairs. The CD method combines co evolutionary information of interacting protein pairs. It takes less time when compare with other methods.

iii. PIPE

An algorithm called Protein Interaction Prediction Engine (PIPE) [21] was developed by Pitre et al[22] to predict high confidence PPIs for any target pair of yeast proteins given only knowledge of their primary structure data. It is based on the assumption that interactions between proteins occur by a finite number of short polypeptide sequences observed in a database of known interacting protein pairs. It is based on sliding window approach and is computationally intensive and require huge amount of time for its implementation. PIPE shows weakness in detecting novel interactions among genome wide large scale datasets as it reported a large number of false positives. PIPE compared with other methods results limited accuracy. Having these limitations it is renovated and developed PIPE2 as an improved and more efficient version of PIPE.

2.2.1.2 Sequence bassed machine learning methods i. Auto Covariance

Guo et al[23]. proposed a sequence-based method using Auto Covariance (AC) and Support Vector Machines (SVM). Amino Acids physicochemical properties were analyzed by AC based on the calculation of covariance. A protein sequence was characterized by a series of ACs that covered the information of interactions between each Amino Acid and its 30 vicinal Amino Acids in the sequence. Finally, a SVM model with a Radial Basis Function (RBF) kernel[24] was constructed using the vectors of AC variables as input. The optimization experiment demonstrated that the interactions of one Amino Acid and its 30 vicinal Amino Acids would contribute to characterizing the PPI information. SVM has strong foundations in statistical learning theory and has been successfully applied in various classification problems. SVM offers lack of local minima in the optimization as computational advantage.

ii. AA Composition

Roy et al[25] examine the role of Amino Acid Composition (AAC) in PPI prediction and its performance against well-known features such as domains, tuple feature, and signature product feature. Every protein pair is represented by AAC and domain features. AAC is represented by monomer and

dimer features. Monomer features capture composition of individual amino acids, where as dimer features capture composition of pairs of consecutive AAs. AAC is simple feature, computationally cheep, applicable to any protein sequence, and can be used when there is lack of domain information. AAC can be used alone or combined with other features to predict new and validate existing interactions. The proposed approach was examined using three ML classifiers (logistic regression, SVM, and the Naive Bayes) on PPI datasets from yeast, worm and fly.

i. iii. Pairwise Similarity

Zaki et al[26]. proposed a PPI predictor based on pair wise similarity and protein primary structure. Each protein sequence is represented by a vector of pair wise similarities against large Amino acid subsequences created by a sliding window approach. Each coordinate of this vector is the E-value of the Smith-Waterman(SW) score. SW alignment score provides a relevant measure of similarity between proteins. Two proteins may interact by the means of the scores similarities they produce. Each sequence in the testing set is aligned against each sequence in the training set, count the number of positions that have identical Amino acids, and then divide by the total length of the alignment. These vectors are then used to compute the kernel matrix which is exploited in conjunction with a RBF-kernel SVM. This work can be improved by combining knowledge about gene ontology, inter-domain linker regions, and interacting sites to achieve more accurate prediction.

iv. UNISPPI

Universal In-silico[27] predictor of PPI was found that Amino Acid frequencies are alone sufficient to predict PPIs among Amino Acid frequencies. Advantage of this methodology includes simplicity and low computational cost.

v. AA Triad

Yu et al[28]. proposed a probability-based approach of estimating triad significance to alleviate the effect of AA distribution in nature. The Relaxed Variable Kernel Density Estimator (RVKDE)[29]_{is employed to predict PPIs based on AA}

triad information. It is summarized as follows. Each protein sequence is represented as AA triads by considering every three continuous residues in the protein. Each protein sequence is represented as AA triads by considering every three continuous residues in the protein sequence as a unit. To reduce feature dimensionality vector, the 20 AA types were categorized into seven groups based on their dipole strength and side chain volumes. The method then scans triads one by one along the sequence, and each scanned triad is counted in an occurrence vector, O. Subsequently, a significance vector, S, is proposed to represent a protein sequence by estimating the probability of observing less occurrences of each triad than the one that is actually observed in O. Each PPI pair is then encoded as a feature vector by concatenating the two significance vectors of the two individual proteins. Finally, the feature vector is used to train a RVKDE PPI predictor.

vi. ETB-Viterbi

Early Trace back is a decoding algorithm with early trace back mechanism. Viterbi algorithm[30] is dynamic programming algorithm for finding the most likely sequence of hidden states. Repeated interactions of viterbi make it slow and it is expensive in terms of memory and time.

2.2.2Structure Similarity

(4)

74 Copyright © 2016. Vandana Publications. All Rights Reserved.

and polar surface locations [31]. It determines not only whether

two proteins interact with each other but also the physical characteristics behind that interaction. In structure based approach sequence would be considered as input. Common limitations with structure based PPI prediction approach is low coverage as the number of known protein structures are smaller than number of known protein sequences. Structure based interaction prediction technique threads the sequence to all the protein complexes in the Protein Data Bank (PDB)[32] and then chooses the best potential match. Based on this match, it uses machine learning techniques to predict interaction between two proteins. Structural context based methods are sub classified as Template structure based approaches, Statistical structure based approaches, Machine learning approaches [19]

2.2.2.1 Structure based template methods i. PRISM

Tuncbag et al[33]_{developed PRISM}[34]_{as a}

template-based PPI prediction method. The two sides of a template interface are compared with the surfaces of two target monomers by structural alignment. If regions of the target surfaces are similar to the complementary sides of the template interface, then these two targets are predicted to interact with each other through the template interface architecture. Based on predefined threshold values whether a pair of proteins are interacting or not would be determined. Because PRISM is a template-based method, its prediction accuracy depends on the template dataset prepared. Only PPIs whose interaction surface structures are conserved are expected to be predicted.

ii. PrePPI

Zhang et al[35] proposed PrePPI (Predicting Protein Protein Interactions) as a structural alignment PPI predictor based on geometric relationships between secondary structure information. Given a pair of query proteins A and B, representative structures for the individual subunits (MA, MB) are taken from the PDB (Protein Data Bank) Close and remote structural neighbors are found for each subunit. A model is constructed by superposing the individual subunits, MA and MB, on their corresponding structural neighbors. The likelihood for each model to represent a true interaction is then calculated using a Bayesian Network. Finally the structure derived score is combined with non structural information, including co-expression and functional similarity, into a naive Bayes classifier. A common limitation of all structure based PPI prediction approaches is the low coverage as the number of known protein structures is much smaller than the number of known protein sequences, and therefore, such approaches fail when there is no structural template available for the queried protein pair.

2.2.2.2 Structure based statistical methods i. MEGADOCK

Ohue et al[36] developed MEGADOCK as a protein protein docking software package using the real Pairwise Shape Complementarily (rPSC) score. First, they conducted rigid body docking calculations based on a simplified energy function considering shape complementarities, electrostatics, and hydrophobic interactions for all possible binary combinations of proteins in the target set. Using this process, a group of high-scoring docking complexes for each pair of proteins were obtained, One of the limitations of this approach is the possibility of generating false-positives for the cases in which no similar structures are seen in known complex structure databases

ii. Meta Approach

Ohue et al[37] proposed a PPI prediction approach based on combining template-based(PRISM) and docking methods (MEGADOCK). A protein pair is considered to be interacting if both PRISM and MEGADOCK predict that this protein pair interacts. Meta approaches have already been used in the field of protein tertiary structure prediction. It suffers from the consensus between two prediction methods that have different characteristics.

iii. PreSPI

It is a domain combination based method [38] which considers domain interactions as basic units of protein interactions. This method considers the possibility of domain combinations appearing in both interacting and non interacting sets of protein pairs. PreSPI suffers from following limitations first, it ignores other domain-domain interaction information between protein pair. Second, it assumes that one domain combination is independent of another. Third, the method is computationally expensive as all possible domain limitations are considered.

iv. DCC

Domain cohesion and coupling [39] based PPI prediction method uses intra protein domain interactions and inter protein domain interactions. Domains cause proteins to interact regardless of number of participating domains. Protein pairs were characterized by the existence of domain in each protein.

v. PID Matrix Score

Potentially Interacting Domain (PID)[40] pair matrix is known as domain based PPI prediction algorithm. The PID matrix score was constructed as a measure of intractability between domains. PID matrix can also be used in mapping of the genome wide interaction network.

2.2.2.3 Structure based machine learning method i. Struct2Net

Singh et al[41] introduced Struct2Net [42] as a structure based PPI predictor. The method predicts interactions by threading each pair of protein sequences into potential structures in the Protein Data Bank (PDB) [32]. Given two protein sequences (or one sequence against all sequences of a species), Struct2Net threads the sequence to all the protein complexes in the PDB and then chooses the best potential match. Based on this match, it uses logistic regression technique to predict whether the two proteins interact. It is a web server that makes structure-based computational predictions of protein-protein interactions (PPIs). The input to Struct2Net is either one or two amino acid sequences in FASTA format. The output gives a list of inter actors if one sequence is provided. Structure based threading approach is used in interaction prediction if two sequences are provided.

2.2.3. Domain similarity

The functional unit of protein is domain and is distinct region of protein sequence which is mainly conserved in the process of evolution. Proteins can be characterized by combination of domains and proteins interact with each other through their domains to carry out biological functions. A protein may contain a single domain or multiple domains each one typically associated with specific function. Domain-domain interactions (DDIs)[43]_{are used to predict interactions between}

(5)

75 Copyright © 2016. Vandana Publications. All Rights Reserved.

Maximum likelihood estimation method provides interacting

domains information having consistent interactions.

2.3. Methods based on biological mining

Various binary classification methodssuch as Bayesian classifier, Random Forest, Logistic Regression, Support Vector Machines and Decision Tree have been applied for protein-protein interaction (PPI) prediction[45]. While training classifier various data sources are used to distinguish between positive examples of truly interacting protein pairs from the negative examples of non-interacting pairs. They apply the available evidence of known interacting proteins (for labeling the training data) with the indirect information such as Gene Ontology annotation, gene expression correlation, sequence homology etc. With comparison of different classifiers, it has been shown that Random Forest Decision (RFD) consistently ranks as a top classifier, followed by Support Vector Machines. In recent days active machine learning algorithms [46]_{play a key role in the}

selection of pairs of proteins for future experimental characterization in order to achieve accurate prediction of the human protein interactions.

2.3.1 Random Forest and Decision Trees

Random forest (RF)[47,48] is a technique which improves classification accuracy. The RF is a practical classifier and much suitable if there are a large dataset with number of features and no need of feature selection or feature deletion, Random Forest classifier has several advantages such as it is fast, simple, robust to outliers and noise. RF benefits from the randomization of decision trees, as they have low bias and high variance. On other side decision trees [26] are flowchart like tree structure where each node denotes a test on an attribute value, each branch represents an outcome of the test and the tree leaves represent classes or class distribution. Decision trees and random forest are widely used in bioinformatics and computational biology for biological data classification especially in prediction of protein protein interactions. RFD builds decision trees based on the input of domain composition of interacting and non-interacting proteins. It predicts whether two proteins are interacting or not by exploring all possible combinations of interacting domains. It discriminates two classes of interacting and non-interacting pairs. The method stops growing the tree as soon as all pairs at a given node are well-separated into two classes. Traversing along the tree provides a classification for an unknown protein pair. It is suffering from generalization error if forest coverage is large. Qi et. al [47] has suggested that the randomization and ensemble strategy applied in Random Forest enable them to handle noise better. However the computational cost of RF increases as the number of generated tress increases.

2.3.2Support Vector Machines (SVM)

It is a supervised learning method, mainly applied to classification and regression problems. The main idea of an SVM is to separate classes with an idea that maximizes the margins between them. SVM is also called Kernel method [49]. It classifies both linear and non linear data. It is widely used in classifying biological data as well as in prediction of protein protein interaction with the help of feed forward back propagation neural networkmethod [50]. It is based on the idea of maximizing the margins. Highly certain objects have large margins and objects with uncertain classification have small margins. SVM is more powerful in classifying arbitrary complicate problems. Negatively it is complex and need large memory requirements and it is slow to train and evaluate data. Another drawback of this classifier is it highly impacts the results by its parameters.

2.3.3Bayesian classification

Bayesian classifiers are statistical classifiers. Bayesian classifier merges data from a huge variety of sources, including some experimental results. Prior to computational predictions of a protein it is determined that likelihood of a particular potential protein interaction is a true positive outcome. Computational methods can only provide situational proof that a particular pair of proteins might interact by combining multiple noisy datasets. By the way of Bayesian classifiers reliable protein interactions can be predicted [51]. Many association methods do not consider the missing and incorrect interaction data which can be treated by using the Bayesian classifiers. Maximum Likelihood Estimation method (MLE) can be used to estimate parameters of Bayesian model. Simple Bayesian classifier known as the naïve Bayesian classifier. It is dependent on class conditional assumption and it assumes the effect of an attribute value on a given class is independent of the values of other attributes. It has been widely used in PPI prediction. It is a probabilistic classifier and a popular algorithm because of its simplicity. The simplicity lies in computational efficiency and easy to interpret. It is quite suitable to problems involving normal distributions, and is common in real world problems. it is not fit to more complex classification problems.

2.3.4 Artificial Neural Networks

Artificial Neural Networks(ANN)[52] or Simply called as NNs are based on the idea to simulate human intellectual abilities in engineering designs. One of the popular model in NN used for modeling PPI is Multi layer perception(MLP)[53]. An MLP is a feed forward artificial neural network consists of multiple layers and each layer is connected to next layer with weighted edges. These weights of edges are used to train dataset. Through artificial neural network and support vector machine is more efficient in determination of protein protein interaction network than other sequence based methods, it is suffering from poor interoperability. The drawback of MLP include it is difficult to know model parameters.

V. CONCLUSION

(6)

survey on existing computational approaches and methods

available in study of protein protein interactions for further research to invent novel methods and tools to construct PPI network by hybridization of multiple methods into one, such that more accurate network construction is possible.

REFERENCES

[1] V.Srinivasa Rao,K. Srinivas,G.N Sujini, and G.N. Sunanda Kumar, “Protein Protein Interaction Detection :Methods and analysis” Hindawi Publishing Corporation International Journal of Proteomics Volume 2014, Article ID 147648.

[2] Enright A.J., Skrabanek L. and Bader G.D. ,“Computational Prediction of Protein-Protein Interactions” Mol Biotechnol. 2008 Jan;38(1):1-17. Epub 2007 Aug 14

[3] Javad Zahiri et al,” Computational Prediction of Protein– Protein Interaction Networks: Algorithms and Resources”, Current Genomics, 2013, 14, 397-414

[4] D. Knisley, J. Knisley “Predicting protein–protein interactions using graph invariants and a neural network” www.elsevier.com/locate/compbiolchem doi:10.1016/ j.compbiolchem.2011.03.003

[5] k srinivas et al,”Protein interaction network for Alzheimer's disease using computational approach”Published online 2013 Dec 6. doi: 10.6026/ 97320630009968 PMCID: PMC3867649 [6] Dr. K. Srinivas et al,“Association Analysis of Type 2 Diabetes proteins interaction networks”, 188th OMICS group conference, 4 th international conference on Proteomics & Bioinformatics, Chicago, USA, 4-6, August 2014

[7] Rogozin BI et al,”Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes”, Brief Bioinform. 2004 Jun;5(2):131-49.

[8] Date SV “The Rosetta stone method”, Methods Mol Biol. 2008;453:169-80. doi: 10.1007/978-1-60327-429-6_7. [9] Benjamin A. Shoemaker, Anna R. Panchenko “Deciphering Protein–Protein Interactions.Part II. Computational Methods to Predict Protein and Domain Interaction Partners” PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043.

[10] Daxin Jiang, Chun Tang “Cluster Analysis for Gene Expression Data Survey “Department of Computer Science and Engineering State University of New York at Buffalo

[11] Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D.Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96(8), 4285–4288 (1999).

[12] T. W. Lin, J. W. Wu, and D. Tien-Hao Chang, “Combining phylogenetic profiling-based and machine learning-based techniques to predict functional related proteins,” PLoS ONE, vol. 8,no. 9, Article IDe75940, 2013.

[13] R. A. Craig and L. Liao, “Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices,” BMC Bioinformatics, vol. 8, article 6, 2007.

[14] K. Srinivas, A. A. Rao, G. R. Sridhar, and S. Gedela, “Methodology for phylogenetic tree construction,” Journal of Proteomics & Bioinformatics, vol. 1, pp. S005–S011, 2008. 14. S. F. Altschul, T. L. Madden, A. A. Sch¨affer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs,” Nucleic Acids Research, vol. 25, no. 17, pp.3389–3402, 1997.

[15] Sato T, Yamanishi Y, Kanehisa M “The inference of protein-protein interactions by co-evolutionary analysis is

improved by excluding the information about the phylogenetic relationships” 2005 Sep 1;21(17):3482-9. Epub 2005 Jun 30. [16] Anton J. Enright, Ioannis Iliopoulos et al, “Protein interaction maps for complete genomes based on gene fusion events”, Nature 402, 86-90 (4 November 1999) | doi:10.1038/47056

[17] Shawn M. Gomez et al,”Learning to predict protein–protein

interactions from protein sequences”,

Bioinformatics (2003) 19 (15):18751881.doi:10.1093/bioinforma tics/btg352

[18] Hamid R Arabnia “Emerging Trends in Computational Biology, Bioinformatics, and Systems “ ISBN:978-0-12-802508-6 Elsevier Inc

[19] Hsin Liu C, Li KC, Yuan S. “Human protein-protein interaction prediction by a novel sequence-based co-evolution method: co-evolutionary divergence”. 2013 Jan 1;29(1):92-8. doi: 10.1093/ bioinformatics/bts620

[20] Sylvain Pitre, Frank Dehne, Albert Chan, Jim Cheetham et al “PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known

interacting protein pairs”, BMC

Bioinformatics 2006, 7:365 doi:10.1186/1471-2105-7-365. [21] S. Pitre, F. Dehne, A. Chan, J. Cheetham, A. Duong, A. Emili, M. Gebbia, J. Greenblatt, M. Jessulat, N. Krogan, et al., “Pipe: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs,” BMC bioinformatics, vol. 7, no. 1, p. 365, 2006.

[22] Y. Guo, L. Yu, Z. Wen, and M. Li, “Using support vector machine combined with auto covariance to predict protein– protein interactions from protein sequences,” Nucleic acids research, vol. 36, no. 9, pp. 3025–3030, 2008.

[23]Yuehui Chen n , Jingru Xu, Bin Yang, Yaou Zhao “A novel method for prediction of protein interaction sites based on integrated RBF neural networks” Computers in Biology and Medicine 42 (2012) 402–407

[24] S. Roy, D. Martinez, H. Platero, T. Lane, and M. Werner-Washburne, “Exploiting amino acid composition for predicting protein-protein interactions,” PloS one, vol. 4, no. 11, p. e7813, 2009.

[25] N. Zaki, S. Lazarova-Molnar, W. El-Hajj, and P. Campbell, “Protein protein interaction based on pairwise similarity,” BMC bioinformatics, vol. 10, no. 1, p. 150, 2009.

[26] Valente et al,“The Development of a Universal In Silico Predictor of Protein-Protein Interactions” , http://dx.doi.org/10.1371/ journal. pone. 0065587

[27] C.-Y. Yu, L.-C. Chou, and D. T. Chang, “Predicting protein-protein Interactions in unbalanced data using the primary structure of proteins,” BMC bioinformatics, vol. 11, no. 1, p. 167, 2010.

[28] Y.-J. Oyang, S.-C. Hwang, Y.-Y. Ou, C.-Y. Chen, and Z.-W. Chen, “Data classification with radial basis function networks based on a novel kernel density estimation algorithm,” Neural Networks, IEEE Transactions on, vol. 16, no. 1, pp. 225–236, 2005.

[29]Colin Kern “EXPLORING LONG-RANGE FEATURES IN BIOSEQUENCES FOR STRUCTURE AND INTERACTION PREDICTION” Springer

(7)

[31] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat,

H. Weissig,I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic acids research, vol. 28, no. 1, pp. 235–242, 2000. [32] N. Tuncbag, A. Gursoy, R. Nussinov, and O. Keskin, “Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using prism,”

[33] Utkan Ogmen, Ozlem Keskin, A. Selim Aytuna “PRISM: protein interactions by structural matching” Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W331-6.

[34] Q. C. Zhang, D. Petrey, L. Deng, L. Qiang, Y. Shi, C. A. Thu, B. Bisikirska, C. Lefebvre, D. Accili, T. Hunter, et al., “Structure based prediction of protein-protein interactions on a genome-wide scale,” Nature, vol. 490, no. 7421, pp. 556–560, 2012.

[35] M. Ohue, Y. Matsuzaki, N. Uchikoga, T. Ishida, and Y. Akiyama, “Megadock: An all-to-all protein-protein interaction prediction system using tertiary structure data.” Protein and peptide letters, 2013.

[36] M. Ohue, Y. Matsuzaki, T. Shimoda, T. Ishida, and Y. Akiyama, “Highly precise protein-protein interaction prediction based on Consensus between template-based and de novo docking methods,” in BMC Proceedings, vol. 7, no. Suppl 7. BioMed Central Ltd, 2013,p. S6

[37] Han et al “PreSPI: a domain combination based prediction system for protein–protein interaction” Nucleic Acids Research, 32(21), 6312–6320. http://doi.org/10.1093/nar/gkh972

[38] Jung et al ” A Protein-Protein Interaction Prediction Method Embracing Intra-Protein Domain Cohesion Information” www.semanticscholar.org

[39] Kim et al,“Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.” Genome Inform. 2002;13:42-50.

[40] R. Singh, J. Xu, and B. Berger, “Struct2net: Integrating structure into protein-protein interaction prediction.” in Pacific Symposium on Biocomputing, vol. 11. Citeseer, 2006, pp. 403– 414.

[41] Rohit Singh , Daniel Park, Jinbo Xu “Struct2Net: a web service to predict protein–protein interactions using a structure-based approach”, W508–W515 Nucleic Acids Research, 2010, Vol. 38, Web Server issue Published online 31 May 2010 doi:10.1093/nar/gkq481

[42] Andreas Schlicker, Carola Huthmacher, Fidel Ramı´rez et al, “Functional evaluation of domain–domain interactions and human protein interaction networks” Vol. 23 no. 7 2007, pages 859–865 doi:10.1093/bioinformatics/btm012

[43] Norman F. Goodacre,a Dietlind “Protein Domains of Unknown Function Are Essential in Bacteria” mBio 5(1):e00744-13. doi:10.1128/mBio.00744-5(1):e00744-13.

[44] Li, Xiao-Li “ Biological Data Mining in Protein Interaction Networks” ISBN :978-1-60566-398-2

[45] Javad Zahiri et al “Computational Prediction of Protein– Protein Interaction Networks: Algorithms and Resources” Current Genomics, 2013, 14, 397-414

[46] Qi Y et al,“Random forest similarity for protein-protein interaction prediction from multiple sources.”, Pac Symp Biocomput. 2005:531-42.

[47] Zhu-Hong You1,2, Keith C. C. Chan “Predicting Protein-Protein Interactions from Primary Protein-Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest” DOI:10.1371/journal.pone.0125811 May 6, 2015

[48] Shao-Wu Zhang et al “Prediction of Protein–Protein Interaction with Pairwise Kernel Support Vector Machine” Int. J. Mol. Sci. 2014, 15, 3220-3233; doi:10.3390/ijms15023220 [49] Himansu Kumar , Swati Srivastava , “Determination of protein-protein interaction through Artificial Neural Network and Support Vector Machine” International Journal for Computational Biology (IJCB) Vol.3, No.2, August 2014, pp. 37-43 ISSN: 2278-8115

[50] Ronald Jansen et al, “A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data” Vol. 302, Issue 5644, pp. 449-453 DOI: 10.1126/science.1087361

[51] D.Knisley, J. Knisley “predicting protein protein interactions

using graph invariants and a neural neettwork”

www.elsevier.com/locate/ compbiochem doi:10.1016/

j.compbiocheem.2011.03.003