GENERAL CONCLUSIONS - Machine learning approaches for epitope prediction

Summary and discussion

Computational methods for reliably identifying potential vaccine candidates (i.e., epitopes that invoke strong response from both T-cells and B-cells) are highly desirable. Unfortunately, the predictive performance of such prediction tool is still far from ideal. Machine learning offers one of the most cost-effective and widely used approaches to developing epitope prediction tools. We have proposed several machine learning based methods for epitope prediction using only amino acid sequence information. First, we have introduced a method, BCPred, for predicting linear B-cell epitopes using the subsequence kernel. Our results have shown that BCPred significantly outperforms several other SVM based classifiers and a number of existing linear B-cell epitope prediction methods.

One of the challenges for developing reliable linear B-cell epitope predictors is how to deal with the large variability in the length of the epitopes which ranges from 3 to 30 amino acids in length. Previous machine learning based linear B-cell epitope prediction methods, including BCPred, require training and testing the classifier using sequences of fixed length. We have constructed the first flexible length linear B-cell epitopes and explored two different approaches for training classifiers using variable length amino acid sequences. Based on our results, we have proposed FBCPred, a novel method for predicting flexible length linear B-cell epitopes using the subsequence kernel. Unlike other linear B-cell epitope prediction methods, FBCPred can predict linear B-cell epitopes of virtually any specified length.

For predicting MHC-I binding peptides, matrix based methods are fairly believed to be less efficient than machine learning based methods due to the inability of many matrix based

methods to modeling the correlations between different positions in the learned model. We have presented a comparative study where we have directly compared an extensive number of machine learning and matrix based MHC-I predictors. Unlike previous comparison studies comparing different MHC-I prediction servers (note that these servers have been developed using different training data), our study provides a direct comparison of different prediction methods using a unified experimental setup (e.g., all methods were trained and evaluated using the same training and test sets, respectively). The results have shown that AOMM and SMMBin, two matrix based methods that we have proposed in this study, were highly competitive with a broad class of machine learning methods for predicting MHC-I peptides.

For predicting MHC-II binding peptides, we have shown that the performance of many MHC-II binding peptide prediction methods reported in the literature is substantially overly- optimistic because the performance of such methods had been estimated using data sets of unique peptides without applying any similarity reduction procedure to eliminate highly similar peptides. Because MHC-II peptides have lengths that vary over a broad range, similarity reduction of MHC-II peptides is not a straightforward task. We have shown that the previously reported similarity reduction methods may not eliminate highly similar peptides, i.e., peptides that share > 80% sequence identity still pass the similarity test. We have proposed a two-step similarity reduction procedure that is much more stringent than those currently in use for similarity reduction with MHC-II benchmark datasets. We have introduced three similarity- reduced MHC-II benchmark data sets derived from MHCPEP (Brusic et al.,1998), MHCBN (Bhasin et al., 2003), and IEDB (Peters et al., 2005) databases and have utilized them in our experiments to show the pitfalls of these commonly used data sets for evaluating the performance of machine learning approaches to MHC-II peptide binding predictions. Finally, we have formulated the problems of qualitatively and quantitatively predicting flexible length MHC-II peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we have introduced MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression. We have presented results of experiments using a benchmark dataset covering 13 HLA-DR and three H2-IA alleles

that showed that MHCMIR is competitive with the state-of-the-art methods for predicting MHC-II binding peptides.

It is our hope that the results of this dissertation and our freely available benchmark data sets and prediction servers will contribute to a better understanding of the dynamics of the adaptive immune system and will facilitate more advances in the epitope prediction problem which is a major challenge in Immunoinformatics.

Contributions

This dissertation has provided several contributions that can be categorized into three categorizies:

• Algorithmic

– AUC optimized matrix method (AOMM), an algorithm for finding a position specific scoring matrix (PSSM) that maximizes the AUC over the training data. – Modified PSSM (MPSSM), a variant of the PSSM method (Henikoff and Henikoff,

1996) that utilizes motif and non-motif sequences in building the PSSM from the provided training data.

– Qualitative via quantitative (QVQ), an approach for building a qualitative scoring matrix using a quantitative matrix method.

– A formalization of the problem of qualitatively and quantitatively predicting flexible length major histocompatibility complex class II (MHC- II) peptides as multiple instance learning and multiple instance regression problems, respectively.

– MILESreg, an adaptation of MILES (Chen et al., 2006) for multiple instance regression over bags of amino acid sequences.

– BCPREDS, a web server for predicting linear B-cell epitopes. The current implementation supports three B-cell epitope prediction methods: (i) BCPred (EL- Manzalawy et al., 2008d); (ii) FBCPred (EL-Manzalawy et al., 2008b); (iii) AAP (Chen et al.,2007). The server is freely accessible at http://ailab.cs.iastate. edu/bcpreds/

– MHCIPREDS, a web server for predicting MHC-I peptides using a number of qualitative and quantitative MHC-I peptide prediction methods. The current implementation of the server provides predictions for the 22 MHC-I HLA alleles. The server is freely accessible athttp://ailab.cs.iastate.edu/mhcipreds/

– MHCMIR, a web server for predicting MHC-II binding affinities using multiple instance regression. The current version supports 13 HLA-DR alleles and three mouse H2-IA alleles. The server is freely accessible athttp://ailab.cs.iastate. edu/mhcmir/

– WLSVM, a wrapper for integrating LibSVM (Chang and Lin, 2001) into Weka framework (Witten and Frank,2005). WLSVM has been contributed and integrated into Weka since version 3.5.2.

– MPSSM, a Java program implementing our proposed MPSSM method. The program is available upon request from the author.

– AOMM, a Java program implementing our proposed AOMM method. The program is available upon request from the author.

• Benchmark data sets

– BCPred data sets, 10 homology-reduced data sets used in the evaluation and implementation of BCPred method. To the best of our knowledge these are the first and only available similarity-reduced linear B-cell epitope data sets. The data sets can be downloaded from the BCPREDS server.

– FBCPred data sets, 2 flexible length linear B-cell epitope data sets used in the evaluation and implementation of FBCPred method. To the best of our knowledge

these are the first flexible length linear B-cell epitope data sets. The data sets can be downloaded from the BCPREDS web server.

– MHC-I data sets, 22 HLA MHC-I allele-specific similarity-reduced data sets. The data is available in two version, qualitative and quantitative, and can be downloaded from the MHCIPREDS web server.

– MHC-II data sets, three benchmark data sets derived from MHCPEP (Brusic et al.,1998), MHCBN (Bhasin et al.,2003), and IEDB (Peters et al.,2005) databases using different similarity reduction methods. The data sets and the similarity reduction scripts can be downloaded from PLoS ONE web site,http://www.plosone. org/article/info:doi%2F10.1371%2Fjournal.pone.0003268#s5, or be requested from the author.

Future work

This dissertation has provided new machine learning based methods for attacking three important epitope prediction related problems, predicting linear B-cell epitopes and predicting both MHC-I and MHC-II binding peptides. From our study and several other related studies, it seems that the problems of predicting MHC-II binding peptides and linear B-cell epitopes are more challenging than the problem of predicting MHC-I binding peptides. The following are some potential directions for future studies.

Predicting conformational B-cell epitopes

In this dissertation, we have focused on predicting linear B-cell epitopes using amino acid sequence information. Although it is believed that a large majority of B-cell epitopes are discontinuous (Walter,1986), experimental epitope identification has focused primarily on linear B-cell epitopes (Flower, 2007). Because the number of available antigen-antibody complexes in protein data bank (PDB) is limited, only few methods for predicting conformational B-cell epitopes using structure information have been proposed. As enough data becomes available,

the development of reliable conformational B-cell epitope prediction tools is expected to gain more interest.

Predicting sub-types of linear B-cell epitopes

One approach of improving the performance of linear B-cell epitopes is to develop predictors that focus on a sub-type of linear B-cell epitopes (e.g., predicting protective linear B-cell epitopes (S¨ollner et al.,2008;EL-Manzalawy et al.,2008c)).

Improved prediction of MHC-II binding peptides using MIL/MIR methods Our formulation of MHC-II binding peptide prediction problem as a multiple instance learning problem has opened up the possibility of adapting a broad range of multiple instance learning methods for classification and regression in this setting. Several avenues for further improving the performance of MHCMIR could be explored: i) Expanding the coverage to more MHC-II alleles; ii) Incorporating feature selection, feature abstraction, and dimensionality reduction methods to reduce redundant and irrelevant features from the meta instance data used to build the support vector regression model; iii) Exploring other regression methods (e.g. Gaussian process (MacKay,1998)) for building the regression model from the meta instance data.

Exploring the application of the approaches and methods presented in this study in several other Bioinformatics problems

Examples may include: i) The application of the scoring matrix methods, MPSSM, AOMM, and SMMBin, in any Bioinformatics application where the scoring matrix approach is appli- cable (e.g., predicting post translational modification sites (Caragea et al.,2007; Yang,2007;

Xue et al., 2008)); ii) Implementing each residue in a protein sequence as a bag of its spatial residues, where each spatial residue could be represented using a set of structure and physicochemical features. Using this representation, multiple instance learning methods may be applied for predicting functional sites using structure and sequence information (e.g., predict-

ing conformational B-cell epitopes (Kulkarni-Kale et al.,2005;Haste Andersen et al.,2006) or protein-RNA interface residues (Terribilini et al.,2006)).

APPENDIX SUPPLEMETARY MATERIALS FOR CHAPTER 2

Table A.1 Performance of different methods on our BCP18 homology-reduced data set using 5-fold cross validation. BCPred method denotes K_(4,0.5)sub .

Method ACC(%) Sn(%) Sp(%) CC AUC K₁spct 55.08 53.12 57.05 0.102 0.588 K₂spct 59.28 60.57 57.99 0.186 0.636 K₃spct 64.70 65.99 63.41 0.294 0.675 K_(3,1)msmtch 46.75 46.34 47.15 -0.065 0.465 K_(4,1)msmtch 57.79 57.72 57.86 0.156 0.599 K_(5,1)msmtch 64.77 61.92 67.62 0.296 0.691 K_(5,2)msmtch 55.01 56.50 53.52 0.100 0.568 LA 64.43 62.87 65.99 0.289 0.691 K_(2,0.5)sub 61.52 61.92 61.11 0.230 0.668 K_(3,0.5)sub 66.80 69.38 64.23 0.337 0.726 BCPred 69.04 65.72 72.36 0.382 0.751 RBF 56.98 57.99 55.96 0.140 0.601 AAP 66.94 56.91 76.96 0.346 0.699

Table A.2 Performance of different methods on our BCP16 homology-reduced data set using 5-fold cross validation. BCPred method denotes K_(4,0.5)sub .

Method ACC(%) Sn(%) Sp(%) CC AUC K₁spct 60.01 60.22 59.81 0.200 0.652 K₂spct 58.99 59.54 58.45 0.180 0.612 K₃spct 61.17 62.53 59.81 0.224 0.645 K_(3,1)msmtch 47.55 47.00 48.09 -0.049 0.460 Kmsmtch (4,1) 54.36 52.86 55.86 0.087 0.569 K_(5,1)msmtch 64.24 61.58 66.89 0.285 0.667 K_(5,2)msmtch 54.70 55.18 54.22 0.094 0.563 LA 63.15 63.49 62.81 0.263 0.686 K_(2,0.5)sub 63.76 63.62 63.90 0.275 0.681 Ksub (3,0.5) 65.53 67.98 63.08 0.311 0.718 BCPred 65.94 74.93 56.95 0.324 0.730 RBF 57.29 56.81 57.77 0.146 0.594 AAP 65.05 60.90 69.21 0.302 0.689

Table A.3 Performance of different methods on our BCP14 homology-reduced data set using 5-fold cross validation. BCPred method denotes K_(4,0.5)sub .

Method ACC(%) Sn(%) Sp(%) CC AUC K₁spct 55.66 54.76 56.56 0.113 0.582 K₂spct 57.07 56.43 57.71 0.141 0.597 K₃spct 65.75 66.71 64.78 0.315 0.675 K_(3,1)msmtch 50.06 49.23 50.90 0.001 0.506 K_(4,1)msmtch 57.07 55.53 58.61 0.142 0.596 K_(5,1)msmtch 63.11 65.04 61.18 0.262 0.649 K_(5,2)msmtch 55.98 54.24 57.71 0.120 0.574 LA 62.98 62.47 63.50 0.260 0.671 K_(2,0.5)sub 60.48 60.41 60.54 0.210 0.647 K_(3,0.5)sub 63.88 66.71 61.05 0.278 0.697 BCPred 64.78 73.14 56.43 0.300 0.733 RBF 57.65 58.35 56.94 0.153 0.603 AAP 61.38 55.40 67.35 0.229 0.665

Table A.4 Performance of different methods on our BCP12 homology-reduced data set using 5-fold cross validation. BCPred method denotes K_(4,0.5)sub .

Method ACC(%) Sn(%) Sp(%) CC AUC K₁spct 55.60 53.80 57.40 0.112 0.591 K₂spct 57.21 58.17 56.24 0.144 0.606 K₃spct 61.00 62.03 59.97 0.220 0.636 K_(3,1)msmtch 44.98 45.05 44.92 -0.100 0.450 K_(4,1)msmtch 53.47 53.80 53.15 0.070 0.548 K_(5,1)msmtch 56.89 65.51 48.26 0.140 0.594 K_(5,2)msmtch 53.41 52.38 54.44 0.068 0.535 LA 61.20 61.13 61.26 0.224 0.662 K_(2,0.5)sub 59.91 61.00 58.82 0.198 0.643 K_(3,0.5)sub 62.74 63.96 61.52 0.255 0.687 BCPred 65.83 53.80 77.86 0.326 0.709 RBF 59.33 60.36 58.30 0.187 0.620 AAP 64.22 69.50 58.94 0.286 0.663

Table A.5 BCPred predictions on RBD of SRAS-CoV S protein. Position Epitope Score

142 PFSPDGKPCTPPALNC 1 159 WPLNDYGFYTTTGIGY 0.992 105 AWNTRNIDATSTGNYN 0.974 18 PSVYAWERKKISNCVA 0.946

Table A.6 AAP predictions on RBD of SRAS-CoV S protein. Position Epitope Score

68 DSFVVKGDDVRQIAPG 1 140 NVPFSPDGKPCTPPAL 1 17 FPSVYAWERKKISNCV 1 166 FYTTTGIGYQPYRVVV 1 114 TSTGNYNYKYRYLKHG 1

Table A.7 Bepipred predictions on RBD of SARS-CoV S protein.

No. Start Position End Position Peptide Peptide Length

1 17 17 F 1 2 72 72 V 1 3 75 88 DDVRQIAPGQTGVI 14 4 93 95 YKL 3 5 110 120 NIDATSTGNYN 11 6 133 133 P 1 7 136 154 RDISNVPFSPDGKPCTPPA 19 8 166 167 FY 2 9 171 174 GIGY 4

Table A.8 ABCPred predictions on RBD region of SARS-CoV S protein. Rank Sequence Start position Score

1 MGCVLAWNTRNIDATS 100 0.93 2 TTTGIGYQPYRVVVLS 168 0.87 3 TSTGNYNYKYRYLKHG 114 0.86 4 DVRQIAPGQTGVIADY 76 0.83 4 PALNCYWPLNDYGFYT 153 0.83 5 TNLCPFGEVFNATKFP 3 0.82 5 DGKPCTPPALNCYWPL 146 0.82 5 DISNVPFSPDGKPCTP 137 0.82 5 TRNIDATSTGNYNYKY 108 0.82 6 KFPSVYAWERKKISNC 16 0.8 7 TGVIADYNYKLPDDFM 85 0.79 7 CFSNVYADSFVVKGDD 61 0.79 8 YRYLKHGKLRPFERDI 123 0.78 9 NCVADYSVLYNSTFFS 30 0.77 10 FSTFKCYGVSATKLND 44 0.76 11 ATKLNDLCFSNVYADS 54 0.74 12 FVVKGDDVRQIAPGQT 70 0.73 13 LNDYGFYTTTGIGYQP 161 0.72 14 SVLYNSTFFSTFKCYG 36 0.71 15 GEVFNATKFPSVYAWE 9 0.7 16 RVVVLSFELLNAPATV 178 0.65

Figure A.1 Analysis of RBD of SARS-CoV S protein using Parker’s hy- drophilic scale.

Figure A.3 BCPred predictions over entire SARS CoV S protein. “E” indicates that the corresponding amino acid residue lies in a predicted linear B-cell epitope. Shaded region represents the RBD region.

BIBLIOGRAPHY

Alix, A. (1999). Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine, 18:311–4.

Andrews, S., Tsochantaridis, I., and Hofmann, T. (2003). Support vector machines for multiple-instance learning. In Adv. Neural. Inf. Process. Syst. 15, pages 561–568. Cam- bridge, MA:MIT Press.

Bailey, T. and Elkan, C. (1995). Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn., 21:51–80.

Bairoch, A. and Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28:45–48.

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., and Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16:412–424. Barlow, D., Edwards, M., Thornton, J., et al. (1986). Continuous and discontinuous protein

antigenic determinants. Nature, 322:747–748.

Beniac, D., Andonov, A., Grudeski, E., and Booth, T. (2006). Architecture of the SARS coronavirus prefusion spike. Nat. Struct. Mol. Biol., 13:751–752.

Bennett, K. and Mangasarian, O. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1:23–34.

Bhasin, M. and Raghava, G. (2004). SVM based method for predicting HLA-DRB1 0401 binding peptides in an antigen sequence. Bioinformatics, 20:3.

Bhasin, M., Singh, H., and Raghava, G. (2003). MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics, 19:665–666.

Bj¨orklund, ˚A., Soeria-Atmadja, D., Zorzet, A., Hammerling, U., and Gustafsson, M. (2005). Supervised identification of allergen-representative peptides for in silico detection of poten- tially allergenic proteins. Bioinformatics, 21:39–50.

Blythe, M. and Flower, D. (2005). Benchmarking B cell epitope prediction: underperformance of existing methods. Protein Sci., 14:246–248.

Brefeld, U. and Scheffer, T. (2005). AUC maximizing support vector learning. Preceedings of ICML 2005 workshop on ROC Analysis in Machine Learning.

Breiman, L. (1996). Bagging predictors. Mach. Learn., 24:123–140.

Brusic, V., Rudy, G., Harrison, L., and Journals, O. (1998). MHCPEP, a database of MHC- binding peptides: update 1997. Nucleic Acids Res., 26:368–371.

Bui, H., Sidney, J., Peters, B., Sathiamurthy, M., Sinichi, A., Purton, K., Moth´e, B., Chisari, F., Watkins, D., and Sette, A. (2005). Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics, 57:304–314. Bulashevska, A. and Eils, R. (2006). Predicting protein subcellular locations using hierarchical

ensemble of Bayesian classifiers based on Markov chains. BMC Bioinformatics, 7:298. Burden, F. and Winkler, D. (2006). Predictive Bayesian neural network models of MHC class

II peptide binding. J. Mol. Graph. Model., 2005:481–9.

Buus, S., Lauemoller, S., Worning, P., Kesmir, C., Frimurer, T., Corbet, S., Fomsgaard, A., Hilden, J., Holm, A., and Brunak, S. (2003). Sensitive quantitative predictions of peptide- MHC binding by a’Query by Committee’ artificial neural network approach. Tissue Antigens, 62:378–384.

Cai, C., Han, L., Ji, Z., Chen, X., and Chen, Y. (2003). SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res., 31:3692–3697.

Caragea, C., Sinapov, J., Silvescu, A., Dobbs, D., and Honavar, V. (2007). Glycosylation site prediction using ensembles of support vector machine classifiers. BMC Bioinformatics, 8:438.

Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at _{http://www.csie.ntu.edu.tw/~cjlin/libsvm}.

Chang, S., Ghosh, D., Kirschner, D., and Linderman, J. (2006). Peptide length-based prediction of peptide-MHC class II binding. Bioinformatics, 22:2761.

Chen, J., Liu, H., Yang, J., and Chou, K. (2007). Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids, 33:423–428.

Chen, Y., Bi, J., and Wang, J. (2006). MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell., 28:1931–1947.

Chinnasamy, A., Sung, W., and Mittal, A. (2004). Protein structure and fold prediction using tree-augmented naive Bayesian classifier. Pac. Symp. Biocomput., 387:98.

Clark, A., Florencio, C., Watkins, C., and Serayet, M. (2006). Planar languages and learnabil- ity. International Colloquium on Grammatical Inference (ICGI06), pages 148–160.

Cui, J., Han, L., Lin, H., Tan, Z., Jiang, L., Cao, Z., and Chen, Y. (2006a). MHC-BPS: MHC- binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties. Immunogenetics, 58:607–613.

Cui, J., Han, L., Lin, H., Tang, Z., Jiang, L., Cao, Z., and Chen, Y. (2006b). MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence- derived physicochemical properties. Immunogenetics, 58:607–13.

Demˇsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30.

Dietterich, T. G., Lathrop, R. H., and Lozano-Perez, T. (1997). Solving the multiple-instance problem with axis parallel rectangles. Artif. Intell., 89:31–71.

Dimitrov, D. (2003). The secret life of ACE2 as a receptor for the SARS virus. Cell, 115:652– 653.

Dobson, P. and Doig, A. (2003). Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol., 330:771–783.

Donnes, P. and Kohlbacher, O. (2006). SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res., 34:W194.

Doytchinova, I. and Flower, D. (2003). Towards the in silico identification of class II restricted T-cell epitopes: a partial least squares iterative self-consistent algorithm for affinity prediction. Bioinformatics, 19:2263–2270.

Drosten, C., Gunther, S., Preiser, W., van der Werf, S., Brodt, H., Becker, S., Rabenau, H., Panning, M., Kolesnikova, L., Fouchier, R., et al. (2003). Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med., 348:1967.

Eisenhaber, F., Frommel, C., and Argos, P. (1996). Prediction of secondary structural con- tent of proteins from their amino acid composition alone. II. The paradox with secondary structural class. Proteins, 25:169–79.

EL-Manzalawy, Y., Dobbs, D., and Honavar, V. (2008a). On Evaluating MHC-II Binding Peptide Prediction Methods. PLoS ONE, 3.

EL-Manzalawy, Y., Dobbs, D., and Honavar, V. (2008b). Predicting flexible length linear B- cell epitopes. 7th International Conference on Computational Systems Bioinformatics, pages 121–131.

EL-Manzalawy, Y., Dobbs, D., and Honavar, V. (2008c). Predicting linear B-cell epitopes using evolutionary information. IEEE International Conference on Bioinformatics and Biomedicine.

EL-Manzalawy, Y., Dobbs, D., and Honavar, V. (2008d). Predicting linear B-cell epitopes using string kernels. J. Mol. Recognit., 21:243–255.

Emini, E., Hughes, J., Perlow, D., and Boger, J. (1985). Induction of hepatitis A virus- neutralizing antibody by a virus-specific synthetic peptide. J. Virol., 55:836–839.

Fisher, R. (1973). Statistical methods and scientific inference. Hafner Press, New York. Flower, D. (2007). Immunoinformatics: predicting immunogenicity in silico. Quantum distrib-

utor, 1st edition.

Fonseca, C. and Fleming, P. (1993). Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. Proceedings of the Fifth International Conference on Genetic Algorithms, 423:416–423.

Fouchier, R., Kuiken, T., Schutten, M., van Amerongen, G., van Doornum, G., van den Hoogen, B., Peiris, M., Lim, W., St¨ohr, K., and Osterhaus, A. (2003). Koch’s postulates fulfilled for SARS virus. Nature, 423:240.

Frank, E., Wang, Y., Inglis, S., Holmes, G., and Witten, I. (1998). Using model trees for classification. Mach. Learn., 32:63–76.

Freund, Y. and Mason, L. (1999). The alternating decision tree learning algorithm. In Pro- ceedings of the Sixteenth International Conference on Machine Learning table of contents, pages 124–133. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.

Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of 13th International Conference in Machine Learning, pages 148–156.

Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat., 11:86–92.

Fung, G., Dundar, M., Krishnapuram, B., and Rao, R. (2007). Multiple instance learning for computer aided diagnosis. In Adv. Neural Inf. Process. Syst. 19, pages 425–432. MIT Press. Garcia, J., Fierro, R., Puentes, A., Cortes, J., Bermudez, A., Cifuentes, G., Vanegas, M., and Patarroyo, M. (2007). Monosaccharides modulate HCV E2 protein-derived peptide biological properties. Biochem. Biophys. Res. Commun., 355:409–418.

Gartner, T., Flach, P., Kowalczyk, A., and Smola, A. (2002). Multi-instance kernels. Proceed- ings of the 19th International Conference on Machine Learning, pages 179–186.

G¨artner, T., Flach, P., and Wrobrl, S. (2003). On graph kernels: Hardness results and efficient alternatives. Lect. Notes Comput. Sci., 2777:129–143.

Goldman, S. and Scott, S. (2003). Multiple-instance learning of real-valued geometric patterns. Ann. Math. Artif. Intell., 39:259–290.

Gowthaman, U. and Agrewala, J. (2008). In silico tools for predicting peptides binding to HLA-class II molecules: more confusion than conclusion. J. Proteome Res., 7:154–63. Greenbaum, J., Andersen, P., Blythe, M., Bui, H., Cachau, R., Crowe, J., Davies, M., Kolaskar,

A., Lund, O., Morrison, S., et al. (2007). Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. J. Mol. Recognit., 20:75–82.

Haste Andersen, P., Nielsen, M., and Lund, O. (2006). Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci., 15:2558.

Hattotuwagama, C., Guan, P., Doytchinova, I., Zygouri, C., and Flower, D. (2004). Quantita- tive online prediction of peptide binding to the major histocompatibility complex. J. Mol. Graph. Model., 22:195–207.

Haussler, D. (1999). Convolution kernels on discrete structures. UC Santa Cruz Technical Report UCS-CRL-99-10.

Henikoff, J. and Henikoff, S. (1996). Using substitution probabilities to improve position- specific scoring matrices. Bioinformatics, 12:135–143.

Henikoff, S. and Henikoff, J. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89:10915–

In document Machine learning approaches for epitope prediction (Page 151-180)