Perspectives - Optimization algorithms for SVM classification

In this thesis, we have addressed for the first time the problem of classification of chromosome mating types via automatic classification methods. The problem is challenging for several reasons:

• The spatial organization of the chromosome is complex, however, so far, technology only permits the use of three loci. The chromosome is therefore mapped on a triangle (6 features: 3 distances and 3 angles). Do these features carry sufficient information to discriminate the two mating types? The results from this thesis suggest that they carry some information but probably not sufficiently to reach high level of prediction accuracy. Combination of these features have also been used but did not succeed in achieving better results.

• There are several sources of underlying uncertainty when acquiring chromosome data. The main sources are the microscope resolution and the non static behavior of the chromosome that makes precise measurements difficult. In this thesis, as a first approach, we have tried to build more robust prediction models. Unfortu- nately, the results are not yet convincing. We suspect that the robust and safe models we have used are not able to handle the extra non-linearly introduced by the worst case approach we have applied.

Further investigations are therefore needed to improve both data and prediction models. From the point of view of data, preliminary experiments have shown that the dynamics of chromosomes may actually carry more relevant information than static data. Ex- periments were conducted with simulation data. Beyond the technical challenge it may raise, the acquisition of dynamical data of chromosomes, i.e. measurements of distances and angles over several time periods, should be one of the main direction of investigation. Static conformation data could probably be improved as well by marking a fourth locus on the chromosome. However, this raises also some technical issues as the marking and measurements of 4 loci may be difficult if wavelengths are very close and partially overlap.

The design of nonlinear robust models is also an important investigation perspective. Using kernels to map the input data into a Reproducing Kernel Hilbert Space (RKHS)

done is one common solution. However, to formulate the robust counterpart of the SVM problem in the case of data uncertainties, we need to bound the uncertainties in the RKHS but kernels do not provide such information. An alternative is the use of ap- proximation of the kernel functions that make use of explicit mappings such as Random Fourier Features (RFF) [125]. In this approach, the main idea relies on the construc- tion of a randomized low dimensional feature space by randomly selecting D sinusoids from a shift invariant kernel Fourier transform that we would like to approximate. The explicit knowledge of the mapping (sinusoids) from the input space to the RFF space could help in bounding the image of the perturbations in the RFF space. Additionally, the technique works in the D-dimensional space rather than the kernel space and avoids expensive management of the large and dense kernel matrix.

In the convergence analysis of the bi-level stochastic technique we have proposed, there are also further possibilities to extend the present work. In the proof, we have assumed that the inner problem is solved to optimality at each iteration and have used its optimal value to compute the gradient of the outer objective function. In practice, as mentioned, we have only computed one step of the inner optimization and have shown results confirming that this variant also works well. The convergence results could therefore be extended to prove whether a stationary point is also reached if only one step of the inner optimization is carried out or not. It would also be interesting to prove the convergence of the bi-level stochastic procedure for non differentiable outer and inner objective functions when making use of subgradients instead of gradients.

Bibliography

[1] H. Abdi, L. J. Williams, Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, pp. 433-459, 2010.

[2] D. Aha, D. Kibler, and M. Albert, Instance-based learning algorithms, Machine learning, 6(1):37-66, 1991.

[3] D. Aha, Lazy learning: Special issue editorial. Artificial Intelligence Review, 11:7- 10, 1997.

[4] E. Aiyoshi, K. Shimizu, Hierarchical decentralized systems and its new solution by a barrier method, IEEE Transactions on Systems, Man, and Cybernetics, 11:444-449, 1981.

[5] E. Aiyoshi, K. Shimizu, A solution method for the static constrained Stackelberg problem via penalty method, IEEE Transactions on Automatic Control, 29:1111- 1114, 1984.

[6] M. Akg¨ul, Topics in Relaxation and Ellipsoidal Methods, volume 97 of Research Notes in Mathematics, Pitman, 1984.

[7] F. Alizadeh and D. Goldfarb. Second-order cone programming. Mathematical Pro- gramming, 95(1):3-51, 2003.

[8] F. A. Al-Khayal, R. Horst, P. M. Pardalos, Global optimization of concave functions subject to quadratic constraints: an application in nonlinear bilevel programming, Annals of Operations Research, 34, 125-147, 1992.

[9] F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Optimization with sparsity-inducing penalties, Foundations and Trends in Machine Learning, vol. 4, no. 1, pp. 1-106, 2011.

[10] J. Bard, A grid search algorithm for the linear bi-level programming problem, In Proceedings of the 14th Annual Meeting of the American Institute for Decision Science, pages 256-258, 1982.

[11] J. Bard, J. Falk, An explicit solution to the multi-level programming problem, Computers and Operations Research, 9:77-100, 1982.

[12] J. F. Bard, Convex two-level optimization. Mathematical Programming, 40, 15-27, 1988.

[13] J. Bard, J. Morre, A branch and bound algorithm for the bilevel programming problem, SIAM Journal on Scientific and Statistical Computing, vol. 11, No. 2, pp. 281-292, March 1990.

[14] A. Beck, M. Teboulle, Gradient-based algorithms with applications to signal recovery problems, in Convex Optimization in Signal Processing and Communications, (D. Palomar and Y. Eldar, eds.), pp. 42-88, Cambribge University Press, 2010. [15] J. Belton, B. Lajoie, S. Audibert, I. Lassadi, I. Goiddon, D. Bau, M. Marti-Renom,

K. Bystricky, J. Dekker, The Conformation of Yeast Chromosome III Is Mating Type Dependent and Controlled by the Recombination Enhancer, Cell Reports, 2015.

[16] O. Ben-Ayed, C. Blair, Computational difficulties of bilevel linear programming, Operations Research, 38:556-560, 1990.

[17] K. Bennett, J. Hu, G. Kunapuli, J. Pang, Model selection via bilevel optimization, Neural Networks, IJCNN ’06. International Joint Conference, 2006.

[18] A. Ben-Tal, A. Nemirovski, Robust convex optimization, Math. Oper. Res., 23, pp. 769-805, 1998.

[19] A. Ben-Tal, S. Boyd, A. Nemirovski, Extending scope of robust optimization: com- prehensive robust counterjparts of uncertain problems, Math. Program., Ser. B 107, 63-89, 2006.

[20] A. Ben-Tal, L. Ghaoui, A. Nemirovski, Robust optimization, Princeton university press, 2009.

[21] A. Ben-Tal, S. Bhadra, C. Bhattacharyya, J. Saketha Nath, Chance constrained uncertain classification via robust optimization, Math. Program. ser.B, 127: 145- 173, 2011.

[22] A. Ben-Tal, D. Hertog, J. Vial, Deriving robust counterparts of nonlinear uncertain inequalities, Mth. Program., Ser. A 149: 265-299, 2015.

[23] A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer Verlag, Berlin, New York, 1990.

Bibliography 67

[24] D. P. Bertsekas, J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Athena Scientific, Belmont, MA, 1997.

[25] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, second edition, 1999. [26] D. Bertsimas, M. Sim, Tractable approximations to robust conic optimization prob-

lems, Math. Program., 107, pp. 5-36, 2006.

[27] D. Bertsimas, D. Brown, C. Caramanis, Theory and applications of robust optimization, SIAM Review, vol. 53, no.3, pp. 464-501, 2011.

[28] Z. Bi, P. Calamai, A. Conn, An exact penalty function approach for the linear bilevel programming problem. Technical Report No.167-O-310789, Department of Systems Design Engineering, University of Waterloo, 1989.

[29] Z. Bi, P. Calamai, A. Conn, An exact penalty function approach for the nonlinear bilevel programming problem. Technical Report No.180-O-170591, Department of Systems Design Engineering, University of Waterloo, 1991.

[30] W. Bialas, M. Karwan, Multilevel linear programming, Technical Report 78-1, Op- erations Research Program, State University of New York at Bu alo, 1978.

[31] W. Bialas, M. Karwan, J. Shaw, A parametric complementary pivot approach for two-level linear programming, Technical Report 80-2, Operations Research Pro- gram, State University of New York at Buffalo, 1980.

[32] W. Bialas, M. Karwan, Two-level linear programming. Management Science, 30, 10041020, 1984.

[33] J. Birge, F. Louveaux, Introduction to Stochastic Programming, Springer Verlag, New York, 1997.

[34] C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1996. [35] C. Bishop, Pattern recognition and machine learning, Springer, 2007.

[36] S. Bolte, F. P. Cordelires, A guided tour into subcellular colocalization analysis in light microscopy, J. Microsc. (Oxford, UK) 224: 213-232, 2006.

[37] L. Bottou, Online Algorithms and Stochastic Approximations, In Saad, D., editor, Online Learning and Neural Networks, Cambridge University Press, Cambridge, UK, 1998.

[38] L. Bottou, Yann LeCun, Large Scale Online Learning, Advances in Neural Infor- mation Processing Systems 16, Edited by Sebastian Thrun, Lawrence Saul and Bernhard Sch olkopf, MIT Press, Cambridge, MA, 2004.

[39] L. Bottou, Yann LeCun, On-line Learning for Very Large Datasets, Applied Stochastic Models in Business and Industry, 21(2):137-151, 2005.

[40] L. Bottou, Curiously fast convergence of some stochastic gradient descent algorithms, Unpublished open problem offered to the attendance of the SLDS 2009 conference, 2009.

[41] L. Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent, Pro- ceedings of the 19th International Conference on Computational Statistics (COMP- STAT’2010), 177-187, 2010.

[42] L. Bottou, Online learning and stochastic approximations.

[43] L. Bottou, Stochastic Gradient Tricks, Neural Networks, Tricks of the Trade, Reloaded, 430-445, Edited by Grgoire Montavon, Genevieve B. Orr and Klaus- Robert Mller, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012. [44] O. Bousquet, S. Boucheron, G. Lugosi, Introduction ro statistical learning theory. [45] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press,

2003.

[46] S. Boyd, L. Xiao, A. Mutapcic, Subgradient methods, Notes for lecture, Stanford University, 2003.

[47] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-122, 2011.

[48] J. Bracken, J. McGill, Mathematical programs with optimization problems in the constraints, Operations Research, 21:37-44, 1973.

[49] L. Breiman, Random Forests, Machine Learning, vol.45(1), pp.5-32, 2001. [50] L. Breiman, Classification and regression trees. CRC Press, 1993.

[51] D.a. Bressan, J. Vazquez, Haber, J.E. Mating type-dependent constraints on the mobility of the left arm of yeast chromosome III, J. Cell Biol. 164 (3), 361-371. 2004.

[52] K. Bystricky, T. Laroche, G. van Houwe, M. Blaszczyk, S.M. Gasser, Chromosome looping in yeast: telomere pairing and coordinated movement reflect anchoring efficiency and territorial organization. J. Cell Biol. 168, 375-387, 2005.

[53] K. Bystricky, H. Van Attikum, M.-D. Montiel, V. Dion,L. Gehlen, S.M. Gasser, Regulation of nuclear positioning and dynamics of the silent mating type loci by the yeast Ku70/Ku80 complex. Mol. Cell. Biol. 29 (3), 835-848, 2009.

Bibliography 69

[54] K. Bystricky, Chromosome dynamics and folding in eukaryotes: Insights from live cell microscopy, FEBS Lett. 2015.

[55] W. Candler, R. Norton, Multilevel programming, Technical Report 20, World Bank Development Research Center, Washington D.C., 1977.

[56] W. Candler, R. Townsley, A linear two-level programming problem Computers and Operations Research, 9, 59-76, 1982.

[57] O. Chapelle, V. Vapnik, Model selection for support vector machines, Advances in Neural Information Processing Systems, 12, ed. S.A. Solla, T.K. Leen and K.-R. Muller, MIT Press, 2000.

[58] Y. Chen, M. Florian, On the geometric structure of linear bilevel programs: a dual approach. Technical Report CRT-867, Centre de Recherche sur les Transports, Universit de Montral, Montral, QC, Canada, 1992.

[59] Y. Chen, M. Florian, S. Wu, A descent dual approach for linear bilevel programs, Technical Report CRT-866, Centre de Recherche sur les Transports, Universit de Montral, Montral, QC, Canada, 1992.

[60] B. Colson, P. Marcotte, G. Savard, A trust-region method for nonlinear programming: algorithm and computational experience, Computational Optimization and Applications, 30, 2005.

[61] B. Colson, P. Marcotte, G. Savard, An overview of bilevel optimization, Annals of Operations Research, vol. 153, pp. 235-256, 2007.

[62] P. L. Combettes, Solving monotone inclusions via compositions of nonexpansive averaged operators, Optimization 53, 475-504, 2004.

[63] P. Combettes, V. Wajs, Signal recovery by proximal forward-backward splitting. 2005.

[64] P. Combettes, J.-C. Pesquet, Proximal splitting methods in signal processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185- 212, 2011.

[65] C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, 20(3): 273, 1995.

[66] N. Couellan, W. Wang, Bi-level Stochastic Gradient for Large Scale Support Vector Machine, Neurocomputing, vol. 153, pp. 300-308, 2014.

[67] N. Couellan, W. Wang, Uncertainty-safe large scale support vector machines, 2015 (submitted paper)

[68] N. Couellan, W. Wang, On the convergence of stochastic bilevel gradient methods, 2015 (submitted paper)

[69] N. Cristianini, C. Campbell, J. Shawe-Taylor, Dynamically adapting kernels in support vector machines, Advances in Neural Information Processing Systems, 11, edM. Kearns, S. A. Solla, and D. Cohn, MIT Press, pp. 204-210, 1999.

[70] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel based Learning Methods, Cambridge University Press, 2000.

[71] J. Cruz, On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions, 2014.

[72] G. Dantzig, Linear programming under uncertainty, Management Sci., 1, pp. 197- 206, 1955.

[73] J. Dekker, K. Rippe, M. Dekker, N. Kleckner, Capturing chromosome conformation. Science 295, 1306-1311, 2002.

[74] C. Do, Q. Le, C. Foo, Proximal regularization for online and batch learning, in International Conference on Machine Learning, pp. 257-264, 2009.

[75] P. Du, J. Peng, T. Terlaky, Self-adaptive support vector machines: modelling and experiments, Computing management science, vol. 6, pp. 41-51, 2009.

[76] J. Duchi, Y. Singer, Efficient online and batch learning using forward backward splitting, Journal of Machine Learning Research, 10, 2899-2934, 2009.

[77] S. Durand, J. Fadili, M. Nikolova, Multiplicative noise removal using L1 fidelity on

frame coefficients, J. Math. Imaging Vision 36, 201-226, 2010.

[78] T. Edmunds, J. Bard, Algorithms for nonlinear bilevel mathematical programming, IEEE Transactions on Systems, Man, and Cybernetics, 21:83-89, 1991.

[79] J. E. Falk, J. Liu, On bilevel programming, Part I: general nonlinear cases. Math- ematical Programming, 70, 47-72, 1995.

[80] M. Florian, Y. Chen, A bilevel programming approach to estimating O-D matrix by traffic counts, Technical Report CRT-750, Centre de Recherche sur les Transports, 1991.

[81] J. Fortuny-Amat, B. McCarl, A representation and economic interpretation of a two-level programming problem, Journal of the Operational Research Society, 32, 783-792, 1981.

Bibliography 71

[82] B. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (5814), pp.972-976, 2007.

[83] G. Gasso, A. Pappaioannou, M. Spivak, L. Bottou, Batch and online learning algorithms for nonconvex Neyman-Pearson classication, ACM Transaction on Intelligent System and Technologies, 2(3), 2011.

[84] J. Gauvin, R. Janin, Directional derivative of the value function in parametric optimization, Annals of operations research, vol. 27, pp. 237-252, 1990.

[85] L. Ghaoui, H. Lebret, Robust solutions to least-squares problems with uncertain data,

[86] P. Hansen, B. Jaumard, G. Savard, New branch-and-bound rules for linear bilevel programming, SIAM Journal on Scientific and Statistical Computing, 13, 11941217, 1992.

[87] L. Hamel, Knowledge Discovery with Support Vector Machines, Wiley, 2009. [88] J. A. Hartigan, M. A. Wong, Algorithm AS 136: A K-Means Clustering Algorithm,

Journal of the Royal Statistical Society, Series C, vol.28(1), pp.100-108, 1979. [89] T. Hastie, S. Rossett, The entire regularization path for the support vector machine,

J. Mach. Learn. Res. 5, 1391-1415, 2004.

[90] S. Haykin, Neural Networks and Learning Machines, Prentice Hall, 2008.

[91] T. Ho, The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8): 832844, 1998. [92] C. Hsieh, K. Chang, C. Lin, S. Keerthi, S. Sundararajan, A dual coordinate descent

method for large-scale linear SVM, In ICML, 2008.

[93] G. Infanger, Planning under Uncertainty: Solving Large-Scale Stochastic Linear Programs, Boyd and Fraser, San Francisco, CA, 1994.

[94] Y. Ishizuka, E. Aiyoshi, Double penalty method for bilevel optimization problems, Annals of Operations Research, 34:73-88, 1992.

[95] R. Jenatton, J. Mairal, G. Obozinski, F. Bach, Proximal methods for sparse hierarchical dictionary learning, in International Conference on Machine Learning, 2010.

[96] R. G. Jeroslow, The polynomial hierarchy and a simple model for competitive analysis, Mathematical Programming, 32, 146-164, 1985.

[97] J. Judice, A. Faustino, The solution of the linear bilevel programming problem by using the linear complementarity problem, Investigaco Operacional, 8:77-95, 1988. [98] J. Judice, A. Faustino, A sequential LCP method for bilevel linear programming,

Annals of Operations Research, 34:89-106, 1992.

[99] J. Judice, A. Faustino, The linear-quadratic bilevel programming problem, INFOR, 32:87-98, 1994.

[100] P. Kall, S. Wallace, Stochastic Programming, John Wiley, Chichester, UK, 1994. [101] S. Kulkarni, G. Harman, Statistical learning theory: A tutorial. 2011.

[102] G. Kunapuli, K. Bennett, J. Hu, J. Pang, Bilevel model selection for support vector machines, Centre de Recherches Mathematiques, CRM proceedings and lecture notes, volume 45, 2008.

[103] I. Lassadi, K. Bystricky, Tracking of single and multiple genomic loci in living yeast cells. Methods Mol. Biol. 745, 499-522, 2011.

[104] I. Lassadi, A. Kamgou´e, I. Goiffon, N. Tanguy-le-Gac, K. Bystricky, Differential chromosome conformations as hallmarks of cellular identity revealed by mathematical polymer modeling. PLoS Comput, Biol., 1-21. 2015.

[105] S. Lee, S.J. Wright, Sparse Nonlinear Support Vector Machine via Stochastic Ap- proximation, University of Wisconsin Report, 2010.

[106] B. Lemaire, The proximal algorithm. In: J.P. Penot (ed.) New Methods in Opti- mization and Their Industrial Uses, International Series of Numerical Mathematics, vol. 87, pp. 73-87. Birkhauser, Boston, MA, 1989

[107] E. S. Levitin, B. T. Polyak, Constrained minimization methods. U.S.S.R. Comput. Math.Math. Phys. 6, 1-50, 1966.

[108] A. Liaw, Package ’randomForest’ in R, 2015.

[109] G. Liu, J. Han, S. Wang, A trust region algorithm for bilevel programming problems, Chinese Science Bulletin, 43, 820-824, 1998.

[110] Z.-Q. Luo, J.-S. Pang, S. Wu, Exact penalty functions for mathematical programs and bilevel programs with analytic constraints, 1993. Preprint from the Department of Electrical and Computer Engineering, McMaster University.

[111] K. Marti, Stochastic Optimization Methods, Springer, 2005.

[112] A. K. Menon, Large-Scale Support Vector Machines: Algorithms and Theory. Research Exam, University of California, San Diego, 2009.

Bibliography 73

[113] D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, F. Leisch, C. Chang, C. Lin, Package ’e1071’ in R, 2015.

[114] A. Miele, K. Bystricky, J. Dekker, Yeast silent mating type loci form heterochro- matic clusters through silencer protein-dependent long-range interactions. PLoS Genet. 5, e1000478, 2009.

[115] L. Minchenko, A. Tarakanov, On second-order directional derivatives of value functions, Optimization journal, vol. 64, no. 2, 389-407, 2015.

[116] MOSEK-ApS. The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision 28)., 2015.

[117] K. Murphy, Machine learning: a probabilistic perspective, MIT press, 2012. [118] N. Murata, A statistical study of On-line learning, In on-line learning and Neural

Networks, Cambridge University Press, 1998.

[119] A. Nitanda, Stochastic proximal gradient descent acceleration techniques, Ad- vances in Neural Information Processing Systems 27, editor: Z. Ghahramani and M. Welling and C. Cortes and N. D. Lawrence and K. Q. Weinberger, pp. 1574-1582, 2014.

[120] B. O’Donoghue, G. Stathopoulos, S. Boyd, A splitting method for optimal control, IEEE Transactions on Control Systems Technology, 2012.

[121] B. Polyak, Introduction to Optimization, Optimization Software, Inc., 1987. [122] A. Pr´ekopa, Stochastic Programming, Kluwer Academic, Dordrecht, The Nether-

lands, 1995.

[123] J. R. Quinlan, Introduction of decision trees, Machine learning, 1(1):81-106,1986. [124] J. Quinlan, C4.5: Programs for machine learning, Morgan-Kaufmann Publishers,

1993.

[125] A. Rahimi, B. Recht, Random features for large-scale kernel machines, in Proceed- ings of the 21st Annual Conference on Advances in Neural Information Processing Systems (NIPS), 2007.

[126] B. Ripley, W. Venables, Package ’class’ in R, 2015. [127] B. Ripley, Package ’tree’ in R, 2016.

[129] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim. 14, 877-898, 1976.

[130] W. Rudin. Principles of mathematical analysis, third edition. McGraw Hill,Inc., 1976.

[131] D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation. In Parallel distributed processing: Explorations in the mircostructure of cognition, vol. I, 318-362, Bradford Books. 1986.

[132] G. Savard, J. Gauvin, The steepest descent direction for the nonlinear bilevel programming problem, Operations research letters 15, pp. 265-272, 1994.

[133] B. Sch¨olkopf, A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Cambridge University Press, 2001. [134] B. Sch¨olkopf, R. Herbrich, A. Smola, A Generalized Representer Theorem, Com-

putational Learning Theory. Lecture Notes in Computer Science. 2111: 416-426, 2001.

[135] B. Sch¨olkopf, A. Smola, Learning with Kernels, MIT, Cambridge, 2002.

[136] Shai Shalev-Shwartz, Shai Ben-David, Understanding machine learning from theory to algorithms, Cambridge university press, 2014.

[137] K. Shimizu, Two-level decision problems and their new solution methods by a penalty method, volume 2 of Control science and technology for the progress of society, pages 1303-1308. IFAC, 1982.

[138] K. Shimizu, E. Aiyoshi, A new computational method for Stackelberg and min-max problems by use of a penalty method, IEEE Transactions on Automatic Control, 26:460-466, 1981.

[139] P. Shivaswamy, C. Bhattacharyya, A. Smola, Second order cone programming approaches for handling missing and uncertain data, Journal of Machine learning research, vol. 7, 1283-1314, 2006.

[140] N. Z. Shor, Minimization Methods for Non-differentiable functions, Springer Series in Computational Mathematics, Springer, 1985.

[141] N. Shor, Nondifferentiable Optimization and Polynomial Problems, Nonconvex Optimization and its Applications, Kluwer, 1998.

[142] P. Simon, P. Houston, J. Broach, Directional bias during mating type switching in Saccharomyces is independent of chromosomal architecture, 21(9), 2002.

Bibliography 75

[143] C. Sorzano, P. Thvenaz, M. Unser, Elastic registration of biological images using vector spline regularization, IEEE transactions on biomedical engineering, VOL. 52, NO. 4, 2005.

[144] H. Stackelberg, The theory of the market economy, Oxford University Press, 1952. [145] I. Steinwart, A. Christmann, Support Vector Machines, Springer, 2008.

[146] J. F. Sturm. Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optimization Methods and Software, 1112:625-653, 1999.

[147] T. Suwuki, Dual averaging and proximal gradient descent for online alternating direction method of multiplier method, ICML, 2013.

[148] K. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, 2009.

[149] T. B. Trafalis, R.C. Gilbert, Robust classification and regression using support vector machines, European Journal of Operational Research, Vol. 173, pp. 893-909, 2006.

[150] T. B. Trafalis and R. Gilbert. Robust support vector machines for classification and computational issues. Optimization Methods and Software, 22(1):187-198, 2007. [151] H. Tuy, A. Migdalas, P. V¨arbrand, A global optimization approach for the linear

two-level program, Journal of Global Optimization, 3, 1-23, 1993.

[152] V. Vapnik, and A. Chervonenkis, A note on one class of perceptrons. Automation and Remote Control, 25, 1964.

[153] V. Vapnik, Estimation of dependencies based on empirical data, Springer-Verlag, New York, 1982.

[154] V. Vapnik, A. Chervonenkis, The necessary and sufficient conditions for consis- tency of the method of empirical risk minimization. Pattern recognition and image analysis, 1,(3), pp.284-305, 1991 (English version).

[155] V. Vapnik, Principles of risk minimization for learning theory, Advances in Neural Information Processing Systems 3. Denver, CO: Morgan Kaufmann, 1992.

[156] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [157] V. Vapnik, Statistical learning theory. Wiley-Interscience, 1998.

[158] V. Vapnik, An overview of statistical learning theory, IEEE Transactions on Neural Networks, 10(5), pp. 988-999, 1999.

1994.

[160] L. Vicente, P. Calamai, Geometry and local optimality conditions for bilevel programs with quadratic strictly convex lower levels, Technical Report No.198-O- 150294, Department of Systems Design Engineering, University of Waterloo, 1994. [161] L. Vincente, P. Calamai, Bilevel and multilevel programming: A bibliography

review, Journal of Global Optimization, 5, 291-306, 1994.

[162] W. Wang, and N. Couellan, Robust classification of large uncertain data using first order methods, 2016 (In preparation).

[163] W. Wang, K. Bystricky, A. Garivier, and N. Couellan, Automatic classification of

In document Optimization algorithms for SVM classification – Applications to geometrical chromosome analysis (Page 77-200)