Regularized Solution - Automatic Regularization for the NLDR Inverse Problem Solution

B. Automatic Regularization for the NLDR Inverse Problem Solution

B.0.2. Regularized Solution

Now, we reconsider the very common situation when the Gram matrix is ill-posed. Maybe for being rank-deficient or for having to close to zero singular values. In this case, some kind of regularization procedure has to be implemented. In this work, we avoid the original problem Ax = b, (B-8) where A = 2G 1 1> 0 , x = w −λ and b = 0 1 .

80 B Automatic Regularization for the NLDR Inverse Problem Solution

And we replace it for the regularized version

Aαx = b, (B-9) with Aα= 2G + 2α2_{I 1} 1> 0 . (B-10)

The unique solution to this last problem is λα = 2 1>_(G+α2_I)−1₁, wα = λα 2 G+α 2_I−1 1. (B-11)

At this point, two big issues rise up. First, what is the relationship between (B-11) and the optimal solution for least squares of (B-8)?, and second, how can we choose a suitable value for the parameter α?

So, for implementing this kind of regularization, in a automated way for realistic con- ditions, special attention has to be paid to the size of the solution. This is a common situation to other regularization procedures [35]. With this in mind, we propose to choose the regularization parameter by

αopt = arg min

α g(α), (B-12)

where

g(α) = kxαk2 = kwαk2+ |λα|2. (B-13)

In summary, the function g(α) is the product of the following functions: the increasing one λα and the non–increasing one

(G+α 2_I)−1₁ 2 .

C. Data Synthesis based on Direct

Interpolation Methods

In the real world, it is almost impossible to collect all the states of a particular phenomenon, for example, given a set of images from a rotating object, it is difficult and expensive to capture every angle view position. In this sense, interpolation between samples might be used to infer an unknown state [49, 88]. Then, the learning problem can be viewed as an approximation of an unknown function

X = ξ(z), (C-1)

which maps between the parameter space, z, and the sample space, X, given a set of n training samples (xi, zi) of the function ξ(z).

Then, given a set of training samples, a novel correspondence xnew at position znew in

the imposed parameter space (reference), is synthesized by learning the function ξ(.) and computing it on znew. For this propose, it can be used a strong interpolation algorithm such

as based on radial basis functions [88], neural networks and statistical learning theory [2], splines [48], among others.

Particulary, the spline methods are commonly used in fields as computer-aided design and computer graphics, because of the simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes through curve fitting. Besides, these methods have been employed to estimate and synthesis high-dimensional data [49].

The definition of a classical spline interpolation is given as follows. An interval [a, b] of a function ξ(.) is divided into sub-intervals by the introduction of knots. Knots form an increasing sequence li, where i = 0, 1, ..., g, with l0 = a and lg+1 = b. Let ξ(z) denote a

polynomial of degree r on each interval [li, li+1]. The function ξ(z) is defined as a polynomial

of degree r on each interval, that is

{ξ (z) | [li, li+1] ∈ Pr, i = 1, ..., g} , (C-2)

where Pr denotes the polynomial spline function of degree r on the interval. In this case,

ξ(z) and its derivatives up to order r − 1 are continuous on [a, b]. Thence, the spline methods (equation (C-2)) can be directly employed in the sample space X to solve a synthesis problem, finding an appropriate solution to (C-1).

Bibliography

[1] Anil K. Jain, Robert P. W. Duin, and Jianchang Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Recognition and Machine Intelligence, 22(1): 4–37, 2000.

[2] Bernhard Scholkopf and Alexander J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, USA, 2002.

[3] Genaro Daza-Santacoloma. Functional Data Representation and Discrimination Em- ploying Locally Linear Embedding. PhD thesis, Universidad Nacional de Colombia sede Manizales, Manizales, Caldas, Colombia, 2010.

[4] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. Mc Graw Hill, 2000.

[5] Andrew R. Webb. Statistical Pattern Recognition. John Wiley & Sons, Ltd, Indianapolis, IN, USA, second edition, 2002. ISBN 0-470-84513-9.

[6] I. T. Jolliffe. Principal Component Analysis. Springer series in statistics. Springer, New York, NY, USA, second edition, 2002. ISBN 0-387-95442-2.

[7] Cox T. F. and Cox M. A. Multidimensional scaling. Chapman & Hall, 1994.

[8] Germán Castellanos, Diana Mar´ın, Edwin A. Cerquera, and Edilson Delgado. Análisis computarizado de registros fonocardiográficos para la detección de soplos cardiacos. Revista Colombiana de Cardiolog´ıa, 13(3):171–179, 2006.

[9] Genaro Daza-Santacoloma, Julián D. Arias-Londo no, Juan I. Godino-Llorente, Nicolás Sáenz-Lechón, V´ıctor Osma-Ru´ız, and Germán Castellanos-Dom´ınguez. Dynamic feature extraction: An application to voice pathology detection. Intelligent Automation and Soft Computing, 2009.

[10] Genaro Daza-Santacoloma, Luis Gonzalo Sánchez Giraldo, Franklin A. Sepúlveda, and Germán Castellanos-Dom´ınguez. Acoustic feature analysis for hypernasality detection in children. In Nilmini Wickramasinghe and Eliezer Geisler, editors, Encyclopedia of Healthcare Information Systems. Idea Group, Inc., 2008.

Bibliography 83

[11] Germán Castellanos, Genaro Daza-Santacoloma, Luis Sánchez, Omar Castrillón, and Julio Suárez. Acoustic speech analysis for hypernasality detection in children. In 28th IEEE EMBS Annual International Conference, pages 5507–5510, New York, NY, USA, 2006.

[12] Augusto Salazar, Genaro Daza-Santacoloma, Luis S´anchez, Flavio Prieto, Germ´an Castellanos, and Colombia Quintero. Feature extraction and lips posture detection oriented to the treatment of clp children. In 28th IEEE EMBS Annual International Conference, pages 5747 – 5750, New York, NY, USA, 2006.

[13] A. Rodr´ıguez-Sánchez, Edilson Delgado-Trejos, Álvaro Orozco-Gutiérrez, Germán Castellanos-Dom´ınguez, and Enrique Guijarro-Estell. Nonlinear dynamics techniques for the detection of the brain areas using mer signals. In International Conference on BioMedical Engineering and Informatics, pages 198–202, 2008.

[14] Liang Wang, Guoying Zhao, Li Cheng, and Matti Pietikainen. Machine Learning for Vision-Based Motion Analysis. Springer, first edition, 2011.

[15] Lin Yang, Wenjin Chen, Peter Meer, Gratian Salaru, Michael D. Feldman, and David J. Foran. High throughput analysis of breast cancer specimens on the grid. In 10th Inter- national Conference on Medical Image Computing and Computer Assisted Intervention, 2007.

[16] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2322, 2000.

[17] Kilian Q. Weinberger and Lawrence K. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In 21st National Conference on Artificial Intelligence, 2006.

[18] Lawrence K. Saul and Sam T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Machine Learning Research, 4:119–155, 2003.

[19] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.

[20] Changshui Zhang, Jun Wang, Nanyuan Zhao, and David Zhang. Reconstruction and analysis of multi-pose face images based on nonlinear dimensionalityreduction. Patter Recognition, 37(3):325–336, 2004.

[21] Samuel Kadoury and Martin D. Levine. Face detection in gray scale images using locally linear embeddings. Computer Vision and Image Understanding, 105:1–20, 2007.

[22] Yaozhang Pan, Shuzhi Sam Ge, and Abdullah Al Mamun. Weighted locally linear embedding for dimension reduction. Pattern Recognition, 42:798–811, 2009.

84 Bibliography

[23] Genaro Daza-Santacoloma, Carlos D. Acosta-Medina, and Germ´an Castellanos- Dom´ınguez. Regularization parameter choice in locally linear embedding. Neurocomput., 73, 2010.

[24] Olga Kouropteva, Oleg Okun, and Matti Pietikainen. Supervised locally linear embedding algorithm for pattern recognition. In IbPRIA, LNCS 2652, pages 386–394, 2003.

[25] Dick de Ridder, Olga Kouropteva, Oleg Okun, Matti Pietikainen, and Robert P. W. Duin. Supervised locally linear embedding. In International Conference on Artificial Neural Networks, 2003.

[26] Abhinav Gupta, Francine Chen, Don Kimber, and Larry S Davis. Context and obser- vation driven latent variable model for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[27] Raquel Urtasun, David J. Fleet, Andreas Geiger, Jovan Popovic, Trevor J. Darrell, and Neil D. Lawrence. Topologically-constrained latent variable models. In Proceedings of the 25th International Conference on Machine Learning, pages 1080–1087, 2008. [28] M. Aharon and R. Kimmel. Representation analysis and synthesis of lip images using

dimensionality reduction. International Journal of Computer Vision, 67:297–312, 2006. [29] Ahmed Elgammal and Chan su Lee. Separating style and content on a nonlinear manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pages 478–485, 2004.

[30] J. A. K. Suykens, V. T. Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle. Least squares support vector machines. World Scientific, 2002.

[31] Gene H. Golub, Michael Heath, and Grace Wahba. Generalized crossvalidation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979. [32] Christian Hansen, James Nagy, and Dianne Oleary. Deblurring Images: Matrices, Spec-

tra, and Filtering. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2006.

[33] Alain Rakotomamonjy, Francis R. Bach, St´ephane Canu, and Yves Grandvalet. Sim- pleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008.

[34] M. Torki, A. Elgammal, and Chan Su Lee. Learning a joint manifold representation from multiple data sets. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), pages 1068 –1071, 2010.

Bibliography 85

[35] Per Christian Hansen. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, 2000.

[36] Daniel Peña. Análisis de datos multivariantes. McGraw-Hill, Madrid, España, 2002. [37] Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. [38] Raquel Urtasun, David J. Fleet, and Neil D. Lawrence. Modeling human locomotion with topologically constrained latent variables models. In Proceedings of the 2th Con- ference on Human motion: understanding, modeling, capture and animation, pages 104–118, 2007.

[39] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.

[40] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14, pages 585–591, 2001.

[41] Lawrence K. Saul and Sam T. Roweis. An introduction to locally linear embedding. Technical report, AT&T Labs and Gatsby Computational Neuroscience Unit, 2000. [42] Marzia Polito and Pietro Perona. Grouping and dimensionality reduction by locally

linear embedding. In NIPS, 2001.

[43] Mikhail Belkin and Partha Niyogi. Convergence of laplacian eigenmaps. Statistics, 19: 1–31, 2008.

[44] Dick de Ridder and Robert P. W. Duin. Locally linear embedding for classification. Technical report, Pattern Recongnition Group, Delft University of Technology, Delft, The Netherlands, 2002.

[45] Per Christian Hansen. Analysis of discrete ill-posed problems by means of the L-Curve. SIAM, 34(4):561–580, 1992.

[46] Andrés Álvarez-Meza, Juliana Valencia-Aguirre, Genaro Daza-Santacoloma, and Germán Castellanos-Dom´ınguez. Global and local choice of the number of nearest neighbors in locally linear embedding - (in press). Pattern Recognition Letters, pages –, 2011.

[47] Juliana Valencia-Aguirre, Andrés Álvarez-Mesa, Genaro Daza-Santacoloma, and Germán Castellanos-Dom´ınguez. Automatic choice of the number of nearest neighbors in locally linear embedding. In CIARP’09, pages 77–84, 2009.

86 Bibliography

[49] Frank Y. Shih, Camel Y. Fu, and Kai Zhang. Multi-view face identification and pose estimation using b-spline interpolation. Information Sciences, 169:189–204, 2005. [50] Methaprayoon K., Lee W. J., Rasmiddatta S., Liao J., and Ross R. Multi-stage artificial

neural network short-term load forecasting engine with front-end weather forecast. IEEE Trans. Sys. Ind. Appl., pages 1410–1416, 2007.

[51] Maier H. and Dandy G. Neural networks for the prediction and forecasting of water re- sources variables: a review of modeling issues and applications. Environmental Modeling and Software, 15:101–124, 2000.

[52] Leng X and Miller H. Input dimension reduction for load forecasting based on support vector machines. In Proceedings of the IEEE International Conference on Electric Utility Deregulation, Restructuring and Power Technologies, 2004.

[53] V. Cherkassky and Y. Ma. Practical selection of svm parameters and noise estimation for svm regression. Neural Networks, 17:113–126, 2004.

[54] Zhou L., Yang H., and Liu C. QPSO-based hyper parameters selection for ls-svm regression. In Fourth International Conference on Natural Computation, 2008.

[55] Kennedy J. and Eberhart R. Particle swarm optimization. In IEEE International Conference on Neural Networks, 1995.

[56] Liu J., Xu W., and Sun J. Optimization with mutation operator. In Proceddings of the 17th IEEE Int. Conf. on Tools with Artificial Intelligence, 2005.

[57] Xu Peiliang. Iterative generalized cross-validation for fusing heteroscedastic data of inverse ill-posed problems. Geophys Journal International, 179:182–200, 2009.

[58] Nhat Nguyen, Peyman Milanfar, Senior Member, and Gene Golub. Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on Image Processing, 10(9), 2001.

[59] BOGGS P.T. and TOLLE J.W. Sequential quadratic programming for large-scale nonlinear optimization. Journal of Computational Application Mathematics, 124:123–137, 2000.

[60] Powell M.J.D. A fast algorithm for nonlinearly constrained optimization calculations. Lecture Notes in Mathematics, 630, 1978.

[61] Sheather S. J. Density estimation. Statistical Sci., 19, 2004.

[62] Weifeng Liu, Puskal P. Pokharel, and Jose C. Principe. Correntropy: Properties and applications in non-gaussian signal processing. IEEE Transactions on Signal Processing, 55(11):5286–5298, November 2007.

Bibliography 87

[63] Jose C. Principe, Dongxin Xu, and John W. Fisher III. Information theoretic learning. In S. Haykin, editor, Unsupervised Adaptive Filtering, chapter 7. John Wiley & Sons, New York, 2000.

[64] Cheng Y. Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research, 28:1797–1808, 1998.

[65] Thank A. M. Daily dataset of 20th-century surface air temperature and precipitation series for the european climate assessment. Journal of Climatology, 22:1441–1453, 2002. [66] R.Gross and J. Shi. The CMU motion of body database. Technical report, Carnegie

Mellon University, 2001.

[67] Sammer A. Nene, Shree K. Nayar, and Hiroshi Murase. Columbia object image library: Coil-100. Technical report, Department of Computer Science, Columbia University, New York, 1996.

[68] Chieh Wang. CMU/VASC database. Technical report, Carnegie Mellon University, 2006.

[69] Gering David. Linear and nonlinear data dimensionality reduction. Technical report, The Massachusettes Institute of Technology, 2002.

[70] M. Lewandowski, J. Martinez-del Rincon, D. Makris, and J.-C. Nebel. Temporal exten- sion of laplacian eigenmaps for unsupervised dimensionality reduction of time series. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), pages 161 –164, 2010.

[71] Bo Long and et al. A general model for multiple view unsupervised learning. In Pro- ceedings of the 8th SIAM International Conference on Data Mining, pages 822–833, 2008.

[72] Tian Xia, Dacheng Tao, Tao Mei, and Yongdong Zhang. Multiview spectral embedding. Trans. Sys. Man Cyber. Part B, 40:1438–1446, 2010. ISSN 1083-4419.

[73] Marilena Pillati and Cinzia Viroli. Supervised locally linear embedding for classification: An application to gene expression data analysis. In Book of Short Papers, CLADAG 2005, pages 147–150, Parma, Italy, 2005.

[74] Marco Loog and Dick de Ridder. Local discriminant analysis. In The 18th International Conference on Pattern Recognition, 2006.

[75] Junping Zhang, Huanxing Shen, and Zhi hua Zhou. Unified locally linear embedding and linear discriminant analysis algorithm for face recognition. In Advances in Biometric Personal Authentication. LNCS, pages 209–307, 2004.

88 Bibliography

[76] Haitao Zhao, Shaoyuan Sun, Zhongliang Jing, and Jingyu Yang. Local structure based supervised feature extraction. Pattern Recognition, 39(8):1546–1550, 2006.

[77] Jianzhong Hu Quansheng Jiang, Minping Jia and Feiyun Xu. Machinery fault diagnosis using supervised manifold learning. Mechanical Systems and Signal Processing, 23:2301– 2311, 2009.

[78] G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20(16):2626–2635, 2004. [79] Mehmet Gonen and Ethem Alpaydin. Localized multiple kernel regression. In Proceed- ings of the 20th International Conference on Pattern Recognition (ICPR), pages 1425 –1428, 2010.

[80] Christian Schuldt, Ivan Laptev, and Barbara Caputo. Recognizing human actions: A local svm approach. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), ICPR ’04, pages 32–36. IEEE Computer Society, 2004. ISBN 0-7695-2128-2.

[81] Yann Lecun and Corinna Cortes. The MNIST database of handwritten digits.

[82] Gorman R. P. and Sejnowski T. J. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 1:75–89, 1988.

[83] Ferdinando Samaria and Andy Harter. Parameterisation of a Stochastic Model for Human Face Identification. In Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, 1994.

[84] Juan Rafael Orozco, Santiago Murillo Rendón, Andrés Marino Álvarez Meza, Julián David Arias Londo no, Edilson Delgado Trejos, Jesús Francisco Vargas Bonilla, and Germán Castellanos Dom´ınguez. Automatic selection of acoustic and non-linear dynamic features in voice. In Proceedings of the Annual Conference of the International Speech Communication Association - INTERSPEECH, 2011.

[85] Luis David Avenda no, Germ´an Castellanos Dom´ınguez, and Juan Ignacio Godino Llorente. Feature extraction from parametric time frequency representations for heart murmur detection. In Proceedings of the Annals Of Biomedical Engineering, 2010. [86] Douglas C. Montgomery and George C. Runger. Applied Statistics and Probability for

Engineers. John Wiley & Sons, third edition, 2003.

[87] Alvin C. Rencher. Methods of multivariate analysis. Wiley-Interscience, Hoboken, NJ, USA, second edition, 2002. ISBN 0-471-41889-7.

[88] Tony Ezzat and Tomaso Poggio. Facial analysis and synthesis using image-based models. In Second International Conference on Automatic Face and Gesture Recognition, 1996.

In document Nonlinear dimensionality reduction frameworks to support machine learning systems = esquemas de reducción de dimensión no lineal para apoyar sistemas de aprendizaje de máquina (Page 97-106)