IERI Procedia 2 (2012) 873 – 879
2212-6678 © 2012 Published by Elsevier B.V. Selection and peer review under responsibility of Information Engineering Research Institute doi:10.1016/j.ieri.2012.06.185 Abst Re super scatt of ve tradit samp varia (FLD effec mem defin can effec © 20 Sele Keyw
2012
Fac
tract ecently, linear rvised method, er matrix and a ectors is high a tional LDA is t ple size (SSS) p ations in illumin DA) algorithm cts to obtain th mbership degree nition of the La be obtained b ctiveness of the 012 Published ection and pee words: LDA; FaceInternation
ce Reco
D
bZhejian discriminant an aims to find th at the same tim and the numberthat it fails to w problems. In th nation conditio is proposed, in he correct loca e matrix is fir aplacian scatter by solving a g proposed meth d by Elsevier B r review unde e Recognition; F
nal Conferen
ognitio
Discrim
aWu ng University of M E nalysis (LDA) he optimal set o e minimise the of observation work when the whe real-world ap ns and differen n which the fu al distribution rstly calculated matrix to obta generalised eig hod. B.V. er responsibilit KNN; FLDA;
nce on Futu
on base
minant
Genyuan
uhan University o Media and CommEmail:zgy871 was proposed of projection ve determinant of ns is small,usua within-class sca pplications, the nt facial express uzzy k-nearest information to d using FKNN ain the fuzzy La
genfunction. E ty of Informat
ure Compute
ed on F
t Anal
n Zhang
a,b of Technology munications, Hang [email protected] to manifold le ectors that maxf the within-cla ally tens or hun atter matrix beco e performances
sions. In this st neighbor (FKN o persuit good
, then the mem aplacian scatter Experimental re tion Engineer
er Supported
Fuzzy
ysis
gzhou, P.R. China earning and pa ximize the deter ass scatter matr ndreds of sampl omes singular, of face recogn tudy, the fuzzy NN) is implem performance. mbership degr r matrix. The o esults on ORL ing Researchd Education
Linea
a attern classifica rminant of the b rix. But, since tles, an intrinsic which is known nition are alway
linear discrimi mented to reduc In the propose ree is incorpor ptimal projecti L face databas Institute
n
ar
ation. LDA, a between-class the dimension c limitation of n as the small ys affected by inant analysis ce these outer ed method, a ated into the ons of FLDA ses show the© 2012 Published by Elsevier B.V.
Selection and peer review under responsibility of Information Engineering Research Institute Open access under CC BY-NC-ND license.
1. Introduction
Techniques for dimensionality reduction in linear and nonlinear learning tasks have attracted much attention in the areas of pattern recognition and computer vision. Linear dimensionality reduction seeks to find a meaningful low dimensional subspace from a high-dimensional input space.
The subspace can provide a compact representation of the input data. Two of the most fundamental linear dimensionality reduction methods are principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2].PCA aims to find a linear mapping, which preserves the total variance by maximising the trace of feature covariance matrix.The optimal projections of PCA are corresponding to the first k-largest eigenvalues of the data’s total covariance matrix.But, PCA may probably discard much useful information and weaken the recognition accuracy since the label information is not used. However, LDA, a supervised method, aims to find the optimal set of projection vectors that maximise the determinant of the between-class scatter matrix and at the same time minimise the determinant of the within-class scatter matrix. But, since the dimension of vectors is high and the number of observations is small,usually tens or hundreds of samples, an intrinsic limitation of traditional LDA is that it fails to work when the within-class scatter matrix becomes singular, which is known as the small sample size (SSS) problems. Recently, many manifold learning-based algorithms or locality preserving methods have been presented. Among them, isometric feature mapping [19], locally linear embedding [20, 21], Laplacian eigenmap (LE) [22, 23] and local tangent space alignment [24] are widely used. Recently, He and Niyogi [25] and He et al. [26] proposed locality preserving projections (LPP), which is a linear subspace learning method derived from LE. LPP can preserve the local information and best detect the essential face manifold structure. The optimal projection axes best preserve the local structure of the underlying distribution in the Euclidean space. From analysis they found that LPP is connected with PCA and LDA, LPP can find an embedding space that preserves local information; however, it is unsupervised method. Local discriminant embedding (LDE) was developed by Chen et al. [27], but the underlying ideas of LDE are almost the same as LPP except for using the label information. Focusing on manifold learning, LDE achieves good discriminating performance by integrating the information of neighbour and class relations between data points. LDE incorporates the class information into the construction of embedding and derives the embedding for nearest-neighbour classification in a low-dimensional space which learns the embedding for the sub-manifold of each class by solving an optimisation problem. Recently, Yan et al. [28] proposed a newly general framework called graph embedding for dimensionality reduction.
In this paper, the fuzzy linear discriminant analysis (FLDA) algorithm is proposed, in which the fuzzy k-nearest neighbour (FKNN) is implemented to achieve the local distribution information of original samples. In the proposed method, a membership degree matrix is calculated using FKNN, then the membership degree is incorporated into the definition of the Laplacian scatter matrix to obtain the fuzzy Laplacian scatter matrices. Significantly, differing from the existing graph-based algorithms that two novel fuzzy neighbour graphs are constructed in FLDA, where it is important to maintain the original neighbour relations for neighbouring data points of the same class and also crucial to keep away neighbouring data points of different classes after the FLDA. Through the fuzzy neighbour graphs, FLDA algorithm has lower sensitivities to the sample variations caused by varying illumination, expression, viewing conditions and shapes. So, the class of a new test point can be more reliably predicted by the nearest-neighbour criterion, owing to the locally discriminating nature. The rest of the paper is structured as follows: In Section 2 we introduce LDA and FKNN. In Section 3, we propose the idea and describe FLDA in detail. Finally, we give conclusions in Section 5.
2. Outline of LDA and FKNN
Fuzzy K-nearest neighbour [31]
With FKNN algorithm, the computations of the membership degree can be realised through the following three steps:
Step1: Compute the Euclidean distance matrix between pairs of feature vectors in training set. Step2: Set diagonal elements of this Euclidean distance matrix to infinity.
Step3: Sort the distance matrix (treat each of its columns separately) in an ascending order. Collect the corresponding class labels of the patterns located in the closest neighbourhood of the pattern under consideration (as we are concerned with ‘k’ neighbours, this returns a list of ‘k’ integers). Compute the membership degree to class ‘i ’ for jth pattern using the expression proposed in the literature
୧୨ = ൝ͲǤͷͳ ͲǤͶͻ ൈ ቀ ୬ౠ
୩ቁ ǡ א
ାሺሻ
ͲǤͶͻ ൈ ሺ୧୨Ȁሻǡ ב ାሺሻ
where ୧୨ stands for the number of the neighbours of the jth data (pattern) that belong to the ith class. As usual, ୧୨ satisfies two obvious properties
σୡ୧ୀଵ୧୨ =1
0<σୡ ୧୨ ୧ୀଵ <N
Therefore the fuzzy membership matrix U can be achieved with the result of FKNN U = [୧୨], i = 1, 2, . . ., c, j = 1, 2, . . ., N
3. Proposed FLDA
Basic idea
Let G ={X, W} be a complete undirected weighted graph with vertex set X and similarity matrix Wൈ.
Let y = y1,y2,…,yN)T be the low-dimensional data points. A reasonable criterion for choosing a ‘good’ map is
to minimise the following fuzzy objective function
ଵ
ଶσ ሺ୧୨ ୧െ୨ሻ
ଶ
୧୨୧୨
under appropriate constraints. Suppose v is a transformation vector, that is, y = X, where the ith column
vector of X is ୧. By simple algebra formulation, the objective function can be reduced to
ଵ ଶσ ሺ୧୨ ୧െ୨ሻ ଶ ୧୨୧୨ =ଵ ଶσ ሺ ୧െ ୧୨ ୨ሻଶ୧୨୧୨ = σ ୧୧୨୧୨୧ െ ୧୨ σ ୧୨ ୧୧୨୧୨୨ = െ ሺǤכ ሻ = െ ୶ =ሺെ ሻ =
where X = [x1, x2,…,xN], and ‘.и’ denotes the matrix element wise multiplication, the fuzzy diagonal matrix and the fuzzy Laplacian matrix of a graph G are defined as
ൌ െ , ୧୧
ൌ σ
୧୨୧୨
Fuzzy linear discriminant analysis
Suppose there are c known pattern classes w1,w2,…,wc, where m is the total number of training samples, and
mi is the number of training samples in class i. In class i, the jth training sample is denoted by ୧୨, and class label ୧ of ୧ (i= 1, . . . , N). Then, the proposed FLDA can be realised by the following four steps:
1. Construct fuzzy neighbourhood graphs. Let ୳୷and୳୷ᇱ denote two (undirected) graphs both over all
data points. To construct ୳୷:
୳୷:ifቐ ୧ൌ ୨ ሺǡ ሻ א ୧ א ାሺሻ א ାሺሻ For୳୷ᇱ :ifቐ ୧
്
୨ ሺǡ ሻ ב ୧ א ାሺሻ א ାሺሻ ାሺሻ ାሺሻ indicates the index set of the K nearest neighbours of the sample x i or xj.
2. Constructing the intraclass compactness fuzzy graph and separability fuzzy graph.
୳୷:୧୨ୋ =ቐͲǤͷͳ ͲǤͶͻ ൈ ቀ ୬ౠ ୩ቁ ǡ ୧ ൌ ୨ א ାሺሻ ͲǤͶͻ ൈ ቀ୧୨ൗ ቁ ǡ ୳୷:୧୨ୋᇱ =ቐͲǤͷͳ ͲǤͶͻ ൈ ቀ ୬ౠ ୩ቁ ǡ ୧ൌ ୨ א ାሺሻ ͲǤͶͻ ൈ ቀ୧୨ൗ ቁ ǡ ଵ ଶσ ሺ୧୨ ୧െ୨ሻ ଶ ୧୨ୋ୧୨ =ଵ ଶσ ሺ ୧െ ୧୨ ୨ሻଶ୧୨୧୨ = σ ୧୧୨ୋ୧୨୧ െ ୧୨ σ ୧୨ ୧୧୨ୋ୧୨୨ = ୳୷ ୋ െ ሺ ୧୨ ୋǤכ ሻ = ୳୷ ୋ െ ୶୳୷ୋ =ሺ ୳୷ ୋ െ ୳୷ୋ ሻ = ୳୷ ୋ
fuzzy and the fuzzy Laplacian matrix LG fuzzy of the graph Gfuzzy are defined as The other fuzzy affinity matrix WG fuzzy, the fuzzy diagonal matrix DG fuzzy and the fuzzy Laplacian matrix LG can be computed in the same way. Find the generalized eigenvectors v1, v2, . . . , vd that correspond to the d largest eigenvalues.
4. Experiments and results
To evaluate the proposed FLDE algorithm, we compare it with the PCA, LDA and LDE algorithm in four databases: Wine dataset from UCI, ORL, Yale and AR face database.The Wine is synthetic data sets taken from UCI. It used to show the effectiveness of the proposed fuzzy neighbourhood graphs and test the
clustering performance of the proposed algorithm as a toy example. The ORL database was used to evaluate the performance of FLDE under conditions where the pose and sample size are varied.The Yale database was used to examine the system performance when both facial expressions and illumination are varied. The AR database was employed to test the performance of the system under conditions where there is a variation over time, in facial expressions, and in lighting conditions. After the projection matrix was computed from the training part, all the images including training part and the test part were projected to feature space. Euclidean distance and nearest neighbourhood classifier were used in all the experiments.The ORL database (http://www.uk.research.att.com/facedatabase.html) contains images from 40 individuals, each providing ten different images. The facial expressions and facial details (glasses or no glasses) also vary. The images were taken with a tolerance for some tilting and rotation of the face of up to 208. Moreover, there is also some variation in the scale of up to about 10%. All images normalised to a resolution of 56 × 46. Now, we test the recognition performances of the four methods: PCA, LDA, LDE and FLDE. In the experiments, l images (l varies from 2 to 6) are randomly selected from the image gallery of each individual to form the training sample set. The remaining 10 2 l images are used for testing. In the PCA phase of LDA, LDE and FLDE, we keep 90% image energy. For each l, we independently run 50 times. The top average recognition accuracy and the average CPU time(s) consumed of the four methods, which corresponds to different number of images per person used for training. The performance of FLDE is better than PCA, LDA and LDE. The values in parentheses denote the number of eigenvectors for the best recognition accuracy. From the ORL database, the performance of FLDE under conditions is better than other methods where the pose and sample size are varied. It is because that two novel fuzzy neighbour graphs help to maintain the original neighbour relations for neighbouring data points.The variation of accuracy along different number of eigenvectors used and the recognition accuracy when the four images per class are randomly selected for training. We can see that FLDE performs always better than the other three methods. The figures also demonstrate that the performance of the proposed method outperforms the other methods under the same conditions, it further shows that the proposed method can extract more discriminative features than the other methods.
5. Conclusions
In this paper, we develop a supervised discriminant technique for feature extraction and recognition, called the FLDA, in which the FKNN is to achieve the local distribution information of original samples. We redefined the fuzzy Laplacian scatter matrix according to FKNN. This reduces the sensitivity of LDA to substantial variations between face images caused by varying illumination, viewing conditions and facial expression. By using the fuzzy membership, the influence of the outliers on feature extraction and final classification can be effectively reduced and the more effective discriminative features can be obtained than those by other algorithm (PCA, LDA and LDE). Experimental results on ORL face databases show the effectiveness of the proposed method. In the future, we will make more tests on other types of datasets and further improved the objective function of the proposed algorithm with kernel methods in the non-linear case.
References
[1]Turk, M., Pentland, A.P.P.: ‘Face recognition using eigenfaces’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’91), Maui, Hawaii, 3–6 June 1991, pp. 586–591
[2]Belhumeur, P.N., Hespanha, J.P., Kriengman, D.J.: ‘Eigenfaces vs.Fisherfaces: recognition using class specific linear projection’, IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19, (7), pp. 711–720
[3]Li, H., Jiang, T., Zhang, K.: ‘Efficient and robust feature extraction by maximum margin criterion’, IEEE Trans. Neural Netw., 2006, 17, (1), pp. 1157–1165
generalized discriminant analysis’, Pattern Recognit., 2006, 39, pp. 277–287
[5]Zheng, W., Zhao, L., Zou, C.: ‘Foley–Sammon optimal discriminant vectors using kernel approach’, IEEE Trans. Neural Netw., 2005, 16, (1), pp. 1–9
[6]Zheng, W., Zhao, L., Zou, C.: ‘An efficient algorithm to solve the small sample size problem for LDA’, Pattern Recognit., 2004, 37, pp. 1077–1079
[7]Friedman, J.H.: ‘Regularized discriminant analysis’, J. Am. Stat. Assoc., 1989, 84, (405), pp. 165–175 Golub, G.H., Van Loan, C.F.: ‘Matrix computations’ (The Johns Hopkins University Press, Baltimore, MD, 1996, 3rd edn.)
[8]9 Belhumeour, P.N., Hespanha, J.P., Kriegman, D.J.: ‘Eigenfaces vs. Fisherfaces: recognition using class specific linear projection’, IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19, (7), pp. 711–720
Swets, D.L., Weng, J.: ‘Using discriminant eigenfeatures for image retrieval’, IEEE Trans. Pattern Anal. Mach. Intell., 1996, 18, (8), pp. 831–836
[9]Howland, P., Jeon, M., Park, H.: ‘Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition’, SIAM J. Matrix Anal. Appl., 2003, 25, (1), pp. 165–179 Ye, J., Janardan, R., Park, C.H., Park, H.: ‘An optimization criterion for generalized discriminant analysis on undersampled problems’, IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26, (8), pp. 982–994
[10]Ye, J., Li, Q.: ‘A two-stage linear discriminant analysis via QRdecomposition’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (6), pp. 929–941
[11]Chen, L.-F., Hong-Yuan, X., Liao, M., Ko, M.-T., Lin, J.-C., Yu, G.-J.: þA new LDA-based face recognition system which can solve the small sample size problem’, Pattern Recognit., 2000, 33, pp. 1713– 1726
[12]Kirby, M., Sirovich, L.: ‘Application of the KL procedure for the characterization of human faces’, IEEE Trans. Pattern Anal. Mach. Intell., 1990, 12, (1), pp. 103–108
[13]Lee, J.M.: ‘Riemannian manifolds: an introduction to curvature’ (Springer, Berlin, 1997)
[14Scholkopf, B., Smola, A., Muller, K.R.: ‘Nonlinear component analysis as a kernel eigenvalue problem’, Neural Comput., 1998, 10, (5), pp. 1299–1319
[15]Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Smola, A., Muller, K.-R.: ‘Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces’, IEEE Trans. Pattern Anal. Mach. Intell., 2003, 25, (5), pp. 623–628
[16]Tenenbaum, J.B., de Silva, V., Langford, J.C.: ‘A global geometric framework for nonlinear dimensionality reduction’, Science, 2000, 290, (5500), pp. 2319–2323
[1]Roweis, S.T., Saul, L.K.: ‘Nonlinear dimensionality reduction by locally linear embedding’, Science, 2000, 290, (5500), pp. 2323–2326
[17]Saul, L.K., Roweis, S.T.: ‘Think globally, fit locally: unsupervised learning of low dimensional manifolds’, J. Mach. Learn. Res., 2003, 12, (4), pp. 119–155
[18] Belkin, M., Niyogi, P.: ‘Laplacian eigenmaps and spectral techniques for embedding and clustering’. Proc. Conf. Advances in Neural Information Processing System, Vancouver, Canada, 3–6 December 2001, vol. 15
[19]Belkin,M., Niyogi, P.: ‘Laplacian eigenmaps for dimensionality reduction and data representation’, Neural Comput., 2003, 15, (6), pp. 1373–1396
[20]Zhang, Z ., Zha, H.: ‘Principal manifolds and nonlinear dimensionality reduction via tangent space alignment’, SIAM J. Sci. Comput., 2004, 26, (1), pp. 313–338
[21]He, X., Niyogi, P.: ‘Locality preserving projections’. Proc. 17th Annual Conf. Neural Information Processing Systems, Vancouver and Whistler, Canada, 8–13 December 2003
[22]He, X., Yan, S., Hu, Y., Niyogi, P., Zhang, H.-J.: ‘Face recognition using Laplacianfaces’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (3), pp. 328–340
[23]Chen, H., Chang, H., Liu, T.: ‘Local discriminant embedding and its variants’ (CVPR, 2005)
framework for dimensionality reduction’, IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), 2007, 29, (1), pp. 40–51
[25]Zadeh, L.A.: ‘Fuzzy sets’, Inf. Control, 1965, 8, pp. 338–353
[26]Bezdek, J.C., Keller, J., Krishnapuram, R.: ‘Fuzzy models and algorithms for pattern recognition and image processing’ (Kluwer Academic Publishers, Dordrecht, 1999)
[27]Kw, K.C., Pedry, W.: ‘Face recognition using a fuzzy fisher classifier’, Pattern Recognit., 2005, 38, (10), pp. 1717–1732