AN UNSUPERVISED ACTIVE
CLASSIFICATION TECHIQUE FOR
FACE
RECOGNITION
Dr.S. ARUNA MASTANI.
Professor, Dept. of ECE, Intell Engineering College, Anantapur, Andhra Pradesh, India.
ABSTRACT
Generally pattern recognition systems dealing with high dimensional data with very few samples for training are faced with the problem of over fitting. This is the case with appearance based methods of Face Recognition (FR) systems where the pixels form the high dimensional feature vector representing the face images, and only few sample face images are present for training. Over-fitting is a condition where a pattern recognition system recognizes/classify the samples used for training perfectly, but poor enough in recognizing/classifying the unseen samples (testing samples that are not used for training). The reason for this is the less number of samples used for training are unable to cover all the possible variations of testing data that occur due to changes of illumination, expression, pose view point of face images. The two ways to overcome this problem is either to reduce the size of samples by extracting the best discriminant features or to provide with a classifier with enhanced generalization capability. Thus development of effective and reliable Face Recognition system boils down to that of representation of patterns (faces) with minimum number of features with most discriminatory information, or to have a strong classification technique that best categorizes the patterns in to different classes.
In this paper a method called active classification through clustering is proposed that combines the advantage of feature extraction and simultaneously uses novel approach for classification, by involving the information about the distribution of testing samples along with the training samples. The proposed technique in this paper is based up on this basic thought of involving the active participation of testing samples in the classifier implementation. Considering this as a new approach to face recognition system, experiments are performed using the well known databases ORL, UMIST and Yale and compared with the Existing methods to prove its efficiency.
Key Words: ‘Face Recognition’, Over-fitting, ‘Unsupervised Classification’, Clustering, ‘Feature extraction’,
1. INTRODUCTION
(KFDA)[10] Kernel Direct Discriminant Analysis (KDDA)[11] are all the generalizations of LDA. As said in [12 ] an ideal Feature extractor would yield a representation that makes the job of the classifier trivial, conversely an omnipotent classifier would not need the help of sophisticated Feature extractor, which means that there is a scope to improve the FR system by developing strong classification techniques such that the role of the Feature extractor is trivial. Hence in this paper a novel method called Active Classification through Clustering (ACC) is proposed which enhances the generalizing capability of classifier by means of an new unsupervised classification approach
2. A NEW APPROACH TO UNSUPERVISED CLASSIFICATION
Generally in unsupervised classification approaches the class labels of all the training samples of are unknown and decision boundaries are constructed based on the unlabeled training data. This is made possible through a process called clustering. Clustering can be described as finding natural groupings in a given dataset, where each group is called a cluster. Natural grouping in the sense is to say that any similarity criteria can be taken through which one say that samples in one cluster are more like one another than like samples in other cluster. Based on this similarity measure among samples, category labels and other information about the source of data are derived from the interpretation of clustering for different applications.
The two important challenges of clustering analysis is to select a suitable similarity measure to form clusters, and second is to evaluate the quality of the clusters formed. or better saying in another way is to define a criterion function that measures the clustering quality of any partition of the data and then the problem is one of finding the partition that optimizes the criterion function As per the application to face recognition system, clusters are considered as various classes of face images which are given, the traditional unsupervised classification approach is defined as classifying a test sample is nothing but finding based on some similarity measure to which cluster it is most similar to and assigning it to that class with which the cluster has been formed with. But in this paper, a new concept is introduced where the second challenge of evaluating the cluster is mounded as a classification technique. In this concept, a test sample is assigned to a cluster (class) for which criterion function chosen to evaluate the quality of the cluster is optimized when test sample is also considered as one of the sample of that cluster. This simple thought is used to develop a robust face recognition/classification system that is relieved from the problem of over-fitting and is giving high recognition rate as that of computationally complex kernel methods.
2.1. Problem Statement
According to face recognition system the classification is defined in the following manner.
Given a training data set consisting of ‘N’ images from C classes with face images per class specified with their class label, and ‘ ’ denoting a mth sample from ith class, the problem is to estimate a
classification function f(x) based on the given input training set X, such that the function f(x) classify the test sample ‘x’ correctly to a class.
2.2 Proposed Method: Active Classification through Clustering (ACC)
Given a training set composed of ‘C’ classes with Ni samples per class where ‘ ’ denotea mth sample from ith
class and a total of ‘N’ sample face images are available in the set. For computational
convenience, each image is represented as a column vector of length n = Iw × Ih by lexicographic ordering of the pixel elements, i.e. Rn , where (I
Where is the average of training set. The transformation matrix is formed with N-1 normalized eigenvectors corresponding to the largest eigenvalues (Principal Components), this form a low-dimensional subspace over which the samples are projected to reduce their dimension to N-1. Here the PCA provides two advantages simultaneously, one is the reduction in the feature space, and the other is it acts as a preprocessing technique used for achieving invariance of clusters to scale changes and rotation. This is compulsory for the proposed method which is based on within class clustering.
The second step is to project the PCA transformed samples i.e. transformed training set , over the subspace which minimizes the within class scatter matrix ( ) given as
, i.e. to project over the null space ( ) of . Which is formed from the eigenvectors corresponding to smallest ‘C-1’ eigenvalues of . Where and are defined as
where , for i=1, 2…C
This is interpreted as clustering concept of minimizing the ‘within class’ scatter matrix /intra class covariance matrix as an optimizing function, so as to cluster the samples of classes differently where each cluster refers to a specific class.
The first two steps help in feature extraction and also in reducing the feature dimensions as the number of dimensions in a dataset increases, distance measures become increasingly meaningless and almost all the samples become equidistant from each other. The remaining part of algorithm is to apply a classifier that is free from the problem of over fitting. This is made possible by directly involving the information about test sample ‘x’ in the classifier design. The basic idea is based on the principle that the trace of the within class scattering matrices of each class measures the square of the scattering radius of the class cluster. This in turn measures the similarity of samples in that cluster (class) and samples which are most similar gives the smallest scattering radius.
Let ‘x’ be the testing sample to be classified and project it over the subspace along with the training samples. Then according to the concept that samples of same class cluster together, the testing sample gets distributed in the close proximity of the class to which it belong to. Thus calculating the trace of within class scatter matrix of every class by involving the test sample as one of the training sample measures the scattering radius of . and must be minimum for best cluster. This is taken as a classification rule and assigns the test sample ‘x’ to a class with minimum scattering radius. That is given an input pattern x Rn, its class label is assumed to be
Where is defined for each class ‘i’ as
and in , and
3. EXPERIMENTAL RESULTS AND DISCUSSION
Considering the method ACC as a new face recognition method with a novel classification technique, experiments are performed over three different data bases namely ORL [13 ], UMIST[14], Yale[15] and compared it with the existing methods PCA, LDA, DLDA, NLDA, KPCA, KLDA, KDDA to judge its efficiency.
The Olivetti Research Ltd. ORL database contains images from 40 individuals, each providing 10 different images. Each image contains a face area mainly . All the images are grayscale at a resolution of 112 × 92 pixels The 10 different images of each person include variations of pose (some tilting and rotation of the face of up to 20 degrees) facial expressions and are not aligned, however the illumination is almost constant.. The UMIST. is a multi-view database consisting of 565 images of 20 individuals. Images contain a face area mainly All the images are grayscale and are scaled into 112×92 pixels, Each image covers a wide range of poses from profile to frontal views (rotation of faces up to 90 degrees) as well as a range of race/sex/appearance; however the illumination and expression remain constant. The Yale face database consists of images from 15 different individuals, using 11 images from each person, for a total of 165 images. These images are preprocessed by aligning and scaling them so that the distances between the eyes were the same for all images and also ensuring that the eyes occurred in the same coordinates of the image and was then cropped to the face area and resized to a resolution of 112 × 92.
To start the FR experiments, each one of the three databases is randomly partitioned into a training sets and a test sets with no overlap between the two. The partition of the databases are done as follows: From the 10 images per class present in ORL database, 5 images are randomly chosen for training and remaining 5 for testing .thus different training sets of 200 images and testing sets of 200 images are formed. Similarly for Yale data base which consists 11 images per class out of which 10 images are considered for experimentation , 5 images are randomly selected for training and remaining images for testing, giving training and testing sets of size 75 images. For UMIST database 5 images are randomly selected for forming training sets of size 100 images and remaining 465 images are used to form testing sets. In all the experiments conducted, recognition rate averaged over 5 runs is taken as measure for comparison so as to increase the accuracy of performance measurement and are tabulated in Table.1.
The experimental results of Active Classification through Clustering (ACC) along with the results of all the methods computed in this work are tabulated in Table.1. From the results it can be easily observed that the performance of proposed method based on “Active classification through clustering” remain superior to all the linear methods and approximately equal to the kernel methods which are said to be more efficient. However it has to be made clear at this moment that the kernel methods may achieve slightly high performance than the proposed method, if appropriate kernel/ kernel parameters is applied. But as the kernel function that suit one type of data/ training set may not be the right choice for the other, selecting a kernel every time depending on the training /data set is highly impossible as no standard optimization methods exist the method ACC will remain superior with respect to time and also complexity of computations.
consuming computations of kernel methods due to their operation in higher dimensional space will remain inferior to the faster clustering techniques with respect to time. Very keenly observing the results, it can be said that this method remain superior to kernel methods in achieving approximately the same recognition rates as that of the kernel methods like KPCA, KLDA, KDDA. This is mainly due to two reasons, one is LDA based algorithms do not generalize well and tend to over-fit when they are applied to database with few numbers of training samples per class. Instead, the proposed algorithm which is an unsupervised learning method which does not pay any attention to the distribution of training set. In addition though the training samples per class are less as distribution of the testing sample is involved in the classifier makes the method robust to small sample size problems.
4. CONCLUSIONS
Through simple idea of involving the testing information in the classifier design a new method named ACC is proposed for face recognition. Being a method based on subspace clustering ACC remains fast enough when compared to the existing LDA based methods and also proved to be almost equally efficient to kernel methods in terms of recognition rates and superior than them computationally .
TABLE.1: Comparison of Average Percentage of face recognition rates (FRR) for ORL, UMIST and Yale databases of ACC with PCA, LDA, DLDA, NLDA KPCA, KLDA, and KDDA
REFERENCES
[1] M. Turk and A. Pentland. “Eigenface for recognition”. Journal of Cognitive Neuroscience, Vol.3, No. 1, 1991, PP.71–86.
[2] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman.“Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.19, No.7, 1997, PP.711–720.
[3] P.C. Yunen and J.H. Lai, “Face Representation Using Independent Component Analysis,” Pattern Recognition, Vol. 35,PP. 1247-1257, 2002.
[4] N. Vaswani and R. Chellappa. “Principal Component Null space analysis for image and video analysis”. IEEE Transactions on Image Processing, Vol.15, No. 7, July 2006, PP. 1816-1830.
[5] M. Yang, N. Ahuja, and D. Kriegman, “Face Recognition Using Kernel Eigenfaces,” Proc. Int’l Conf. Image Processing, Vol. 1, PP. 37-40, 2000.
[6] C. Liu and H. Wechsler, “Enhanced Fisher Linear Discriminant Models for Face Recognition,” Proc. Int’l Conf. Pattern Recognition, vol. 2, pp. 1368-1372, 1998.
[7] H. Yu and J. Yang. “A direct LDA algorithm for high dimensional data-with application to face recognition”. Pattern Recognition, Vol.34, No.10, 2001, PP. 2067–2070.
[8] R.Huang, Q.S.Liu, H.Q.Lu and S.D.Ma. “Solving the small sample size Problem of LDA”. In the proceedings. Of International Conference. Pattern Recognition, Quebec, Canada, August, 2002, Vol. 3, PP:29-32.
[9] Li-Fen Chen, Hong-Yuan Mark Liao, Ming-Tat Ko, Ja-Chen Lin, and Gwo-Jong Yu. “A new LDA-based face recognition system which cansolve the small sample size problem”. Pattern Recognition, Vol. 33, 2000, PP.1713–1726
[10] G. Baudat and F. Anouar. “Generalized discriminant analysis using a kernel approach”. Neural Computation, Vol.12, No.1, 2000, PP.2385–2404.
[11] Juwei Lu, K.N. Plataniotis and A.N. Venetsanopoulos. “Face recognition using kernel direct discriminant analysis algorithms”. IEEE Transactions on Neural Networks, Vol.14, No.1, Jan 2003, PP.117–126.
[12] R.O. Duda, P.E. Harr, and D.J. Stork. Pattern Classification, Wiley, 2001. [13] ORL face database http://www.uk.research.att.com/facedatabase.html
[14] UMIST face database http://images.ee.umist.ac.uk/danny/database.html
[15] Yale University face database, http://cvc.yale.edu/projects/yalefaces/yalefaces.html
Database
No. of samples per class
PCA LDA DLDA NLDA
KPCA (Gaussian
kernel)
KLDA (Gaussian
kernel)
KDDA (Gaussian
kernel)
ACC
ORL 5 94.6 92 93.2 92.8 93.08 80.23 82.12 95.2
UMIST 5 62.8 73 78.2 80.6 86.2 84.15 87.90 88.9