Face Hierarchical Clustering with SIFT Based Similarities

(1)

2017 2nd_{International Conference on Computer Science and Technology (CST 2017)} ISBN: 978-1-60595-461-5

Face Hierarchical Clustering with SIFT-Based Similarities

Wan ZHANG

a

_{, Xiao-fu WU}

b

_{, Suo-fei ZHANG and Jun YAN}

Nanjing University of Posts and Telecommunications, Nanjing 210003, China

a_{zhangwanxlyx@sina.com,}b_{xfuwu@njupt.edu.cn}

Keywords: Unsupervised image clustering, SIFT, Hierarchical clustering.

Abstract. In this paper, a face image hierarchical clustering method is proposed, which employs the scale invariant feature transform (SIFT) for extracting image features and further defines a novel measure of similarities between pairs of face images. Experiments show that the proposed hierarchical clustering method performs better than the other reported SIFT-based clustering approaches.

Introduction

Face image clustering is an important subject in computer vision, and data mining etc. Its goal is to assign face images into several clusters, such that the face images in the same cluster are more similar to each other than to those in the other clusters.

Commonly, a clustering method requires to compute similarities or “distances” (dissimilarities) between pairs of images. However, how to define a suitable measure of similarity constitutes a major challenge in practice. A traditional approach is to first extract the manually-designed image features in a space of much lower dimension than the original image space and then derive similarities by matching features from pairs of images [1]. A modern approach employs deep-learning methods that often work in a supervised manner. Although various deep-learning methods might outperform traditional approaches, the use of manually-designed features is still welcome, due to its low complexity and possible use in an unsupervised pretraining stage for deep-learning networks.

In [2], the authors proposed a process of image refining retrieval result by exploiting and fusing unsupervised feature technique Principal component analysis (PCA) and spectral clustering. In [3], the authors derived a measure of dissimilarities between any pair of images by matching the scale invariant feature transform (SIFT) [4] features. Then, the hierarchical clustering algorithm is employed to cluster face images. In our recent work [1], a novel nonmetric similarity between pairs of images was proposed with the idea of soft matching of SIFT features, and then affinity-propagation was employed to cluster face images with better performance compared to the hard-matching counterpart.

(2)

Agglomerative Hierarchical Clustering

Agglomerative clustering algorithms [5], [6] start from the partition of the data set into singleton nodes and merge step by step the current pair of mutually closest nodes into a new node until there is one final node left, which comprises the entire data set. Various clustering algorithms share this procedure as a common definition, but differ in the way in which the measure of inter-cluster dissimilarity is updated after each step.

The input to the hierarchical clustering algorithms is alwaysa finite data set = { , , … , }together with a dissimilarity index defined as follows.

Definition 1: A dissimilarity index on a set is a map : × → [0, ∞) which is

reflexive and symmetric, i.e. wehave d(x, x) = 0 and d(x, y) = d(y, x) for all , ∈ . By collecting all possible dissimilarity indexes, one canform a dissimilarity matrix = [ ] _×, where = ( , ).

The output of any agglomerative clustering algorithm is adendrogram, for which the measure of inter-cluster dissimilarity plays a major role. According to the employed inter-clusterdissimilarities, there are typically three types of agglomerativeclustering schemes, namely, single-linkage, complete-linkage,and average linkage. For the average linkage, the inter-clusterdissimilarity between any

pair of clusters (R, Q) is defined as[3]: ( , ) = ∑∈ ∑∈

| |∙| | ,where |R| denotes the

number of data points in R. In theexperiments, only the average-linkage scheme is considered.

SIFT-Based Dissimilarities SIFT

SIFT is a popular method that identifies local appearance features from images, which are invariant to scale and rotation, and partially invariant to change in illumination and 3D camera viewpoint. Essentially, it requires to locate the keypoints and compute their descriptors [7].

The keypoints are detected by a cascade filtering approach, which is invariant to scale and orientation. A keypoint descriptor is created by first computing the gradient magnitude and orientation at each sample point in a region around the keypoint location. In the literature, the most commonly used regions are 4 × 4 neighboring pixels with 8 orientation bins [4]. These samples are then accumulated into orientation histograms summarizing the contents over the 4×4 subregions. Then, one can get the descriptors, which are 8-dimensionalvectors.

(3)

Dissimilarities with Hard-Matching of Features

With SIFT algorithm, each image can be described by a ( × 128)-dimensional vectorfor∀ ∈ [1, ]. Noting that the number of keypoints for image i might be different from , the number of keypoints of image j, when ≠ .

After getting the keypoints and their descriptors, Then, matching is accomplished by finding candidate matching keypoints based on the Euclidean distance of their feature vectors, as proposed in [4]. One can implement the keypoint matching procedure for every pair of images (i, j): for each local feature(keypoint) from image i, the procedure is to identify its nearest and second nearest neighbors from image j and compute the distances of the keypoint to the two neighbors. If the ratio of the two distances (to the nearest and second-nearest neighbors) is less than a fixed threshold of δ = 0.8, the match is considered significant. The result is a number of keypoint matches found for this pair of images.

1) Dissimilarity proposed in [3]: In [3], the authors deriveda dissimilarity index

between any pair (i, j) of face images byusing hard-matching of SIFT features

= 1 − ( ( , ), ( , ))

( , ) . (1)

where m(i, j) is the number of significant feature matchesfrom image i to image j. Note that m(i, j) ≠m(j, i) ingeneral.

2) A modified version of [3]:However, the proposed dissimilarity index Eq.1 may be

less than zero. This is not desirablein general. Therefore, we propose a modified version as

= 1 − ( , ) + ( , )

+ . (2)

Fortunately, this minor modification shows a considerableperformance improvement as shown later in experiments.

3) A modified version of [8]: In [8], the authors derived anon-metric measure

ofsimilarity with SIFT matching, whichtakes the form of

( , ) = ( , ) − 1 ( , ) − 1 ( , ). (3)

Inspired by this definition and further considering that theinput dissimilarity index of hierarchical clustering must besymmetrical, we propose a new symmetrical similarity indexas:

(4)

where ( , ) = ( , ) ( , ), is defined as a normalized version of m(i, j) + m(j, i).

As µ(i, j) = µ(j, i), we have s(i, j) = s(j, i), namely,a symmetric similarity. Then, the dissimilarity index can besimply calculated as

= 1 − ( , ) . (5)

Dissimilarities with Soft-Matching of SIFT Features

As shown in [4], the decision of keypoint match by hard-thresholding is only meaningful in the sense of high probability. When the threshold is set to δ = 0.8, this hard-matching method can eliminate roughly 90% of the false matches while discarding less than 5% of the correct matches. This observation stimulates us to introduce a soft-matching mechanism in [1].

Instead of making hard binary decisions, we proposed to employ soft decisions for

matching the k-th keypoint of image i to image j, namely, ( , ) =

, − ( , ) , = 1, … , , where ( , ) =

( ) , >

0, is a class of sigmoid functions with a parameter β controlling the shape of sigmoid

functions, δ denotes a fixed but constant threshold, ( , ) is the ratio of the two

distances from the k-th key point of image i to the nearest and second-nearest

neighbors identified from the keypoints of image j. Clearly, ( , ) ∈ (0, 1) represents

the soft-decision reliability by matching the k-th keypoint of image i to the keypoints of image j.

Then, we can define a “soft” measure of significant featurematches as (not

necessarily an integer number ( , ) = ∑ ( , ) . With this soft measure of

significant feature matches, we canderive the soft version of dissimilarity index of Eq.1,Eq.2, andEq.5 by replacing m(i, j) with ( , ), respectively.

Experiments

In this section, we report some experiment results forclustering of face images extracted from the Olivetti database.The Olivetti face database consists of 400 grey-scale images ofsize 112×92 for 40 individuals, where each individual has 10images and appears with a range of in- and out-of-plane posevariations. We computed dissimilarity value Dijbetween allpossible image pairs and ran the

agglomerative hierarchicalclustering algorithm for a series of subsets taken from theOlivetti database.

(5)

matrix D as the input of agglomerative hierarchical clustering algorithm, and gain the final clustering results.

In the experiments, each dataset of K classes is taken as asubset of Olivetti database, where a total of 10 × K imagescovering K persons are picked in order from the Olivettidatabase.

In this paper, the clustering performance is evaluated by theclassification accuracy, which is usually defined as the numberof correct images after clustering divided by the total numberof images [9]. For a given dataset of K categories,the total number of images equals to 10K.

Figure.1: Classification accuracy comparison among various hard-matching-oriented dissimilarities (Eq.1, Eq.2 and Eq.5).

Firstly, the clustering performance of the proposed dissimilarities Eq.2 and Eq.5 are compared to that of Eq.1 with hardmatching in Fig.1. Clearly, the proposed method Eq.2 performs betterthan the method Eq.1 in [3]. And the proposed method Eq.5 performsthe best among various schemes.

Secondly, we plot the classification accuracy performance for various schemes with both hard-matching andsoft-matching in one figure.Note thatthe soft-matching agglomerative hierarchical clustering usesβ = 3, δ = 0.6 while the hard-matching version uses δ = 0.8as in [1]. As shown in Fig.2, the soft-matching method [1] performs almost surely better than thehard matching method [4]. And the proposed method Eq.5 performsthe best among various schemes, regardless in hard matchingor soft matching.

Figure.2: Classification accuracy comparison between hard-matching and soft-matching based approaches.

10 15 20 25 30 35 40

0.5 0.6 0.7 0.8 0.9 1

K (number of clusters)

Ac

cu

ra

cy

hard matching

Dissimilarity(1) Dissimilarity(2) Dissimilarity(5)

10 15 20 25 30 35 40

0.5 0.6 0.7 0.8 0.9 1

K (number of clusters)

Ac

cu

ra

cy

hard matching VS.soft matching

(6)

Conclusion

Agglomerative hierarchical clustering was widely employedin various applications, where the clustering performancedepends heavily on the measure of dissimilarities betweenpairs of input samples. For face image clustering, it was shownin [3] that SIFT features can be well employed to define adissimilarity ratio between any pair of images by matchingkeypoints. In this paper, we propose a novel series of SIFT-based dissimilarity indexes and show significant performancegains with agglomerative hierarchical clustering.

As a popular feature extraction technique, SIFT with matching features has been extensively applied to various objectrecognition tasks. We believe the proposed dissimilarity measure can be combined with SIFT to further improve the systemperformance for the traditional object recognition applications.

References

[1] Zhang, W., Wu, X., Zhu, W. P., et al. Unsupervised Image Clustering with SIFT-Based Soft-Matching Affinity Propagation[J]. IEEE Signal Processing Letters, 2017, 24(4): 461-464.

[2] Memon, M. H., Shaikh, R. A., Li, J. P., et al. Unsupervised feature approach for content based image retrieval using principal component analysis[C]//Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2014 11th International Computer Conference on. IEEE, 2014: 271-275.

[3] Antonopoulos, P., Nikolaidis, N., Pitas, I. Hierarchical face clustering using sift image features[C]//Computational Intelligence in Image and Signal Processing, 2007. CIISP 2007.IEEE Symposium on. IEEE, 2007: 325-329.

[4] Lowe, D. G. Distinctive image features from scale-invariant keypoints[J]. International journal of computer vision, 2004, 60(2): 91-110.

[5] Sasirekha, K., Baby, P. Agglomerative hierarchical clustering algorithm–A Review [J]. International Journal of Scientific and Research Publications, 2013, 3(3): 1.

[6] Jain, A. K., Dubes, R. C. Algorithms for clustering data[M]. Prentice-Hall, Inc., 1988.

[7] Zhang, Y., Zhang, H. Image clustering based on SIFT-affinity propagation [C]//Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on. IEEE, 2014: 358-362.

[8] Dueck, D., Frey, B. J. Non-metric affinity propagation for unsupervised image categorization[C]//Computer Vision, 2007. ICCV 2007.IEEE 11th International Conference on. IEEE, 2007: 1-8.