Feature description based on Center-Symmetric Local Mapped Patterns
Carolina Toledo Ferraz, Osmando Pereira Jr., Adilson Gonzaga
Department of Electrical Engineering, University of São Paulo Av. Trabalhador São Carlense, 400 - 13566-590
São Carlos, SP - Brazil
[email protected], [email protected], [email protected] ABSTRACT
Local feature description has gained a lot of interest in many applications, such as texture recognition, image retrieval and face recognition. This paper presents a novel method for local feature description based on gray-level difference map- ping, called Center-Symmetric Local Mapped Pattern (CS- LMP). The proposed descriptor is invariant to image scale, rotation, illumination and partial viewpoint changes. Fur- thermore, this descriptor more effectively captures the nu- ances of the image pixels. The training set is composed of rotated and scaled images, with changes in illumination and view points. The test set is composed of rotated and scaled images. In our experiments, the descriptor is compared to the Center-Symmetric Local Binary Pattern (CS-LBP). The results show that our descriptor performs favorably com- pared to the CS-LBP.
Categories and Subject Descriptors
I.5.2 [Pattern Recognition]: Design Methodology—fea- ture evaluation and selection
Keywords
region description, image matching, region detection, feature description
1. INTRODUCTION
In image processing, the local feature description plays an important role in applications such as texture recognition, content-based image retrieval and face recognition. The ob- jective is to build a descriptor histogram vector providing a representation that allows efficient matching of local struc- tures between images.
A good descriptor has some important features including invariance to illumination, rotation and scale. Furthermore, it should be robust to errors in detection, geometric distor- tions from different points of view and photometric defor- mations caused by different lighting conditions [13].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
SAC’14 March 24-28, 2014, Gyeongju, Korea.
Copyright 2014 ACM 978-1-4503-2469-4/14/03 ...$15.00.
One of the most widely known key feature descriptors pu- blished in the literature is the Scale-Invariant Feature Trans- form (SIFT) [12]. It was introduced by Lowe in 1999 [11]. It is characterized by a 3D histogram of gradient locations and orientations and stores the bins in a vector of 128 positions.
Many variations of SIFT have been proposed, such as Prin- cipal Components Analysis SIFT (PCA-SIFT) [9], Gradient Location and Orientation Histogram (GLOH) [10], Speeded Up Robust Features (SURF) [3], Colored SIFT (CSIFT) [1], Center-Symmetric Local Binary Pattern (CS-LBP) [8] and Kernel Projection Based SIFT (KPB-SIFT) [17].
PCA-SIFT applies the standard Principal Components Analysis (PCA) technique to the normalized gradient patches extracted around local features. PCA is able to linearly project high-dimensional descriptors onto a low-dimensional feature space and reduce the high frequency noise in the de- scriptors. However, the PCA-based feature descriptors need an offline stage to train and estimate the covariance matrix used for PCA projection. Thus, the application of PCA- based features is limited. The GLOH descriptor also uses PCA to reduce the descriptor size. It is very similar to SIFT, however, it uses a Log-Polar location grid instead of a Cartesian one. SURF approximates SIFT using the Haar wavelet response and using integral images to compute the histograms bins. It uses a descriptor vector of 64 positions, providing a better processing speed. CSIFT tries to solve the problem of color image descriptors, improving on SIFT, which is designed mainly for gray images. It uses the color invariance model to build the descriptor. CS-LBP [8] com- bines the strengths of the well-known SIFT descriptor and the Local Binary Pattern (LBP) texture operator. In [8], it is shown that the construction of the CS-LBP descriptor is simpler than SIFT, but it generates a feature vector of 256 positions, which is twice the SIFT vector size. KBP-SIFT [17] reduced the dimension of the descriptor from 128 to only 36 bins by applying a kernel projection on the orientation gradient patches rather than using smoothed weighted his- tograms. The approach is tolerant to geometric distortions and does not require a training stage; however, the perfor- mance evaluation is not superior in all cases with respect to SIFT.
In a recent work, Vieira, Chierici, Ferraz and Gonzaga
[15] present a new descriptor based on fuzzy numbers that
they called the Local Fuzzy Pattern (LFP). This method
interprets the gray-level values of an image neighborhood as
a fuzzy set and each pixel level as a fuzzy number. A mem-
bership function is used to describe the membership degree
of the central pixel to a particular neighborhood. Moreover,
http://dx.doi.org/10.1145/2554850.2554895
the LFP methodology is a generalization of previously pub- lished techniques such as the LBP, the Texture Unit [6], the Fuzzy Number Edge Detector (FUNED) [4] and the Cen- sus Transform [16]. One evolution of the LFP is the Local Mapped Pattern (LMP) [5], where the authors consider the sum of the differences of each gray-level of a given neigh- borhood to the central pixel as a local pattern that can be mapped to a histogram bin using a mapping function.
In this paper, we propose a new interest region descrip- tors, denoted as the Center-Symmetric Local Mapped Pat- tern (CS-LMP). CS-LMP combines the desirable properties of the CS-LBP using the LMP methodology. This new de- scriptor captures the small transitions of pixels in the image more accurately, resulting in a greater number of correct matches than with CS-LBP. Furthermore, they do not use PCA to decrease the size of the feature vector, but instead quantize the histogram into a fixed number of bins.
The rest of this paper is organized as follows. In Section 2, we briefly describe the methods of CS-LBP and LMP.
Section 3 gives details on the proposed approach CS-LMP.
The experimental evaluation and the results are presented in Section 4. Finally, we conclude the paper in Section 5.
2. LOCAL DESCRIPTORS
In computer vision, local descriptors are based on image features extracted from compact image regions using his- tograms to generate a feature vector.
In this section, two local descriptors are presented: the CS-LBP and the LMP.
2.1 Center-Symmetric Local Binary Pattern (CS-LBP)
The CS-LBP is a modified version of the LBP texture fea- ture and SIFT descriptor, inheriting the desirable properties of both texture features and gradient-based features. The features are extracted by an operator similar to the LBP and the constrution of the descriptor is similar to SIFT.
This methodology has many properties such as tolerance to illumination changes, robustness on flat image areas, and computational simplicity [7]. In [8], the authors state that the LBP produces a rather long histogram and therefore is difficult to use in the context of an image descriptor. To solve this problem, they modified the scheme of how to com- pare the pixels in the neighborhood. Instead of comparing each pixel with the center one, the method compares center- symmetric pairs of pixels as illustrated in Figure 1. We can see that for 8 neighbors, LBP produces 256 different binary patterns (pattern
LBP∈ [0, 255]), whereas for CS-LBP, this number is only 16 (pattern
CS−LBP∈ [0, 15])).
The interest region detectors, such as Hessian-Affine, Harris- Affine, Hessian-Laplace and Harris-Laplace [14, 13], provide the regions that are used to compute the descriptors. After computing the regions, they are mapped to a circular region of a constant radius to obtain the scale and affine invariance.
These regions are fixed at 41 × 41 pixels and the pixel val- ues are in the range [0,1]. Rotation invariance is obtained by rotating the normalized regions in the direction of the dominant gradient orientation.
For the construction of the descriptor, the image regions are divided into cells with a location grid, and for each cell, a CS-LBP histogram is built. The CS-LBP feature extraction for each pixel of the region is accomplished according to Equation 1:
Figure 1: LBP and CS-LBP features for a neighbor- hood of 8 pixels
CS − LBP
R,N,T(x, y) =
(N/2)−1
i=0
s(n
i− n
i+(N/2))2
i, (1) where n
iand n
i+(N/2)correspond to the gray values of center-symmetric pairs of pixels of N equally spaced pixels on a circle of radius R and
s(x) =
1, if x > T, 0, otherwise.
To avoid boundary effects in which the descriptor abruptly changes as a feature shifts from one cell to another, bilin- ear interpolation over the x and y dimensions is used to share the weight of the feature between the four nearest cells. The share for a cell is determined by the bilinear in- terpolation weights [8]. Thus, the resulting descriptor is a 3D histogram of the CS-LBP feature locations and values.
The final descriptor is built by concatenating the feature his- tograms computed for the cells and the last normalization to obtain the maximum unit for each bin. To eliminate the influence of very large descriptor elements, the bins of the histogram are limited to 0.2 and the histogram is normalized again. For a 4 × 4 Cartesian grid, the final feature vector contains 256 positions.
2.2 Local Mapped Pattern (LMP)
The LMP methodology assumes that each gray-level dis- tribution within an image neighborhood is a local pattern.
This pattern can be represented by the gray-level differences around the central pixel. Figure 2 shows an example of a 3 × 3 neighborhood and a 3D graphic of the respective local pattern.
Figure 2: Gray-level differences as a local pattern
Each pattern defined by a W × W neighborhood will be
mapped to a histogram bin h
busing Equation 2, where f
g(i,j)is the mapping function, P (k, l) is a weighting matrix of pre-
defined values for each pixel position within the neighbor-
hood, and B is the number of the histogram bins. This equa-
tion represents the weighted sum of each gray-level difference
between the neighboring pixels and the central one, mapped onto the [0,1] interval by a mapping function and rounded to B possible bins.
h
b= round
W k=1 W
l=1
(f
g(i,j)P (k, l))
Wk=1
Wl=1
P (k, l)
(B − 1), (2) Through Equation 2, it is possible to derive some pre- viously published approaches for local pattern analysis. The Local Binary Pattern (LBP) can be derived from Equation 2 using the Heaviside step function shown in Equation 3:
f
g(i,j)= H[A(k, l) − g(i, j)] (3) where
H[A(k, l) − g(i, j)] =
0 if [A(k, l) − g(i, j)] < 0;
1 if [A(k, l) − g(i, j)] ≥ 0.
Taking into account the basic LBP with a 3 × 3 neighbor- hood of pixels, the weighting matrix will be:
P (k, l) =
⎡
⎣ 1 2 4
128 0 8
64 32 16
⎤
⎦
The LBP is thus a particular case of the LMP approach.
Using different mapping functions, the LMP methodology could extract specific features from the image using local pattern processing.
For texture analysis, the LMP approach uses a smooth approximation to the Heaviside step function by a logistic function, a logistic curve, or a common sigmoid curve as in Equation 4.
f
g(i,j)= 1 1 + e
−[A(k,l)−g(i,j)]β
, (4)
where β is the curve slope and [A(k, l)−g(i, j)] are the gray- level differences within the neighborhood centered in g(i, j).
The proposed weighting matrix is:
P (k, l) =
⎡
⎣ 1 1 1 1 1 1 1 1 1
⎤
⎦
When using the sigmoid curve, the method is called LMP- s.
Sigmoid functions, whose graphs are “S-shaped” curves, appear in a great variety of contexts, such as the transfer functions of many neural networks. Their ubiquity is no ac- cident; these curves are among the simplest nonlinear curves, striking a graceful balance between linear and nonlinear be- havior. Figure 3 shows the mapping functions f
g(i,j)used in Equation 2 for the LMP-s (sigmoid) and LBP (Heaviside step) approaches.
3. CENTER-SYMMETRIC LOCAL MAPPED PATTERN (CS-LMP)
The CS-LBP descriptor described in the previous section has the following disadvantage: the difference between the symmetrical pairs and the center are thresholded to 0 or 1, and do not capture the nuances that can occur in the image.
(a) Mapping function to the
LMP-s method (b) Mapping function to the LBP method
Figure 3: Gray-level differences mapped by Equa- tion 2. a) Local Mapped Pattern. b) Local Binary Pattern
Analyzing the advantage of flexibility of the LMP method to be malleable by changing the mapping function and the weight matrix in the Equation 2, and the disadvantage of the CS-LBP descriptor (stiffness in using the step function), we propose a new descriptor named CS-LMP based on the CS-LBP, but using a mapping function suitable for the cal- culation of characteristics in the LMP method.
Equation 5 shows the feature extraction for each pixel of the interest region, defined in a neighborhood W × W :
CS −LMP
R,N,f(x, y) = round
N/2−1
i=0
f (i)P (i))
N/2−1i=0
P (i)
b (5)
where b is the number of histogram bins minus one, P (i) is the ith element of the weight matrix P defined in the Equation 6, and f (i) is the sigmoid mapping function defined in Equation 7.
P = [p
0p
1p
2...p
N/2−1]
t(6) where
p
i= 2
if (i) = 1
1 + e
−[ni−ni+(N/2)]β
(7)
where n
iand n
i+(N/2)correspond to the gray level values of the center-symmetric pairs of N equally spaced pixels on a circle of radius R and β is the sigmoid curve slope. If the co- ordinates of n
ion the circle do not correspond to the image coordinates, the gray-value of this point is interpolated.
Figure 4 illustrates the operation of the CS-LMP after setting b to 15 (16 bins minus one).
We adopted the Hessian-Affine detector because it presents better results for detecting interest regions in images [8].
Afterwards, the regions are mapped to a circular region of constant radius and rotated in the direction of the dominant gradient, as described in [8] and [12]. Each region is divided into cells with a grid size of 4 × 4. The features are ex- tracted for each pixel of the interest region using a CS-LMP descriptor and a histogram is built for each cell.
The parameters R, N and β were empirically established, the best results were obtained and these will be discussed in Section 4.
As it was presented in [7], a bilinear interpolation is made
over the x and y dimensions to share the weight of the fea-
ture between the four nearest cells. Thus, a 3D histogram of
Figure 4: CS-LMP features for a neighborhood of 8 pixels
the CS-LMP feature locations and values is built as shown in Figure 5. The descriptor is made by concatenating the fea- ture histograms computed for the cells, obtaining a vector of 256 positions(4× 4 × 16). The descriptor is then normalized as in the CS-LBP presented in Section 2.1.
(a) Input
region (b) 4×4
cells’ divi- sion
(c) 4×4×16=256 bins of concate- nated histogram
Figure 5: CS-LMP descriptor.
4. EXPERIMENTAL EVALUATION
In this section, we present the performance of the CS-LMP and CS-LBP descriptors applied to the matching of image pairs using nearest-neighbor similarity.
The image database is available on the internet
1 2. It has 8 sets of images with 6 different types of transforma- tions: viewpoint change, rotation change, scale, ilumination, JPEG compression and blur. The training images used in this paper are shown in Figure 6 and the test images are shown in Figure 7.
Table 1 presents the evaluation of the parameter β in the method CS-LMP for the image “bark”. We tested some values of the β parameter and we generated the recall × precision curves, measuring its respective area under curve (AUC). For most of the test images, the best value of β was near 0.6 for CS-LMP.
The evaluation criteria is explained in the next subsection.
4.1 Evaluation criteria
The performance of the CS-LMP descriptor was evaluated and compared with CS-LBP, using an evaluation criterion based on Heikkil¨a et al. [7, 8]. This criterion consists of the number of correct and false matches for a couple of images, knowing a priori the set of true matches (i.e., the ground truth). Heikkil¨a et. al. [8] determine the ground truth
1
http://www.robots.ox.ac.uk/ vgg/research/affine/
2
http://lear.inrialpes.fr/people/mikolajczyk/
(a) graf1 (b) graf4 (c) bark1 (d) bark4
(e) leu-
ven1 (f) leu-
ven4 (g) trees1 (h) trees4
(i) ubc1 (j) ubc4 (k) boat1 (l) boat4
(m) bike1 (n) bike4 (o) wall1 (p) wall4
Figure 6: Training Images: graf (viewpoint change), bark (scale and rotation change), trees (blur image), leuven (ilumination change), ubc (JPEG compres- sion), boat (scale and rotation change), bike (blur image), wall (viewpoint change)
(a) resid1 (b) resid4 (c) east
park1 (d) east park4
(e) east
south1 (f) east south4 (g)
zmars1 (h) zmars4
(i) zmonet1
(j) zmonet4
(k) ax- terix1
(l) ax- terix4
(m) belle-
done1 (n) belle-
done4 (o) bip1 (p) bip4
Figure 7: Test Images: resid, east park, east south (scale and rotation), zmarks, zmonet (rota- tion change), axterix, belledone, bip (scale change)
by the overlap error of the two regions. The overlap error determines how a region A of image 1 overlaps region B of image 2. Image 2 is projected on image 1 by a homography matrix H.
The overlap error is defined by the ratio of the intersection
Table 1: β evaluation example for the image “bark”
β AUC
0.1 0.8993 0.2 0.9073 0.3 0.9100 0.4 0.9173 0.5 0.9173 0.6 0.9196 0.7 0.9163 0.8 0.8960
and the union of the regions given by Equation 8:
S
= 1 − (A ∩ H
TBH)/(A ∪ H
TBH) (8) A match is assumed to be correct if
S< 0.5.
To evaluate the performance of our descriptors, we present the results with recall × precision curves [2] and their respec- tive AUC.
Precision represents the number of correct matches over the total number of possible matches returned (falses+corrects) and is defined by Equation 9.
precision = n
cn (9)
where n is the number of possible matches returned and the Euclidean distance between their descriptors is below a threshold t
d. The number of correct matches n
cincludes the matched pairs whose Euclidean distance between their descriptors is below a threshold t
dand
S< 0.5
Recall represents the ratio between the correct matches and the total number of true matches (the ground truth) and is defined by Equation 10.
recall = n
cM , (10)
where M is the number of elements in the set of true matches.
To construct the recall × precision curve, each point is defined by choosing n ∈ [1, N], where N represents the total number of possible matches. Analyzing this curve, a perfect descriptor would give a precision equal to 1 for any recall.
4.2 Matching results
The interest region detector used to compute the descrip- tors was the Hessian-Affine detector. As discussed in the previous subsection, the parameters set for the CS-LMP were β = 0.6, b = 15, R = 2, and N = 8. For the CS- LBP, we used the parameters R = 2, N = 8, and T = 0.01 [7].
We generated the recall × precision curves for all images of the test set and calculated the AUC for each of them.
However, for reasons of clarity, we present only four of them in Figure 8. Note that there is a good precision for low recall for the four descriptors, i.e., the first region pairs returned are correct matches; however, as the number of region pairs returned increases, the precision decreases. We can see that the reduction of precision is more accentuated by CS-LBP.
As the number of returned matches increases (n), CS-LMP achieves a greater number of correct matches (n
c) compared to CS-LBP.
The AUC results are shown in Table 2 for the CS-LMP and CS-LBP descriptors. Note that for all images, CS-LMP has an AUC value greater than CS-LBP.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
precision CS−LMP
CS−LBP
(a) resid
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
precision CS−LMP
CS−LBP
(b) east park
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
precision CS−LMP
CS−LBP
(c) east south
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
precision CS−LMP
CS−LBP
(d) zmars
Figure 8: Results of the recall × precision curves for CS-LMP and CS-LBP descriptors.
Table 2: Area under curve (AUC) Image CS-LMP CS-LBP
resid 0.9043 0.8332 east park 0.8758 0.7842 east south 0.9027 0.7816 zmars 0.9659 0.9499 zmonet 0.9659 0.9364 axterix 0.9577 0.9237 belledonne 0.7878 0.6704 bip 0.8889 0.8495
Table 3 presents the values for recall, precision and the n best returned correspondences by the two descriptors. For example, for the image named “resid”, taking into account the first 1027 possible matches returned, we have 580 correct matches for the CS-LMP and 545 for the CS-LBP descrip- tor. The image “east south” presents the difference of 60 correct matches favorable to the CS-LMP and for the image
“east park”, our proposed approach outstands by 52 correct matches over the CS-LBP.
Figure 9 presents the correct matches achieved between the image pair called “resid”. It was performed to check the images test results when applying the descriptors CS-LMP and CS-LBP.
5. CONCLUSIONS
In this paper, we proposed a novel method for feature des- cription based on local pattern analysis. The method com- bines the LMP operator and the CS-LBP descriptor. It uses the CS-LBP-like descriptor and replaces the step function by the sigmoid function.
CS-LMP features has many properties that make it well-
suited for this task: a relatively short histogram and tole-
rance to rotation and scale change. Furthermore, it does
not require setting many parameters. The performance of
the CS-LMP descriptor was compared to the CS-LBP des-
criptor in terms of recall × precision curves. As CS-LMP
Table 3: Correct Matches
Image Recall Precision Correct Matches
CS-LMP resid 0.9265 0.5648 580/1027
CS-LBP 0.8706 0.5307 545/1027
CS-LMP east park 0.9079 0.4904 641/1307
CS-LBP 0.8343 0.4507 589/1307
CS-LMP east south 0.9266 0.495 745/1505
CS-LBP 0.8520 0.4551 685/1505
CS-LMP axterix 0.9718 0.7169 276/385
CS-LBP 0.9366 0.6909 266/385
CS-LMP belledonne 1.0 0.1293 15/116
CS-LBP 0.8667 0.1121 13/116
CS-LMP bip 0.9032 0.6538 476/728
CS-LBP 0.8596 0.6223 453/728
CS-LMP zmars 0.9689 0.0764 1681/22001
CS-LBP 0.9568 0.0755 1660/22001
CS-LMP zmonet 0.9714 0.7827 407/520
CS-LBP 0.9427 0.7596 395/520
200 400 600 800 1000 1200 1400 1600
100
200
300
400
500
600
(a) CS-LMP - 356 correct matches
200 400 600 800 1000 1200 1400 1600
100
200
300
400
500
600