Statistical Texture Methods - A Taxonomy of Texture Representations

2.5 A Taxonomy of Texture Representations

2.5.1 Statistical Texture Methods

Texture analysis methods often need to be applied on real-world, natural, images. In the simplest case this refers to images of a single, uniform, (a priori) textured surface. For a texture analysis method to be practical it often needs to be robust. Robustness is often taken to be a proxy for invariance against properties irrelevant to the task at hand. These are primarily manifested as grey-level variations within and between images of the same texture (class) due to perspective shifts, rotations, illumination and scaling variations. The human visual system is robust against these properties; when presented with a texture, changing the angle, light and size of the presentation will, in most cases, not affect our conviction that we are looking at the same texture (class).

Many statistical methods consist of describing an image region as an histogram of features found in said region. The standard process for this type of statistical method is as follows:

1. Choosing a set of filters to apply to an image;

2. Choosing a set of features from the filter responses;

3. Choosing how to represent the features as an histogram.

Methods also vary depending on whether or not the histograms across all textures are defined across the same alphabet. Methods based on feature distributions are called bag of visual words or bag of textons methods.

Ojala et al. (1996) and Ojala and Pietik¨ainen (1999) introduce the concept of local binary patterns(LBP), as a gray-level difference based simplification of Wang and He (1990) texture-spectrum local neighbourhood labelling method. An LBP is

of pixels with coordinates −r sin(2πi_n

P), r cos(

2πi

nP) , given that the coordinates of the central pixel g(0) are (0, 0). In their later paper (Ojala et al., 2002b), they also introduced a measure for the contrast of local texture regions, given by the variance of the local pixels:

VAR_n_P_,r= 1 n_P nP

∑

i=1 δ (g(i) − ¯g)2, (2.3)

where ¯gis the mean of the g(i)nP

i=1 .

Ojala et al. demonstrate the rotation invariance and discrimination power of LBP_n_P,r, VARnP,r and the ratio

LBP_nP,r

VAR_nP,r, via a series of classification tasks on the Ou-

tex database. The best classification performance, 97.9% is obtained using _VARLBPnP,r nP,r. Ylioinas et al. (2013) propose a novel method for sampling LBPs, which they call dense LBPs. Their sampling method consists of taking internal pixel corners in an image as g(0) (centre of their LBP samples), as well pixel centres, in standard LBP. For an m × n image, the standard, say, LBP8,r representation would contain

(m − r)(n − r) points. Ylioinas et al. (2013) equivalent LBP₈, r variant would contain (m − r)(n − r) + (m − r − 1)(n − r − 1) points. This increase in density leads to slightly improved classification performance (≈ 2 − 8%) compared to standard LBP. Liu et al. (2012) propose another extension to the standard LBP extension. In their method, they calculate four LBP variants at every pixel centre. Their first two LBP variants capture the intensity of the central and neighbouring pixels separately. The other two variants capture pixel value difference in angular and radial directions. They report a significant improvement (≥ 8%) in classification performance compared to standard LBP.

Lowe (1999, 2004) introduces the scale invariant feature transform (SIFT) keypoint detection and representation. He uses a difference of Gaussian (DoG) detector to identify local extrema which he uses as keypoints. He then filters down the key-

point sample by first deleting low contrast points and then deleting keypoints for which the principal curvature is above a pre-set threshold, which he argues corre- spond to edges. Next, he defines an image representation based on the identified keypoints, using local gradients. He shows that this representation performs well in object identification tasks. Ke and Sukthankar (2004) propose a method, PCA- SIFT, to reduce the dimensionality of the SIFT representation and show that the performance in a retrieval task improves compared to the standard SIFT method. They also show that the computational cost in performing PCA on the SIFT representation is small compared to the saving in the matching cost in retrieval tasks.

Bay et al. (2006) introduce a set of features, speeded up robust features (SURF). Just like SIFT ,the features are based on local extrema keypoints. They introduce an extrema detector that uses the Hessian of the image convolved with a Gaussian kernel and show that, for an image of 800 × 640 pixels, their method finds a comparable number of keypoints to the DoG (1418 vs 1520) in under one third of the time. They then introduce an descriptor, based on Gaussian weighted wavelet responses, in horizontal and vertical directions. They show that their joint detector-descriptor is up to 4× faster than SIFT and recall performance improves by up to 10%.

Leung and Malik (2001) propose a method for obtaining textons from textures. They apply a set of 48 DtG kernels to multiple images of the same surface under different illumination and viewpoint conditions. They argue that by doing this they are capturing both the geometric and photometric properties of texture, thus they call their vocabulary 3D textons. They then concatenate the responses for all images and all filters. They obtain their feature vocabulary by clustering the filter response space (using k-means) and then joining tightly packed clusters for which there is likely to be little data.

Zhang et al. (2007) use the affine invariant version of the Harris-Laplace and Laplacian detectors (which capture corner-like and blob-like regions) together with the SIFT, RIFT and SPIN descriptors to construct their keypoint space. They learn a feature space by clustering their keypoints using a support-vector-machine (svm)

Zhang et al. (2015) introduce a scale invariant texture representation based on frequency decomposition and gradient orientation. It consists of a 2D histogram for the joint distribution of orientation-decomposed image-intensities (responses to wedge filters) and a texture gradient. They compare classification performance on two texture databases to the performance of other texture representations. They find that their algorithm marginally outperforms (≈ +0.3%) the state-of-the-art (BIFs) for one of the datasets, but performs considerably less well on another (≈ −4%), compared to BIFs.

Other statistical methods include co-occurrence matrix methods (Clausi, 2002, Gotlieb and Kreyszig, 1990) and autocorrelation methods Haralick (1979), Ulaby et al. (1986).

In document Quantifying Texture Scale in Accordance With Human Perception (Page 47-50)