Chapter 1 Introduction
3.5 Textons
In early texture modelling literature, many approaches drew inspiration from models of texture interpretation by the human visual system. This field has been greatly influenced by the work of Julesz, which involved the study human texture discrimination in the visual cortex.
Julesz and Miller (1962) first hypothesized that texture discrimination in the human visual system occurs across the whole visual field, and that it is governed by higher-order statistical relationships. Later, this was followed by the conjecture that two textures with the same second-order statistics are indistinguishable to the human eye (Julesz et al., 1973). This conjecture was disproved for textures with identical second- and third-order statistics in further work (Caelli & Julesz, 1978). In 1981, Julesz coined the term โtextonโ, which remains popular even today, although it is being used in a rather different context. Textons are primitive, local texture descriptors, consisting of prominent features such as blobs, edges, line terminators and line crossings. The texton theory was originally developed and tested on binary images of synthetically generated textures, and therefore
Chapter 3 โ Texture analysis 51
textons were only defined in this context. The lack of an operational definition for greyscale images caused the theory to fall into disfavour at the time, while filtering approaches gained popularity. In the late 20th century, many texture analysis algorithms involved convolution with a bank of
linear, two-dimensional filters as the first step (Knutsson et al., 1983; Koenderink & van Doorn, 1987; Perona & Malik, 1990). It is during this time that the Fourier transform, and eventually wavelets, found their applications in image analysis, while fundamental modelling approaches remained on the periphery.
In a novel merging of filtering approaches with the fundamental texton theory, Leung and Malik (2001) redefined textons as cluster centres in a filter response space. Hereafter, several studies have focused on the optimisation of various aspects of the algorithm, such as the choice of filters or the choice of a classifier (Schmid, 2001; Cula & Dana, 2004; Varma & Zisserman, 2005).
Since its redefinition the texton approach has seen a burst of applications in many fields, such as medical image analysis (Yang et al., 2007) and remote sensing (Zeki Yalniz & Aksoy, 2010). Popular texton tasks include segmentation (Malik et al., 2001), defect detection (Behravan et al., 2009) and classification. There have been several successful studies on texture classification using popular texture data sets such as the Brodatz (scanned from Brodatz, 1966) and VisTex (Pickard et al., 1995) databases (Zhang et al., 2007; Van der Maaten & Postma, 2007; Umarani et al., 2008).
3.5.1 A texton algorithm
Many adaptations have been made to the original texton algorithm proposed by Leung and Malik (2001). The algorithm described here follows the work of Varma and Zisserman (2005), as this version is still similar to the original algorithm, but achieves improved classification results (when tested on textures from popular databases). The algorithm consists of three main steps:
1. multivariate representation, 2. texton dictionary building and 3. histogram computation.
To obtain a multivariate representation, images are convolved with a filter bank containing ๐๐นuser specified filters, so that each pixel is represented by ๐๐น filter responses. The choice of a filter bank is important for the overall performance of the algorithm, and is discussed in section 3.5.2.
Especially in textural images, one would expect many of the filter responses to be similar, and thus it is expected that the pixels can be grouped into clusters of similar pixels. Textons (๐ฏ) are then defined as the ๐๐น-dimensional centres of these clusters. The K-means clustering method (MacQueen, 1967) has originally been used for clustering (Leung & Malik, 2001), but alternative methods have been proposed (Georgescu et al., 2003; Gangeh et al., 2011). When K-means clustering is used, the number of textons is equal to ๐พ๐, the number of clusters specified in K-means clustering (unless some of the clusters have become empty during clustering).
Chapter 3 โ Texture analysis 52
Finally, once ๐ฏ has been calculated, each pixel in each image is assigned to the cluster centre or texton in ๐ฏ that is closest to it in the filter response space, usually based on a Euclidean distance metric.
By counting the number of pixels in an image that were assigned to each texton, a texton count histogram for the image can be calculated. These ๐พ๐ texton counts in the texton histogram of an image become the features that are extracted.
It should be noted that the K-means clustering step in the texton algorithm requires long computer running times, due to the computationally expensive and iterative operation of calculating distances between all pixels and their closest cluster centres.
3.5.2 Filter bank for the texton algorithm
The selection of an appropriate filter bank is vital to the overall performance of any filter-based texture analysis algorithm. The dimensionality of the filter set has to be balanced against its discriminative capacity and sensitivity to invariance, inconsistent image conditions and prominent features to be extracted.
Typical filter banks include various filter types with different orientations and spatial frequencies, which ensures that a variety of features (such as edges or blobs), with any size and orientation, can be detected. The two-dimensional forms of Gabor transforms (Gabor, 1946), Laplacians of Gaussians and low-pass Gaussians are popular filter choices.
The literature on selecting and designing filters is expansive, and researchers using the texton algorithm frequently adapted existing filter banks to suit their requirements. Three well-known filter banks are presented here.
In their original texton algorithm, Leung and Malik (2001) used a filter bank abbreviated here as the โLMโ filter bank. This set consists of 36 oriented filters (two types, edges and bars, each at six orientations and three scales), eight rotationally invariant filters (Laplacians of Gaussians), and four low-pass Gaussian filters. Due to the sensitivity of the various filters to frequency and rotation, this filter bank is highly discriminative, but lacks robustness in cases where textures are slightly distorted.
Another well-known filter bank is that of Schmid (2001), which is referred to here as the โSโ filter bank. These filters are similar to Gabor filters in some respects, but the entire set is rotationally invariant. Thirteen different scale and frequency combinations were chosen. The filters were normalised to have zero mean so that the filter responses would not be as adversely affected by varying lighting conditions. The rotational symmetry ensures a better representation of textures with slightly rotated features, but reduces selectivity for anisotropic textures.
Varma and Zisserman (2005) proposed a filter bank design method that balances dimensionality, discriminative power and sensitivity to varying image conditions. The method starts with a root set of 38 filters: 36 oriented filters (as in the LM filter bank), a Laplacian of Gaussian and a low-pass Gaussian. A first subset called MR8 is derived by retaining only the maximum filter responses across
Chapter 3 โ Texture analysis 53
all orientations, as well as the rotationally symmetric filters. By retaining only the scaled filter with the maximum response for each of the two types (edge and bar) this subset is further reduced to MRS4, with only four responses. A different way to reduce the MR8 set is by only considering filters at a single, fixed scale, also resulting in four responses (MR4).
3.5.3 Applications in the process industries
The term โtextonโ has become commonplace in texture analysis parlance. Many studies describe the use of โtextonsโ, but it was found that the term has been loosely applied to almost any texture analysis method that follows some form of structural approach.
One application to online defect detection in textile products has been found, where LBPs were used to detect and localise defects and texton features used in a classifier for the types of defects (Behravan et al., 2009). In an application to particles on conveyor belts, Jemwa and Aldrich (2012) used the texton approach to determine the fraction fines (passing a 6 mm sieve mesh size) in coal. 280 images were classified with ๐พ-NN and SVMs into seven fines fraction categories with up to 74% accuracy for the best hyperparameter combination.