• No results found

Texture analysis problems

2.2 Texture analysis

2.2.3 Texture analysis problems

The described texture features can be extracted for every pixel or every interest point and pooled into a global descriptor for a texture region or an entire image depending on the task. Note that feature selection [12, 87] and dimensionality reduction [78, 106, 8] (e.g. PCA) can be used to avoid redundancy, a curse of dimensionality and overfitting the training data. This section introduces four major texture analysis problems, namely texture classification, segmentation, synthesis and shape from texture.

Classifier Training images Feature extraction Training labels Training Classifier Test image Feature extraction Testing Class prediction

Figure 2.10: Training and testing phases in a texture classification framework. Note

that the feature extraction is learned from the training data in optimised filters and dictionary learning methods as represented by the dashed arrow.

Texture classification

In texture classification, a set of training images is used to train a model in a supervised manner as shown in Figure 2.10. A trained model is then used to classify unknown texture images. The feature extraction methods introduced in the previous section build a N-dimensional feature vector (e.g. statistical/structural measures or histogram) for a given image. Texture classification methods extract these texture features for all the training samples. Samples from the same class are ideally clustered in the feature space. A decision rule (classifier) is then learned to label an unknown test image to a given class based on its projection into the feature space. Several distance measures and classifiers have been used in texture classification. Distance measures estimate the similarity of descriptors in the feature space (e.g. Euclidean, Mahalanobis, Manhattan and chi-square distances). A distance measure is typically used in a K-NN classifier which assigns a test image to the majority voting of theK nearest training samples in the feature space.

A linear classifier makes a decision rule for a given image based on a linear combination of its feature vector. A linear SVM is a binary classifier which constructs

a hyperplane in the feature space to separate two classes represented by multiple training samples. The hyperplane maximises the margin (calculated as vectors) on both sides to the nearest samples. While linear classifiers have the advantage of low complexity, the data is often not linearly separable in which case a non- linear SVM is more appropriate. The non-linear SVM maps the feature space into a higher-dimensional space in which the data is linearly separable. Several kernel functions are used to map the feature space including polynomial and radial basis function. Texture classification problems generally involve more than two classes. A multi-class SVM is generally constructed by considering multiple binary classification problems including one-against-all, one-against-one, and Directed Acyclic Graph SVM (DAGSVM) methods [109]. Alternatively, SVMs can be trained with gradient descent (introduced in Appendix A.2.3) [110]. Neural networks such as MLP introduced in Appendix A can also be trained for classification with backpropagation using feature vectors as inputs. Other classification methods used in texture classification include naive Bayes [101, 111] and Adaptive Boosting (AdaBoost) [38].

Most texture classification datasets evaluate the recognition of materials, includ- ing kth-tips-2b [22, 112], Kylberg [31], CUReT [113], and UIUC [108]. These datasets contain texture images with ground truth class labels. It is common prac- tice to split a dataset into training (with ground truth) and testing (unknown) sets to evaluate the performance of a classification algorithm. Note that on very large image classification datasets like ImageNet [50], training, validation and test sets are predefined. Several methods exist to evaluate and compare the performance of classification algorithms, while avoiding to overfit a single test set. The dataset can be randomly split into training and testing sets (e.g. 80% and 20% respectively) and repeated multiple times to average the accuracy and report the standard deviation on the test set. This approach, also referred to as Monte Carlo cross-validation, does not ensure that each sample is used for testing. Other cross-validation methods ensure that each sample is used once for testing, providing a powerful measurement of performance. K-fold cross-validation is a non-exhaustive method in which the dataset is partitioned intoKfolds of equal sizes (typicallyK=10). Each fold is used once for testing and the remaining folds for training. Leave-N-out cross-validation is an exhaustive cross-validation method, commonly used withN=1 (leave-one-out) in which each sample is used once for testing and the rest for training. Leave-one-out is thus a particular case of K-fold whereKis equal to the number of samples.

The performance of a texture classification algorithm is typically measured by the average accuracy and standard deviation over the various splits described above. A confusion matrix can also provide meaningful information about the performance. The confusion matrix of an N-class problem is anN×N matrix which represents the

true and false classification of each class. Other performance measures can be used for datasets with unbalanced numbers of samples per class.

Texture segmentation

Texture segmentation aims at partitioning an image into regions of homogeneous texture properties. It therefore requires the classification or clustering of every pixel in the image. It can be supervised (providing training samples of the textures to segment) [26, 114, 115] or unsupervised [21, 26, 57, 65, 68, 116, 117]. In both cases, texture descriptors introduced in Section 2.2.2 are typically obtained for every pixel or superpixel and either classified (supervised) or clustered (unsupervised) in the feature space. Local spectral histograms from Gabor and other filters responses are commonly used features in texture segmentation [116–118]. Model-based seg- mentation such as MRF and GMRF are also well suited for this task [119, 120]. The segmentation of texture descriptors can be performed, among others by curve evolution with level-set optimisation [114], region growing and merging [121] and functional minimisation (Mumford-Shah functional) [116, 117]. Other basic meth- ods to cluster texture descriptors in the feature space include K-means, mean-shift, GMM, region splitting and watershed [38].

Texture segmentation benchmarks can be mosaics, i.e. artificially created from segments of multiple texture images, or real images with multiple texture regions. The Brodatz texture dataset is commonly used to create mosaics in the literature [21, 65, 68, 78]. Note that using mosaics automatically provides precise ground truth. The Prague texture segmentation benchmark [26] enables the testing and comparison of algorithms on a range of supervised and unsupervised segmentation tasks with various texture mosaics.

Commonly used performance metrics of segmentation algorithms can be grouped into pixel-wise, region-based, consistency and clustering measures. Pixel-wise measures are based on counts of wrongly interpreted pixels (e.g. classified as class ibut different ground truth class) and wrongly assigned pixels (e.g. ground truth i but classified as another class). These metrics include the Omission (O) error (a ratio of wrongly interpreted pixels), Commission (C) error (a ratio of wrongly assigned pixels), weighted average Class Accuracy (CA), recall (CO, the average correct assignment) and precision (CC, overall accuracy). Note that the ground truth classes of pixels are compared to segmented results using the Munkres algorithm in the unsupervised case. Region-based measures compare segmented and ground truth regionsRi,i=1, ...,Mand ¯Rj,j=1, ...,N respectively, whereMandN are the number of segmented and of ground truth regions. These metrics include Correct-, Over-, and Under Segmentation (CS, OS and US) as well as Missed-, and Noise Error (ME and NE). A regionRmis considered CS if and only ifRmR¯nkR¯n, wherek

is a threshold parameter (e.g. 0.75). OS is a count of regions ¯Rnsplit into smaller regionsRm (and vice versa for US). ME and NE are counts of regions ¯Rnand Rm respectively that do not belong to CS, OS and US. Other metrics include Global- and Local Consistency Error (GCE and LCE) and clustering measures (Mirkin metric, Van Dongen metric and variation of information). A detailed description of these performance measures is provided in [26].

Texture synthesis

Texture synthesis involves the generation of a texture image or region from a texture sample. It is commonly used for image inpainting, computer graphics, and image compression. Model-based methods are well suited for image synthesis by building a parametric model which captures the statistical properties of a texture image and allows it to generate visually similar images with identical properties. Typical models used for texture synthesis include MRF [27] and fractals [67] introduced in Section 2.2.2. Filter banks and wavelets have also been used for texture synthesis, modelling images by statistics on responses and wavelet coefficients [28]. Finally, recent deep learning texture synthesis methods are introduced in the Section 2.2.4.

Shape from texture

Shape from texture, originating from [29], is used to reconstruct the shape of a 3D object from a 2D image. The distortion of a generally regular surface texture is analysed to infer the orientation and shape of a surface. Effects of the visual geometry on the 2D texture appearance include foreshortening, compression, scaling and changes in area and density. Common analysis methods include measures of gradients of texture appearances in the 2D plane and model-based (e.g. isotropy texture model) methods [30].