2.4 Summary and Conclusions
3.1.1 Texture and Existing Databases
Below are two early definitions of texture compiled by Coggins [Cog82] and also given in [TJ98].
1. “We may regard texture as what constitutes a macroscopic region. Its structure is simply attributed to the repetitive patterns in which elements or primitives are arranged according to a placement rule.” [TMY78]
2. “A region in an image has a constant texture if a set of local statistics or other local prop- erties of the picture function are constant, slowly varying, or approximately periodic.” [Skl78]
The above definitions correspond to structural and stochastic views of texture. Structural models decompose texture into two elements, underlying texture elements and their arrange- ment [Har79, BA88, VP88]. Stochastic models focus on the random properties instead of the deterministic properties of the texture, and they describe local properties of the texture sufficient for characterization [HB95, GS05]. Both definitions leverage the important texture property of spatial homogeneity at a particular scale, which allows texture to be identified by local statistics. Julesz analyzed the set of local statistics used in preattentive human tex- ture perception, the process that makes textures “effortlessly distinguishable” [Jul81, BJ83]. Julesz introduced the term “texton” to describe the basic texture primitives recognized in preattentive perception. Textons are analogous to phonemes in speech recognition. Textons are described as “elongated blobs (of given orientation, width and aspect ratios) and their terminators” [Jul81]. Julesz hypothesized that the human preattentive system analyzes the frequency of textons and does not perform any higher order statistical analysis of spatial in- teractions in the texture. Both the structural and stochastic models can be decomposed into and thought of in terms of such textons.
There is a large variety in the types of textures considered in the texture analysis com- munity. Textures can be generated or imaged through various devices such as cameras, aerial satellites, microscopes, sonar, computed tomography (CT), magnetic resonance (MR), and ultrasound (US). There are purely structural and stochastic textures, as well as textures with both structural and stochastic aspects, such as natural textures and imaged materials. This chapter focuses on natural and material textures imaged using standard cameras. Imaged textures are divided according to their surface properties into either two-dimensional (2D)
or three-dimensional (3D) textures [DvGNK99, LM01, CHM05]. 2D textures have smooth, locally planar surfaces whose primary, physical cause is local variation in surface spectral re- flectance. 3D textures have rough surfaces whose texture is related to local height variations. Even in the restricted domain of imaged 2D and 3D textures, the texture databases used in the community have varied over the years.
As mentioned in the beginning of Section 3.1, the relevant information and desired invari- ances in a texture description depend on the specific set of textures being examined. This points out one difficulty in texture analysis: the generalizability of methods beyond the exam- ined database. However, specific types of texture variation are of interest in the community, and methods and databases can be generally discussed in terms of the types of variation they handle or express.
Early texture classification work used databases such as the Brodatz collection [Bro66]. Typical experiments acquired multiple training and target images per class by partitioning the single per class image supplied by the database. Therefore, the texture variation within a class is limited and only includes sampling variation caused by large scale features or deformations in the physical material being imaged. Later texture databases, such as the MIT Vision [vis] and MeasTex Image [mea] texture databases, introduced variability that would be expected in less constrained, “real world” situations. Such situations include variation due to lighting and viewpoint angle. However, the variation included in these databases is not comprehensive because only a small number of lighting conditions or viewpoint angles are given.
More recently, the CUReT [DvGNK99] and the KTH-TIPS2 [CHM05] databases have been introduced, which supply a much more comprehensive collection of images. CUReT, which is described in detail below, supplies 205 images of 61 materials taken in a controlled environment under varying viewing and illumination angles. Such a database allows texture models to be constructed and analyzed in terms of these specific variations. KTH-TIPS2 supplies images similar to CUReT but that contain two additional forms of variation. First, changes in scale, i.e., zoom, in the camera are included for each material. Second, multiple example materials for each class are defined. Multiple examples allows the classification of true texture categories, instead of identification of specific examples in a category.
Figure 3.1: The 61 materials in the CUReT database. Image taken from [VZ02].
The Columbia-Utrecht Reflectance and Texture Database
CUReT was collected by researchers at Columbia University and Utrecht University [DvGNK99]. Figure 3.1 shows each of the 61 materials at a frontal viewing angle. Each class contains images from one material that exhibit 3D effects such as specularities, inter-reflections, and shadow- ing, as shown in Figure 3.2. This large intra-class variability makes correct classification of the database a challenging task. The limitations of CUReT include a lack of significant scale change, limited in-plane rotation, and small-scale texture features. Small-scale features tend to simplify classification tasks by allowing more compact and better sampled texture measure- ments.
In Sections 3.2 and 3.3, I follow an experiment on CUReT designed by Varma & Zisserman [VZ02] and followed in [VZ03, HCFE04] (and also, roughly, in [PNMT04]). The experiment uses 92 of the 205 per class images, those with the largest minimum number of valid pixels across the samples. Each of these 61×92 = 5612 images are cropped to a resolution of 200×200, converted to grey scale, and processed to have zero mean and unit variance. In Sections 3.2 and 3.3, I use exactly the same images as Varma & Zisserman, which Varma supplied [VZ02]. In Section 3.3, I discuss the necessity of the grey-scale intensity normalization, which is done
Figure 3.2: Thirty images from the “Zoomed Plaster B” material (number 30) illustrating the large intra-class variability present in CUReT.
to achieve partial invariance to linear intensity variation across images.
An experiment must also split the 92 per class images into disjoint training and test sets. Varma & Zisserman typically reported results for two cases, each with 46 training and 46 target images per class. The first case alternates training and target assignment in the order the images are given. The second case gives results averaged over a small number of random splits. These splits yield a total of 61×46 = 2806 training images and 2806 test images for each split. In Sections 3.2 and 3.3, unless otherwise specified, I report results averaged over 100 random splits with equally sized training and test sets. For consistency, the test set is not modified when smaller training sets are examined.
Of the 5612 images considered in CUReT, one of the images has a corrupted file. It is image 60-101 (sample 60, view 02-62). The effects of this corrupted image are not discussed further, beyond noting that parametric classifiers, such as QDA, are more sensitive to outliers than non-parametric classifiers, such as nearest neighbor.