R- precision
2.2 Image Description
2.2.2 Image Features
In order to create an image description, one has to extract features from the image. As discussed previously, features can be global, describing a characteristic of the entire image, or they can be local, describing a characteristic of a segmented- or salient- region.
There are also pseudo-global descriptors that describe the whole of an image, but are built from the specific arrangement of regions and their descriptors within the image. It is also possible to classify features as being general, or domain-specific. General features include things such as colour and texture, whilst the domain-specific features may describe such things as faces or fingerprints. From a retrieval standpoint, it is often better to combine multiple features to generate a more robust image description. Some common image features used in content-based image retrieval are described below.
2.2.2.1 Colour Features
Colour is perhaps the most widely used of all visual features in image retrieval. Most colour feature representations are relatively robust to image size and orientation. Colour is most often indexed in the RGB or HSV colour-spaces, however otherperceptual colour- spaces have also been suggested. Finlayson et al. (1998) discuss colour-normalisation techniques for indexing.
By far the most common colour descriptor (used both globally and locally) is the colour histogram first proposed for use in retrieval by Swain and Ballard (1991). Stricker and Orengo (1995) noted most colour histograms are sparse and sensitive to noise, and suggested using the cumulative colour histogram instead, which they showed to be insensitive to the quantisation parameter. Stricker and Orengo also proposed a second technique in which only the dominant features of the colour distribution were indexed, in the form ofcolour moments from the first three moments (mean, variance and skewness) of the colour histogram. Sebe et al. (2003) used local colour moment descriptors together with salient points for retrieval.
Smith and Chang (1995) proposed the Colour Set feature formed from a set of colours from a quantised colour-space. The Colour Set features were binary, and thus allowed a binary search tree to be constructed for fast search (Smith and Chang, 1996).
Pass et al. (1996) take a two stage approach to indexing in which the image is segmented by reducing the number of colours. Pixel values of segmented regions with large areas are then stored in acoherent vector, and those from small regions are stored in aincoherent vector. Results showed this approach worked better than the simple colour histogram.
2.2.2.2 Texture
Texture in an image refers to homogeneous visual patterns within the image that are not due to a single colour or intensity. Haralick et al. (1973) was perhaps the first to suggest the use of texture as a feature, with the co-occurance representation that explored the spatial relationships between grey-level pixels. Tamura et al. (1978) investigated compu- tational approximations of texture properties found to be important from psychological
studies. These Tamura textures are attractive for image retrieval because they are visu- ally meaningful. The Tamura textures were exploited in both the MARS (Huang et al., 1996) and QBIC (Niblack et al., 1993) retrieval systems. Howarth and R¨uger (2004) carried out a detailed evaluation of the use of textures in a query-by-example image retrieval task.
Textures have also been represented using the Wavelet transform (e.g. Smith and Chang, 1994; Laine and Fan, 1993). In particular, Ma and Manjunath (1995) showed that the Gabor Wavelet transform performed well in a texture annotation task.
2.2.2.3 Shape
Shape is important in some retrieval scenarios, such as trademark retrieval (Eakins et al., 1998). Eakins (1993) discusses some design requirements for a shape retrieval system. Shape-based retrieval does suffer from the drawback that it requires an initial segmentation to select the shapes from the image.
In general, shape descriptors can be separated into two categories; region-based and boundary-based. Perhaps the most successful region-based descriptors are moment in- variants introduced by Hu (1962). The characteristic boundary-based descriptor is the Fourier descriptor (Zahn and Roskies, 1972).
2.2.2.4 Robust Local Descriptors - SIFT
There are a large number of different types of feature descriptors that have been sug- gested for describing the local image content within a salient region; For example colour moments and Gabor texture descriptors (Sebe et al., 2003; Stricker and Orengo, 1995; Ma and Manjunath, 1995). However, many of these descriptors are not robust to poor imaging conditions. A study by Mikolajczyk and Schmid (2003) showed that the Scale Invariant Feature Transform (SIFT) descriptor, designed by Lowe (2004), was superior to other descriptors found in the literature, such as the response of steerable filters or orthogonal filters. The performance of the SIFT descriptor is enhanced because it was designed to be invariant to small shifts in the position of the sampling region, as might happen in the presence of imaging noise.
The SIFT descriptor is a three-dimensional histogram of gradient location and orienta- tion. Lowe, suggests that gradient location be quantised into a 4×4 location grid, and gradient angle be quantised into 8 orientation bins. The resulting descriptor has 128 dimensions. Illumination invariance is obtained by normalising the descriptor by the square root of the sum of the squared components.