Feature Quantisation - Automatic image annotation and object detection

Feature descriptors generated from the first two steps of image description, i.e. region choosing and feature extraction, can be processed directly by some applications for the problem to be solved. A very simple example is CBIR using global image features such as colour histograms which are represented as vectors. The similarity of two images is measured by the similarity of the corresponding vectors, which can be further calculated in a number of ways such as consine distance and Euclidean distance. Given a query image, all the images in the database are ranked according to their distances to the query.

Chapter 3 Image Description 33

However, for some other applications, the third step - feature quantisation, needs to be applied. One example is applications where saliency is used for image description. The number of salient points found in images can be very large. For example, the number of salient points found from the Washington set images (University of Washington, 2004), which have an average resolution of 640 × 480, is in general several thousand per image, using the “difference-of-Gaussian pyramid” approach. It is not convenient for image retrieval or auto-annotation algorithms to process so many salient points per image directly, especially when each point is represented by a high dimensional feature descriptor. Feature quantisation is a process of grouping similar image feature descriptors into the same class and different ones into different classes. As a result, images can be described by the membership, a single number, of descriptors instead of the actual high dimensional values. Feature quantisation can also be regarded as a classification problem in which the membership of each feature is to be determined. In the following, two clustering techniques are discussed, namely k-Means Clustering and the Self-Organizing Map (SOM).

3.3.1 The Self-Organizing Map (SOM)

The Self-Organizing Map (SOM) is a neural network-based data visualization tool in- vented by Professor Teuvo Kohonen. “It converts complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. The SOM usually consists of a two-dimensional regular grid of nodes.” [Kohonen]. Similar data items should be organized closer than more dissimilar ones. Reducing dimensions and displaying similarities are the two valuable characteristics of this technique.

3.3.1.1 The SOM Toolbox

The SOM toolbox1 _{is a function package developed for Matlab to implement the Self-}

Organizing Map algorithm. A SOM consists of neurons organized on a regular low- dimensional grid [Vesanto et al. (2000)]. Each neuron represents a weight vector which has the same dimensions as the data set to be visualized (i.e. the input data set). The final SOM for visualizing the high dimensional data set is obtained by training iteratively (maybe several hundred times). The idea is that in each training step, not only the Best- Matching Unit (BMU, the neuron whose weight vector is closest to the input sample, which is picked from the input data set) but also its neighbors are updated: the region around the BMU is stretched towards the training sample, Figure 3.6 [Vesanto et al. (2000)]. In the end, neurons on the map become ordered: neighboring ones have similar weight vector.

Chapter 3 Image Description 34

Figure 3.6: _{Updating the best matching unit (BMU) and its neighbors towards the} input sample marked with x. The solid and dashed lines correspond to situation before

and after updating, respectively [Vesanto et al. (2000)]

3.3.1.2 Shape Clustering Using CSS and SOM

The Curvature Scale-Space (CSS) (3.2.2.2) descriptors of 1100 marine creature shape images2_{, which are used in the SQUID system}3_{, are extracted. These 1100 descriptors}

are then clustered by SOM into 155 (11x15) clusters. Figure 3.7 are three random cells (clusters) from the SOM, each of which is represented by 6 sample shapes from it. It shows that the shapes are well clustered. Besides, one shape from each cell of the SOM is taken to construct the SOM, in order to give a visual overview of the whole SOM, as shown in Figure 3.8. Thanks to the clustering characteristics of SOM, as described in section 3.3.1, shapes which are similar but not similar enough to be clustered within the same cell are placed as neighbours.

Figure 3.7: _{Three random cells from the SOM of 1100 marine creature shapes} 2

Available at ftp://ftp.ee.surrey.ac.uk/pub/vision/misc/fish contours.tar.Z

Chapter 3 Image Description 35

Chapter 3 Image Description 36

3.3.2 k_{-Means Clustering}

The k-Means is an algorithm to cluster objects, or data points, into k partitions, or clusters. The clusters are discoverd through an refinement process that updates the position of clusters iteratively. During each iteration, all the training points are assigned to the closest cluster based on the distance to the cluster centroid. Then, the centroid of each cluster is updated by the new cluster centroid which is calculated as the centroid of all the points that belong to it. The process is repeated until the points no longer switch clusters, or after a pre-defined number of iterations. Decisions on the value of k and the starting cluster centroids are essential to the performance of k-Means. A common choice of the initial centroids is to choose k sample points at random and use them as the centroids.

The k-Means is one of the most popular techniques for multi-dimensional vector quantisation in image description. For example, Duygulu et al. (2002); Jeon et al. (2003) use it to quantise the global feature descriptors of image segments and then represent each segment with the membership of the descriptor. Hare and Lewis (2004) use it to quantise SIFT descriptors of salient regions found in images and then represent each image as a histogram of the membership of descriptors.

In document Automatic image annotation and object detection (Page 44-48)