Summary and Conclusions - Saliency for Image Description and Retrieval

R- precision

7.1 Summary and Conclusions

Image retrieval is a wide and varied field encompassing many techniques inspired from other disciplines. This diversity is reflected within each chapter of this thesis. Chapters 3 to 6 describe an array of tools and techniques that can be used for content-based retrieval.

The foundation of content-based image retrieval is the computer vision techniques which make up the low-level feature descriptions that are used to describe and compare image content. Chapter 3 discussed some of the issues related to generating consistent

image descriptions in the presence of noise and other image transformations. The techniques described in the chapter used the concept of saliency in order to generate robust descriptions. Chapter 3 described two evaluations of saliency detectors in which the difference-of-Gaussian detector described by Lowe (2004) was compared to a range of other detectors. All of the detectors tested had strengths and weaknesses in different areas, however, the difference-of-Gaussian detector performed well under most of the distortions it was subjected to. From these results, the difference-of-Gaussian detector was adopted for creating the image descriptions for the experimentation elsewhere in this thesis, however, as discussed in Chapter 4 and by Mikolajczyk et al. (2005), better image descriptions would likely be created by combing the results from multiple detectors. The final section of Chapter 3 discussed a simple scheme for describing the pixel content of a salient region by its dominant colours. This was achieved by clustering the pixels in RGB space using the mean-shift algorithm. The colour descriptors were used with some success in later chapters, although it was found that whether colour information actually helped retrieval was highly dependent on the data-set.

Techniques for exploring the query-by-example retrieval paradigm are discussed in Chap- ter 4. The first section of the chapter develops a technique for measuring the content- based retrieval performance of annotated image-sets. The technique attempts to esti- mate the relevance of retrieved images based on the idea that retrieval algorithms should retrieve semantically similar images, that is images with similar annotations. The section also verifies Sebe et al. (2003)’s result that image description using salient regions can produce better retrieval than with global descriptors.

The second half of Chapter 4 discusses and develops the idea of using text retrieval techniques in combination with salient regions and their descriptors. The technique consisted of quantising the descriptors of each salient region into a ‘visual’ term and then representing each image by a vector of term occurrences. These term occurrence vectors were then used within a vector-space and Latent Semantic Indexing framework. The results from experiments using these techniques showed generally good performance, although they did highlight a few problems. On the whole, the LSI technique produced better maximum precision (at low recall) than the vector-space model, but performed worse overall. The need to combine different salient region detectors was illustrated in the case of the low-resolution Corel data-set, which in contrast to the Washington data-set was poorly represented when using difference-of-Gaussian salient regions. Chapter 5 described an application of the retrieval techniques described in the latter parts of Chapter 4. The query-by-example paradigm was extended to work on a mobile device in such a manner that the query image was captured by a camera incorporated into the device. Retrieval performance was demonstrated using images from the National Gallery. In order to ensure a correct match, a re-ranking algorithm was developed that

ensured geometric consistency of matching salient regions within the constraints of a planar-homography.

Finally, Chapter 6 discussed two approaches that attempt to bridge the semantic gap. The first approach proposed simply propagating annotations from similar images. This approach works well if the images are well represented by the low-level features that are used to describe and compare their similarity, such as when using SIFT ‘visual’ terms with the Washington data-set. However, a common problem of all hard auto- annotators such as this one was brought to light; images are often mislabelled with keywords that have similar visual appearance to the true keywords, such a mislabelling images of ‘horses’ with ‘foal’. In fact this problem not only with automatic annotators, but also with annotations created manually by humans. This mislabelling can create certain problems in terms of image retrieval. The problem can be assuaged somewhat by methods involving clustering of keywords (Duygulu et al., 2002) or by use of thesauri. Alternative approaches exist that avoid this mislabelling problem. In the past, proba- bilistic annotations have been used for ranked retrieval and shown to outperform retrieval usinghard annotations (Jeon et al., 2003). The second half of Chapter 6 of this thesis suggests another alternative by which an elegant, linearly algebraic manipulation of a matrix of keyword and image-feature observations is shown to produce asemantic space. The semantic space this factorisation technique creates represents the underlying struc- ture and links between the keywords and visual features. Un-annotated images can be projected into this semantic space and then searched by keyword. Initial experiments using this approach have shown promise; even when using only simple global features the technique outperforms the machine translation approach described by Duygulu et al. (2002) for a number of search terms.

7.1.1 Novel work in this Thesis

A full list of contributions to the image retrieval community made by this thesis was outlined in the introduction. Not all of those contributions represent novel aspects of the research, and so the contributions with novel value associated with them are reaffirmed here.

• Development of a technique for assessing the content-based retrieval performance of aquery-by-example style algorithm when using annotated image-sets.

• The extension of the query-by-example paradigm to a mobile device.

• Development of a novel retrieval strategy using quantised local descriptors of salient regions within a vector-space framework.

• Demonstration of a simple technique for auto-annotation by propagating seman- tics.

• Development of a linear-algebraic technique for building a searchable semantic space with un-annotated images in an attempt to bridge the semantic gap.

In document Saliency for Image Description and Retrieval (Page 126-129)