Conclusions - Saliency for Image Description and Retrieval

R- precision

4.5 Conclusions

This section has presented a way to link methods from the information retrieval community with image description through salient regions to form powerful image retrieval techniques. We have shown how local descriptors from salient regions can be quantised into ‘visual’ terms and these terms used as a basis for indexing through the vector-space and Latent Semantic Indexing retrieval models.

Evaluation of the two techniques on the Washington data-set has shown that with well- chosen parameters, the LSI technique exhibits a slightly better performance than the vector-space technique at low values of recall, but performs worse as recall increases. Both techniques vastly outperform retrieval by global grayscale histogram matching. Experiments with the thumbnail images from the Corel data-set showed less promising results, but subsequent investigation has shown this to be to due to the lack of high- frequency information within the images from which to select salient regions. This lack of salient regions causes the ‘visual’ term-occurance vector-space to be poorly defined.

4.6 Summary

This chapter has described how image retrieval can be performed using image represen- tations from local descriptors of salient regions, as described in the previous chapter. The main contribution of the chapter has been to investigate how techniques from the text retrieval community can be exploited for use with these image descriptions. The chapter has also introduced a technique for assessing the content-based retrieval performance of annotated image collections in query-by-image-content type tasks. The chapter concluded with a discussion about the relative performance of the two text retrieval approaches investigated.

Query By Mobile Device

“...when you have eliminated the impossible, whatever remains, however im- probable, must be the truth.”

Sir Arthur Conan Doyle, Sherlock Holmes

This chapter aims to demonstrate the robustness of the vector-space retrieval approach discussed in the previous chapter. Image descriptions from SIFT features of difference- of-Gaussian salient regions applied to the vector-space model are shown to outperform other retrieval approaches in the retrieval scenario described here.

The chapter introduces a new paradigm for content-based image retrieval, in which a mobile device is used to capture the query image and display the results. The system consists of a client-server architecture in which query images are captured on a mobile device and then transferred to a server for further processing. The server then returns the results of the query to the mobile device. There are a number of possible user- scenarios for the use of such a device. These scenarios generally fall into two categories, depending on what kind of query result the system would be expected to provide. The first category is very much like previous research on the “physical hyper-link” carried out at HP labs (Barton and Kindberg, 2001), where a user can ‘click’ on real world objects as if they were a hyper-link, using a mobile device as the interface. In this case, the objective of the system is to find anexact representation of the query image in the database and to return metadata corresponding to the object represented in the query image. For example, consider using the device in a museum or art gallery. The device could be pointed at various exhibits or paintings and would return metadata about that particular object. Another possible example would be in a bookshop. In this case the device could be pointed at a book cover, and the returned metadata could be, for example, reviews of that particular book.

The second category is much more like classical content-based image retrieval. In this case, the objective is not necessarily to find an exact match, but rather to find a ranked set ofsimilarimages - either visually similar (e.g. in terms of colour) or similar in terms of the semantics of the content.

This chapter examines the first category in detail, although the retrieval algorithms presented are equally applicable to the second category. The chapter is split into sev- eral sections. The first section discusses some of the problems and requirements with retrieval from a mobile device. The second section shows how the vector-space retrieval model from the previous chapter has been augmented to fulfil the requirements. The third section shows how the retrieval model has been implemented in a client-server architecture. The fourth section illustrates some results of our system in a mock museum scenario. Finally, the last section provides an executive summary of the chapter.

5.1 Requirements

The aim of the system described in this chapter was foremost to demonstrate the power of the retrieval approach described in Chapter 4. The scope of the system was limited to cover image retrieval of paintings from a mobile device within an art gallery. The idea was that the mobile device could be used to query a painting hanging on the wall, and that the device would show metadata about the artwork, perhaps in the form of a web-page. Figures 5.1 and 5.2 illustrate the idea with montages of screen-shots from the second of our demonstration implementations.

It was decided that the system should be able to work with current mobile hardware technology. State-of-the-art mobile devices, such as camera phones, have built in cam- eras for image capture, and the ability to connect to the internet through systems such as GPRS. What most current mobile devices lack, however, is computational power, for example most current devices are unable to natively perform floating-point maths. These constraints meant that the system had to be designed in a client-server fashion, with the mobile client handing off the majority of processing to the server.

Constraining the system to work only in an art gallery scenario with paintings simplifies the retrieval somewhat. The fine-art paintings we dealt with were flat surfaces, this meant that the retrieval algorithm would only have to deal with planar homographic transformations between the query image and the images in the database (there are some other geometric imaging issues such as warping due to the camera lens, but these can be removed through calibration if necessary). The difference-of-Gaussian salient regions described in Chapter 3 were shown to be quite robust to this kind of transform; certainly within the limits we envisaged the query images to be captured from.

Figure 5.1: Montage showing a screen-shot from the software demonstrator in capture mode and the artwork being captured. Images Copyright c 2005, National Gallery,

Figure 5.2: Montage showing various parts of the metadata shown to a user by the software demonstrator as they scroll through it. Images and Metadata Copyright c

In document Saliency for Image Description and Retrieval (Page 86-91)