Future Prospects - System Analysis and Evaluation The previous chapters described the design an

System Analysis and Evaluation The previous chapters described the design and development of a parametric admin-

7.1 Introduction

7.4.5 Future Prospects

Information retrieval benchmarks are generally considered to be an ongoing and incremental process, and thus the documentation of ImageCLEFphoto 2006 would not be complete without reporting on its influence of its succeeding event Image- CLEFphoto 2007. Based on the experience and the feedback retrieved in 2006 and on discussions with participants after CLEF, future prospects for 2007 will include the following.

Document Collection

The IAPR TC-12 Benchmark will again form the basis for the VIR experiments, whereby only realistic parts of the logical image representations will be released in 2007: title, notes, location, and date fields (i.e. the descriptions that typical users might add to their own photographs). In addition to the English and German representations, we are planning to also generate a set of Spanish image representations as well as one subset using a randomly selected language for each image. Evaluation of ad-hoc retrieval from lightly annotated images is expected to address several novel research questions including:

• how significant is the choice of the retrieval language?

• how does retrieval performance compare to retrieval from fully annotated images in 2006?

Since the involvement of visual retrieval techniques will become more important, we aim to attract more visually oriented methods in addition to the currently pre- dominant concept-based approaches to further approach and narrow the semantic gap from both sides, TBIR and CBIR.

Query Topics

According to the participants’ feedback from 2006, the query topics in 2007 will: • again be based on the updated viventura log file (to create realistic topics);

• reuse some of the topics from 2006 (for a comparison with retrieval using the description field, and to investigate how much improvement can be gained one-year on);

• be controlled by the topic difficulty measure;

• be created against a number of dimensions such as the estimated size of the target set, geographic constraints or the level of how “visual” they appear. Participants will only receive the topic titles and three sample images, but no nar- rative descriptions to avoid confusion. The sample CBIR systems FIRE and GIFT will also be available again, and we might even provide the output of visual base- line runs. Translations only for topic languages that were also used in 2006 will be provided for 2007 as well. These are: English, German, Spanish, Italian, French, Portuguese, Russian, Polish, Japanese, Simplified and Traditional Chinese. Visual topics will thereby be part of the standard ad-hoc set. Should participants wish to investigate any other language, they will have to provide their own translation.

Further novel ideas include that participants could choose their own sample images for QBE and that participants could be asked to submit a number of topic candidates themselves.

Relevance Assessments and Performance Measures

Both relevance assessments and performance measures will remain unchanged for 2007: the use of the pooling method combined with ISJ and the same set of measures (MAP,P(20),GMAP, andbpref). New ideas include the ranking of systems by the average rank of these measures, and to further involve the participating groups in the relevance assessment process.

7.5 Summary

This chapter reported onImageCLEFphoto 2006, the first evaluation effort for (multilingual) VIR from a generic photographic collection (e.g. photographs of holidays and events).

First, after a general introduction to ad-hoc retrieval tasks at ImageCLEF, the motivation for providing ImageCLEF with the resources and functionality of the IAPR TC-12 Image Benchmark was presented, followed by a chronological description of the organisation and realisation of the evaluation event from January to December 2006. In particular, it was highlighted how the individual benchmark components were generated and used in the light of ImageCLEFphoto: this in- cluded the image collection and the query topics as well as the relevance judgments and the choice for a particular set of performance measures.

ImageCLEFphoto 2006 saw the submission of 157 system runs by 12 participating groups from 10 different countries. A description for each of the systems used in the evaluation was provided, together with an analysis of their retrieval performance with respect to several submission parameters and topic dimensions. Some of the findings include:

• a combination of visual and textual features generally improves retrieval effectiveness;

• visual features often work well for more visual queries;

• feedback and query expansion can help to improve retrieval effectiveness.

Although some of these trends had been shown for other domains before, Image- CLEFphoto 2006 was the first large-scale evaluation event to actually investigate these also for the domain of multilingual retrieval from a generic photographic collection. Finally, an analysis of the event was provided too, including the evaluation of the task difficulty, the choice of performance measures, the feedback of participating groups and, based on it, the future prospects for ImageCLEFphoto 2007 and onwards.

After the image retrieval community had been calling for resources similar to those used by TREC in the document retrieval domain,ImageCLEF has begun to provide such resources also within the context of VIR in order to facilitate stan- dardised laboratory-style testing of (predominately concept-based) image retrieval systems. By running evaluation tasks which are modelled on scenarios found in multimedia use today, the barriers between research interests and real-world needs have been addressed.

These resources now also include a benchmark suite for retrieval from generic photographic collections, a domain that had lacked such resources for retrieval evaluation for a long time, although it had been estimated to be likely to become of increasing interest to researchers with the growth of the desktop search market (and the popularity of tools such as FlickR). By joining the IAPR TC-12 Image Benchmark with the ImageCLEFphoto ad-hoc retrieval task, the need for evaluation events in this domain has now been satisfied, and the gap has finally been filled.

Chapter 8

In document IAPR TC-12 image collection (Page 115-119)