our superpixel algorithm performed better than the other algorithms.
7.2
Some reflections and future research
We took a first principles approach in terms of spatial frequency content of saliency maps and it led us to develop a couple of simple but very effective algorithms for salient object detection using only the features of color and intensity. Our algorithms consistently proved to be more effective that more complicated algorithms that use additional features or more sophisticated models. We found that it is better to use simple features for saliency detection since very little can be guessed about the form and features of an unknown object. An evident course of future work is saliency detection in videos. It would require taking into account object motion, which is a very significant cue for visual attention
Human visual attention is also guided by other aspects of saliency apart from the one arising from color and intensity contrast, namely, shape, orientation, clo- sure, symmetry, as well as direction and speed of motion, in case of moving scenes. Some researchers have tried taking into account some of these aspects but without resounding success unless dealing with very specific images. This is both because of the limitations of the features used by them as well as the lack of a robust mechanism to combine the use of various features. This therefore remains a future opportunity to improve saliency detection.
A lot of what we know about vision is from psycho-visual studies and neuro- science research. Usually the proposed models explain only some of the observed visual capabilities but there are newer discoveries being made in the areas of psy- chophysics and neuroscience. We should update our models with these discoveries and improve saliency detection.
Saliency detection can be used for several simple but useful applications. Some of the works in the offing are non-photo-realistic rendering, saliency-based image filtering, and saliency-based contrast stretching and color correction. Saliency de- tection may also aid in clever use of printing ink - more dots per inch for salient regions as opposed to non-salient ones. This warrants further investigation.
Not everything that is salient is important to us. Saliency may direct visual attention, but our visual system quickly ignores it for what is more important, even if it is less salient. While we assume that saliency alone can direct visual attention, it would be true only in a completely alien environment. But humans have a very strong familiarity with most of the environments they encounter in their daily lives. So, in most cases we are able to ignore ‘salient noise’, i.e salient regions or objects that grab our involuntarily attention for a very brief moment, to be able to find the things that matter to us. To widen the range of real world applications for everyday use, the next step to task-independent saliency detection is task-dependent object detection, particulary multi-category object detection. As a research area, this is a vast and difficult topic. We can make a small headway, at least in domain
specific object detection, using novel feature extraction techniques like superpixel segmentation, and using clever machine learning algorithms. However, the evident fact that the human visual system is capable of discriminating between thousands of objects in a fraction of a second does suggest emphatically that there is a highly simplified and efficient underlying abstraction scheme that is not explained by the algorithmic models we use presently. This remains an open challenge to overcome in the future.
Image database saliency is an interesting area to delve into given the deluge of images that is getting increasingly difficult to manage. While we take into account the content of the images using simple features, there are a lot of other aspects of the image content (e.g. the objects, the aesthetics, etc.) that can be considered for assigning an interestingness value to it. Further research can be done in abstract- ing the images better, computing image similarity better, and experimenting other schemes of defining and finding interesting images. This can surely be aided by user studies since the notion of interesting images is highly subjective. It is also possible to extend database saliency techniques to web-mined images from popular online image databases like Flickr and Picasa.
Another aspect that needs more research in this connection is image clustering techniques. One of the main limitations of several clustering techniques is that they are not easily scalable. Ideally, it should be possible to update clusters on the fly with images added or removed from the database. With ease of scalability in mind, approaches like locality sensitive hashing could be used. The approaches should be non-iterative in nature and should be able to support clustering in high-dimensional spaces.
The number of pixels in images is growing at a rapid rate. It puts a heavy computational overhead on most image algorithms. Superpixels offer an excellent option of lowering the complexity by a few orders of magnitude. The abstraction performed by superpixels is preferable over sub-sampling since it is anisotropic and takes into account the local statistics. So, there is room for more applications of superpixels, including in image compression, optical flow computation, and so on. We can also look into parallelization of superpixel segmentation techniques and deployment on mobile devices.
Finally, we would like to make a small case for simplicity and result oriented top-down research. For saliency detection, using center-surround filtering by keep- ing the appearance of the saliency map as the objective was surprisingly ignored. The same can be said about using color and spatial information together for su- perpixel segmentation using a known clustering technique instead of using more complex models. In both cases, our algorithms were developed using a goal-oriented approach. Instead of choosing a model or tool to find an application for, we tried to find an effective approach for dealing with the task at hand. The result was simple algorithms that outperformed more complex state-of-the-art approaches.
Appendix A
Text Detection
A.1
Introduction
The availability of information about the content of an image has always helped in image classification problems. Text of different languages frequently appears in various forms in images, and conveys useful information about names of people, titles, locations, dates of events, etc.. This information is potentially very useful for annotating images automatically, thereby aiding image retrieval. There can be several other applications of such a text reader, like automatic video annotation, number plate recognition, robot navigation etc..
Detecting and reading text automatically is more difficult if it appears in a natural image (as opposed to a scanned and thresholded binary image of a scanned document).
The task of text reading requires both localization (or detection) of text regions in the image, as well as recognition of this text. While previous works have mainly addressed the issue of text detection/localization in images, relying on third party Optical Character Recognition (OCR) software for text recognition, we address both issues in this chapter.
In this chapter, as a first step we first detect text regions in images using an AdaBoost [49, 137] based detector and then we binarize the text region so that it is suitable for being fed to an OCR engine.