A Review of Color Image Segmentation Based on Visual Attention

10  Download (0)

Full text


2017 2nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5

A Review of Color Image Segmentation

Based on Visual Attention

Ning-yu ZHANG


, Qing LIU, Wen-zhu YANG*, Si-le WANG,

Zhen-chao CUI, Li-pin CHEN and Xiang-yang CHEN

College of Computer Science and Technology, Hebei University, Baoding 071002, China


*Corresponding author

Keywords: Visual Attention, Color Image Segmentation, Saliency

Abstract. Color image segmentation is a key technology in the field of computer vision. The quality of the segmentation results has a decisive influence on the quality of subsequent image analysis. In essence, computer vision is used a computer to replace the human vision to complete the target extraction and determination. Most of the traditional methods focus on ontology characteristics and ignore the role of human in the analysis process. In recent years, the visual saliency which simulates the process of the extraction of visual object by the human has been greatly improved in the field of image segmentation. Firstly, the visual attention calculation model is introduced. Then, the applications of visual attention calculation model in color image segmentation were summarized. Finally, the challenges in visual attention modeling at present were analyzed.


Color image segmentation is an important issue in the field of computer vision. The results of the segmentation are affect the qualities of the subsequently image identification, analysis and understanding [1]. Color image segmentation divide an image into different areas, which have their own special meanings and there is no intersection between each other, far more and they should comply with the consistency conditions of the particular areas [2]. Color image segmentation methods using some of basic features (such as color, texture, intensity, orientation, and so on) divide the image into different areas [3]. So far, no segmentation methods are applicable for most situations; most methods are used in specific scenarios.

Typical color image segmentation approaches, such as based on clustering, threshold, support vector machine (SVM) and so on, usually are complex and slow. Since Koch and Ullman proposed the concept of saliency in 1985, while many researchers began to study visual saliency. With the developing of the visual saliency, more researchers gradually concern to the methods which called color image segmentation based on visual saliency. These methods not only can make the segmentation results closer to the characteristics of Human Visual System (HVS), but also meet the human expectation and requirements, meanwhile improved the segmentation rate [4].


Saliency originates from visual rarity, unpredictability, surprise, or uniqueness, and is often generated from variations in image attributes like gradient, boundaries, color, and edges. Visual saliency detection and segment the object regions, which are most prominent and significant in the whole scene. Extracted saliency map are widely used in many computer vision applications, such as object recognition, image segmentation, image retrieval, image editing, and so on [7].

Color image segmentation is a process that simulates the human vision perceptual system, separated object-of interest region. Color image segmentation based on visual attention, first of all, through the visual attention method to locating interested object, then use other methods for accurate segmentation. Using visual attention mechanism to achieve the complex background of color image segmentation, not only conform to the process of human vision perceptual system, but also make more sense to the process of human development, and become one of research hotspots in the field of machine learning for its algorithm rate is fast and accurate. This will lay a good foundation for machine learning.

Visual Attention Computational Model

The Driving Mode of Visual Attention

Attention belongs to the concept of psychology and a part of cognitive process [8]. Attention has the branch of visual attention and auditory attention [9], the role of visual attention is quickly find interested target or area from a large amount of visual information. HVS can receive the number of visual data about 108 to 109 bits per second, said the study [10,11].

Visual attention, according to the process of forming, can be divided into two different types: the first one is bottom-up, which is rapid, task-independent and depends on the image low-level features information (for example, color, texture, intensity, orientation, and so on); the second one is top-down, which is slower, task-dependent, volition-controlled [12]. The illustration of two factors that guide human attention is shown in Fig.1.

Figure 1. Examples of two ways of visual attention (Items in the first row that attract bottom-up attention include the vertical bar among horizontal bars and the red bar among gray bars (Images are from [13].). Pedestrians in the second row attract top-down attention (Images are from “Walking1” and “Walking2”


Classification of Visual Attention Computational Model

Visual attention computational model’s development originates from a modal so-called “feature integration theory,” that are proposed by Treismn and Gelade [15] in 1980. This visual theoretical model points out the important visual characteristics of visual attention, and states how these features are integrated to the allocation of attention. Then in 1998, Koch and Ullman [16] according to “feature integration theory” proposed the first comparative perfect calculation model of visual attention, namely ITTI model, and put forward the concept of “saliency map” for the first time. Since then the research of visual attention enters the fast development period, producing a lot of visual attention model, these algorithms are inspired by ITTI model.

Cognitive Attention Models

Actually, almost all attentional models are directly or indirectly enlightened by the model of “feature integration theory” raised by Treisma and Gelade [15], tells the important visual features, and how human attention are controlled by these visual features. Subsequently, Koch and Ullman proposed a feedforward model to combine these features, and introduced the concept of “saliency map” for the first time, which represented the salient regions in the scene. At the same time, they also introduced the "Winner-take-all" neural network to select the area of the most significant, and use a “Inhibition of return” suppress strategy allows the focus of attention to the next most significant area.

The earliest cognitive attention model based on bottom-up proposed by Itti et al. [17] in 1998, which combines the human visual system and neural network, and further modified the model in 2003 [18], as shown in figure 2 is the architecture of visual attention computational model based on bottom-up put forward by Itti. Itti model is the most representative, and has become the standard of the bottom-up visual attention models.

Linear filtering

colors intensity orientations

Center-surround differences and normalization

Feature maps

(12 maps) (6 maps) (24 maps)

Across-scale combinations and normalization

Linear combinations Conspicuity maps


Attended location

Inhibition of return Input image

Saliency map


For an input image, this model extracts three primary visual characteristics: colors, intensity and orientations, using center-surround operation in a variety of scales to generate feature maps which reflect the probabilities of significant, and then merge three feature maps to obtain the final saliency map. In addition, using the biological “Winner-take-all” competition mechanism to get the most significant space position, which are used to guide the selection of attention’s location, finally using “Inhibition of return” method to complete the transfer of visual focus.

An input image is subsampled into a Gaussian pyramid and each pyramid level is decomposed into channels for Red (R), Green (G), Blue (B), Yellow (Y), Intensity (I), and orientations (O). From these channels, center-surround “feature maps” fl are constructed and normalized for different features l. In each channel, maps are summed across scale and normalized again:

4 4

, ,

2 3

( c ), ,

l c s c l c s I C O

f =N

 

= += + f ∀ ∈ ∪l L LL (1)

{ }







0 , 45 ,90 ,135 ,0 0 0 0



L = I L = RG BY L = (2)

Then these maps are linearly summed and normalized once again to produce the “saliency maps”:

, ( ), ( ).

c O

I l C l L l O l L l

C f C N f C N f

∈ ∈

= =



Finally, significant maps are linearly combined once again to produce the saliency map:

{, , }


. 3 k I C O k

S =

C (4)

This model’s four implementations: Saliency Toolbox (STB) by Walther [19], Matlab code by Harel [20], iNVT by Itti [17], and VOCUS by Frintrop [21]. These methods’ code can be found on the Internet.

Most existing visual cognitive models are based on Itti model, for attention is in general unconsciously driven by low-level features such as contrast, colors, and intensity. These methods generally following three steps: first, feature extraction; second, saliency computation; last, the important operations for the produce of saliency map are winner-take-all, or inhibition-of-return, or other non-linear operations.

Cognitive attention model can express the bottom-up attention, and already has been successfully used in computer vision, such as classification and the prediction of attention location which had achieved high accuracy rate.

Spectral Analysis Models


Hou et al. [22] proposed the spectral residual saliency model based on the idea that similarities imply redundancies. They suggest that the statistical singular in spectrum domain corresponding to the abnormal area, the objects become significant in this region. So, from the image’s results of the amplitude spectrum subtract the amplitude spectrum of local average filtering, combined with the original phase, using Fourier inverse transformation that can be get the saliency map in the spatial domain.

After the SR method, Guo et al. [23] found a method called phase spectrum of Fourier transform, it can receive better results of significant prediction to discard the amplitude spectrum but retain the phase spectrum information.

Achanta et al. [24] proposed Frequency tuned method used in significantly region detection. This method will transform RGB image to CIELab color space, and Gaussian smoothing, after that subtracting the image feature vector arithmetic average, finally according the point to get amplitude that can obtain the saliency map.

Li et al. [25] proposed Hyper-complex Fourier transform. Through analysis the frequency domain found that the spike in the amplitude spectrum corresponding to the repeat pattern of the original signal, and then using low-pass filter for amplitude spectrum- Spectral filtering (SF) suppressing repetitive patterns to stand out the prominent signal.

Saliency model based on spectrum analysis are simple to explain and implement. These methods also received great success in the aspects of the significant focus prediction and the saliency region detection. Due to the implementation based on fast Fourier transform, it can satisfy the real-time requirement, and compared to iNVT similar models, computation speed can be increased nearly 10 times. But its biological rationality is not very clear.

Graphical Model

A graphical model is a probabilistic framework in which a graph denotes the conditional independence structure between random variables. The eye is regarded as a time series in this category of visual attention. Since there are most of hidden variables affect the generation of eye movement, as a result, approaches like Hidden Markov Models (HMM), Dynamic Bayesian Networks (DBN), and Conditional Random Fields (CRF) have been incorporated.

Liu et al. [26] used a series of new features and CRF integrate these features, for detect the significant object. Harel et al. [20] proposed Graph-based visual saliency (GBVS), which used the pixels’ association of the graph-based, and used Markov chain to calculate saliency.

Recently, zhang et al. proposed a salient object detection based commute-time distance to extract important objects. In this paper, they integrate background and convex hull prior maps to get the final saliency map.

Graph-based models can model the complex attention mechanism, thus can obtain better prediction ability. But the disadvantage of this model is the highly complexity, especially involved in training and readability.

Color Image Segmentation Based on Visual Attention


Color Image Segmentation Based on Cognitive Models

Yang et al. [27, 28] according to inspiration of Itti model, using the image low-level features to segment image. Firstly, respectively extracting the red, green, and blue color features from the R, G, and B three channels, then according to Itti model to calculate the saliency map based on the three colors features. The purpose of this paper is to detect the cotton foreign fibers, according to the different color information; different significant maps give different weights to generate the final saliency map. Finally using a simple threshold method can achieve the goal of image segmentation.

Zhang et al. [29] combines the advantages of Itti and SR models, first of all transform the RGB space to HIS space, then according to Itti algorithm extract the intensity, saturation, and hue feature maps in multiscale, and analysis the feature maps in frequency domain according to SR, getting the saliency object information when remove the redundant information. Moreover, transforming to spatial domain and build the final saliency map. Finally, using threshold method can obtain the separation of object and background.

Wang et al. [30] respectively calculate the color and intensity saliency maps based on Itti algorithm. The former’s purpose is to detect the object of color foreign fibers, while the latter is the gray foreign fibers. If combining the above two saliency maps can segment the all foreign fibers objects.

Liu et al. [31] used graph-cut method to segment image based on visual saliency. Saliency map is usually directly using threshold for image segmentation, so the versatility is poor. But this paper used Graph Cuts method, using the saliency map by GBVS to help segment the image, so the versatility is strong.

Color Image Segmentation Based on Spectral Analysis Models

Zhang et al. [32] proposed image segmentation based on visual saliency through analysis in frequency domain, using low-level features, received saliency maps by respectively from the local and global to compute the features visual saliency. And then integrating the two saliency maps to generate a final saliency map. At the last, the saliency map used a threshold to get a binary image, added the binary image and original image can realize the segmentation of target and background.

Guo et al. [33] through combined visual saliency with Support Vector Machine (SVM) to realize the image segmentation, first finding significant region by visual attention mechanism, then using morphological operations to process image, meanwhile, automatic selection and label samples. Through training the SVM, finally use it for image segmentation.

Gao et al. [34] find significant region by SR, using dynamic threshold to obtain object’s outline, and regarding the outline as a Chan-Vese initial curve. The advantage of this method is to prevent the blindly looking for location to evolution, and reduce the interference of background.


Our Research Work

In this paper, firstly, we read a large number of literatures, and classify them to different categories, including color image segmentation, visual saliency, and color image segmentation based on visual saliency. Meanwhile, we also analysis the advantages and disadvantages about these approaches.

Summary and Forecasting

Image segmentation is a very important and challenging problem for computer vision and image analysis. There are large gaps for image features and content, the reasons including external environment, differences of imaging equipment, the influence of illumination, and the shelter. Therefore, the current image segmentation algorithms are generally poor versatility in various fields, and can only according to the specific scenarios and specific requirements to design the corresponding image segmentation algorithms.

As people study HVS deeper, the applications of visual attention mechanism to color image segmentation are consistent with the results of human visual perception.

According to visual attention model, most researchers focus on the visual attentions that are based on the bottom-up drive way, ignoring the top-down. In order to further improve the accuracy of color image segmentation, the implementation of color image segmentation based on the combination of the bottom-up and top-down is an important research aspect.


Visual attention plays an indispensable role in the process of HVS. Through the guidance of visual attention mechanism, can quickly search to the interested target under complex scene, by using visual attention mechanism to detect significant objects, can effectively reduce the time required to find potential targets under complex scene. There are significant for color image segmentation by using human visual attention mechanism, and improve the technology of computer vision image segmentation.

The visual attention mechanism is applied to the segmentation of color images under complex background, more similar to human visual system and more stable. Imitating human visual cognitive mechanism, it’s a popular trend for fusion and cross about relevant subject information and human vision. With the comprehensive development of computer vision, neurology, biology, psychology and other disciplinary, a color image segmentation method which is similar to HVS and better robustness will be implemented.


The authors thank The Natural Science Foundation of Hebei Provence (F2015201033) and The Natural Science Foundation of Hebei Provence (F2017201069), for their financial support.



[2] Lou, X. P., Tian, J., Zhu, G. Y., et al.: A Survey on Image Segmentation Techniques[J]. Pattern Recognition and Artificial Intelligence, 1999, 12(3):300~312. [3] Jain, N., Lala, A.: Image segmentation: A Short Survey[C]// Confluence 2013: Next Generation Information Technology Summit. IET, 2013:380-384.

[4] Kohonen, T. A computational model of visual attention[C]// International Joint Conference on Neural Networks. IEEE Xplore, 2003:3238-3243 vol.4.

[5] Itti, L., Koch, C.: A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention [J]. Vision Research, 2000, 40(12):1489-1506.

[6] Borji, Ali, D. N. Sihite, and L. Itti.: Salient Object Detection: A Benchmark[J]. European Conference on Computer Vision Springer-Verlag, 2012:414-429.

[7] Cheng, M. M., Zhang, G. X., Mitra, N. J., et al.: Global Contrast Based Salient Region Detection [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2011, 37(3):409-416.

[8] Borji, A., Itti, L.: State-of-the-Art in Visual Attention Modeling [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013, 35(1): 185-207. [9] Li Wan-Yi, Wang Peng, Qiao Hong.: A Survey of Visual Attention Based Methods for Object Tracking [J]. ACTA AUTOMATICA SINICA, 2014, 40(4):561-576. [10] Koch, K., Mclean, J., Segev, R., et al.: How Much, the Eye Tells the Brain [J]. Current Biology, 2006, 16(14):1428-34.

[11] Katsuki, F., Constantinidis, C. Bottom-Up and Top-Down Attention [J]. Neuroscientist, 2013, 20(5):509-521.

[12] Frintrop, S.: Computational Visual Attention. Computer Analysis of Human Behavior. London: Springer, 2011. 69−101.

[13] Itti, L.: Visual Salience [Online], Available: http://www.

scholarpedia.org/article/Visual salience, July 24, 2013.

[14] Wu, Y., Lim, J., Yang, M. H.: Online Object Tracking: A benchmark. In: Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 2411−2418.

[15] Treisman, A. M., Gelade, G.: A Feature-Integration Theory of Attention [J]. Cognitive Psychology, 1980, 12(1):97-136.

[16] Koch, C., Ullman, S.: Shifts in Selective Visual Attention: Towards the Underlying Neuronal Circuitry. Human Neurobiology, 1985, 4:219−227.

[17] Itti, L., Koch, C., Niebur, E.: A Model of Saliency-Based Visual Attention for Rapid Scene Analysis [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259.

[18] Itti, L. Modelling Primate Visual Attention [J]. Computational Neuroscience, 2003:607-626.


[20] Harel, J., Koch, C., Perona, P. Graph-based visual saliency. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems, NIPS 2006. New York: Neural Information Processing System Foundation, 2007. 545−552.

[21] S. Frintrop, VOCUS.: A Visual Attention System for Object Detection and Goal-Directed Search. Springer, 2006.

[22] Hou, X., Zhang. L.: Saliency Detection: A Spectral Residual Approach. In: Proceedings of the 2007 Computer Vision and Pattern Recognition. Minneapolis, MN: IEEE, 2007. 1−8.

[23] Guo, C. L., Ma, Q., Zhang, L. M.: Spatio-Temporal Saliency Detection Using Phase Spectrum of Quaternion Fourier Transform. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE, 2008. 1−8.

[24] Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency tuned Salient Region Detection. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1597−1604.

[25] Li, J., Levine, M. D., An, X., Xu, X., He, H.: Visual Saliency Based on Scale-Space Analysis in the Frequency Domain. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(4): 996−1010.

[26] Liu, T., Sun, J., Zheng, N. N., Tang, X. O., Shum, H. Y.: Learning to Detect a Salient Object. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN: IEEE, 2007. 1−8.

[27] Yang, Wenzhu, et al.: Saliency-Based Color Image Segmentation in Foreign Fiber Detection. Mathematical & Computer Modelling 58.s 3–4(2013):852-858.

[28] Zhang, C., Yang, W., Liu, Z., et al.: Color Image Segmentation in RGB Color Space Based on Color Saliency[J]. Ifip Advances in Information & Communication Technology, 2014, 419:348-357.

[29] ZHANG, Huawei, ZHENG, Yafeng, ZHANG, Qiaorong.: Color image segmentation based on visual attention mechanism[J]. Computer Engineering and Applications, 2011, 47(10):154-157.

[30] Wang, Sile, Fan, Shiyong, Lu, Sukui, et al.: Visual Saliency Map Based Color Image Segmentation for Foreign Detection[J]. Computer Engineering and Design, 2013, 34(08):2783-2787.

[31] Liu, Yi, Huang, Bing, Sun, Huaijiang, et al.: Image Segmentation Based on Visual Saliency and Graph Cuts[J]. Journal of Computer-Aided Design & Computer Graphics. [32] Zhang, Qiaorong, Jing, Li, Xiao, Huimin, et al.: Image Segmentation Based on Visual Saliency[J]. Journal of Image and Graphics, 2011, 16(05):767-772.

[33] GUO, Wentao, WANG, Wenjian, BAI, Xuefei.: SVM Model for Segmentation of Color Image Based on Visual Attention. Computer Engineering and Applications, 2011, 47(36):174-176.



Figure 1. Examples of two ways of visual attention (Items in the first row that attract bottom-up attention include the vertical bar among horizontal bars and the red bar among gray bars (Images are from [13].)

Figure 1.

Examples of two ways of visual attention (Items in the first row that attract bottom-up attention include the vertical bar among horizontal bars and the red bar among gray bars (Images are from [13].) p.2
Figure 2. Architecture of Itti saliency-based visual attention model (The figure is adapted from [17])

Figure 2.

Architecture of Itti saliency-based visual attention model (The figure is adapted from [17]) p.3