4.2 Automatic Image Re-targeting
5.1.1 Image clusters and database saliency
We group the images of the database into a given number of clusters. The number of clusters formed depends on how many images are required by the user to create a summary of the database. Clustering is performed using the k-means algorithm (Section 2.5.1). For this we need an image signature, i.e. a compact feature repre- sentation of the image, and a distance measure to compute the similarity between
1
The material presented in this chapter is also available in the reference [6]
(a) An example database of images
(b) Some images are more interesting than others. Here the size of the image is relative to the degree of ‘interestingness’.
Figure 5.1: The goal of database saliency detection is to automatically rank images in the order of their ‘interestingness’.
5.1. Ranking images 89 I I I I I I I I I I I I I I I I I I I I C C C C C C C C C C C C
Figure 5.2: An artificial input image with coherent pixels labeled C and incoherent pixels labeled I. In the case shown here, coherent pixels have more than six similar neighbors while incoherent pixels have six or less.
two images. We propose the use of image saliency weighted color coherence vectors (ISWCCV) as the image signature and the χ2 distance as a similarity measure [80].
Image saliency weighted color coherence vectors (ISWCCV)
Simple color histograms are poor abstractions of images as two visually dissimilar images can have the same histogram (see Fig. 5.3(a)). Color coherence vectors (CCV) were proposed by Zabih and Pass [111] to alleviate this problem. They label pixels as belonging to one of the two classes: coherent and incoherent, as illustrated in Fig. 5.2, and create a separate histogram for each. A coherent pixel is one that has at least a threshold Tnnumber of similar pixels as its neighbors, while an incoherent pixel has fewer.
To obtain an ISWCCV, we use image saliency computed using the method pre- sented in Section 3.2. Instead of simply counting a pixel belonging to a bin, we take into account the saliency value S(x, y), normalized in the interval [0, 1], to compute the height of the bin. Thus, more weight is given to the salient pixels of an image rather than considering the entire image uniformly. The height h(n) of the nth bin is computed as: h(n) = P (x,y)∈bin(n)S(x, y) PB n=1 P (x,y)∈bin(n)S(x, y) (5.1)
where B is the number of bins of the histogram. The expression in the denominator is the normalizing constant. Fig. 5.3 illustrates the creation of ISWCCV using saliency maps.
The χ2 significance test [80] provides a measure of similarity of two histograms h1 and h2 as: χ2(h1, h2) = 1 2 B X n=1 (h1(n) − h2(n))2 h1(n) + h2(n) (5.2)
Histogram of coherent pixels Histogram of incoherent pixels Histogram of coherent pixels Histogram of incoherent pixels ISWCCV CCV ISWCCV CCV Weighted histogram of incoherent pixels Weighted histogram of coherent pixels Weighted histogram of incoherent pixels Weighted histogram of coherent pixels (a) (b) (c) (d)
Figure 5.3: (a) The two images on the left have the same global color histogram but different CCV’s.(b) Coherent and incoherent pixel histograms forming the CCV’s for the two images. (c) Saliency maps corresponding to the input image. (d) Image saliency weighted color coherence vectors obtained by taking into account image saliency for creating the coherent and incoherent color histograms.
5.1. Ranking images 91
Figure 5.4: Computing database saliency as a sum of distances from all cluster centers.
the χ2 similarity measures of the coherent and incoherent pixel histograms:
D(ch1,ih1,ch2,ih2) = αχ2(ch1,ch2) + (1 − α)χ2(ih1,ih2) (5.3)
where the superscripts c and i represent coherent and incoherent pixel histograms, respectively, and α ∈ [0, 1] decides their relative weight. This similarity measure D is used by us for performing the k-means clustering of the images.
We create the histograms on images resized to 40 × 30 pixels. We quantize the values in each channel in the interval [0, 3] i.e. in to four bins. In our implementation, we choose Tn to be 6, i.e, a pixel is considered coherent if it has at least 6 similar neighbors. A neighboring pixel is considered similar if the quantized value of each color channel value is the same as the quantized value of each channel of the pixel in consideration. As as result of the quantization, each histogram has 64 (4×4×4) bins.
Finally, we use α = 0.3 to relatively weigh the coherent and incoherent histograms.
Computing database saliency
The ‘interestingness’ value i.e. the database saliency value of the nth image in the database is computed as:
S(n) =
k X m=1
D(chm,ihm,chn,ihn) (5.4)
where k is the number of clusters created, andchmandihm correspond to the cluster centers. This pair of values (i.e. ISWCCV) is computed as the average of all the coherent and incoherent histograms belonging to a cluster.
The bigger S(n) is, the more interesting we consider the image to be. An image is further away from its cluster center when it is less similar to the rest of the images in that cluster. By considering the sum of distances from all the cluster centers we choose images that are most dissimilar to the rest of the images in the database. In other words, we consider those images to be more interesting that have fewer images similar to them. This is visually explained in Fig. 5.4. A possible drawback of this measure of database saliency is that some neighboring images of a cluster can have similar values. To ensure that they father images chosen are dissimilar to each other, we could also consider in computing the database saliency value mutual distances between the images ranked as most salient.
Our method can also be used for an application like video summarization using keyframe extraction. In this case however, we would choose images that are closest to the cluster centers rather than farther away.