The saliency detection algorithm presented in Section 3.2 treats the entire image as the common surround (abstracted as the average image CIELAB color vector) for any given pixel. The premise is that there is no knowledge of the scale of the salient object and therefore it is best to pass all the low-frequency content. We base our new saliency detection algorithm on the premise that we can make assumptions about the scale of the salient object to be detected based on its position relative to the image borders. This can lead to smarter filtering bandwidth choices. This also helps overcome shortcomings of our previous algorithm.
3.3.1 The surround assumption of IGS
In Fig. 3.2 we observe that the larger the scale of the object is, the smaller has to be the low-frequency cut-off for detecting it (i.e., highlighting it fully in the saliency map), i.e larger the surround of the center-surround filter. Since we do not usually know the scale of an object beforehand, it is prudent to assume the worst case scenario and choose a very small low-frequency cut-off, i.e., a large surround. This allows detecting both large and small objects. This is why the method presented in Section 3.2 works well usually.
However, the method suffers from two drawbacks. In cases where the salient object is quite small, the surround is unusually large (whole image) and asymmetric, thereby potentially including a lot of noise. This affects the quality of detection. The second drawback is that if the salient object or region occupies more than half of the image pixels, or if the background is of a highly varied nature such that the largest contribution to the mean vector Iµin Eq. 3.4 comes from the salient object, the salient object becomes closer to the mean and is less salient. Instead, it is the background that appears more salient.
3.3.2 New surround assumption
To detect a large object fully, as in Fig. 3.2, we need a small value of the low- frequency cut-off, which is achieved using a larger surround for center-surround filter. Ideally, we would like to choose a surround just sufficient to fully highlight the object. The question we want to address is if it is possible to choose a compact surround sufficient to highlight the salient object without knowing its scale a priori.
3.3. Saliency detection algorithm - II 41 Input [2.5π , π] [π5, π] [10π, π] [20π, π] [40π, π] [80π, π] [160π , π] [320π , π]
Figure 3.2: Bandpass filtering of the input image with progressively increasing band- width from top to down (values in brackets show spatial frequency range). The first column of images is the same as the last row of Fig. 3.1. The high-frequency cut-off is kept the same while the low-frequency cut-off is reduced. We make three related observations here. One, the larger the scale, the smaller should be the low-frequency cut-off. Two, this also means that the more interior pixels of a salient object need a smaller low-frequency cut-off than the ones closer to edges of the object. Three, if we succeed in detecting a large object, the cut-off chosen usually also allows detecting smaller objects. These three observations form the basis of our improved method of saliency detection.
Taking a cue from the image borders
Fig. 3.2 shows that a large value of low-frequency cut-off only lets us detect the pixels at the borders of a large object. As we lower the value of the low-frequency cut-off, we progressively detect more interior pixels of the object. If we know that we are performing center-surround filtering at the object border, we can use a large value of the low-frequency cut-off. Alternately, if we are performing filtering at the center of the large object, we need to use a small value of the low-frequency cut-off. We do not possess this knowledge a priori about the object size and location. However, we observe that if the pixel belonging to a salient object is close to the image borders, then it is likely to be close to the object borders (see Fig. 3.3). This means that we can use the position of the pixel relative to the image borders as a cue to limit the low-frequency cut-off.
In other words, we can vary the surround of the center-surround filter with respect to each pixel position. We choose to vary the surround symmetrically with respect to the center pixel position. To justify the use of a symmetric surround, let us imagine that at each pixel position in the image we are at the center of a large symmetric object (shown as a dotted blue ellipse) not touching the image borders (Fig. 3.4). We need to choose a low-frequency cut-off for the center-surround filtering that is enough to detect the innermost-pixel of this assumed large object. If we succeed in detecting this fictitious object then we can as well detect any object or part of object smaller than this.
To exploit the boundary based cue, we need to assume that salient objects are not touching image borders, i.e they are fully inside the image. This is a reasonable assumption as Fig. 3.5 shows. The figure is obtained by averaging the 1000 images of our ground truth (Section 3.5) where white indicates object and black indicates the background.
3.3.3 Saliency computation
With the new assumption about the surround we can compute saliency by applying a position-varying bandpass filter at each pixel. The filter bandwidth should vary in such a way that the low-frequency cut-off value of each filter at each pixel reduces progressively as we move towards the center pixel from the image borders.
Position variant difference of boxes filtering
We use box filters for performing bandpass filtering. It allows us to use integral images made popular by Viola and Jones [139] to perform the desired position- dependent center-surround filtering at each pixel. This is both computation and memory efficient, albeit with the tradeoff that the bandpass filtering is not ideal. Thus, for an input image of width W and height H, the symmetric surround saliency
3.3. Saliency detection algorithm - II 43
Figure 3.3: If a pixel belonging to a salient object lies close to the image border, then it can not be far from the object borders (assuming the object is lying inside the image). Such a pixel does not need a very low value of the low-frequency cut-off, suggesting the the low-frequency cut-off for the center-surround filter can be varied according to the position of the pixel relative to the image borders.
Figure 3.4: Images explaining the premise that we can guess the scale of the salient object based on how far from the image borders we are when we are performing center- surround filtering. At each position in the image the low-frequency cut-off value i.e the surround of the center-surround filter should be such that we are able to detect the center pixel of a fictitious large elliptical object (in blue dots). If we can detect this then we can also detect any object smaller than the elliptical object.
Figure 3.5: Average of a thousand ground truth images in which white represents the salient object and black represents the non-salient background. Despite the fact that objects come in all sizes, shapes and locations, most of them lie away from the image borders, and have roughly half the extent of the image dimensions.
value at a given pixel Sss(x, y) is obtained as:
Sss(x, y) = kIµ(x, y) − If(x, y)k (3.5)
where If is the Gaussian blurred image as in Eq. 3.4 and Iµ(x, y) is the average CIELAB vector of the sub-image whose center pixel is at position (x, y) as given by:
Iµ(x, y) = 1 A x+xo X i=x−xo y+yo X j=y−yo I(i, j) (3.6)
with offsets xo, yo, and area A of the sub-image computed as:
xo = min(x, W − x) (3.7)
yo = min(y, H − y)
A = (2xo+ 1)(2yo+ 1)
The sub-image regions obtained in Eq. 3.6 using Eq. 3.7 are the maximum possible symmetric surround regions for a given pixel at the center. This is the reason we call our second saliency algorithm MSSS, for maximum symmetric surround saliency.
The closer a pixel is to the edges, the narrower will be its surround. Notice that in Fig. 3.4 we also show a pixel that falls on the background and the assumed extent of the plausible salient object as blue dotted ellipse. The filtering bandwidth we use here is suited for detecting an object with an extent shown by the blue ellipse and is not sufficient for detecting the background as salient. This is because the