Basic concepts on image processing - Basic concepts of computer vision

2.1 Basic concepts of computer vision

2.1.3 Basic concepts on image processing

Approaches based on mathematical morphology compose an important subset of traditional image processing techniques [28,29]. Operations are typically performed in local neighborhoods around pixels, which can be of variable sizes and shapes according to designed structuring elements or “windows”. Basic operations such as erosion and dilation have the eﬀect of “growing” or “shrinking” objects in a binary image, and can be combined into operations such as opening and closing to ﬁll holes or open weakly-connected objects, or image enhancement techniques as top-hat and bottom- hat operations that combines opening and closing procedures to enhance contrast and details in presence of shading [29].

Kernels and convolutional filters

Other popular operations performed at local neighborhoods are ﬁltering using kernels or convolutional filters. They can range from simpler strategies such as basic

Gaussian kernels for image smoothing and Laplacian kernels for edge detection, up to more complex hand-engineered wavelets for analysis of textures and other pat- terns of relevance [29]. Analogously to signal processing operations on 1D signals, convolutional ﬁlters are applied over the whole image in a sliding window fashion, a procedure exploited by modern approaches described in the next sections.

Histograms transformations and thresholding

While the basic concepts behind these techniques are easier to understand using binary and gray-scale images as examples, most of them are naturally extended to analysis of color images. In this domain, image processing techniques using histogram representations are also very common. Exempliﬁed in Figure 2.2, histogram equalization aims at spreading the histogram components to improve image contrast, while histogram matching consists in approximating its distribution to the charac- teristic form of a pre-existent reference distribution [28], which can be of particular relevance to aid computer vision algorithms with robustness to variation on image acquisition conditions.

Moreover, color thresholding is one of the most basic approaches for identiﬁ- cation of objects or regions of interest, where pixels are labeled according to intensity values larger or lower than pre-deﬁned values named threshold [28]. Figure 2.3 illustrates the output of a thresholding operation on the hue channel of the input image.

Geometric transformations

In contrast to such operations that alter intensity values of pixels, another set of image processing techniques known as geometric operations focus instead on alter- ing the spatial relationship between pixels. Studies and techniques on geometry for computer vision constitute an important and vast ﬁeld of research, with the “Multiple

0 5 10

104 a) Original S

0 0.2 0.4 0.6 0.8 1

b) Training Set average S

0 0.2 0.4 0.6 0.8 1 0 1 2 10 5 0 5 10 104 c) Matched S 0 0.2 0.4 0.6 0.8 1 0 5 10

104 d) Matched & Equalized S

0 0.2 0.4 0.6 0.8 1

Figure 2.2: Example of histogram matching and equalization. Histogram c) is obtained by matching a) to b), while histogram d) is the result of equalizing histogram c).

view geometry in computer vision” book by Hartley & Zisserman [32] as a widely used reference discussing its major concepts. For this dissertation, the following concepts are of particular relevance to understand modern state-of-the-art techniques as well as novel approaches herein introduced.

Figure 2.3: Example of image thresholding. Left: input image; middle: hue channel after transforming the image to the HSV color space; right: binary image obtained by thresholding the hue channel.

As summarized in [29], geometric transformations consist of two main operations: i) a spatial transformation of coordinates, and ii) an interpolation of intensity values that define final values of transformed pixels. Spatial transformations known as scaling, rotation, translation and shearing form a set of coordinate transformations referred to as affine transformations, which can be formulated using affine or transformation matrix such as the one in Eq. 2.1.

      x′ y′ z′       =       t11 t12 t13 t21 t22 t23 t31 t32 t33             x y z       , (2.1)

where x, y, and z are the original coordinates of the original image point in homo- geneous form [32], tij are the coeﬃcients of the transformation matrix, and x′, y′,

and z′ _{are the coordinates of the transformed point. In general terms, aﬃne transfor-}

mations preserve linear relationships between points, straight lines and planes, such that a given pair of parallel lines remains parallel after the transformation. Fig- ure 2.4 illustrates each transformation, with the corresponding parameterization of transformation matrices for each case.

As described in following sections, the concept of invariance to aﬃne transformations has been of great importance for the development of computer vision

Figure 2.4: Illustration of diﬀerent aﬃne transformations.

algorithms that aim at robustness against diﬀerent acquisition conditions. The intu- ition for such cases is that, ideally, a descriptor of an object or any entity of interest should provide the same output regardless if the entity is subjected to translation, rotation or other aﬃne transformations.

In document Stochastic Methods for Fine-Grained Image Segmentation and Uncertainty Estimation in Computer Vision (Page 32-36)