2.1 Basic concepts of computer vision
2.1.3 Basic concepts on image processing
Approaches based on mathematical morphology compose an important subset of traditional image processing techniques [28,29]. Operations are typically performed in local neighborhoods around pixels, which can be of variable sizes and shapes accord- ing to designed structuring elements or “windows”. Basic operations such as erosion and dilation have the effect of “growing” or “shrinking” objects in a binary image, and can be combined into operations such as opening and closing to fill holes or open weakly-connected objects, or image enhancement techniques as top-hat and bottom- hat operations that combines opening and closing procedures to enhance contrast and details in presence of shading [29].
Kernels and convolutional filters
Other popular operations performed at local neighborhoods are filtering using kernels or convolutional filters. They can range from simpler strategies such as basic
Gaussian kernels for image smoothing and Laplacian kernels for edge detection, up to more complex hand-engineered wavelets for analysis of textures and other pat- terns of relevance [29]. Analogously to signal processing operations on 1D signals, convolutional filters are applied over the whole image in a sliding window fashion, a procedure exploited by modern approaches described in the next sections.
Histograms transformations and thresholding
While the basic concepts behind these techniques are easier to understand us- ing binary and gray-scale images as examples, most of them are naturally extended to analysis of color images. In this domain, image processing techniques using his- togram representations are also very common. Exemplified in Figure 2.2, histogram equalization aims at spreading the histogram components to improve image contrast, while histogram matching consists in approximating its distribution to the charac- teristic form of a pre-existent reference distribution [28], which can be of particular relevance to aid computer vision algorithms with robustness to variation on image acquisition conditions.
Moreover, color thresholding is one of the most basic approaches for identifi- cation of objects or regions of interest, where pixels are labeled according to intensity values larger or lower than pre-defined values named threshold [28]. Figure 2.3 illus- trates the output of a thresholding operation on the hue channel of the input image.
Geometric transformations
In contrast to such operations that alter intensity values of pixels, another set of image processing techniques known as geometric operations focus instead on alter- ing the spatial relationship between pixels. Studies and techniques on geometry for computer vision constitute an important and vast field of research, with the “Multiple
0 5 10
104 a) Original S
0 0.2 0.4 0.6 0.8 1
b) Training Set average S
0 0.2 0.4 0.6 0.8 1 0 1 2 10 5 0 5 10 104 c) Matched S 0 0.2 0.4 0.6 0.8 1 0 5 10
104 d) Matched & Equalized S
0 0.2 0.4 0.6 0.8 1
Figure 2.2: Example of histogram matching and equalization. Histogram c) is ob- tained by matching a) to b), while histogram d) is the result of equalizing histogram c).
view geometry in computer vision” book by Hartley & Zisserman [32] as a widely used reference discussing its major concepts. For this dissertation, the following concepts are of particular relevance to understand modern state-of-the-art techniques as well as novel approaches herein introduced.
Figure 2.3: Example of image thresholding. Left: input image; middle: hue channel after transforming the image to the HSV color space; right: binary image obtained by thresholding the hue channel.
As summarized in [29], geometric transformations consist of two main opera- tions: i) a spatial transformation of coordinates, and ii) an interpolation of intensity values that define final values of transformed pixels. Spatial transformations known as scaling, rotation, translation and shearing form a set of coordinate transforma- tions referred to as affine transformations, which can be formulated using affine or transformation matrix such as the one in Eq. 2.1.
x′ y′ z′ = t11 t12 t13 t21 t22 t23 t31 t32 t33 x y z , (2.1)
where x, y, and z are the original coordinates of the original image point in homo- geneous form [32], tij are the coefficients of the transformation matrix, and x′, y′,
and z′ are the coordinates of the transformed point. In general terms, affine transfor-
mations preserve linear relationships between points, straight lines and planes, such that a given pair of parallel lines remains parallel after the transformation. Fig- ure 2.4 illustrates each transformation, with the corresponding parameterization of transformation matrices for each case.
As described in following sections, the concept of invariance to affine trans- formations has been of great importance for the development of computer vision
Figure 2.4: Illustration of different affine transformations.
algorithms that aim at robustness against different acquisition conditions. The intu- ition for such cases is that, ideally, a descriptor of an object or any entity of interest should provide the same output regardless if the entity is subjected to translation, rotation or other affine transformations.