2.2 Relevant Research
2.2.2 Edge Detection
Edges form one of the several features that compose an image, and the edge detection methodology focuses on analysing a scene or frame estimating the edges of objects. Edges, in terms of a visual scene, are significant contrast changes in one direction or another, and can typically form the boundary between objects. Interestingly, edge detection also appears in signal processing (usually 1-D edge detection), and so much of the maths used to derive edges in signals can be transferred in some capacity to the 2-D image space (such as Gaussian convolution, Laplacian transforms and Gabor filters). In general, edges can be classified as two different types, ramp or roof type edges (see figure 2.1).
Fig. 2.1 One-dimensional edge profiles. [58]
It is unlikely, or certainly rare in real world signals, to get a crisp step or line edge due to contrast boundaries not being as sharp as these. This is mostly down to the capture technology which interpolates and adds low frequency components to the boundaries yielding the ramp or roof style edge. Both step and line type edges can be generated in artificial test images. Measuring of correctly detected edges can be subjective if just a visual reference of the image is taken. A better more quantifiable method is edge counting. This is where each edge in a
2.2 Relevant Research 19
scene is counted, and the resultant output of the edge detector is counted. These can yield true positives (actual edges in the scene that were detected), false positives (detected edges that do not appear in the scene) and false negatives (edges that are in the scene but were not detected). In a real world scenario, it is difficult to describe a true edge vs a false edge due to the complexity of textures and image angles, and therefore it is common practice to describe the performance of an edge detector against a known artificial image. The gradient magnitude of an edge in its simplest form is the differential of the intensity against a particular axis, so in the x-axis this would be the formula:
G(f(x)) =(dI)
dx (2.1)
For continuous, non-digital images it is usual to define the x and y directions in terms of maximum gradient (thus the x-axis is the angle along the maximum gradient). The interest for this project is in digital imagery however, and thus the x and y axis remain as the digital axis depicted by pixels. One of the earliest examples of utilising gradients to detect edges in an image is the Roberts Cross operator, which uses the above principle in 2-Dimensional space to extract gradients [121]. Roberts proposed the equation:
G(f(i,j)) =|f(i,j)−f(i+1,j+1)|+|f(i+1,j)−f(i,j+1)| (2.2) which results in intensity changes in a diagonal direction. The equation can be shown as two kernels [58] figure 2.2
Fig. 2.2 Roberts Operator. [58]
The computed gradients are provided at the interpolated pointi+12,j+12The Roberts operator is simple and efficient but lacks noise tolerance, and its simplicity with respect to modern day computers does not offset its lack of noise tolerance. A method by Erwin Sobel, [135] was introduced which avoids the necessity for an interpolation point by using a 3x3 operator. The Sobel operator is computed with partial derivatives:
2.2 Relevant Research 20
and the gradient magnitude calculated by:
G= q
s2
x+s2y (2.4)
Similar to the Roberts operator the Sobel operator is used as a convolution mask with images:
Fig. 2.3 Sobel Operator. [58]
This operator uses a constant with the partial derivatives such that the pixels directly adjacent to the center mask pixel have more of an emphasis.
In contrast to the Sobel operator, Prewitt [114] developed an operator that also uses a 3x3 kernel, but does not place any emphasis on neighbouring pixels. An excerpt from [58] shows the comparison of edge gradient extraction over the operators discussed which can be seen in figure 2.4
2.2 Relevant Research 21
Fig. 2.4 A comparison of Edge Detectors. a) Original image b) Filtered image, c) Simple gradient using 1 x x2 and 2 x 1 masks, d) Gradient using 2 x 2 masks, e) Robert cross operator, f) Sobel operator, g) Prewitt operator [58]
Further work in Edge Detection has been done by using the second derivative of the gradient. The advantage of using the second derivatives is that at the zero crossing point, this indicates a local maxima in the gradients. The Laplacian is used in the two-dimensional version to obtain the second derivative of the gradients. The Laplacian of f(x,y)is
∇2f =d
2f dx2 +
d2f
dy2 (2.5)
The following partial differential equations can be approximated:
d2f
dx2 = f[i,j+1]−2f[i,j] + f[i,j−1] (2.6) d2f
dy2 = f[i+1,j]−2f[i,j] + f[i−1,j] (2.7)
This yields a mask that can be used to approximate the Laplacian, or second order derivative of the gradient 2.5.
2.2 Relevant Research 22
Fig. 2.5 Laplacian Operator, derived as the second order differential [58]
One of the limitations in using the Laplacian second order differential is that it is highly sensitive to noise, and any noise artifacts apparent in the first order derivatives are going to provide a zero crossing detection in the second derivative. In the paper by [86], they propose a solution to the noise problem of zero-crossing second derivatives by adding a Gaussian filtering stage and smoothing, and following this with a Laplacian to obtain the zero-crossing points. The filtering removes the noise, but also widens potential edges and as such the zero-crossing local maximas are important to extract. The zero-crossing Laplacian output is then convolved with the image to yield the edges, which should be relatively noise free. The Gaussian filter and subsequent Laplacian zero-crossing is shown here:
LoG(x,y) =− 1 π σ4 1−x 2+y2 2σ2 e− x2+y2 2σ2 (2.8)
The limitation of using the Gaussian filter is primarily down to the smoothing constant which is applied to σ. Widening the filter reduces the noise further but also smooths the edge
gradients which can lose resolution.
In the work by Canny [23], the problem of error smoothing and edge definition loss is addressed through the use of non-maxima suppression. The image is convolved with a Gaussian, as with Marr and Hildreth [86], and results in a smoothed image. The gradient of the smoothed image is then approximated using first difference approximations, usually using the Sobel or Prewitt operator.
One of the limitations of using these kinds of edge detectors is that little is suggested about the internal structure of any objects. The early methods such as Roberts operator [121] analysed edges but were susceptible to noise. The later edge detectors are less susceptible to noise, and define clear edges. All the methods throughout the convolution are either losing information (the sharp edges lose the gradient information) or are susceptible to noise. As mentioned earlier, the Edge Detection methodology can be likened to signal processing. Gabor filters used in conjunction with images, proposed by Mehrotra et al [88] provide an optimal balance between frequency resolution and time / spatial resolution. By convolving the filters with an image, at multiple angles across the image it extracts feature descriptors of
2.2 Relevant Research 23
edges in each direction. The Gabor filter is a linear filter, and the frequency and orientation representations successfully model the visual cortex of mammalian brains (thus linked to the thought of similarities in human perception) [85] [30].