3. SPATIAL OPERATIONS AND TRANSFORMATIONS
3.2 Templates and Convolution
Template operations are very useful as elementary image filters. They can be used to enhance certain features, de-enhance others, smooth out noise or discover previously known shapes in an image.
Convolution
USE. Widely used in many operations. It is an essential part of the software kit for an image processor.
OPERATION. A sliding window, called the convolution window (template), centers on each pixel in an input image and generates new output pixels. The new pixel value is computed by multiplying each pixel value in the neighborhood with the corresponding weight in the convolution mask and summing these products.
This is placed step by step over the image, at each step creating a new window in the image the same size of template, and then associating with each element in the template a corresponding pixel in the image. Typically, the template element is multiply by corresponding image pixel gray level and the sum of these results, across the whole template, is recorded as a pixel gray level in a new image. This "shift, add, multiply" operation is termed the "convolution" of the template with the image.
If T(x, y) is the template (n x m) and I(x, y) is the image (M x N) then the convoluting of T
In fact this term is the cross-correlation term rather than the convolution term, which should be accurately presented by
∑∑
−Introduction to Image Processing and Computer Vision by LUONG CHI MAI http://www.netnam.vn/unescocourse/computervision/computer.htm
However, the term "convolution" loosely interpreted to mean cross-correlation, and in most image processing literature convolution will refer to the first formula rather than the second.
In the frequency domain, convolution is "real" convolution rather than cross-correlation.
Often the template is not allowed to shift off the edge of the image, so the resulting image will normally be smaller than the first image. For example:
* result is obtained from
(1 x 1) + (0 x 3) + (0 x 1) + (1 x 4).
Many convolution masks are separable. This means that the convolution can be per formed by executing two convolutions with 1-dimensional masks. A separable function satisfies the equation:
( ) ( ) ( )x,y g x h y
f = ×
Separable functions reduce the number of computations required when using large masks This is possible due to the linear nature of the convolution. For example, a convolution using the following mask
can be performed faster by doing two convolutions using
1
since the first matrix is the product of the second two vectors. The savings in this example aren't spectacular (6 multiply accumulates versus 9) but do increase as masks sizes grow.
Common templates
Just as the moving average of a time series tends to smooth the points, so a moving average (moving up/down and left-right) smooth out any sudden changes in pixel values removing noise at the expense of introducing some blurring of the image. The classical 3 x 3 template
Introduction to Image Processing and Computer Vision
does this but with little sophistication. Essentially, each resulting pixel is the sum of a square of nine original pixel values. It does this without regard to the position of the pixels in the group of nine. Such filters are termed 'low-pass ' filters since they remove high frequencies in an image (i.e. sudden changes in pixel values while retaining or passing through) the low frequencies. i.e. the gradual changes in pixel values.
An alternative smoothing template might be
This introduces weights such that half of the result is got from the centre pixel, 3/8ths from the above, below, left and right pixels, and 1/8th from the corner pixels-those that are most distant from the centre pixel.
A high-pass filter aims to remove gradual changes and enhance the sudden changes. Such a template might be (the Laplacian)
Here the template sums to zero so if it is placed over a window containing a constant set of values, the result will be zero. However, if the centre pixel differs markedly from its surroundings, then the result will be even more marked.
The next table shows the operation or the following high-pass and low-pass filters on an image:
High-pass filter
Introduction to Image Processing and Computer Vision
After high pass
2
After low pass
9
Here, after the high pass, half of the image has its edges noted, leaving the middle an zero, while the bottom while the bottom half of the image jumps from −4 and −5 to 20, corresponding to the original noise value of 6.
After the low pass, there is a steady increase to the centre and the noise point has been shared across a number or values, so that its original existence is almost lost. Both high-pass and low-pass filters have their uses.
Edge detection Templates such as and
B
Introduction to Image Processing and Computer Vision by LUONG CHI MAI http://www.netnam.vn/unescocourse/computervision/computer.htm
highlight edges in an area as shown in the next example. Clearly B has identified the vertical edge and A the horizontal edge. Combining the two, say by adding the result A + a above, gives both horizontal and vertical edges.
Original image
See next chapter for a fuller discussion of edge detectors.
Introduction to Image Processing and Computer Vision by LUONG CHI MAI http://www.netnam.vn/unescocourse/computervision/computer.htm
Storing the convolution results
Results from templating normally need examination and transformation before storage. In most application packages, images are held as one array of bytes (or three arrays of bytes for color). Each entry in the array corresponds to a pixel on the image. The byte unsigned integer range (0−255) means that the results of an operation must be transformed to within that range if data is to be passed in the same form to further software. If the template includes fractions it may mean that the result has to be rounded. Worse, if the template contains anything other than positive fractions less than 1/(n x m)(which is quite likely) it is possible for the result, at some point to go outside of the 0-255 range.
Scanline can be done as the results are produced. This requires either a prior estimation of the result range or a backwards rescaling when an out-of-rank result requires that the scaling factor he changed. Alternatively, scaling can he done at the end of production with all the results initially placed into a floating-point array. The latter option assumed that there is sufficient main memory available to hold a floating-point array. It may be that such an array will need to be written to disk, which can be very time-consuming. Floating point is preferable because even if significantly large storage is allocated to the image with each pixel represented as a 4 byte integer, for example, it only needs a few peculiar valued templates to operate on the image for the resulting pixel values to be very small or very large.
Fourier transform was applied to an image. The imaginary array contained zeros and the real array values ranged between 0 and 255. After the Fourier transformation, values in the resulting imaginary and real floating-point arrays were mostly between 0 and 1 but withsome values greater than 1000. The following transformation wits applied to the real and imaginary output arrays:
F(g) = {log2-[abs(g) +15}x 5 for all abs(g) > 2-15 F(g) = 0 otherwise
where abs(g) is the positive value of g ignoring the sign. This brings the values into a range that enabled them to be placed back into the byte array.