Linear Filter Architectures - Vhdl Image Processing

Using spatial filter kernels for image filtering applications in hardware systems has been a standard route for many hardware design engineers. As a result, various architectures in the spatial domain exist in company technical reports, academic journals and conferences papers dedicated to digital FPGA hardware-based image processing. This is not surprising because of the myriad of image processing applications that incorporate image filtering techniques.

Such applications include but are not limited to image contrast enhancement/sharpening, demosaicking, restoration/noise removal/deblurring, edge detection, pattern recognition, segmentation, inpainting, etc.

Several authors have published papers involving implementing a myriad of algorithms involving spatial filtering hardware architectures for FPGA platforms performing different tasks or used as add-ons for even more complex and sophisticated processing operations..

A sample of application areas in industrial processes include the detection of structural defects in manufactured products using real-time imaging and edge detection techniques to remove damaged products from the assembly line.

Though frequency (Fourier Transform) domain filtering may be faster for larger images and optical processes, spatial filtering using relatively small kernels and make several of these processes feasible for physical, real-time applications and reduce computational costs and resources in FPGA digital hardware systems.

Figure 2.1(i) shows one of the essential components of a spatial domain filter, which is a window generator for a 5 x 5 kernel for evaluating the local region of the image.

Line In 1

Figure 2.1(i) – 5×5 window generator hardware architecture

The boxes represent the flip flops (FF) or delay elements with each box providing one delay. In digital signal processing notation, a flip flop is represented in the z-domain by and in the discrete time domain as ,

where x would be the delayed signal. The data comes in from the left hand side of the unit and each line is delayed by 5 cycles. For a 3 x 3 kernel, there would be three lines and each would be delayed by 3 cycles.

Figure 2.1(ii) shows the line buffer array unit which consists of long shift registers composed of several flip flops. Each line buffer is set to the length of one row of the image.

Thus, for a 128 x 128 greyscale image with 8 bits per pixel, each line buffer would be 128 wide and 8 bits deep.

Line Buffer1

Line Buffer2

Line Buffer5

Line out1

Line out2

Line out5 Data_in

Line Buffer3 Line out3

Line Buffer4 Line out4

Figure 2.1(ii) – Line buffer array hardware architecture

The rest of the architecture would include adders, dividers, and multipliers or look up tables. These are not shown as they are much easier to understand and implement.

The main components of the spatial domain architectures are the window generator and line delay elements. The delay elements can be built from First in First out (FIFO) or shift register components for the line buffers.

The architecture of the processing elements is heavily determined by the mathematical properties of the filter kernels. For instance the symmetric or separable nature of certain kernels is incorporated in the hardware design to reduce multiply-accumulate operations. There are mainly three kinds of filter kernels, namely symmetric, separable-symmetric and non-separable, non-separable-symmetric kernels. To understand the need for this clarification, it is necessary to discuss the growth in mathematical operations of image processing algorithms implemented in digital hardware.

2.1.1 Generic Filter architecture

In the standard spatial filter architectures, the filter kernel is defined as is and each coefficient of the defined kernel has its own dedicated multiplier and corresponding image window coefficient. Thus, this architecture is flexible for a particular defined size of kernel and any combination of coefficient values can be loaded to this architecture without modifying the architecture in any way. However, this architecture is inefficient when a set of coefficients in the filter have the same values and redundancy grows as the number of matching coefficients increases. It also becomes computationally complex as filter kernel size increases since more processing elements will be needed to perform the full operation on a similarly sized image window. The utility of the filter is limited to small kernel sizes ranging from 3×3 to about 9×9 dimensions. Beyond this, the definition and instantiation of the architecture and its coefficients become unwieldy, especially in digital hardware description languages used to program the hardware devices. Figure 2.1.1 depicts an example of generic 5×5 filter kernel architecture.

Figure 2.1.1 – Generic 5×5 spatial filter hardware architecture

The 25 filter coefficients range from c0 to c24 and are multiplied with the values stored in the window generator grid made up of flip flops (FF). These coefficients are weights, which determine the extent of the contribution of

the image pixels in the final convolution output. The partial products are then summed in the adder blocks. Not shown in the diagram is another adder block to sum all the five sums of products. The final sum is divided by a constant value, which is usually defined as a multiple of 2 for good digital design practice.

2.1.2 Separable Filter architecture

The separable filter kernel architectures are much more computationally efficient where applicable. However, these are more suited to low-pass filtering using Gaussian kernels (which have the separability property). The architecture reduces a two dimensional N × N sized filter kernel to two, one dimensional filters of length N. Thus a one-dimensional convolution operation (which is much easier than 2-D convolution) is performed followed by multiplication operations. The savings on multiply-accumulate operations as a result in the reduction in the number of processing elements demanded by the architecture can really be truly appreciated when designing very large filter convolution kernel sizes. Due to the fact that spatial domain convolution is more computationally efficient for small filter kernel sizes, separable spatial filter kernels further increase this efficiency (especially for large kernels built as with a generic filter architecture implementation).

Figure 2.1.2 depicts an example of separable filter kernel architecture for a 5 × 5 spatial filter now reduced to 5 since the row and the column filter coefficients are the same with one 1-D filter being the transpose of the other.

Figure 2.1.2 – Separable 5×5 spatial filter hardware architecture

Observing the diagram in Figure 2.1.2, it can be seen that the number of processing elements and filter coefficients have been dramatically reduced in this filter architecture.

For example, the 25 coefficients in the generic filter architecture have been reduced to just 5 coefficients which are reused.

2.1.3 Symmetric Filter Kernel architecture

Symmetric filter kernel architectures are more suited to high-pass and high-frequency emphasis (boost filtering) operations with equal weights and reduce the number of processing elements, thereby reducing the number of multiply-accumulate operations. A set of pixels in the image window of interest are added together and then the sum is multiplied by the corresponding coefficient, which has the same value for those particular pixels in their respective, corresponding locations. Figure 2.1.3(i) shows a Gaussian symmetric high-pass filter generated using the windowing method while Figure 2.1.3(ii) depicts an example of symmetric filter kernel architecture

Figure 2.1.3(i) – Frequency domain response of symmetric Gaussian high-pass filter obtained from spatial domain symmetric Gaussian with windowing method

Figure 2.1.3(ii) – 5 x 5 symmetric spatial filter hardware architecture

2.1.4 Quadrant Symmetric Filter architecture

The quadrant symmetric filter is basically one quadrant (or a quarter) of a circular symmetric filter kernel and rotated 360 degrees. The hardware architecture is very efficient since it occupies a quarter of the space normally used for a full filter kernel.

To summarize the discussion of spatial filter hardware architectures, it is necessary to present a comparison of the savings of hardware resources with regards to reduced multiply-accumulate operations.

For an N × N spatial filter kernel, N × N multiplications and (N × N)-1, additions are required. For example, for a 3 × 3 filter, 9 multiplications and 8 additions are needed for each output pixel calculation, while for a 9×9 filter, 81 multiplications and 80 additions are needed per output pixel computation.

Since multiplications are costly in terms of hardware, designs are geared towards reducing the number of multiplication operations or eliminating them entirely.

Table 2.1.4 gives a summary of the number of multiplication and addition operations per image pixel required for varying filter kernel sizes using different filter architectures.

Table 2.1.4 – MAC operations and filter kernel size and type

KEY

*/pixel – Multiplications per pixel +/pixel – Additions per pixel

GFKA – Generic Filter Kernel Architecture SFKA – Separable Filter Kernel Architecture

Sym FKA – Circular Symmetric Filter Kernel Architecture

In document Vhdl Image Processing (Page 31-41)