• No results found

FPGA- BASED ACCELERATORS LIBRARY FOR IMAGE PROCESSING IN SPACE

4.3.2 Histogram Calculator

As aforementioned, the histogram calculation is an important task for a plenty of image process-ing operations. Thus, an efficient implementation of this operation enables a strong acceleration of many image processing operations.

An image histogram, in accordance with its definition reported at the beginning of this chap-ter, is a graph that counts the number of pixels in the image (vertical axis) that present a particular intensity value (horizontal axis). Since the available tonal value in an image is defined by the bpp resolution, the value on the horizontal axis are in the [0, 2bpp] range. Instead, the vertical axis can contains values up to the number of pixels composing the input image (i.e., all the pixels in the input image have the same value).

This family contains just one IP-core, the Histogram Calculator. This IP-core is equipped with an interface able to provide in input four pixel values in parallel at the same time. This input par-allelism allows to speed up the computation task.

(a) Lena (b) Cameraman

(c) Mandrill (d) Mars

Figure 4.16: Mean Square Error

The provided outputs are the values associated with the Histogram Bars (HBs), that can be inter-nally or exterinter-nally buffered. For this reason, the output interface, in addition to the HB values, provides the memory address where each HB value must be stored. Moreover, to ensure the syn-chronization with other modules, one additional output bit is provided to communicate the end of the histogram computation.

Thanks to this output interface, this module can be easily interconnected to a BRAM input port, for an internal buffering, or to a Direct Memory Access (DMA) module to store the computed his-togram in an external memory.

The parallel internal architecture of the Histogram Calculator is shown in Figure 4.18.

It is composed of 4 BRAM Buffers, 4 2-inputs adders, a 4-inputs adder, 4 2-to-1 multiplexers, 8 2-to-1 multiplexers and a Controller that manages the overall histogram computation process.

The histogram computation is performed in two steps. First, the Controller sets to zero the reset signal. In this way input pixels act as addressing signal for the 4 BRAM Buffers. Buffers are implemented as dual-port Block RAMs, provided by Xilinx FPGA Virtex architectures (Section 3.3). Each BRAM Buffer has a dimension to allow the storage of all bars composing the histogram.

Thus, the size of these buffers depends on the bpp resolution of the input image. Since a BRAM in space-grade Xilinx FPGAs is composed with 512 words of 32 bit each. A single BRAM buffer is

(a) (b)

(c) (d)

Figure 4.17: Laplacian Edge Extraction - (a) Noisy image in input (σ2n= 1500) (b) Edges extracted from noisy image (c) Edges extracted from the image filtered by a static 11x11 filter (d) Edges extracted from image filtered by AIDI

able to support an input image with a bpp resolution up to 9 bit (i.e., the histogram is composed of 29= 512 bars). With higher resolution each buffer will be composed of more than one BRAM.

The input packets are split into 4 words, each representing a pixel value. Each received pixel addresses the BRAM Buffer associated with its position in the input packet (e.g., the pixel in the least significant part of the input packet addresses the BRAM Buffer 0). The value of the location addressed by the input pixel value is read, incremented, and then rewritten in the same location in a single clock cycle exploiting the dual-port nature of the buffer. In this way, each buffer row acts as a counter. When an entire image is received, each buffer row contains a partial HB value.

In the second computation step, the partial HBs are merged to compute the image histogram.

BRAM Buffers are scanned starting from location 0. At each clock cycle, the 4 partial HB values are read and summed. In this way, the final value of each HB is computed. After each location has been read, it is forced to 0. This ensures that counters are reset to the initial conditions, allowing

Figure 4.18: Histogram Calculator internal architecture

the computation of a new histogram.

In addition to HB values, the Histogram Calculator outputs the HA and the HD signals. The for-mer represents the index associated with the output HB value, while the latter is asserted when the histogram calculation task is completed. It must be mentioned that, in the case in which the histogram must be stored in an external memory, to guarantee a proper storage the memory base address must be added to HA.

The first computation step performed by the Histogram Calculator requires the number of clock cycles needed to receive the input image, while the second step requires only 2bpp clock cycles to read the buffers. Thus, the total number of clock cycles required by the Histogram Cal-culator for computing the image histogram counts up to:

Ncl ock=Nr ow∗ Ncol umns

4 + 2bpp (4.11)

where Nr ow sand Ncol umnsare the number of rows and columns composing the input image.

For the sake of completeness, Table 4.3 shows the area occupation and the timing perfor-mance of the Histogram Calculator implemented on space-qualified Xilinx Virtex 5-QV XQR5VFX130 (Section 3.3). The data reported in the table are related to a configuration that allows to compute

the histogram of an image composed of 1,024 x 1,024 pixels with a bpp resolution of 8 bit.

Table 4.3: Histogram Calculator performances on a Xilinx Virtex 5-QV XQR5VFX130

FPGA Area Occupation Max Freq.

Slices DSP BRAMs [MHz]

265 (0.20%) - (-) 4 (0.67%) 74.29

The total number of clock cycles to compute the image histogram in this configuration is 262,400. Thus, the maximum sustainable throughput is 283.29 fps.