Block extraction module - HOG-SVM detector

4.2 HOG-SVM detector

4.2.5 Block extraction module

The block extraction module is rather complicated, compared to the earlier components of the pipeline. Its purpose is to directly extract HOG block feature vectors, for every newly arrived pixel. Each histogram is updated on every valid incoming gradient magnitude and orientation pixel from the CORDIC module, as the block window is being “slid” across a currently captured frame image. An extracted block vector is then directly used to compute a series of partial SVM dot-products for all detection windows, which share the block on the image, as it will be reviewed later. This module assumes parameters discussed in subsection2.3.1, and as such, its output is a 32-element vector, containing four concatenated 8-bin gradient orientation histograms of their respective cells. The naive approach to directly calculate the histograms for newly arrived pixels, is to store sufficient amount of pixels in line buffers, and then recalculate the sums of pixels for each histogram bin. This method however, requires a lot of hardware resources and produces quite a long combinatorial path.

A better approach is to update the histogram bins, by only adding the most recent data, and removing the “oldest” set of data. More specifically, the histogram update scheme involves adding an incoming column of pixel values to an accumulator for a particular cell, and subtracting its respective last column of pixels. A graphical representation of this scheme is illustrated in Fig. 4.7for a 2-by-2 block of 8-by-8 pixel cells. Here, the block appears to be “sliding” in a raster scan-like fashion along the width of a currently captured image frame, as columns of pixels in front of the cells are added to the histograms’ bins, while the cells’ last columns are subtracted. Given ap×qblock of m×mpixel cells and histogram lengthL, this operation on an incoming frame image of expected sizeW ×H, can be mathematically expressed as follows:

hi,j[k] =hi,j[k−1] +Sm[k−Kj]−Sm[k−m−Kj] (4.1) where hi,j[k] andhi,j[k−1] are the new and old values of the i-th bin of the

Figure 4.7: Illustration of a “sliding” 2-by-2 block of 8-by-8 pixel cells, along the width W of the gradient magnitude and orientation images. The red pixel refers togm[k] andgo[k], while the green pixels refer togm[k−W l] andgo[k−

W l]; 0< l≤15

j-th cell’s histogram, and Sm[k] =

m−1

l=0

di[k−W l]∗gm[k−W l],

with di[k] = 1 whengo[k] =i∈N<L ; 0 otherwise,go[k] =

j_|_θ

∇[k]|

·Lbeing the quantized unsigned gradient orientation, andgm[k] - the gradient magnitude stream. Kj is an offset which depends on the position of cell j in a block, based on its bottom right-most pixel. For example, K0 = 0, K1 = 8, K2 =

8W andK3= 8W+ 8, if assuming the cell numbering in Fig. 4.7.

Note that this recursive relation introduces blocks, which are “split” out across the image. In other words, some columns of the block are still on the other side of the image, as illustrated in Fig4.8. Additionally, the first few rows of a new frame need to be buffered, until a block is within the frame’s bounds. Thus a block is considered valid, whenk≥W(p·m−1) and k−k

W ≥q·m−1. Thevalidsignal can be used to invalidate such blocks, by keeping track of the current row and column of the gradient image frames.

The hardware design begins with an elaborate buffering topology, to allow parallel processing of the gradient magnitude and orientation pixels in a block, as defined in equation 4.1. Since a single BRAM cannot be used due to the required amount of reading ports, a combination of registers and RAM based circular buffers are used for the buffering. This topology is illustrated in Fig.

4.9, where a 16-by-16 register matrix is used to store and access pixels from the active block, and 15 circular buffers for the remaining pixels on each row.

There is a separate buffer for the gradient magnitude and orientation streams, however the gradient orientation is quantized, prior to being fed into its respective buffer. This reduces the data bit width of the buffer and therefore saves considerable amount of hardware space. The quantization function is imple- mented as a series of comparators, instead of relying on a division and multiplication. Since the expected hardware is difficult to describe as a block diagram, the function is described in Alg. 6. One can easily unroll thewhile loop and

Figure 4.8: A situation, where the block is split across the image due to buffering

implement the hardware comparators in parallel, with some additional decoding and multiplexing logic.

Figure 4.9: A buffering topology for the gradient magnitude and quantized orientation streams.

Algorithm 6Gradient orientation quantization function

functioni= binidx(θ∇)

Input: Gradient orientation

Output: Histogram bin indexi∈N<L, whereLis the total bin amount 1: Initial index: i= 0

2: Initial angle step: d= π_L

3: while d <|θ∇|andi < L−1do

4: d=d+_Lπ 5: i=i+ 1 6: end while

After buffering, the outputs of the gradient magnitude register matrix are fed into a series of adder trees. Each histogram bin has its own pair – one for Sm[k−Kj] andSm[k−m−Kj], as illustrated in Fig. 4.10. To incorporate the

Figure 4.10: Bin accumulation hardware for one 8×8 cell j, with histogram length ofL= 8.

Figure 4.11: Adder tree with selective input. The dashed lines indicate potential pipe-lining.

multiplication bydi[k], each input of an adder tree is equipped with a selector switch between 0 andgm[k], as seen in Fig. 4.11. The switches are controlled by decoded signals, coming from the bin index register matrix. The outputs of the histogram bin accumulators are all concatenated to form an output HOG feature vector, updated on each newly arrive valid pixel. This is perhaps the main advantage of this module, but it comes at a great hardware cost.

In document Implementation and Analysis of Real time Object Tracking on the Starburst MPSoC (Page 63-67)