Initialisation - The Harmony Filter - The Application of Harmony Search in Computer Vision

3.4 The Harmony Filter

3.4.1 Initialisation

The harmony search optimiser is initialised when the user chooses a target and its template histogram is generated. This is done by first converting the colour space from RGB to HSV. A two-dimensional histogram is then generated from the (H)ue and

3.4 The Harmony Filter

Figure 3.5: The harmony filter consists of two parts, namely the tracker and the harmony search optimiser. The tracker renders the optimal localisation of the target as an overlay on top of each video frame in real-time. It does this by using the harmony search optimiser to minimise an objective function that indicates the most likely localisation.

(S)aturation channels while the (V)alue channel is discarded. The V channel contains the intensity information and is sensitive to light changes [54, 85, 86]. The HSV colour circle is illustrated in Figure 3.6. By ignoring the V channel and concentrating on the H and S channels the model becomes more robust to changing light conditions between frames. By only using a two-dimensional histogram and not the full three-dimensional one one also gains an overall speed increase. This histogram is then saved by the optimiser as the template histogram and is used whenever a candidate histogram is evaluated.

The HSV colour system is not without faults and is not perceptually the most accurate representation of the hue, saturation and value components of an image [87]. However, HSV is a simple mathematical transformation of RGB making the transformation of the image to the HSV colour space computationally inexpensive. While not perfect the HSV colour space is accurate enough for the purpose of tracking and, more

Figure 3.6: This diagram illustrates the HSV colour space by modelling it as a cylinder. The cylinder’s height represents the V channel while the H and S channels are represented by the cylinder’s circumference and radius respectively.

importantly, allows for real-time performance.

Colour histograms, also called colour distributions, are used as target model due to their robustness to partial occlusion, rotation and deformation. A histogram is a distribution of the colours that are present in the region of interest. Therefore, depending on how the region of interest is defined, the histogram changes very little when an object within the region of interest rotates or deforms.

The harmony filter uses a 2-dimensional histogram created from the H and S channels of the HSV colour space. The different H and S intensities are discretised into m

and nbins respectively. This creates a 2-dimensional histogram withm×nbins. The choice of m and n is a trade-off between speed and accuracy. The more bins we use the more accurate we can represent the histogram but more bins also means makes the objective function more computationally expensive. It was found through experimen-

3.4 The Harmony Filter

tation that 10 H bins (m= 10) and 12 S bins (n= 12) resulted in a good compromise that is both accurate enough for robust tracking without needlessly slowing down the calculation of the objective function.

However, the target histogram is only part of the initialisation. The template histogram only has to be calculated once and then never changes but other initialisation steps are performed at each new frame. Every time the optimiser is queried for the target location the harmony memory is initialised using the current frame and the target’s previous location. The previous location is used to predict the state vector that describes the target’s location, velocity, and scale as a five-dimensional vector. A state vector is defined as xi = [x, y,x,˙ y, s˙ ] where x, y is the target’s location in

pixel coordinates, ˙x,y˙ is the target velocity, and sis a scaling parameter that controls the size of the box defining the target. Notice that the optimiser not only finds the target’s most probable location but also its velocity and scale. However, once the scale and velocity has been estimated one can calculate the current location based on the previous location.

A simple motion model that assumes steady velocity of the target between frames is used to fill the HM with estimated predictions of the target location. A random acceleration in the x and y direction (ax, ay) is generated and used to create a new

state vector as follows.

xt+1=xt+ ˙xt+ 1 2ax (3.56) yt+1=yt+ ˙yt+ 1 2ay (3.57) ˙ xt+1= ˙xt+ax (3.58) ˙ yt+1= ˙yt+ay (3.59)

Each new candidate state vector is weighed by creating the corresponding histogram and comparing it with the template histogram using the Bhattacharyya coefficient. The new vector with its fitness weight is then added to the HM until the HM is filled.

Once the HM has been initialized new candidate solutions are improvised using the standard HS algorithm and the HM is updated until convergence to the optimal solution is detected. Since the predicted target position can be calculated from its velocity and previous position, only the ˙x,y˙ andscomponents are explored during the

improvisation process. This speeds up the convergence by restricting the search space to only solution vectors that are possible within the motion model.

In document The Application of Harmony Search in Computer Vision (Page 91-95)