2.9 Summary
3.1.2 Methods
The proposed method were derived from Cutler and Davis (2000), but with a number of proposed modifications. An investigation shows that a similarity metric based on sil- houette shape is not as effective with a target which can change orientation arbitrarily. Therefore, a coarser metric derived from the oriented bounding box fitted to bat’s 2D object silhouette was proposed to better capture the periodicity of motion (as well as be- ing computationally less expensive). Further investigation shows that the selection of the dominant frequency proposed by Cutler and Davis is often inconclusive; therefore, two techniques were proposed to replace it. The first technique is called the diagonal selec- tion (DS), which is based on the correlation of the signal with individual components, reconstructed from the peaks of the signals’ frequency spectra and the second is the self- similarity technique.
Algorithm 1: Fit an oriented bounding box on a set of 2D points (bat silhouettes)
1 minArea← 0;
2 foreach Edge ∈ ConvexHull do
3 Orientation← compute the edge orientation; 4 Rotate the ConvexHull using Orientation;
5 area← Compute bounding box area of the rotated convex hull; 6 if area < minArea then
7 RECTANGLE← Rectangle of minArea;
8 else
9 end
10 end
11 Return RECTANGLE
For each video, the bats’ silhouettes (Figure 3.1) using the background Gaussian mixture model proposed by Zivkovic and van der Heijden (2006) was extracted . This method was used as it’s been shown (Zivkovic and van der Heijden, 2006; Bouwmans et al., 2008) to be more suitable for real-time processing, whilst at the same time improv- ing classification accuracy. To detect the connected components, contours were obtained from the binary image using the contour algorithm proposed by Suzuki et al. (1985). An oriented bounding box was fitted to each silhouette using Algorithm 1 and a selection of the bounding box metrics (height, width and hypotenuse) were measured.
To solve the problem of broken silhouettes, contours are merged based on the mini- mum perpendicular distances from the four corner points of each minimum fitted rectan- gle to the boundaries of the other bounding boxes. This was repeated for all fitted rotated bounding boxes to merge broken silhouettes (3.1). Broken silhouettes are due to noise in the video, which in some cases is resolved with the Fast Fourier Transforms (FFT).
The bounding box metrics (height, width and hypotenuse) are used to form three different 1D signals that varies with time (varying frame by frame). Mathematically, these three signals may be represented as h(t), w(t) and d(t) respectively and t = 0 . . . M − 1, where M is the total number of frames in the video. Each 1D signal is then broken into short overlapping windows and the Fast Fourier Transform (FFT) (Equation 3.1) is computed for each metric (height, width and hypotenuse) separately.
3.1. DATASET ANDMETHODS 65
Figure 3.1: Foreground images of bats segmented using improved GMM Zivkovic and van der Heijden (2006). (A) is an original image which was segmented to achieve a perfect segmentation in (B). (C) is a case where Zivkovic and van der Heijden (2006)’s algorithm produced a broken silhouettes. There were two rotated bounding boxes fitted on the broken silhouettes (D). (E) shows when the contours were merged based on the minimum perpendicular distances from the four corner points of each minimum fitted rectangle to the boundaries of the other bounding boxes. Finally, (F) show the resultant
single bounding box fitted on the bat’s silhouettes.
F(k) =
(N−1)
∑
t=0
f(t) e−i2πkt/N (3.1)
Where f (t) is the signal in the spatial domain with N samples, t = 0 . . . N − 1 and F(k) in the Frequency Domain (encoding both amplitude and phase) with k = 0 . . . N − 1.
f(t) = 1 N (N−1)
∑
k=0 F(k) e−i2πkt/N (3.2)Where f (t) is the signal in the spatial domain with N samples, t = 0 . . . N − 1 and F(k) in the Frequency Domain (encoding both amplitude and phase) with k = 0 . . . N − 1. To determine the wing beat from the FFT stem plot, Cutler and Davis used equation 2.1. However, two techniques were proposed in this thesis, which replaces Cutler and Davis’s technique.
In the first technique, the diagonal similarity technique, the dominant frequencies from each window are reconstructed into synthetic signals and used to compare with the original using the diagonal of their respective similarity matrices (3.3). The frequency which minimises this correlation is selected, and this criteria replaces that proposed by Cutler and Davis (2000). Synthetic signals were reconstructed from dominant frequencies by converting each peak in the frequency domain to a complex form (having imaginary and real parts). Signals are then formed when the inverse FFT of each peak are computed
using equation 3.2. These signals are compared with original signal and the most corre- lated signal’s frequency is selected as the wing beat frequency. For example, figure 3.2(A) shows a sample signal taken using the hypotenuse (h(t)) metric with respect to time. This was transformed into the frequency domain using FFT, the stem plot of the peaks can be found in 3.2(B). For illustration purpose the first five peaks of the FFT were reconstructed into synthetic signals and their plot superimposed on the original’s signal in figure 3.2(C). When the signals were correlated with the original signal, the synthetic sginal from peak 1 (blue) was found to be more correlated and therefore selected as the wingbeat of the bat. This corresponds to the ground truth wing beat frequency.
fk=
n
∑
i=1
|(toi− tki)| {where k = 1, 2, 3...m} (3.3)
For comparison, the bounded box metrics are used to form self-similarity matrices which are used to compute the FFT. The self-similarity matrices, in this case, are com- puted using absolute correlation (3.4). To determine the wing beat frequency, each column of the similarity matrix is linearly de-trended and a Hanning filter applied. The result is then used to compute the power spectra for all columns of the self-similarity matrices. For accuracy, the skewness of each of the fix columns of the similarity matrices was ob- tained and either their spectra averaged or median estimated depending on the results of the skewness in Equation 3.5. The highest peak of P( fk) is then selected to represent the
wing beat frequency.
St1,t2 = n
∑
i=1 |t1i− t2i| (3.4) P( fk) = Mean(P( fk)) if |3(mean−median) σ | < 0.5 Median(P( fk)) otherwise (3.5)This approach uses the bounding box metrics which is 1D and is computationally efficient as opposed to the 2D images used by Cutler and Davis (2000). The diagonal similarity method in Equation 3.4 was used to select peak values even when the signal
3.1. DATASET ANDMETHODS 67
Figure 3.2: (A) Shows a sample signal taken using the hypotenuse (h(t)) metric with respect to time. (B) Show the FFT stem plot of the signal in (A) with each peak marked with their corresponding frequencies and (C) show the reconstructed signals for the first five peaks superimposed on the original signal (lime). The more correlated signal is peak 1, which is in fact the wing beat of the bat when compared with ground truth.
is buried in noise as opposed to Equation 2.1, which was used by Cutler and Davis to discriminate between periodic and non-periodic motion. The following section illustrates the diagonal similarity selection method using three synthetic signals.
Assuming three numerical signal: T0= {1, 4, 6, 8, 6, 4, 1, 4, 6, 8, 6, 4, 1},
original signal then to determine which of the other two signals T1and T2is more related to
T0, the diagonal similarity matrix selection approach equation 3.4 is applied. The diagonal similarity of T0and T1is 8 (see Figure 3.3 (a)) and is smaller than that of T0and T2, which
is 22. The similarity plot of T0and T1can be found in Figure 3.3 (b), which has patterns
that can be used to determine the peak frequency.
Figure 3.3: (a) shows the similarity table of the synthetic signal and (b) shows the similarity plot resulting from the the similarity table in (a).