Background Subtraction - Relevant Research

2.2 Relevant Research

2.2.6 Background Subtraction

The techniques discussed in this section are all mono-modal (pixel-wise) background subtraction techniques [37] [137] [125] [156]. Pixel-wise techniques treat each pixel independently from all the others. This maybe a rash assumption (because there may be underlying pixel interdependency that pixel-wise techniques will not detect) but it does lend itself to some very fast techniques that can be optimized using multithreaded processing. All the approaches considered here perform well (produce tangible results) in static observation environments only. Dynamic observations are much more complex; background subtraction techniques do not fare well and tend to produce false detections. Extra techniques are used to compensate for dynamic observation platforms which are commonly grouped as motion estimation techniques. Other non-pixel-wise techniques can use texture / edge based detections which exploit local spatial information for extracting the structural information. Noriega, [105], divides the scene into overlapping square patches for detections (the overlapping is a non-mono-modal approach) whereas Heikkila, [52], describes a model of local texture characteristics and uses fixed circular regions of pixels for comparison. Another style of approach is sampling based which evaluates a wide local area around the pixels to perform complex analysis. A spatial sampling mechanism is employed by Cristiani, [29], using pixel-region mixing. Barnich, [12], uses spatial neighbourhood sampling to refine per-pixel estimates and is loosely based on a Parzen windows process. These approaches tend to be processor intensive and do not lend themselves to efficient multithreaded implementations due to the need to compare pixels across the frame. A popular pixelwise technique (Kernel Density Estimation), which is not a real time technique, is introduced for comparison purposes with the approaches discussed.

Kernel Density Estimation (KDE)

More recently, a probability density estimation technique has been proposed in Kernel Density Estimation (KDE) [37]. This technique is not real-time however it is an important consideration as it is a common offline method for background subtraction. It is also non-parametric once it has been initialized, which is especially important for autonomous algorithms; this technique does require external input from a user or device at initialization, limiting the initial autonomous capability and opening the model to subjectivity. The KDE technique estimates the probability density function of each pixel based on a number of consecutive frames (the number of frames, or ‘window’, is fixed throughout the operation of the algorithm). The probability density function (PDF) of each pixel is calculated for the defined window of frames using a Gaussian kernel, shown in eq (2.14).

2.2 Relevant Research 28 P(x_t) = 1 N N

∑

i=1 d

∏

j=1 1 q 2π σ2_j e −1 2 (_xt,_j−xi,j)2 σ2_j (2.14)

Wherext is a d-dimensional colour feature,xiis the mean of this colour feature overN

frames andσ is the bandwidth or standard deviation in the jth dimension. Each pixel in the

current frame is compared with the PDF; if the pixel is sufficiently different from the mean of the probability density function, it is considered to be foreground; otherwise the pixel is considered to be background. The threshold (sigma multiple) used to determine if a pixel is sufficiently different and therefore foreground is required to be pre-selected as part of the initialization. An important consideration for KDE is the selection of the kernel bandwidth (scale). If the bandwidth is too narrow false foreground detections become a problem because of the ragged density estimate for the pixel, too wide and the density estimate will be overly smooth leading to missed detections. In Elgammal et al [37] the bandwidth is autonomously defined for each pixel, and is adaptive throughout the operation. By measuring the deviations between two consecutive intensity values, in most cases, it can be assumed that the two pixels come from the same local-in-time distribution (as only very few pixel intensity pairs are expected to come from different distributions). If the local-in-time distribution is assumed to be Gaussian, the deviation distribution (x¬xn+1) is also GaussianN. For a symmetric

distribution the median of the absolute deviations is defined as eq (2.15).

Pr(N(µ,σ2)>m) =0.25 (2.15)

Thus the bandwidth of the distribution can be estimated in eq (2.16).

σ = m

0.68√2 (2.16)

Wheremis the median over the frames in the colour space, andσ is the bandwidth or

standard deviation. The approach can be extended to include “Probabilistic suppression of False Detections” [37] which considers pixels that are neighbouring the pixel currently being analysed. This increases the robustness to noise (e.g. leaf fluttering), but also increases the processing time required for each pixel. As this process requires analysing neighbouring pixels it limits the effectiveness of multi-threaded implementations. This review is specifi- cally focusing on pixel-wise approaches and consideration of neighbouring pixels or local region approaches is beyond the scope of the investigation. The approach makes some assumptions about the real world. The distribution of colour (or other feature) for each pixel is modelled with a Gaussian and this assumption increases the susceptibility of the model to false detections and noise, because real world features are not necessarily distributed as

2.2 Relevant Research 29

a Gaussian distribution. Another assumption made by this method is that the background is sufficiently static to avoid being considered as foreground, however, rapid illumination changes or leaves blowing in the breeze can introduce noise or false detections. When considering real-time applications there are drawbacks to this technique. Most importantly, the model will not run in real-time because of the window of frames that is required to be read in order to generate the probability density for each pixel. If the window is moved in an overlapping manner on the receipt of new frames the approach can get closer to true real time simulation. The approach also has a high memory cost (because of the number of frames required to be remembered).

Gaussian Mixture Models (GMM)

Despite being proposed chronologically before KDE, adaptive background mixture models allow real time analysis of a video stream by using multiple Gaussian kernels [137] to represent the colour distribution of each pixel. Each pixel is assigned to one of the Gaussian probability density functions (the number of PDFs is defined at initialization) depending on how closely the pixel properties match the PDF. The number of functions used to describe a pixel determines how robust the technique is with busy or multi-modal scenes. Typically 3 to 5 Gaussian functions are used describe background and foreground pixels but generally this is problem specific (more would be defined for a motorway than a green field for example). As the number of functions used to represent each pixel is increased, the required processing also increases which can affect the real-time capability of the approach. This technique is useful when there is a multimodal background, with the multiple Gaussians able to represent several different modes of pixels. In a very busy scene the detection performance of the approach decreases due to the number of Gaussians used being insufficient to represent each mode of the pixels. This can be improved by increasing the number of Gaussian representations at the expense of processing and memory requirements. Using a recursive method, the Gaussian functions are updated in real-time removing the need to remember every point of the history and a window of frames; the Gaussian function that the pixel matches closest is updated with the current pixel value, and once updated, the pixel value is discarded eq (2.17)

P(X_t) =

∑

i=1

ωi,t∗η(xt,ui,t,Σi,t) (2.17)

Wherex_t is the current data sample,K is the number of distributions,ωt is an estimate of the

weight (what portion of the data is accounted for by this Gaussian) of theithGaussian at time t,µi,t is the mean value of theithGaussian in the mixture at time t,Σi,t is the co-variance

2.2 Relevant Research 30

matrix of theithGaussian at time t, andη is the probability density function defined in eq

(2.18). η(Xt,µ,Σ) = 1 (2π) n 2|Σ| 1 2 e−12(xt−µt) T Σ−1(xt−µt) _(2.18)

With the aim of saving computational memory and speed the covariance matrix is assumed to be of the form eq (2.19)

Σi,t=σk2I (2.19)

Which assumes independence between the feature variables and that they have the same variances. These assumptions are not necessarily valid in the real world, but the approach avoids processing intensive matrix inversions at the expense of accuracy eq (2.20).

ωi,t= (1−α)ωi,t−1+α(Mi,t) (2.20)

Whereα is the learning constant andMis defined as 1 for the Gaussian that was matched

and 0 for the remaining functions. The Gaussian Mixture Model (GMM) [137] is a parametric technique requiring both the learning constant and sigma threshold to be pre-defined at initialization. The sigma threshold for assigning a match to a Gaussian distribution is (according to [137]) normally set to 2.5. Parameters for unmatched distributions are not changed. The matching distribution is updated with the new observations in eq (2.18). When a match is not found for any of the distributions, the least likely distribution is discarded and a new distribution is introduced with the current pixel value as its mean. The technique was improved by [61] to enable shadow detection and the approach later optimized by [160] to increase robustness.

Recursive Density Estimation (RDE)

As a departure from the probabilistic methods, RDE introduces a new approach to background subtraction [125] [7] [5] [116]. There is no prior assumption about the underlying distribution of a pixel’s feature value. The approach calculates how near (dense) a pixel value is to all the previous pixels that have been before it. The pixel history is stored as the mean and standard deviation of the pixels from all previous frames. The mean and standard deviation are updated recursively using the formula in eq (2.21). A Cauchy type kernel is used to calculate the density of the current pixel compared with the history [5]:

D= 1

1+||x_t−µt||2+Xt− ||µt||2

2.2 Relevant Research 31

Wherext is the current data sample,µt is the mean of all previous data samples;Xt is the

scalar product of the previous data samples. Both, the mean and the scalar product can be updated recursively as shown in eq (2.22) and (2.23) [5].

µt= t−1 t µt−1+ 1 txt;µ1=x1 (2.22) X_t=t−1 t Xt−1+ 1 t||xt|| 2_;_X 1=||x1||2 (2.23)

Wheret is the number of frames read, including the current frame. If there is no change in the scene, the pixel density does not change, and therefore the pixel is considered as a background. When there is a change in the scene, the proximity of the value of the pixel in the current frame compared to all previous frames (mean and standard deviation) changes. If this change is significant enough (large enough difference in value) the pixel is considered as a foreground. The threshold for the difference is defined using the standard deviation (sigma) of all previous frames. Usually a threshold of 2 or 3 sigma is used; by increasing the sigma there is a reduction to the sensitivity to change in the scene thus reducing the number of false detections. Too high a sigma value and the system will start to miss detections. It is a realtime, recursive technique which is highly computationally efficient. As an aside observation, the accuracy of RDE (given the variable nature of real world environments) could be improved through using a semi-supervised approach where the sigma value is updated on an ad hoc basis.

In document Autonomous real time object detection and identification (Page 44-48)