Appearance-based Object Classification - Automatic object classification for surveillance video

After analysing all the challenges latent in an appearance-based object classification procedure, two techniques were proposed to tackle the existing flaws, (i) the optimal multi-features fusion algorithm and (ii) the biologically inspired classifier (refer to Sections 4.2.2 and 4.2.3, respectively). As a result, two appearance-based object classification approaches (AOC) are presented (i) Multi-Feature Appearance- based Object Classifier and (ii) Biologically inspired Appearance-based Object Clas- sifier. The former, Multi-Feature Appearance-based Object Classifier (refer to Sec- tion 4.4.1), analyses the benefits of constructing multi-feature descriptors preserving the non-linearity of their different feature spaces to represent semantic objects exis- tent in surveillance scenarios. While the combined visual models are created using

the Multi-Objective Optimisation technique, the classification is based on SVM. The latter, Biologically inspired Appearance-based Object Classifier (refer to Sec- tion 4.4.2), studies the advantages offered by biologically inspired optimisation techniques in order to classify objects based on multi-feature visual models and neural structures provided by SOMs.

The remainder of this Section presents the two proposed Appearance-based Ob- ject Classifiers.

4.4.1 Multi-Feature Appearance-based Object Classifier

(Multi-Feature AOC)

Typically, appearance has been studied analysing individual features/descriptors, their advantages and disadvantages and their performance under different circum- stances and scenarios. The proposed approach is built upon the idea that multi- feature descriptors covering different and complementary visual properties provide more sophisticated and robust object representation. Despite authors often neglect the non-linearity of different descriptor spaces and often combine features in a linear manner; our approach builds object visual models by fusing multiple appearance descriptors, which exhibit non-linear behaviour and typically consist of different similarity metrics. The proposed Multi-Feature Appearance-based Object Classifier determines an optimal metric for fusing appearance features extracted in different feature spaces, considering their non-linearity and studying their influence towards the object classification.

The Multi-Feature Appearance-based Object Classifier consists of two stages namely online classification and offline training mode (refer to Figure 4.15). More- over, it is based on the assumption that, as input, this approach receives the blobs extracted from the Motion Analysis and Object Extraction Component presented in Section 3.2.

The offline training mode is a component built upon the Multi-Objective Op- timisation technique, discussed in Section 4.2.3. Its aim is to train the system according to the surveillance object taxonomy, composed of two semantic concepts: Vehicle and Person8, and the four selected appearance features. Considering the features different behaviour, metrics and nature, a weighted linear combination of the feature descriptor distances is proposed, resulting in the distance matrix and

8_{The proposed surveillance taxonomy is built upon two semantic concepts, Vehicle and Person.}

objective functions: DistanceM atrix = d 1 1 d12 d13 d14 d2 1 d22 d23 d24 ! (4.32) D(k)(V(k), ¯V , A) = 2 X l=1 αld (k) l (¯vl, v (k) l ), (4.33) where d(k)_l (¯vl, v (k)

l ) is the distance between the blob’s low-level-feature descriptors

and the semantic class centroid and A = {α1, α2, ..., αL} is the set of weighting

coefficients to optimise.

Consequently, the weights, αl, related to the objective functions must be op-

timised in order to obtain a “trade-off solution” for the multi-feature descriptor. Such optimisation consists of the analysis of a set of compromised solutions, Pareto- optimal solutions, as explained in Section 4.2.3. Finally, the offline training stage procures a set of visual models for each semantic concept included in the surveillance taxonomy using the training dataset.

The online classification stage builds each blob’s visual model as an appearance multi-feature descriptor. Each appearance descriptor is composed by the combination of four appearance features: DCD, CLD, CSD and EHD. These features were selected based on the results obtained in Section 4.3.1, and due to their robustness, compact representation and significance for human perception. In order to preserve the non-linearity of each individual appearance feature, their fusion is achieved linearly combining their distance against each semantic object’s visual models, and therefore, respecting the feature spaces. However, such a combination is optimised by applying different weights, α = {α₁∗, α∗₂, α∗₃, α∗₄}, to each member of the equation depending on their significance to the scenario under analysis, in our case, surveillance scenarios. Finally, the appearance multi-feature descriptors are built on the Optimised Object Classifier where the difference between each appearance feature computed for a blob and the visual models computed in the offline training stage are calculated and adapted applying the optimised weights obtained using the Multi- Objective Optimisation Technique (refer to Figure 4.15). Finally, the Optimised Object Classifier performs categorisation using SVMs.

4.4.2 Biologically inspired Appearance-based Object Clas-

sifier (Biologically inspired AOC)

The Biologically inspired Appearance-based Object Classifier focuses on two ob- jectives. First, achieving an optimal combination of appearance features suitable for

surveillance object classification and, as well as the previous approach, built on the premise that multi-feature descriptors enable a more complex and robust representation procuring a higher level of distinguishability while preserving the non-linearity attached to each feature space. Also, to study the recent developments in optimisation techniques based on the problem solving abilities of biological organisms, such as bird flocks or fish schools, for surveillance object classification.

The Biologically inspired Appearance-based Object Classifier, presented in Fig- ure 4.16, classifies the blobs provided by the Motion Analysis and Object Extrac- tion component, which extracted the moving objects included within the surveillance videos under analysis (refer to Section 3.2). The proposed approach consists of three stages namely Training Phase, Appearance Feature Extraction and Multi-Feature Particle Swarm Classifier.

The Training phase is based on the MOO technique (refer to Section 4.2.3) and faces the challenge of the optimal appearance features combination preserving their individual properties and non-linearity. The training stage procures a set of visual models for each semantic concept included in the surveillance taxonomy using the training dataset. The visual models are calculated by applying MOO, which pro- poses to linearly combine the distance between the visual descriptors defining the object to classify and the visual models built during the training phase. Consider- ing that the surveillance taxonomy includes two semantic objects, Vehicle/Car and Person, the objective functions calculated in the MOO are defined as:

D(k)(V(k), ¯V , A) = 2 X l=1 αld (k) l (¯vl, v (k) l ), (4.34)

where k indicates the blob under analysis, d(k)_{is the distance between the blob’s low-}

level feature descriptor and the centroids, ¯V = { ¯v1, ¯v2, ¯v3, ¯v4} and A = {α1, α2, ..., αL}

is the set of weighting coefficients to optimise. A set of compromised solutions, known as Pareto optimal solutions, are generated to calculate the weights that optimise the linear combination of appearance features, A∗ = {α∗₁, α∗₂, α₃∗, α∗₄}.

The Appearance feature extraction, based on the experimental results obtained in the visual analysis study presented in Section 4.3.1, extracts a set of appearance features, composed of DCD, CLD, CSD and EHD, to build visual models based on the optimal combination of appearance features. In order to preserve the non-linearity of each individual appearance feature, their fusion is achieved by linearly combining them. However, such a combination is optimised by applying different weights, pre- viously calculated in the training phase, A∗ = {α∗₁, α∗₂, α∗₃, α₄∗}, to each member of the equation depending on their significance to the scenario under analysis, in our case, surveillance scenarios.

Finally, the Multi-Feature Particle Swarm Classifier is based on evolutionary computation models, mimicking the effects of either fish schooling or bird flocks. This stage is built upon the Particle Swarm Classifier (refer to Section 4.2.2). The proposed classifier is implemented for a multi-descriptor space whose performance is influenced by the weights derived for non-linear optimal combination of appearance feature spaces. The proposed classifier exploits SOM neural structures to represent high dimensional patterns and the abilities of biological algorithms to optimise the classification performance.

In document Automatic object classification for surveillance videos. (Page 77-81)