From Hand-crafted Features to Automatic Features

5.2 The Starting Point: Offline Learning

5.2.1 From Hand-crafted Features to Automatic Features

The computation of image features is an integral part of learning and classifying gestures from the segmented gesture images (see FigureA.1). If swarm robotic systems are to be used for learning gestures in real-time (i.e., learning instantly) efficient strategies for online feature computation need to be in place. We investi- gate techniques that effectively compute image features for gesture recognition and efficiently represent the commands defined in gesture vocabulary: spatial pointing gestures (see Figure2.4(a)), gestures for performing potential SAR tasks (see Figure2.4(b)), and finger count gestures (see Figure2.6).

(a) (b)

Figure 5.7. Hand-crafted properties computed from a segmented hand mask. (a): Properties such as, length, height, centroid, orientation, major and minor axis. (b): Properties such as, convex hull, convexity defects and perimeter.

5.2.1.1 Hand-crafted Features

Although many feature computation methods for vision-based gesture recognition exist, we adopt the most familiar approaches that compute meaningful and discriminative properties from the segmented hand masks (see Figure A.1). As features need to robustly represent all the commands in the gesture vocabulary (see Section2.2.1), we compute hand-crafted features from the segmented hand masks (gestures): shape and blob properties[MATLAB R2014b documentation;

Linan, 2007], geometrical characteristics and image moments [OpenCV 3.0.0- dev documentation]. These hand-crafted features have gained importance due

to their powerful gesture classification performance [Savaris and Wangenheim,

2008;Das et al.,2010;Trigo and Pellegrino,2010;Cox and Budhu,2008;van der Werff and van der Meer,2008].

A few hand-crafted properties computed from a segmented hand mask are visualized in Figure5.7. Image moments{u00, u01, . . . , u20, u02} are used to calcu-

late properties such as the, length, height, area, centroid, orientation, major and minor axis, as shown in Figure 5.7(a). Figure 5.7(b) illustrates properties such as the, convex hull, convexity defects and perimeter. From the above mentioned hand-crafted feature properties (i.e., shape, blob and geometrical characteristics etc.), we select a set of F = 110 features that can be efficiently computed from segmented hand masks. However, using a relatively large number of features is redundant and counter-productive, because:

• Classifiers yield better accuracy (and faster predictions) if only a few, highly discriminative features are used. This is true with datasets that have a high- dimensional feature space, namely too many features.

• Computing fewer features is faster as compared to techniques that require training by building a large dataset (e.g., offline methods).

• Using fewer number of features more training samples can be disseminated (spread) throughout the robot swarm for bandwidth-limited scenarios. Feature selection strategies aim to reduce the dimensionality of the feature space by selecting the best subset of features that have the highest importance in a supervised classification problem. To select an optimal subset of features from the given set of F = 110 features, we adopt the Principle Component Analysis (PCA) technique and the Ranker method in WEKA [Hall et al.,2009] (see Sec-

tion6.7.3), which provide an assessment of the quality of the features and rank the features: with respect to their individual and mutual discriminative powers and based on their contribution towards the multi-class classification problem. The top 20 hand-crafted feature properties (i.e., the 20 features with the highest ranks) are reported in Table5.1together with their measured PCA scores.

After feature selection is performed, the reduced subset of F < 110 features are termed as a feature vector ¯x, also known as feature descriptor. As the ges-

ture vocabulary comprises of one and two-handed gesture commands (see Fig- ure2.4), for one-handed gestures a one-dimensional feature vector is computed which consists of F numerical elements. In the case of two-handed gestures, a single feature vector of F elements is computed from each of the two segmented hand masks, after which the resulting two feature vectors are concatenated end- to-end to form a feature vector of F× 2 elements.

Table 5.1. The top 20 hand-crafted feature properties selected using the PCA and Ranker methods in WEKA [Hall et al.,2009].

Rank Feature PCA Score

1 Solidity 0.594946

2 Extent of minimum enclosing circle 0.482678

3 Extent of minimum area rectangle 0.399841

4 Extent of enclosing bounding box 0.328516

5 Minimum y-axis co-ordinate 0.272821

6 Extent of ellipse 0.236379

7 Formfactor 0.210368

8 Roundness 0.186468

9 Compactness 0.165338

10 Solidity of circle 0.151234

11 Y-axis centroid co-ordinate of minimum enclosing ellipse 0.137960

12 Y-axis centroid co-ordinate of minimum enclosing circle 0.126662

13 Y-axis centroid co-ordinate of minimum area rectangle 0.115440

14 Y-axis centroid co-ordinate 0.105492

15 Y-axis centroid co-ordinate of enclosing bounding box 0.096098

16 Solidity of ellipse 0.086963

17 Minimum y-axis co-ordinate at maximum x-axis co-ordinate 0.078208

18 Minimum y-axis co-ordinate at minimum x-axis co-ordinate 0.070038

19 Sphericity 0.062963

20 Solidity of rectangle 0.056212

5.2.1.2 Automatic Online Feature Computation

The main disadvantage of hand-crafted methods is that, hand-crafted features cannot guarantee that they best represent the segmented hand masks (gestures). To overcome this critical issue, in[Nagi et al.,2014d] we introduce Convolutional

Max-Pooling (CMP), a novel approach for automatic online computation of fea- turesas illustrated in Figure5.8. Inspired from the alternating convolution and max-pooling layers of the MPCNN [Nagi et al., 2011], the CMP is a two-layer

feedforward network which does not make use of any training mechanism (i.e., features are computed independently and irrespective of the gesture class). Refer to[Nagi et al.,2014d] for details of the CMP feature computation method.

Compared to hand-crafted features, the online features computed using the CMP provide a better numerical representation of the segmented hand masks and improve classification performance. This is because, for binary (black and white) images, hand-crafted features use only the segmented mask (i.e., only the

white pixels in the image) for feature computation, while the CMP uses all pixels (i.e., black and white pixels) and performs convolution and pooling operations.

Repeat for all sub-regions

Pooled Image Convolved image Pooled image 1 7 5 9 Feature Descriptor (xi) Convolved Image Convolutional Kernel (Cx,Cy) 1 . . F×F Online Incremental Multi-class Classification Convolved Feature Image Pooled Feature Input Layer Output Layer Convolved Feature Max-Pooling Kernel (Mx,My) Resize Layer 1 Layer 2 xi Segmented Image Gabor Filters

Figure 5.8. Online feature computation using the Convolutional Max-Pooling (CMP) approach, a two-layer feedforward network [Nagi et al.,2014d].

In document Symbiotic interaction between humans and robot swarms (Page 122-125)