Training Phase - Training and Testing Phases

5.4 Training and Testing Phases

5.4.1 Training Phase

The training set consisted of two individuals, male and female, performing 14 SASL signs described in Table 5.6. These signs are the same signs used by Achmed.

The procedure used to train the SVM is depicted in Figure 5.9.

Each frame of each video in the training set was processed using the entire feature extraction procedure mentioned previously. Starting with each image of size 40 × 30 pixels, resulting from the feature extraction procedure, a data file is created consisting of the pixel values of the image. Each pixel in the image is taken as a feature vector and is assigned an index, as illustrated in Figure 5.10. An image of size 40 × 30 pixels contains a total of 1200 feature vectors. Referring to Figure5.10, the first feature vector, in this case, has an index of 1 and a value of 0. The last feature vector has an index of 1200 and a value of 1.

Chapter 5. Design and Implementation of the Upper Body Pose Recognition and

Estimation System 66

Sign Description

Away Move the right hand to and fro away from right side of the

body.

Bye Waving with the right hand inwards to the left and outwards

to the right above right shoulder.

Cracker Moving hands from the chest away from each other to the

sides.

Curtains Moving both hands towards the face and outwards again

above respective shoulders.

Dress Moving hands from the chest downwards. When reaching

below the hips, move hands away from the body.

Eat Moving both hands towards the mouth and mimic eating

using chopsticks.

Left Raise left hand away from the left side of the body.

Light Raise right hand above right shoulder just above the head.

Love Cross arms in the middle of the upper chest.

Right Raise right hand away from the right side of the body.

Run Moving both hands on the side of the chest imitating a run-

ning movement.

We Move right hand to the left side of the chest and across to

the right shoulder.

Why Move right hand to the left side of the chest and tap twice

against chest.

Wide Raise right and left hand away from the sides of the body.

Table 5.6: _{The 14 SASL signs used in training and testing.}

A label is assigned to each set of 1200 feature vectors, which groups the resulting features of that frame into a specific training class and indicates the position of both wrists in that frame. The positions of both wrists need to be identified. A structured method of assigning a label to the position of each wrist is to superimpose a grid on the training image. The grid consists of 168 equally sized squares and covers the entire pose space as illustrated Figure5.11.

Each square is a quarter of the size of the face. The number of blocks is limited to only cover the pose space. Each block is assigned to a class in the SVM and is assigned to a set of feature vectors if the wrist is observed to be in that block. This yields a total of 168 classes. The top-left block is assigned the label 1, increasing towards the right and downwards, with the bottom-right block being assigned the label 168. In Figure

5.11, the wrist of the right hand is observed in block 32, as indicated by the cross, and is therefore assigned label 32. Both wrists are assigned a label.

Data scaling is another form of preparation of data for the SVM. Scaling the data avoids features with a greater numeric range from dominating features with a lower numeric

Chapter 5. Design and Implementation of the Upper Body Pose Recognition and

Estimation System 67

Figure 5.9: _{Procedure used to train the system.}

Figure 5.10: _{Data file without labels.}

Chapter 5. Design and Implementation of the Upper Body Pose Recognition and

Estimation System 68

Figure 5.11: _{Superimposed grid in pose space.}

range [36]. Thus, a pixel with a value of 255 is converted to 1 and a pixel with a value of 0 is left unchanged. This limits the range of feature vectors to [0,1].

The SVM can be trained to predict test data more effectively by determining the optimal

C and γ RBF kernel parameters for the given problem. A brute-force approach that

can be used is the trial and error of each C and γ combination, where each parameter is an exponentially growing sequence. A structured alternative uses the grid-search function in LibSVM, which uses cross-validation. The cross-validation method divides the training set into n equally-sized subsets, where the classifier is trained on n − 1 subsets and tested on the remaining subset for each parameter combination [35]. This highlights the combination of parameters with the best cross-validation accuracy. The result of running the LibSVM grid-search function on the training data is depicted

in Figure5.12. The optimum parameters obtained were as follows: C was 512 and γ was

0.000122. The accuracy rate of the kernel was optimized from 88% before optimization to 91% after optimization. The small difference between the two accuracy rates indicates that a high accuracy can be achieved with the RBF kernel even without optimization. The final format of the training data file is illustrated in Figure 5.13. Each line of the file consists of: the class representing the right-hand wrist; the class representing the left-hand wrist; and the list of feature vectors. Figure 5.13 depicts two lists of feature vectors. The first list represents the right-hand wrist in block 134 and the left-hand wrist in block 132. The second list represents the right-hand wrist in block 132 and the left-hand wrist in block 138.

Chapter 5. Design and Implementation of the Upper Body Pose Recognition and

Estimation System 69

Figure 5.12: _{Grid-search optimization results.}

Figure 5.13: _{Data file with labels in the training phase.}

In document Faster upper body pose recognition and estimation using compute unified device architecture (Page 77-81)