Chapter 3 Biometric Data and Systems
3.5 Testing
3.5.2 Modality testing
In order to asses the performance of our biometric algorithms in advance of collecting a large new dataset, we evaluated them on previously collected data which approximated the data we would collect. In this section we discuss the performance of each modality individually. The performance of all of the modalities can be seen in Table 3-3.
In assessing the face recognition algorithm we used a subset of the XM2VTS database of frontal face images [89] without any occlusion of the face. Using the inbuilt OmniPerception model for face data, together with appropriate eye spacing information we allowed the SDK to perform automated face location, feature
Appendix B), from these images we cross compared all images of the same subject to form a client set of 1182 comparisons (six comparisons per subject excluding self- comparisons of images). We then compared each subject with six randomly selected images not from the same subject; this formed our impostor set of 1182 comparisons. We were granted access to reconstructed feature vectors that we could use with our probabilistic framework, however because of the way these are constructed these vectors perform much more poorly than the direct method described in section 3.2.1. Looking at the distribution of match scores from the direct method shown in Figure 3-8 we see that these still meet our requirements for scale and regularity (that the are well distributed across the full range of zero to one) and so may still be used for fusion in Chapter 4.
Figure 3-8 Distribution of Client and Impostor Scores for the OmniPerception Face Recognition Algorithm and Matcher
To assess the performance of our gait algorithms we used the Southampton HiD database [83] consisting of 1,079 sequences from 115 subjects walking to the left we were able to construct training, gallery, client and impostor sets; these sets were converted to the dynamic and both static feature vectors as described in section 3.2.2. The training set consisted of 145 sequences of 15 subjects that could be used to estimate the intra and inter-class mean and variance; the gallery consisted of single sequences from 100 subjects; the client set consisted of 834 sequences each matched
to a subject in the gallery set; the impostor set consisted of 834 sequences where the sequences were not matched to a subject in the gallery.
For verification we use our probabilistic framework described in Chapter 2, we also performed verification using a simple Euclidean distance classifier in order to verify the performance improvements expected by using our method. For comparison the EER for the dynamic method using a Euclidean distance classifier is 5.7%; using the McNemar’s test we can see that the improvement due to our framework is statistically significant at the 95% confidence level. More importantly the distribution of match scores for the Euclidean distance classifier span five orders of magnitude and extremely poorly distributed (making setting a verification threshold extremely difficult). By contrast the distribution of match scores based on the probabilistic framework are shown in Figure 3-9 and we can see that these clearly fulfil our requirements set out in Chapter 2, that they span the full range and are well distributed.
Figure 3-9 Client and Impostor Distributions for Dynamic and Static Gait Distributions Using the Probabilistic Classifer
To evaluate the automated extraction and verification from our ear recognition algorithm we again used the XM2VTS database, this time with the left most head rotation image. Using four images each of 114 subjects (listed in Appendix B) we compile a client set of 684 comparisons and by comparing client images to random images selected from clients in the dataset we produce an impostor set of 684 comparisons. The remainder of the clients are used for training. Again the
and impostor scores are shown in Figure 3-10. There is clearly a concern over both the performance of the algorithm and the resultant distribution of client scores, the effect of this will be considered in Chapter 4 to influence whether further work is expended on this modality. For the moment it is sufficient to note that after manual inspection of the extracted ear images, the cropping seems to be the primary difficulty in gaining acceptable performance levels. The PCA technique is (as noted in section 3.2.1) is particularly sensitive to proper centring, masking and rotation; and it therefore seems sensible to consider either a better extraction technique or less sensitive algorithm.
Figure 3-10 Distribution of Client and Impostor Scores for the Ear Modality
Given the novel nature of footfall sensor there did not exist a suitably large database for initial evaluation. For this reason we recorded a small initial database of fifteen subjects with eight records each. We use five of these subjects for training and the remaining ten as test data. As with the other modalities we compare all records of a subject with their other records to produced 280 client comparisons, we then compare each client record with randomly selected non-client records to produce an impostor set of 280 comparisons. Again the probabilistic framework described in Chapter 2 is employed and the distribution of scores shown in Figure 3-11. The results of the footfall sensor are promising given such a small training population and limited feature vector.
Modality EER (%) Decidability Face 2.9 4.47 Gait (Dynamic) 5.2 3.40 Gait (Static 1) 14.2 1.86 Gait (Static 2) 21.6 1.61 Ear 35.4 0.87 Footfall 22.3 1.49
Table 3-3 Equal Error Rates and Decidability Indices for Modalities Under Test
Figure 3-11 Distribution of Client and Impostor Scores for the Footfall Data
As we can see there is a great deal of difference in the performance of the various modalities, the effect of this will be fully explored in Chapter 4.