Biometrics Performance Metrics - A framework for continuous, transparent authentication on mobi

2.6 Biometrics

2.6.5 Biometrics Performance Metrics

There is currently no widely accepted method for reporting the results of biometric studies. Crawford [133] provided a step towards such formalization by reviewing keystroke dynamics research on mobile and non-mobile platforms and proposing a list of reportable statistics. Many studies in this area continue to report a dizzying array of error rates and performance curves. This makes it challenging to compare study results, and has the potential for drawing incorrect conclusions.

Pattern classifier output is sensitive to many factors, including algorithm choice, amount of training data, the chosen features in the feature vector, and within-participant variation. These factors will have an effect on the performance metrics computed for each classifier. Within the current literature, there are several widely-used methods to report the quality of a particular pattern classifier; those described here are used to report the results of the biometrics studies in this research. Table 2.3 shows the different types of metrics that can be considered for any pattern classifier. It shows all of the possible results in a two-class problem, with the class decisions made by the classifier in the columns, and the true, known classes in the rows. The diagonal from top left to bottom right shows the number of correctly classified patterns. True accept and true reject are seen when the classifier produces the same result as the known classification for the pattern. False accept and false reject are when the classifier produces the opposite result to the known classification. Many studies report both false accept and false reject rates but do not report the true accept and true reject rates. In many research papers, these values are known as false (or true) positive and false (or true) negative, but the terms false (or true) accept and reject will be used in this research.

Predicted Class Actual Class

Positive Negative Positive True Accept False Reject Negative False Accept True Reject

Table 2.3: A generic confusion matrix for a two-class decision problem.

Several different types of error rates are commonly reported in biometrics studies. There is some disagreement in the research community as to which rates are important [133], but the generally accepted errors are as follows (see Figure 2.2):

Crude Accuracy (CA): Also called misclassification error, this standard method of reporting results is simply the number of incorrect classifications made when comparing the classifier output to the known true class for the pattern [134]. As it is a combination of the next two metrics, it delivers minimal value on its own.

2.6. Biometrics 32 False Accept Rate (FAR) Commercial Applications High-security Applications Low-security Applications False Reject Rate (FRR) Equal Error Rate (EER) Error Threshold Small (Tight) Large (Slack) R ate (%)

Figure 2.2: Generic classifier performance metrics, with threshold levels for secure, insecure, and unknown security levels, showing the relationship between EER, FAR and FRR. These curves do not represent results from this or any research. Adapted from [12].

likelihood that an unauthorized user (i.e., an impostor) will be granted access to the protected resource. High FAR values are often seen as a significant problem because they represent an intrusion into a protected system, although the determination of a threshold accepted level is left to particular implementations. Let FA be the number of false accepts and NI be the number of impostor patterns. FAR is calculated as in Equation 2.1 [135]:

FAR = FA

NI (2.1)

False Reject Rate (FRR): Also called Type II error or false negative. FRR represents the likelihood that an authorized user will be denied access to the protected resource. It can be seen as an annoyance to the authorized user since it means that they will have to attempt to reauthenticate, perhaps more than once. Let FR represent the number of false rejects from the classifier output and NA be the number of authorized user patterns. Then, FRR is calculated using Equation 2.2.

FRR = FR

NA (2.2)

The relationship between FAR and FRR has been described as mutually exclusive since it is impossible to both reject and accept the same authentication attempt [18, 136]. While such a statement is true, care must be taken when using such a description for these two related error rates. FAR and FRR also share an inverse relationship – it should not be assumed from the use of the term “mutually exclusive” that no relationship exists between the error rates.

2.6. Biometrics 33 The proof of such a relationship lies in the definition of Equal Error Rate (EER).

Equal Error Rate (EER): EER is defined as the point at which the plotted curves of FAR and FRR values cross [12], as seen in Figure 2.2. In this figure, a large or “slack” error threshold means that the value above which an authentication attempt is granted access is low. In other words, more authentication attempts will be accepted than with a small, or “tight”, error threshold. The terms “small” and “large” in this context refer to the range in which accepted attempts reside. With small error thresholds, the range of values that are accepted is small, and the reverse for a large error threshold. EER can also be determined by plotting the ROC curve for the classifier, as detailed below, and determining its abscissa by plotting a diagonal line from the upper left to the lower right corners and observing where the two lines cross.

ROC Curve: A Receiver Operating Characteristic (ROC) curve, as seen in Figure 2.3, shows the relationship between FAR and True Accept Rate (TAR), which is the number of patterns that actually belong to the positive class [134]. The ROC curve shows the overall usefulness of the results of the pattern classification. The closer the line comes to the upper left corner of the graph, the better the method is at correctly identifying or verifying users. Furthermore, since this curve is based over all thresholds, it can be used to select a viable threshold at which the classifier in question is most accurate.

Area Under Curve (AUC): AUC is a measurement of the area under the ROC curve [137] for a given classifier and a given user. It is a representation of the probability of a true response (either positive or negative) when classifying data – a random classifier will have an AUC value of 0.5 (50%) and an ideal classifier will have an AUC of 1.0 (100%). AUC is a summary that attempts to represent the entire ROC curve in one value. As such, AUC calculation loses some information and nuances of the original curve since the individual tradeoff values that make up the curve are lost.

The European Standard for Access Control Systems (EN 50133-1) states that a biometric authentication system must have a False Accept Rate (FAR) of less than 0.001% and a False Reject Rate (FRR) of less than 1% in order to be used in production systems [138]. However, the error rates suggested in EN 50133-1 are not specific to behavioral biometrics, which are known to be less distinctive than physiological biometrics [65]. Therefore, the values stated in EN 50133-1 may not provide a suitable benchmark to use in determining the applica- bility of any behavioral biometric. Instead, the error rates of related work in the particular biometric field may be used as a benchmark.

The research presented to this point has examined biometrics for use on mobile devices, including user acceptance of them and the methods of reporting their errors. Their use in the mobile device environment has the potential to provide a continuous, transparent authentication method. This concept has been studied by several researchers, both for mobile device

2.7. Transparent and Continuous Authentication 34

In document A framework for continuous, transparent authentication on mobile devices (Page 45-48)