Unfortunately, not only keystroke systems but all biometric authentication systems sometimes suffer from mistakes in the authentication decision. Some reasons that might influence the accuracy of the system are related to the classification method of choice, the features included in the timing vector, and the quantity of training data [4].
It is possible for an imposter to be mistakenly identified as the legitimate user if, by chance, the typing patterns of these two individuals are close enough to the extent that the classification method fails to distinguish between them. Conversely, when one of the legitimate user’s fingers slips off the keyboard and causes the typing pattern to change slightly, the user may not be successfully authenticated. Thus, it is important to have some
metrics to exactly measure the error rate; this helps to identify the performance level that can be expected and tolerated by that system’s users [87].
There are four possible results in a two-class problem which should be taken into consideration in any pattern classification system such as keystroke dynamics authentication [25]. The classifier will produce either a positive classification, if this is the legitimate user, or a negative one, if this is an imposter. If the actual classification is positive which means that the user under study is the legitimate user, then the result is true accept in the case where the classifier predicted a positive classification and then the result is otherwise a false reject. In the same manner, if the actual classification is negative, which means that the user under study is an imposter, then the result is false accept in the case where the classifier predicted a positive classification and the result is otherwise a true reject. Table 2.3 demonstrates all the possible result combinations in a more concise manner. True accept and true reject demonstrate the quantity of correctly classified users. False accept and false reject, on the other hand, are used to compute the system’s error rate [25].
Table 2.3: Two-class decision result’s combination.
Predicted class Positive Negative Actual class
Positive True accept False reject Negative False accept True reject
There are many methods to evaluate the performance of a keystroke dynamics system, thus a variety of error rates can be found in the current literature.
A very simple way to measure the error rate was used in earlier studies; using the accuracy measure, which is the percentage of successfully classified attempts compared to the total number of completed attempts; this technique was adapted in [33, 71, 68]. Quite the opposite, a misclassification error is the percentage of incorrect classifications compared to the total number of attempts; it was applied in [88]. Both rates are computed using the following equations:
(2.5)
The most frequently used error rates for determining the performance of an authentication system are: the False Accept Rate (FAR), also referred to as the Imposter Pass Rate (IPR) and the False Reject Rate (FRR), also called the False Alarm Rate (FAR). The FAR is the percentage of impostors who have successfully gained access to the system, while the FRR is the percentage of legitimate users who have been denied access to the system. These two error rates were used by the majority of free-text keystroke systems, including [8, 50, 80]. These error rates are computed using the following two formulas:
(2.7)
(2.8)
When looking at the numbers produced by both FAR and FRR, the smaller these values are, the more secure the system under study is. As shown in Figure 2.4, there is a trade-off between the FAR and FRR which can be controlled according to the strictness level of security required [4]. FAR is required to be as low as possible in highly secured applications, even though a higher FRR compromise might be experienced. Meanwhile, a higher FAR is somewhat acceptable in systems where security is not the major aim yet the system’s usability is of higher priority.
Figure 2.4: An example showing the relationship between FAR, FRR, and EER. Threshold Error Rate High‐security Applications High‐usability Applications
The other commonly used error rate is the Equal Error Rate (EER), also referred to as Cross- over Error Rate (CER), which represents the value where FAR and FRR are equal. It was used in many studies such as [68, 69, 89], where lower EER values indicate a more secure system. Figure 2.4 demonstrates the relationship between FAR, FRR, and EER with respect to different threshold values and how it reflects on the security level of the application in-use.
ZeroFAR and ZeroFRR are two other error rates used in [12, 66]. ZeroFAR corresponds to the FRR value when the FAR is equal to zero and, likewise, ZeroFRR corresponds to the FAR when the FRR is equal to zero. This is performed by setting the threshold so that FAR becomes zero in case of ZeroFAR and, similarly, setting the threshold so that FRR becomes zero in case of ZeroFRR.
The Receiver Operating Characteristic (ROC) curve is another performance measuring method. It finds the relationship between the FAR and the True Accept Rate (TAR), which is the percentage of legitimate users correctly classified. A high accuracy system would have this curve plotted closer to the upper left corner of the diagram, as shown in Figure 2.5. This error rate was used in studies such as the one conducted in [67].
Figure 2.5: An example of ROC curve.
Due to the fact that free-text keystroke authentication can be a continuous process, another metric was proposed in some studies. This metric exactly defines the amount of time, in number of keystrokes, that it takes for the system to discover that an imposter has had access. This aims to detect the impostor as fast as possible, incorporating as few keystrokes as
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 True Accept Rate False Accept Rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
possible, which implies that an attacker would be detected before he can do more harm to the system. The number of keystrokes performed before the imposter was detected was used in [44, 77].
Giot et al. [90] introduced a Failure To Acquire Rate (FTAR) in their paper, which investigates the acquisition process of keystroke data. This error rate identifies the ratio of acquisition difficulties that face the users. A common example of such difficulties is typing mistakes in fixed-text keystroke dynamics. Such mistakes compel the users to delete the word being typed and start all over again. This might cause some frustration for the users, which deteriorates the usability of the method in-hand.
The fact that there is no major method used for measuring the performance in the studies relating to keystroke dynamics authentication, makes it difficult to compare the results of these studies. This has a critical impact on the overall perception of the methods used and may result in incorrect conclusions about them [25].