In this section we first describe in more detail a CA system, using the example of keystroke dy- namics, and we will focus on the way that such a system works and how the performance can be evaluated. This will be described in Section 4.2.1 and then in Section 4.2.2 we describe 3 evaluation scenarios that we used in our analysis.
4.2.1 System Description
As mentioned in Section 4.1 a CA system will consider each and every action of the user to determine the genuineness of that user. It is however not the case that each and every action by itself will lead to a lockout of the current user, but only in combination with previous actions. In biometrics, and specifically in behavioural biometrics we do know that genuine users make errors [15]. The typing behaviour of a user is never so stable that all his/her typing rhythm is exactly correct. When considering Keystroke Dynamics (KD), the features that are used are duration of a key (i.e. the time a specific key is held down) and the latency between two consecutive keys (i.e. the time between releasing one key and pressing down the next key). Due to normal variation in behaviour, as well as influences of external factors, like mood, tiredness, or distraction, will duration and latency vary. Furthermore we can notice that a specific user will be more stable in his/her typing behaviour of some keys and less stable for other keys. In general we also know that some users are more stable in their overall typing manner than others. Generally in KD the template of a user will contain mean (µ) and standard deviation (σ) for the durations and latencies of the various keys and key pairs. A 30
4.2 CONTINUOUSAUTHENTICATIONSYSTEM
small σ value indicates stable behaviour of that user for that particular duration or latency, while a larger value implies the user is not so stable.
Often a normal distribution of the timing value (duration or latency) is assumed. Although this is not completely true, it does define a workable assumption. Under that assumption we know that 68%/95%/99.7% of all the timing values fall within 1/2/3 times σ difference from the mean µ. By default this implies that never 100% of all the timing values (duration or latency) fall within the range [µ − k · σ, µ + k · σ] for small values of k. This implies that the genuine user will not, on each and every action, act as he/she should. But it is clear that on most of his actions, his behaviour will be considered normal and only on a minority of actions, his/her behaviour will deviate from normal. When considering the behaviour of an imposter user however, it might happen that on a number of actions his behaviour seems to be the same as the normal behaviour of the genuine user. However, due to the fact that two people do not behave in exactly the same manner, we will see that on many actions the behaviour of the imposter deviates from the normal behaviour of the genuine user.
The above noticeable difference between genuine and imposter behaviour will be used in a CA system to differentiate between them and to lock out the imposter user and to leave the genuine user to do his daily business. The way this is done is by using the concept of ”trust in the genuineness” of the current user [19]. If the system’s trust in the genuineness of the user is above a specific value (threshold) then the user can continue his activity, while if the trust becomes too low, then the user is locked out and has to use the SA system again to get access to the system.
After the user has logged on using an SA system, the trust level is set to 100. Every single action of the current user will lead to an increase or decrease of this trust. More specific, if an action is performed in the normal manner (i.e. as described in the template of the genuine user), then the user gets a ”reward” and the trust increases. Any action that is performed in a manner that deviates from the normal behaviour will lead to a decrease in trust, i.e. a ”penalty”. Given the above reasoning that the genuine user will perform most actions in a ”normal” manner, his/her trust will in most cases increase and in some cases decrease, but generally will stay at a high level. An imposter user, whose behaviour will in most cases deviate from the normal behaviour of the genuine user, will have a negative trend in the trust value and will, after a certain number of actions, be locked out of the system due to the too low value of the trust. The goal is to minimize the number of actions that can be done by an imposter.
The way that the trust changes, i.e. the amount of penalty and reward, is a system setting that will determine how well the system performs. Penalty and reward can be fixed values, but can also depend on the actual correctness or deviation from the normal behaviour [20]. For example, a system could be implemented in such a way that if the current timing duration value of a letter deviates more than 2 standard deviations from the expected mean value, then the user would get a penalty, i.e. a decrease in trust. This change of trust ∆Tcould be a fixed value (i.e. ∆T = −1) or could depend
on the actual current time duration t and mean µ and standard deviation σ as given in the template (e.g. ∆T = 2 −|t−µ|σ ).
The penalty and reward function plays the same role in CA as the distance metric or matching function plays in an SA system. Changing these functions will change the performance of the sys- tem. Unlike an SA system, where the distance value or matching score immediately determines the decision made by the system (either accept the user or reject him/her), there are 2 levels of decisions in a CA system. The first level is described above and is the change of the trust value based on the current action of the user. The second level will be the decision to either let the user continue working or lock him/her out. That decision is made based on the trust value after the latest action. A trust value that drops below a specific threshold will lock out a user, while otherwise the user can continue to work. Note that a user can never be locked out based on an action performed according to the normal behaviour of the genuine user, because such an action would in fact lead to an increase in trust and hence to a trust value that must be above the lock out threshold (otherwise the user would have already been locked out on the previous action). Note also that correct behaviour will lead to an increase of trust, but this trust value can never exceed 100. If a CA system would allow for the trust value to increase unlimited, then an imposter, when hijacking a current session, could profit from this build up trust of the genuine user.
4. PERFORMANCEEVALUATION OFCONTINUOUSAUTHENTICATIONSYSTEMS 𝑇𝑇1 𝑇𝑇1 𝐼𝐼𝐼1 2 3 … 𝐼 − 1 𝐼
Figure 4.1: Data separation for VP-1.
4.2.2 Evaluation Scenarios
When evaluating the performance of a system we can use various scenarios. We describe three general scenarios, but variations on these are possible. These scenarios will be named ”internal”, ”external”, and ”mixed” for reasons that will become clear shortly. All these scenarios are related to the use of Machine Learning (ML) tools. We assume that for each user of the CA system, we train a binary classifier that has the classes ”genuine user” and ”imposter user”. Each classifier is trained with a combination of genuine and imposter data in equal amounts, to avoid bias. If the number of data samples used from the genuine user equals n and data of k imposter users is used, then each of these imposter users will supply approximately n/k data samples for the training. Testing of the classifiers is never done with data that has already been used for training. The amount of training data of the genuine user was 50% of the total amount of data of that user, with a maximum of 20,000 actions1.
In case of the ”internal” scenario, we assume that the system is used inside an organization where data of all participants is available. One may assume that, because data of all imposters is used during the training that this will influence the performance of the system in a positive manner. For that reason also the ”external” and the ”mixed” scenarios are designed. For the ”external” scenario we assume that the system might be attacked only by people of whom no data is available for training the classifier. The ”mixed” scenario is a combination of both the ”internal” and the ”external” scenario. Similar as for the ”external” scenario the classifier is trained with the data of the genuine user and the data of a subset of the imposter users.
4.2.2.1 Verification Process 1 (VP-1)
This verification process implies that data of all participants can be used to train binary classifiers. If we assume M participants within the organization, then each binary classifier is trained with the data of one genuine user and M − 1 imposter users. The data to test the performance of the system is the data of each participant that has not yet been used for training. Figure 4.1 explains this separation process for the first user, where |T r1| ≈ |IM P1| to avoid classifier biassing. This verification
process can be considered as the ”internal” scenario. 4.2.2.2 Verification Process 2 (VP-2)
In this case however, we do assume that M1imposter users whose data is used during the training
are part of an organization that will employ the CA system. The remaining M2= M − 1 − M1
imposter users are however considered as external to the organization or added to the organization after the training phase and no data of these users is assumed to be available for training purposes. Alternatively one could assume a large organization where there are too many participants to train the classifier of a genuine user with the data of all other participants. In this scenario will the testing 1We have used this maximum limit of 20,000 actions primarily because of two reasons. First is to keep a significant
amount of data for the testing of all imposters and second is to reduce the classifier’s training time. We found that due to this maximum limit the classifier’s training accuracy also improved.
4.3 PERFORMANCEINDICATORS 𝑇𝑇1 𝑇𝑇1 𝐼𝐼𝐼1 2 … (𝐼 − 1)/2 (𝐼 − 1)/2 + 1 … 𝐼
Figure 4.2: Data separation for VP-2.
data come from all participants, i.e. all imposter users are included in the testing. However, testing is never done with the same data that has been used for training already. Figure 4.2 explains this separation process for the first user, where again |T r1| ≈ |IM P1| to avoid classifier biassing and
M1=
M −1
2 . This verification process can be considered as the ”mixed” scenario.
4.2.2.3 Verification Process 3 (VP-3)
In this verification process a set of standard imposter data should be supplied by the organization selling the CA system and that each user should train a classifier based on the supplied imposter training data and his/her own training data. This scenario can be tested by splitting the set of im- posters in 2 sets of respectively M1and M2 = M − 1 − M1users. Then, using the data of the
M1imposters, plus the data of the genuine user for training (again assuring that the number of gen-
uine and imposter data samples for training are approximately the same), the binary classifier can be trained. In this case however the testing will be done using the data of the genuine user that has not been used for training, and the data of the remaining M2imposter users. This procedure can be
repeated multiple times for different splits of the set of imposter users.
Figure 4.3 explains this separation process for the first user, where again |T r1| ≈
IM P1 1 ≈ IM P2 1
to avoid the classifier biassing, and M1 =
M −1
2 . First, we trained the classifiers with
the training data of the genuine user and the training data of the first set ofM −1
2 imposter users,
exactly as we have done in VP-2 (see Figure 4.3(a)). Next we tested this system with the testing data of the genuine user and all of the data of the second set of imposter users. This process is then repeated with the second set of training data where the imposter users swapping roles (see Figure 4.3(b)). In this verification process imposter users are not known to the system during testing. This verification process can be considered as the ”external” scenario.