Probabilistic Neural Network for User Authentication Based on Keystroke Dynamics

(1)

Volume 2, Issue 11, November 2013

Page 342

Abstract

Computer systems and networks are increasingly used for many types of applications; as a result the security threats to computers and networks have also increased significantly. Traditionally, password user authentication is widely used to authenticate legitimate user, but this method has many loopholes such as password sharing, brute force attack, di ctionary attack and more. The aim of this paper is to improve the password authentication method using Probabilistic Neural Networks (PNNs) with three types of distance include Euclidean Distance, Manhattan Distance and Euclidean Squared Distance and four features of keystroke dynamics including Dwell Time (DT), Flight Time (FT), mixture of (DT) and (FT), and finally Up-Up Time (UUT). The results illustrate that Euclidean Squared Distance with (UUT) feature provide low error rate and high accuracy compared with the other two types of distances used.

Keywords: Biometrics, Keystroke Dynamics, Probabilistic Neural Network, User Authentication.

1. I

NTRODUCTION

Internet applications use an authentication scheme to make sure that only genuine individual can login to the application [1]-[2]. User authentication is the process of validating claimed identity for the purpose of performing trusted communications between parties for computing application [3].Biometrics offers an automated methods for authentication and identification based on physiological or behavioral characteristics [4]-[5].

The keystroke dynamics authentication is based on the idea that each user has unique keystroke latency pattern which is different from others [6]. There are two types of keystroke dynamics: first one is the static keystroke dynamics in which the typed data is fixed and the typed time information is also fixed during login time. While the second one is continuous keystroke dynamics in which individuals are authenticated independently of what they are typing on the keyboard and the typing characteristics are analyzed during complete session [7].

This paper is organized as follows. Section two presents related work. Section 3 illustrates the features of keystroke dynamics. Section 4 presents the proposed approach; section 5 describes the proposed approach for user authentication, while the evaluation criteria explained in section 6. Finally the experiment results, results analysis and conclusions are given in sections 7, 8, and 9 respectively.

2. R

ELATED

W

ORK

There are several previous works that may have a relation in one way or another to the present work.

-In[8], a comparison between ADALINE(Adaptive Linear Element) based on the single perceptron model and the BPNN(Back Propagation Neural Networks) model, using both the FT and digraph latency time, concluded that the BPNN surpass the ADALINE which was not capable of classifying patterns.

-In [9], a statistical method was used to classify users to legitimate users or impostors. The extracted FT feature was used and 63 users participated in their experiment.

-In [10], a statistical-based comprehensive study was carried out and presented the development of a keystroke dynamics-based user authentication system using the neural network with FT feature. The deduction was that neural network-dynamics-based methods have better results as compared with statistical methods in keystroke patterns classification.

-In [11], user’s typing biometrics was measured using a fuzzy logic. This experiment involved 29 users who provided their user ID (Identification) and password of length eight or more. Common DT were used and the typing difficulty between two successive keystrokes is calculated. One main disadvantage of using typing difficulty as keystroke feature was to set the categories of difficulty, as there are wide ranges of possibilities to define the typing difficulty.

3. K

EYSTROKE

D

YNAMICS

F

EATURES

The main purpose of the feature extraction phase is extracting vital features from the timestamp collected from raw keystroke data for template generation. The extracted features are certainly the critical means with regards to keystroke dynamics based biometrics and directly influence the performance of the classifier [11]. Extracting the right features from a dataset can reduce the computational complexity of the problem [14].Each sample in the dataset is represented by a sequence of timing information expressing the exact time at which keys have been pressed and released. There are many common types of features which can be extracted from a human keystroke as follows [11]-[12]:-

1-Dwell time: It is the time to measure how long a key is pressed (Down) until it is released (Up).

Probabilistic Neural Network for User

Authentication Based on Keystroke Dynamics

Sarab M. Hameed1, Mais M. Hobi2 and Sumaya Saad3

1,2,3

(2)

Volume 2, Issue 11, November 2013

Page 343

2-Flight time: It is the Interval between a key release (Up) and next key press (Down) time.

3-Down-Down: It is the Interval between two successive key presses (Down). 4-Up-Up: It is the Interval between two successive key releases.

5-Tri-graph: Is the elapsed time between the first key press (Down) and the third key press (Down). 6-Placement of the fingers: a camera is required in this type.

7-Pressure of keystroke: In this case a special type of pressure which has a sensitive keyboard needs to be used.

4. P

ROBABILISTIC

N

EURAL

N

ETWORKS

The PNN consists of input layer, pattern layer, summation layer, and output layer. The number of nodes in input layer depends on length of timing information vector for specific user. The pattern layer is designed to contain one neuron (node) for each training sample available and the neurons are split into the two classes. Each neuron in the pattern layer computes a distance measure between the presented input vector and the training example represented by that pattern neuron. The summation layer contains one neuron for each class, while the output layer contains one neuron. PNN algorithm is started with read timing information vector (X) and feed it to each Gaussian function in each class, then for each group of hidden nodes, compute all Gaussian functional values at the hidden nodes as illustrated in equation(2)[17].

   

  

 ₂

) (

2 exp



D p i x

(2)

Where

i

p

: represents the output of a pattern node.

x

_{: is the timing information vector to be assigned into class}

c

i_.



_{: smoothing factor.}

D

: represents the distance between the timing information vector

x

and the sample vector (reference vector)

y

is computed in this paper using three types of distance measures as illustrated in the following equations [15]-[16]:

1-Euclidean Distance: is one of the most popular distance metric between two vectors x and y and is computed as in equation (3).





 

 2

) ,

( x y x y

D

(3)

2-Euclidean Squared Distance: uses the same equation as the Euclidean distance metric, but does not take the square root.

3-Manhattan Distance: the Manhattan (or city block) distance between vector x and vector y is calculated as in equation (4).

 

 y x

y x D ( , )

(4)

After these steps, for each group of hidden nodes, feed all its Gaussian functional values to the single output node for that group as illustrated in equation(5).

) ( 1

) (

1

x j

p n

x y

n

i

j

i

 



₍₅₎

Where

i

y

: represents the output of summation node i for

classi

j

n

: represents the number of samples in pattern layer of

classi

i

p

: represents the output of pattern node i

Finally, find maximum value of all summed functional values at the output nodes comparing the values of Y1(x) and Y2(x).

(3)

Volume 2, Issue 11, November 2013

Page 344

5. P

ROPOSED

A

PPROACH FOR

U

SER

A

UTHENTICATION

The aim of this section is to apply PNN for keystroke dynamics. This section, also, clarifies the attributes that can be extracted from the users based on their typing styles that can be employed to maximize subtle differences between the typing styles of users.

5.1Keystroke Dataset Collection Phase

An essential part of any keystroke dynamics system is the acquisition of users keystroke data from the keyboard [13]. Therefore the keyboard property is set to enable the proposed system to distinguish between authentic users and impostors in an accurate way. The dataset acquired is a collection of timing of keystroke for specific passwords over a period of three months. The users are randomly selected from the staff of Baghdad University/college of science to participate building the required datasets. Two datasets are created. The first one is named Keystroke-1and the second one is Keystrok-2.

In Keystroke-1, 17 users are asked to type the password "computer" twenty five times. Five times (i.e. five samples) per week's session, i.e. five weeks are required to collect the dataset for each user. The user types the password in different sessions because there is a chance that when the user types continuously the same password again and again, the typing speed may be increased. Further, to take the probability of all user variations and circumstances.

However, the password “computer” is considered weak because of the relative ease with which a third party can guess them or find them via dictionary attacks. Hence, another dataset, Keystroke-2, is built up in a similar way to Keystroke-1, but 16 users were participated (thirteen of them are similar to those who participated keystroke-1). Here the users were asked to type password “comp.84-rl” which satisfies the strong password selection criteria, i.e., at 10 characters in length combining symbols and numbers.

5.2 Feature Extraction phase

In this paper, DT, FT, and UUT features are extracted from the collected dataset. Figure (1) depicts the three major features that have been extracted when the users type the "computer" password. The same representation can be repeated with the second password”comp.84-rl”.

Figure 1 Features Representation of "computer" Password

The length of the timing information vector is different and depends on the length of the password, for example, a password “computer” which contains eight characters will result in eight DT, seven FT and seven UUT. Generally, a password with n character will yield n number of DT and n - 1 number for FT and UUT.

5.3 Preprocessing

To enable the proposed system to distinguish between authentic user and imposter with minimum error rate, a normalization process is performed as shown in equation (1)

(1)

where

F: is the feature value,

MinF: is the minimum value that the feature F can get, and MaxF: is the maximum value that the feature F can get.

5.4 Training Phase

This section illustrates how PNN was used in proposed system..Table (1) shows the number of nodes in each layer when the features of "computer" password are extracted. The first row represents the information of PNN structure when DT feature is used. The number of input layer nodes represents the length of information timing vector according to specific feature. The number of pattern layer nodes represents the total number of training samples (users).

MinF MaxF

MinF F

F

  

(4)

Volume 2, Issue 11, November 2013

Page 345

Table 1 : PNN Information Structures for "computer" Password

Feature(s) Input layer

Nodes

Pattern layer

Nodes

Summation layer

Nodes

Output layer

Nodes

DT 8 17 2 1

FT 7 17 2 1

DT+FT 15 17 2 1

UUT 7 17 2 1

In this paper, there are two classes, the first class for authorized users and the second one for impostors. For example, in Keystroke-1 there are 17 users, the first 6 users are authorized and the remaining 11 users are impostors when the "computer" password is used. On the other hand, in Keystroke-2 there are 16 users the first 6 users are authorized and the remaining 10 users as impostors.

The main aim of training phase of PNN is to find a proper value of smoothing parameter



. In this paper, an algorithm is proposed to determine the proper range for



as clarified in algorithm (1). After the proper range for



is obtained, the training samples in pattern layer nodes are presented as input information vector in input layer. Then change



value within proper range and apply PNNalgorithm continuously until reaching to least error rates in classification. This value of



is considered the best value for good classification .

Algorithm (1): Smoothing Parameter Range Determination

 



Input: Set of training timing information vectors

p

_i_,_j.

Output: Smoothing parameter

 



.

Step1: [Find Mean Vector]

Compute the mean (centeroid) vector for each class

k

c

, 1<= k<=2.



 1 _i_,_j.

k

k p

N



Where

N

_{: is the total number of patterns with specific class}

j i

p

_,

: is the feature number

i

of pattern number

j

Step2: [Find Standard Deviation Vector]

Compute the standard deviation vector for each class, k.





.

1 2

,





 i ij

k

k p

N

std 

where

i



_{: is the}

_ith

_{value of mean vector}

Step3: [Find Range]

Find the minimum and maximum standard deviation value for each class to obtain the proper range of

 



value.

5.5 Authentication Phase

(5)

Volume 2, Issue 11, November 2013

Page 346

output score of pattern layer node has the range of [0 1]. Then this output score is put forward into summation layer. Finally, the decision is made at output layer to classify the user either as authentic or imposter.

6. E

VALUATION

C

RITERIA

To evaluate the predictive performance of the proposed biometric authentication system four measures are calculated. These measures are called False Reject Rate (FRR), False Accept Rate (FAR), Mean Error Rate (MER) and Accuracy. The formulas for calculating each of these measures are given as in equations (6), (7), (8) and (9) respectively [12]. FRR: is defined as the rate at which users are rejected when they could be authenticated.

attempts ligitimate of number Total

users ligitimate rejected

of Number FRR 

(6)

FAR: is defined as the rate at which users are accepted when they should be rejected.

attempts impostor of number Total

attempts impostor accepted of Number FAR 

(7) MER: is the mean rate of FAR and FRR.

2

FRR FAR

MER  

(8)

Accuracy: is defined as the proportion of true results in the population.

attempts of number Total

attempts correct of Number Accuracy 

(9)

7. E

XPERIMENTAL

R

ESULTS

Two experiments are conducted independently. In each experiment the two passwords “computer” and “comp.4-rl” are used by PNN. In the first experiment was tested on the same sample in the training dataset i.e. 1 and Keystrok-2. Each sample contains mean values of keystroke timing vectors. The results of this experiment were obtained with error rates equal to 0% and Accuracy 100% of two passwords "computer" and "comp.84-rl" with three types of distances, as shown in tables (2-3).

The second experiment deals with testing the proposed approach on line. This experiment includes computing the selected feature(s) for each key he/she typed. Then the preprocessing on the computed vector time is applied. Moreover, this experiment involves the testing when the proposed approach uses the first trial of password typing, and when it uses three trials of password typing. The results of this experiment are shown in table3( 4-5).

Table 2: Experiment1 Results of “computer” Password

D

is

ta

n

c

e

P

as

sw

o

rd

Metrics DT FT DT+F T

UUT

E

u

c

li

d

ia

n

“

c

o

m

p

u

te

r”

FRR% 0 0 0 0

FAR% 0 0 0 0

MER% 0 0 0 0

Accuracy% 100 100 100 100

E

u

c

li

d

ia

n

S

q

u

a

re

d

FRR% 0 0 0 0

FAR% 0 0 0 0

MER% 0 0 0 0

Accuracy% 100 100 100 100

M

a

n

h

a

tt

a

n

FRR% 0 0 0 0

FAR% 0 0 0 0

MER% 0 0 0 0

Accuracy% 100 100 100 100

(6)

Volume 2, Issue 11, November 2013

Page 347

D is ta n c e P a ss w o rd

Metrics DT FT DT +FT UU T E u c li d ia n “ c o m p .8 4 -r l”

FRR% 0 0 0 0 FAR% 0 0 0 0 MER% 0 0 0 0 Accuracy

%

100 100 100 100

E u c li d ia n S q u ar ed

%

100 100 100 100

M a n h a tt a

n FRR% 0 0 0 0

FAR% 0 0 0 0 MER% 0 0 0 0 Accuracy

%

100 100 100 100

8. R

ESULTS

A

NALYSIS

The PNN is used as keystroke dynamics authentication. The results of using PNN with different types of distances show the ability to distinguish authentic users from impostors. The results of experiment1 reflect a good training level in order to obtain high performance of proposed approach..The results of experiment2 give the indication to that the DT feature is the worst keystroke dynamic feature. On other hand experiment2 results indicate that UUT producethe best or equal results as compared with other features that are used in previous works DT, FT, and combination of DT and FT. Finally, the best results are obtained with UUT out performs others because UUT implicitly contains the two other features DT, and FT; that leads to build a new feature from the previous two features making the last feature having more capability to discriminate the authentic users from the impostors. Furthermore, this best result is obtained with fewer network net nodes when compared with combination of DT and FT features.The FAR%, FRR%, MER% and Accuracy of proposed approach evidence that it is possible to improve the password security mechanism considering not just the combination of DT and FT but also UUT feature.

9. C

ONCLUSIONS

From the results, one can conclude, in general, the following:

1-The DT, FT, and UUT features are candidate for distinguishing between authentic users and imposter one and may be exploited to reinforce password security.

2-The keystroke dynamics features namely DT, FT, combination of DT and FT and UUT are extracted. However UUT feature satisfies the best results among the others.

3-The accuracy of the presented work with PNN is close to meet acceptable error levels that would be required for a system with some degree of security.

4-When PNN with three types of distance metric (Euclidean distance, Euclidean squared distance, and Manhattan distance) is applied the results show that the Euclidean squared distance produced the best results.

Table 4 Experiment2 Results of “computer” Password

D is ta n c e P a ss w o r d T r a il n o .

Metrics DT FT DT +F T UU T E u c li d e a n “c o m p u te r" 1

% 76 94 94 94

S q u a r e d E u c li d e a

n _FRR% ₁₆ ₀ ₀ ₀

(7)

Volume 2, Issue 11, November 2013

Page 348

Accuracy

% 76 100 100 100

M a n h a tt a

n _FRR% ₅₀ ₃₃ ₃₃ ₁₆

FAR% 9 0 0 0 MER% 29 16 16 8 Accuracy

% 76 88 88 93

E u c li d e a n 3

% 72 86 90 90

S q u a r e d E u c li d e a

n _FRR% ₃₃ ₂₇ ₂₇ ₁₆

FAR% 24 3 0 0 MER% 28 15 13 8 Accuracy

% 72 88 90 94

M a n h a tt a

n _FRR% ₆₁ ₅₀ ₅₀ ₃₃

FAR% 9 0 0 0 MER% 35 25 25 16 Accuracy

% 72 82 82 88

Table 5 Experiment2 Results of “comp.84-rl” Password

D is ta n c e P a ss w o r d T r a il n o

. _Metrics _DT _FT _DT

+F T UU T E u c li d e a n “ co m p .8 4 -r l"

1 FRR% 50 0 16 0 FAR% 30 10 0 0 MER% 40 5 8 0 Accuracy

%

62 93 93 100

S q u a r e d E u c li d e a n

%

62 93 93 100

M a n h a tt a n

%

75 87 87 100

E u cl id ea n

3 FRR% 55 5 5 11 FAR% 30 10 10 3 MER% 42 7 7 7 Accuracy

%

60 91 91 93

S q u a r e d E u c li d e a n

%

60 89 89 93

M a n h a tt a n

(8)

Volume 2, Issue 11, November 2013

Page 349

Accuracy

%

62 87 89 91

R

EFERENCES

[1] U. Dieckman N and R.W. Frischholz “BioID: A Multimodal Biometric Identification Systems” IEEE Computer, Vol.33, No. 2,pp.64-68, 2000 .

[2] F. Monrose and A. D., Rubin “Keystroke Dynamics as a Biometric for Authentication” Future Generation Computing Systems (FGCS), Vol.12, No 12, pp.351-359, 2000.

[3] L. O’Gorman Comparing Passwords, Tokens, and Biometrics for User Authentication, Proccedings of the IEEE,Vol.91,No.12, pp.2019-2040, 2003.

[4] Y. Chen and A. jain, “Beyond minutiae: A fingerprint individuality model with pattern, ridge and pore features,” in ICB09,2009.

[5] H. Mendez, C.Martin, J. Kittler,Y. Plasencia, and E. Garcia Reyes, “Face recognition with lwirimagery using local binary patterns,” in Proceedings of the International Conference on Advances in Biometrics, 2009.

[6] L. C. F Araujo,H. R. LuizSucupirajr., Miguel G. Lizarrage, Lee L. Ling, and Joao B. T. Yabu-Uti, “User Authentication Through Typing Biometrics Features”, IEEE Transactions on Signal Processing, Vol. 53,No.2, pp.851-855, 2005.

[7] D. Davis and W. Price, “Security for Computer Networks”, John Wiley & Sons, Inc., 1989.

[8] N. Abdullah, A.M. Ahmad, “User Authentication via Neural Networks”, in Proceedings of the 9th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, pp. 310-320, 2000.

[9] D. C. D. Souza, “Typing Dynamics Biometric Authentication”, Bachelor engineering thesis, Faculty of Engineering and Physical Sciences, University of Queensland, Australia, 2002.

[10]L. C. Change., L. W. Kin, and L. C. Peng, "Keystroke Patterns Classification Using the ARTMAP-FD Neural Network", In proceeding of 3rd International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP, Vol. 2, pp. 61-64, 2007.

[11]P.S. Tee.,T.S. Ong. andA. B. J. Teoh, “A Multilayer Layer Fusion Approach on Keystroke Dynamics”, Springer, Vol. 14, pp. 23-36, 2011.

[12]H. Barghouthi, “Keystroke Dynamics How Typing Characteristics Differ from One Application to Another”, Msc in Information Security, Gjøvik University College, Norway, 2009.

[13]S.Steven Richard, "A New Approach to Securing Passwords Using a Probabilistic Neural Network Based on Biometric Keystroke Dynamics", Ph.D thesis, University of Newcastle upon Tyne, UK, The Department of Electrical and Electronic Engineering,2003.

[14]G. Romain,M. El-Abed and R.,Christophe, "Keystroke Dynamics Authentication", International Symposium on Collaborative Technologies and Systems, France, 2009.

[15]F.Monrose and A. D. Rubin, “Keystroke Dynamics as a Biometric for Authentication”, Future Gener Compute Syst Vol. 16, No.4, pp. 351– 359, 2000.

[16]R. Kenneth, "User Authentication via Keystroke Dynamics: an Artificial Immune System Based Approach", in Proceedings of 5th International Conference on Information Technology, 2011.