Pattern Classification and Machine Learning

a model for it, although they did not create a working implementation of it. They suggest two important findings: first, that both authentication security and usability are increased by using transparent authentication when compared to explicit methods; second, that while transparent authentication can be used on many devices, it is ideally suited to mobile devices because of their access to a rich source of data about the device owner [139]. They did not, however, proffer justification for either claim. Jakobsson et al. also consider data that is available from the carrier and from off the device itself. They chose to use the call frequency as the sole biometric, although they do make reference to other biometrics such as keystroke dynamics that could be used.

The model that resulted from Jakobsson et al.’s work is yet another step in the direction towards realizing transparent authentication but has issues such as how the results of call frequency calculations relate to what may be done on the mobile device. Another issue in their work is that there is no comparison of the strength of call frequency to other biometrics that can be used on a mobile device.

In 2007, Mazhelis and Puuronen [79] produced a framework for user substitution detection (USD)on mobile devices. They claim to consider security “detective” methods, rather than “preventive” aspect of security. The difference with these two approaches is that the former investigates who committed an intrusion after it happens, while the latter focuses on prevent- ing the intrusion from happening. Mazhelis and Puuronen claim that USD is only closely related to authentication rather than true authentication. In their view, true authentication can be assumed to end when the user is granted access to a resource, although more con- temporary definitions of authentication refer to allowing or disallowing the use of specific resources, services and data. Mazhelis and Puuronen’s work focuses on a strong psycholog- ical connection between the user’s actions and their uniqueness and classification ability in place of implementation methods and pattern classification techniques.

Biometric techniques and the frameworks that support them require methods for comparing gathered patterns to known patterns. Typically, machine learning techniques such as sta- tistical and neural network-based pattern classification algorithms have been used for this purpose.

2.9 Pattern Classification and Machine Learning

The pattern matching tasks in most biometrics research use standard pattern classification algorithms to make decisions. Figure 2.4 shows a typical workflow for a pattern classification task. The workflow begins with the selection of one or more classifiers for the data at hand. Next, the classifiers are trained with a subset of the gathered data to create a model to which

2.9. Pattern Classification and Machine Learning 39 test data and future patterns will be compared. Once training is complete, the classifier model is tested for accuracy by presenting the trained classifier with some test data that was not used in the training phase. The results of the tests are then examined and measurements such as those described in Section 2.6.5 are generated.

The next step is to simplify the model by identifying those data features that provide the most discriminatory information to the classifier and removing those that provide minimal information. This reduction of dimensionality simplifies the classifier since there are fewer features to compare during any one classification task. It may be the case that no features may be removed if they are all equally important.

Select Classifier Train (create model) Measure Classifier Accuracy Simplify Model

Figure 2.4: Generic workflow for a pattern classification problem.

If the error rates are not sufficiently low, then a different classifier is selected, as shown by the top dotted arrow in the figure. If the dimensionality can be reduced, then the new model is once again tested with the same data as in the previous test and the error rates are compared to see if simplifying the model had an effect. The expected outcome is that the error rates are no lower than previously, although there are some cases where simplifying the model may increase the classifier’s accuracy. Once the most accurate classifier has been chosen, the model can be used to classify new data.

The classifiers used for this research are suitable for use on a mobile device platform, which has limitations in memory, battery life and processor speed. For this reason, several factors influence classifier choice, as follows:

Simplicity: the classifier should have a simple algorithm.

Speed: the classifier should make a decision within a few seconds so that device function- ality is not impaired.

Accuracy: the classifier should, within the constraints of behavioral biometric accuracy, have acceptably high AUC and low EER, FAR and FRR values. The exact definition of “acceptable” relies on the specific classifier and biometric implementation.

The following classifiers, all of which meet the first two requirements stated above, are commonly used and are standard in some programming language libraries:

2.9. Pattern Classification and Machine Learning 40 Na¨ıve Bayes (Gaussian and Kernel Density): This probabilistic classifier uses either a Gaus-

sian or Kernel Density Estimation, and requires data independence within each class. Data independence means that the presence of one feature in the data is unrelated to the presence of any other feature. Such a characteristic lends itself well to small datasets since using additional features does not require an exponentially larger dataset. The specific types of Na¨ıve Bayes classifier tested were Gaussian distributions for estimation, and also a kernel density model since the data in this study does not follow a Gaussian distribution. A kernel density Na¨ıve Bayes classifier does not make assumptions regarding the distribution of the data to be classified (a Gaussian model assumes a Gaussian data distribution), and is suitable for continuous rather than discrete measurements. Based on the training data, the probability density of the timing features for each class is estimated using a kernel function. When new data is presented to the classifier, it is placed in the class whose estimated density function gives the highest value for the new data [150].

Decision Tree (DT): This classifier is often used to map decisions used to place data into one or more classes. Each node in the tree represents a feature in the data that can be used to determine to which class it belongs. The leaf nodes of the tree represent the classes. Decision trees are a suitable classifier for the data in this research because they are fast to classify new data, and have misclassification error rates that are com- parable to more complex classifiers. They also make no assumptions about the data’s distribution.

k-Nearest Neighbor (k-NN): This algorithm creates a feature space by plotting all training data on an n-dimensional graph as single points. When new data is classified, the data point is plotted on the same graph, and then assigned the majority class of its k nearest neighbors, where k is a parameter that can be adjusted by the experimenter. Smaller values of k allow classification when there is only a small amount of training data. Manhattan and Euclidean distance measures were tested with the k-NN pattern classifier to determine whether the distance measure used makes a difference to the accuracy of k-NN.

The following two classification algorithms were considered for use with this research, but were discounted because they did not meet one or more of the above requirements:

Support Vector Machine (SVM): This classifier is commonly used with two-class prob- lems that use supervised learning methods. The model represents the data as points in space that are divided by a hyperplane; one of the two classes is on each side of the hyperplane. New data is classified by plotting it in the same space and predicting its class based on which side of the hyperplane the point falls. While Support Vector

2.10. The Transparent Authentication Framework 41 Machines are well-suited to the type of data seen in this study, they are often slow to classify and require significant processor speed and memory. For these reasons, SVM was considered a poor choice for this work and was thus not tested.

Neural Network (NN): Artificial neural networks are based on the network of biological neurons that are present in the human brain. They consist of a series of artificial neurons or nodes that are interconnected in such a way that they can be used to model complex relationships between the network’s inputs and outputs. One of their uses is to find patterns in data. A basic neural network consists of at least three layers: the input, output and hidden layers. The nodes in each layer are connected to each other in that each node in each layer passes its output to each node in the next layer, as shown in Figure 2.5. The interconnections may be weighted, and the training (or learning) phase of a neural network updates the weights for each interconnection. Generally, neural networks have high accuracy but are slow to train and to classify. Furthermore, they may require large amounts of training data, depending on the application. For these reasons, they were considered unsuitable for this work and were not tested.

Inputs Outputs 0.9 0.2 0.25 0.75 0.25 Input Layer Hidden Layer Output Layer 0.2 0.1 0.3 0.8 0.4 0.2 0.1 0.6 0.4 0.1 0.3 0.1 0.1 0.3 0.6 0.7 0.9 0.4 0.8 0.1

Figure 2.5: Example of a neural network. The numbers on each interconnection are the weights for that connection. They are updated during training.

Pattern classifiers are used to make decisions about the class of new data based on the features and known class of previously provided data. They may be used for pattern matching in biometric implementations, but should be accurate, algorithmically simple and provide timely answers to new classification data.

2.10 The Transparent Authentication Framework

The current research into authentication, specifically that which takes place on a mobile device, has provided a basis for further work in the field. Specifically, the research examined

2.11. Terminology Used in this Dissertation 42

In document A framework for continuous, transparent authentication on mobile devices (Page 52-56)