Study Design - A framework for continuous, transparent authentication on mobile devices

best deployed in implementing the Transparent Authentication Framework, within the limits of a particular device and operating system.

Answers to both of these questions required access to a corpus of keystroke dynamics mea- surements from a soft keyboard on a mobile device. Such a corpus does not yet exist for research use; thus, another goal of this study was to create a corpus of keystroke dynamics information from typists using soft keyboards. The results of this study have been used to tailor the feature vector and pattern classifier for the keystroke dynamics biometric used in the Transparent Authentication Framework.

4.2 Study Design

The study has two parts: the first part gathered the required keystroke corpus; to this end an iPhone application was created that allows the user to type text into a textbox using the standard iOS soft keyboard. The second part of the study presented the gathered patterns to five pattern classifiers in order to determine whether there was distinctive information in the keystroke patterns of mobile device typists. The latter part of the study provided information used to answer the two research questions that drive this research.

The data gathering part of this study used a between–groups experimental design [152, p. 74] in which each participant was in turn considered the owner of their device and the remaining participants made up the group rest-of-world. Each of the participants used either an iPhone or an iPod Touch during the experiment. Both iPod Touch and iPhones were included in the study because they use the same type of keyboard and operating system, and are functionally equivalent for the purposes of this study. All devices had a single user, and used the standard soft keyboard provided by Apple. The data gathering exercise resulted in eight datasets, each of which contained an owner and rest–of–world class. The gathered datasets were then presented to the pattern classifiers chosen as part of the second part of the study. The independent variable for the study was the classifier used on each dataset, which depended in turn on which participant was considered the owner. The dependent variables were the classifier error rates, which depended on the participants’ typing metrics.

4.2.1 Participants

The first part of the keystroke pattern gathering project involved eight participants. The five female and three male participants ranged in age from mid-20s to late 50s, and also ranged in experience in typing on the iPhone or iPod Touch from beginner (had never used one before this experiment) to expert (repeated daily use). All participants were native English

4.2. Study Design 65 speakers, and were instructed to type in English during the study. The participants were volunteers who provided their own mobile devices and who were not paid in any way for their participation.

4.2.2 Apparatus and Materials

There were four iPhone 3GSes and four iPod Touch 3rd Generations with a minimum of iOS 3.0 used in this study. The devices used the standard iPhone and iPod keyboards without modification of any kind. The pattern classification part of the study used MatLab release R2012b and the standard pattern classification algorithms found in the Statistics Toolbox add-on. The pattern classification exercise was performed offline (i.e., not on the mobile device itself) since this was a feasibility study to select an algorithm and determine biometric fitness, rather than an assessment of whether these algorithms are viable on the device itself.

4.2.3 Procedure

Part 1: Data Gathering

The metrics used for this work are key hold time and inter-key latency, as shown in Figure 4.1. These measures represent the dependent variables for this part of the study. Key hold time is used with keystrokes only; it is a measure of how long, in seconds, the owner tapped a particular key. Inter-key latency is a measure of the time that passes between the release of the first key and the tap of the second key. The key hold time for the keystroke key1 is

calculated by subtracting the time the key is tapped from the time the key is released, as shown in Equation 4.1. The key hold time for the keystroke key2 is calculated similarly.

Inter-key latency is used with bigrams, which are sequences of two characters. This metric measures how much time passes between releasing the first key in the bigram and tapping the second key, and is calculated as shown in Equation 4.2.

keyHoldT imekey1 = key1release− key1tap (4.1)

interKeyLatencykey1key2= key2tap− key1release (4.2)

After each participant signed the ethics Information and Consent sheets, the custom-designed application was loaded onto their device and the participant was instructed to use the application as many times per day as they wished to enter keystroke data. Participants were asked

4.2. Study Design 66

KeyTap KeyRelease KeyTap KeyRelease

Key2 Hold Time Inter-key Latency Time Key₁ Key₂ Key1 Hold Time Bigram

Figure 4.1: Keystroke metrics. Key hold time applies to keystrokes and inter-key latency applies to bigrams.

to enter at least 100 characters each time they used the application in order to ensure that sufficient data was gathered to be distinctive. The 100 character minimum was not rigidly enforced and participants could type more or fewer characters if they wished. Screenshots for the two versions of the KeystrokeData application can be seen in Figure 4.2a. The differ- ence between the two application versions was that the second version allowed participants to send their data to the experimenter via the application. In the first version, the data had to be gathered manually, which reduced the possible pool of participants to those physically near the experimenter. Allowing for remote data gathering solved this issue.

The user interface consists of a single text box, as seen in Figure 4.2a, and a counter that represents the number of keystrokes left to type to reach 100 characters. To enter text, the user tapped on the text box and the keyboard appeared. When the participant finished typing, they closed the application by pressing the Home key on the device. After three weeks the data from each device was gathered manually and the application was removed from each participant’s device.

The second iteration of the application, as seen in Figure 4.2b, added the ability to copy typed text to an email or text message. The buttons used to implement this functionality are labeled “Copy to Email” and “Copy to SMS”, respectively. The third button, labeled “Send Data” allowed the participant to email the SQLite data store containing the keystroke information to the experimenter.

Other than the ease-of-use additions mentioned above, the inner workings of the second iteration of this study were identical to the first iteration. After approximately three weeks, an email was sent to the two study participants asking them to email the SQLite data store to the experimenter using the button on the application. The participants were also given instructions on how to remove the application from their device. All data stores were sent successfully.

4.2. Study Design 67

(a) First Iteration (b) Second Iteration

Figure 4.2: Screenshots of the first and second KeystrokeData application.

soft keyboard, as seen in Figure 4.3. The predictive text function was disabled as part of this study because it skewed the user’s normal typing pattern. When predictive text changes the series of characters actually typed by the user, the resulting changed string is replaced as one chunk of text. The result is that the application gathers the timing information for the original string of characters, then gets another string of characters from the predictive text that all have the same key up and key down times, which means that the resulting key hold time is zero. The net effect of predictive text is that the timing information needed for the keystroke pattern is effectively lost. Future iterations of the keystroke gathering application must manage this problem since asking users to disable predictive text is not likely to be feasible.

Another notable change in regular typing functionality seen in the KeystrokeData application is that automatic capitalization was disabled. In normal typing patterns, in order to type an uppercase letter, the user must first tap the Shift key and then the letter they wish to type. This means that the inter-key latency for a bigram containing a uppercase letter would be different than the inter-key latency for the bigram containing the same lowercase letters. For instance, if the user typed the bigram em, we would expect that the inter-key latency would be shorter for this bigram than for the bigram eM since the user would have to press the

4.2. Study Design 68 Shift key before the M key in the second case. While it is desirable to allow device owners to enable predictive text and auto-capitalization if they wish, the limitations imposed by disabling them are acceptable for a proof-of-concept such as this experiment.

(a) Letters (b) Numbers (c) Symbols

Figure 4.3: The characters available on a standard iOS soft keyboard. The keys outlined in red are not considered keystrokes by the keystroke gathering applications. The English keyboard is shown since the participants were instructed to type in English.

The shift key is not considered a keystroke in this study. The return, space, and backspace keys were all considered keystrokes, and were also allowable parts of a bigram. The space character was included because it is an allowable and frequent character in English. Since the KeystrokeData application did not allow the use of predictive text, participants often cor- rected their typing errors by using the backspace key. This has the possibility of containing valuable information on the user’s typing habits, and thus was considered a particularly rich source of distinctive information about the participant, so it was included as a keystroke. The return character was included as a keystroke for completeness, since it is a valid character on a soft keyboard.

Part 2: Pattern Classification

Once the data had been gathered from the study participants, it was presented to five pattern classifiers in order to determine firstly whether there was enough distinguishing information in the data to justify using keystroke dynamics in the Transparent Authentication Framework, and secondly whether any of the classifiers could be considered optimal for the type of data gathered. The five pattern classification algorithms used in this study were described in detail in Chapter 2. In summary, the five classifiers are Na¨ıve Bayes with kernel density and Gaussian estimation techniques (NB (KD) and NB (Gau), respectively), Decision Tree (DT), and k–Nearest Neighbor with both Manhattan and Euclidean distance measures (k-NN (Man) and k-NN (Eucl), respectively). Each classifier was trained using supervised learning methods, which means that the classifier was trained on a combination of patterns from both the owner and rest-of-world groups and the known classes of each sample were provided during training.

4.3. Data Acquisition 69

In document A framework for continuous, transparent authentication on mobile devices (Page 78-83)