Experiment 4 - Security and usability in a hybrid property based graphical authentication syste

Purpose: Comparative memorability evaluation for colour, mixed, digit, and word based models Parameters: Login success rate

This section describes an experiment to evaluate and compare the memorability (retention rate) of four implementations of the property based authentication model. Memorability experiments are

normally conducted as part of the usability analysis of a system as memorability is itself a vital part of the usability of a system. However, since memorability evaluations are normally conducted as longitudinal trials (continuous login visitations over a long period of time), it is extremely difficult, if not impossible to conduct memorability evaluations for all ten prototypes of the property based system implementation for a number of reasons. First and foremost, unlike in the usability tests conducted (experiments 1, 2 and 3), where participants are asked to simply create a password and then use the password to login, hence they may be permitted to assess more than one system, a participant conducting a memorability test is only allowed to use a single model for the duration of the experiment. The researcher had to ensure that this is done such that the results are not affected by multiple password interference and the tendency for increased mental pressure on the participants. Secondly, longitudinal trials on memorability are normally conducted over several weeks, or even months, and with other equally significant components of the system to evaluate, devoting too much time on memorability trials may not be worthwhile. Thirdly, many participants consider longitudinal trial extremely boring due to their repetitive nature and may not endure till the end of the experiment, while experimental results need a considerable number to be acceptable. Hence arrangements are made to mitigate the effects of the tendency that participants may decide to drop out of the experiment prematurely [73].

5.7.1 Main Hypotheses

The experiment hypothsises that across the various login sessions, the memorability of at least one of the experimental conditions (models) will be significantly higher than the memorability of the other test conditions.

Ho: That statistically significant variation in login success rate will not be recorded for any of the models under investigation (test conditions)

H1: That statistically significant mean variation in login success rate will be recorded for at least one of the system models under investigation in login session 1.

H2: That statistically significant mean variation in login success rate will be recorded for at least one of the system models under investigation in login session 2.

H3: That statistically significant mean variation in login success rate will be recorded for at least one of the system models under investigation in login session 3.

H4: That statistically significant mean variation in login success rate will be recorded for at least one of the system models under investigation in login session 4.

5.7.2 Research Participants

Twenty five participants are recruited for each of the models under investigation. Participants are drawn from the undergraduate student population of a university. At the end of the experiment, the records of only twenty participants are randomly selected for evaluation, the remaining are discarded as the increased number only helps to safeguard against the premature withdrawals of participants from the experiment. All participants are between the ages of 20 to 35 years of age and each has at least one email and one bank account, hence, each participant has at least one online password and one numerical PIN and thus had an experience in the use of passwords. All participants claimed to have used computers and the internet for between one and six years and were thus all experienced in the use of computers. The use of a student population ensured little disparity in computing expertise and mental capabilities among members of the participant population.

5.7.3 Experimental Design

Here, a between users design was adopted for the experiment in which twenty five participants were recruited and each was allocated one of the four test conditions, which were:

1. A fill (colour) based implementation of the property based scheme. 2. A mixed implementation of the property based scheme.

3. A digit based implementation of property based model. 4. A word based implementation of the property based scheme.

The operational procedures and the interface layout of each of the prototypes is identical with the only difference being in the factor with which the authentication is performed. The tasks to be performed by each of the participants on each of the prototypes was also the same. No questionnaires were issued to participants in the conduct of this experiment.

5.7.4 Experimental Variables

The independent variables are the four test conditions (prototype property based models) while the dependent variables are the login success rates logged on to the system during each of four authentication sessions. In this experiment, as in all previous experiments, participants are allowed only to use 2 authentication steps and a fixed grid size of 9.

5.7.5 Apparatus and Materials

• Four prototypes of the property based system, identical in every aspect of the design and the tasks/procedures the participants are expected to perform.

• An information sheet that provides the participants with information about the experiment and what they are expected to do.

5.7.6 Experimental Procedure

The following procedure was followed in the conduct of the experiment:

In the first authentication session, a participant is expected to create a graphical password using one of the four test conditions (software prototypes). The participant then logs onto the system using the password details chosen in the password creation part. Only data related to the success of user authentication in the different authentication sessions is needed by the researcher. The participant is then asked to return at a later date for the next authentication session. For this experiment, four login sessions were needed for each participant. The first login session is done on the day the password is created. The second login session is done two days after the first login session. The third login session is done a week after the second login session and then the third login session is done two weeks after the third login session. The time is increased gradually to be able effectively measure the variation.

5.7.7 Experimental Results for Experiment 4

The login success rates for the four models under investigation are presented in appendix 4. The results have summarised using excel into table 5.7. The contents of the table have been used to generate the histogram in figure 5.4. From the figure it can be clearly seen that in the first login session participants using the colour model and the digit based model recorded 100% login success. The users of the word based model and the mixed model recorded 95% and 80% success rates respectively. In the second login session, participants using the colour based model and those of the digit based model again recorded 100% success, while users of the word based and mixed models recorded 85% and 70% respectively, this is an overall percentage decline of 10% and 5% from the previous results for both models. In the third authentication session, the digit based model still maintains the 100% lead, while the colour model declines to an overall drop of 5%, the word based model maintains its 85% success rate from the previous session, while the mixed models drops a further overall drop of 5% to reach an overall success rate of 65%. In the fourth authentication session, the digit model again maintains a success rate of 100%, the colour model and the mixed model both lose another Overall 5%, while the success rate for the word based model is maintained at 85% from its drop in login session 2. From this chart it could be seen that

only participants in the digit based model could remember their passwords throughout the four sessions of the experiment. The mixed model had the lowest success rate at the beginning of the experiment and had the lowest at the end of the experiment.

1st Login 2nd Login 3rd Login 4th Login

colour 100 100 95 90

mixed 80 70 65 60

digit 100 100 100 100

word 95 85 85 85

Table 5.7: Data table for login success rates of all models

Fig. 5.6: Histogram of comparative success rates of all models

To evaluate the significance in variation of login success rate for all models, a chi-square analysis was conducted. The results for the chi-square test is presented in appendix 4B. From the chi-square test, the case processing summary for all login entries indicates that all values used for the analysis were valid (100%). In comparing the models to the 1st login success rate, the Pearson chi-square value is 9.173 at 3 degrees of freedom, p = 0.027. Phi and Cramer’s V indicates a significance value of p=0.027. This shows significant variation between the models in the 1st login session. For

0 20 40 60 80 100 120

1st Login 2nd Login 3rd Login 4th Login

A comparison of login success rates

the second login session, the chi-square value of 9.8 at 3 degrees of freedom, p=0.022, also indicating significant variation between the models in login success rate. For the 3rd login session, the chi-square value was 10.196 at 3 degrees of freedom, p=0.017, revealing significant success rate variation between the models. In the 4th_{login session, the Pearson’s chi value was 14.902 at} 3 degrees of freedom, with p = 0.002.

5.7.8 Discussion of Results for Experiment 4

The results presented in the chi-square tests to test if significant variation between the various software models and the login success rates at each login session to indicate that significant variations do exist between the models in all four login sessions. The result also indicate a gradual increase in the significance between the groups from the 1st login session to the 4th login sessions (from p = 0.27 to p = 0.002). In the light of these results, it is fair to acknowledge that the null hypothesis is thus invalid. Hence significant variations do occur across all four login sessions between the various authentication system prototypes. Hence the null hypothesis that statistically significant variation in login success rate will not be recorded for any of the models under investigation (test conditions) has to be rejected while the other four hypotheses that state that statistically significant mean variation will occur in at least one of the conditions in at least one of the login sessions is accepted.

In document Security and usability in a hybrid property based graphical authentication system (Page 135-140)