Analysis of the questionnaire data - - Post Implementation (Analysis of Data)

Malware Phishing

Chapter 8 - Post Implementation (Analysis of Data)

8.2.1 Analysis of the questionnaire data

The statistical methods used to analyse the questionnaire data are described in this section. The aim of the analysis is to determine whether the group’s information security awareness increased during the playing of the game. This was achieved by calculating the average group marks after each questionnaire.

The number of correct answers was determined for each survey (the pre-assessment, the first post-assessment and the second post-assessment). The average number of correct answers for each questionnaire was calculated: the results are depicted in Figure 8-2. The first questionnaire (pre-assessment) was used to create a baseline of the initial information security awareness levels to measure the other questionnaires’ results against. This resulted in demonstrating the impact of the training session and the online game. The

second questionnaire (post-assessment 1) was done after the training session while the last questionnaire (post-assessment 2) was administered after completing the online game. Each questionnaire consisted of five questions from the seven categories resulting in 35 questions per questionnaire.

The pre-assessment indicated that the participants scored an average of 21.03 correct answers out of the possible 35 correct answers. In the second questionnaire (post-assessment 1), the participants’ scores averaged 21. The last questionnaire administered after the online game resulted in an average score of 19.7 out of a possible 35 marks.

Figure 8-2: Questionnaire Results (Average) (Source: Own)

The results from the questionnaires show a decrease in the information security awareness of the participants. The information security awareness knowledge remains relatively consistent between the baseline (21.03) and after the completion of the training session (21). The decrease of 0.3 is negligible. Surprisingly the result of the final questionnaire was 19.7. An increase in the average points was expected, not a further decrease of 1.3 points after the completion of the online game. The reason for this will be explored in the next section.

Outliers could have affected the results and need to be removed from the current results to observe if the results are not skewed by them. The outliers can be trimmed from the data before calculating the mean. As the questionnaires counted out of 35, the maximum a participant could obtain is 35 marks. The graphical distribution of the marks obtained by all

19 19.5 20 20.5 21 21.5

Pre Assessment Post Assessment 1 Post Assessment 2 Average Number of Correct Answers (Out of 35)

Average

participants for each assessment is depicted in Figure 8-3 and Figure 8-4. A box plot graph (Figure 8-3) and a plot graph (Figure 8-4) were generated from the analysed data.

The data represents the marks received by all the participants of the pre-assessment (Pre), post-assessment 1 (P1) and post-assessment 2 (P2).

Figure 8-3: Assessment Box Plots (Source: Own)

Box plots are used to display differences between the assessments. Box plots have various advantages which include displaying the full range of the variances (from maximum to minimum values), the median (indicated by the thick black line), and outliers (McGill, Tukey & Larsen 1978). The different assessments’ box plots are depicted in Figure 8-3 and the following observations are made:

Pre-Assessment (Pre) – Most of the marks obtained are between 20 and 24 out of 35.

These marks form a baseline which is used to compare the results against the other assessments.

Post Assessment 1 (P1) – The assessment after the training session shows a small variance between the minimum and maximum value. Also, a small decrease in the median is noticed compared to the median of the pre-assessment (Pre).

Post Assessment 2 (P2) – A significant variance between the minimum and maximum values is noticed, as well as a further decrease in the median.

Next the data was also graphically displayed with the use of a plot graph. The use of the plot graph not only makes the outliers clearly visible but also depicts the distribution of marks received by each participant (Figure 8-4). The horizontal axis indicates each mark obtained by each participant for the different assessments. The item values of S001 through to S031 depicted on the horizontal axis denotes each individual participant. In other words, each respondent’s marks for each assessment are depicted in Figure 8-4.

Also the distributions of the marks are visible and the grouping for each assessment would present the author an opportunity to conduct preliminary analysis on the dataset.

Figure 8-4: Distribution of Assessment Marks (Source: Own)

Subsequently the identification of outliers is possible. The distribution of marks per assessment provides the ability to comment on the effect of the assessments on the participants. The first observation notices that marks for the pre-assessment (Pre) and the first post-assessment (P1) were less staggered than the second post-assessment (P2). In other words, the marks for the pre-assessment (Pre) and the first post-assessment (P1) are more closely grouped together than the marks for the second post-assessment (P2).

Another observation is that the marks for pre-assessment (Pre) and the first post-assessment (P1) are mostly grouped in the 20 to 25 mark region, while the marks for the second post-assessment (P2) tend to appear in the 15 to 20 mark region. This shows that

the participants’ marks decreased during the second post-assessment (P2), indicating the participants were negatively affected between the first post-assessment (P1) and the second post-assessment (P2).

Microsoft Excel was used for the analysis of the data. The MEDIAN and TRIMMEAN functions were used. The MEDIAN function is used to calculate the value that is in the middle of a set of numbers; in other words, half the numbers have values above the median, and half have lower values (Microsoft 2014a). The TRIMMEAN function calculates the mean taken by excluding a percentage of data points from the top and bottom tails of a data set (Microsoft 2014c). The most extreme 20 percent of scores were removed, the highest 10 percent of scores and the lowest 10 percent of scores, before calculating the mean. Figure 8-5 depicts the comparison between the mean and the trimmed mean for each of the surveys. The graph shows a more substantial decrease in the average points between the pre-assessment and the first post-assessment when the outliers are removed. Therefore the outliers have an impact of the calculation of the mark for the pre-assessment. The trimmed mean calculations between the pre-assessment and the first post-assessment highlights the training session had little effect on the learning of the participants. However, both results between the first assessment and second post-assessment for the mean and trimmed mean have the same shape, which also indicates a decrease in marks after completion of the online game. An increase in marks between the first post-assessment and second post-assessment were expected subsequently supporting the hypothesis that the game has a positive learning effect. Conversely the impact of the gaming session is found to have had a negative impact on learning. Further analysis of the gaming component is required to determine the causality of the decrease in marks.

Figure 8-5: Line Graph Comparing Results (Mean and TrimMean) (Source: Own)

The mean is useful to summarise a group of numbers but is sensitive to extreme values created by outliers. Next we calculate the median as it calculates the value that is in the middle of an ordered set of numbers and is seen as a more robust value within a dataset which is not affected by outliers. The calculated median values for all the surveys are depicted in Figure 8-6. The values for the pre-assessment, the first post-assessment and the last assessment are respectively 22, 21 and 20. These results also indicate a decline after the completion of each questionnaire.

Figure 8-6: Line Graph Comparing Results (Mean and Median) (Source: Own)

The variations in each questionnaire’s results are calculated next. Although the previous analyses have shown a decrease in information security awareness levels, after each

Number of Correct Answers (Out of 35) Mean

TrimMean Average Number of Correct Answers (Out of 35)

Average Median

questionnaire the results of the variation analysis could indicate the impact of the training session and participating in the online game. The bigger the difference between the mean values, the higher impact an activity had on the participants. The standard deviation of the population is depicted in Figure 8-8. This was calculated using the STDEVP function in Excel (the formula is depicted in

Figure 8-7). The STDEVP function measures how far values are spread from the mean (average value) (Microsoft 2014b). The “n” denotes the number of participants, the “X_i” represents the individual values while “X ” represents the mean.

Figure 8-7: Formula for STDEVP (Microsoft 2014b)

Therefore the line graph shows that the biggest deviation between the questionnaire results occurs between the second and third questionnaires, which are respectively the first and second post assessment. The activity that occurred between these two questionnaires is the online game which the participants played.

Figure 8-8: Line Graph Population Standard Deviation (Source: Own)

The variation on the population (VARP) function in Excel was used to calculate the variance based on the entire population (Microsoft 2014d). The formula for the VARP function is given in Figure 8-9. This was used to validate and support the results from the population standard deviation. The results from determining the variation on the population

In document A study regarding the effectiveness of game play as part of an information security awareness program for novices (Page 168-174)