Language-Agnostic Test - The development and evaluation of an electronic serious game aimed at

As was discussed in Chapter 3, a programming language-agnostic test was de- veloped to allow for comparison between the two test groups for which there was access to class marks for one but not the other. This was done so that the classroom mark comparison could underscore the results of the language-agnostic test and subsequently legitimise measurement of the groups without programming course marks, for who only the results of the language-agnostic test would be available. Data was received in the form of an anonymised set of test scores with each entry representing the test scores of an individual respondent. While an initial 68 students completed the first test at the start of the semester, only 22 of those 68 also completed the second test. This is too low a number of respondents from which meaningful insight can be drawn, and was further compounded by the group being split into two - those who played the game and those who did not. These groups were sized at 45 and 28 respondants at the start of the semester and 18 and 4

46 CHAPTER 5. MEASUREMENT AND TEST RESULTS

Figure 5.1: Mean comparison considering group and time di↵erences. respondants respectively, at the end of the semster.

The initial unequal group sizes were acceptable because of the longitudinal nature of the study and the expectation that the number of students who volunteer to write the tests and to play the game would certainly decrease as the semester progressed. Thus, although this would hinder statistical comparison because the sizes of the groups could di↵er drastically, it increased the likelihood of a larger number of final respondents.

Table 5.1: Fixed E↵ect Test for language-agnostic test results.

E↵ect Num. DF Den. Df F p

A 1 20 0.389594 0.539568

B 1 20 9611227 0.005642

A+B 1 20 1.275571 0.272089

** A - Group E↵ect — B - Time — A+B - Group and Time e↵ect

Figure 5.1 shows the comparison between the test groups, split by the sampling

5.2. LANGUAGE-AGNOSTIC TEST 47

Table 5.2: The p-value of any group being distinct when compared to each other group. P+T1 P+T2 NP+T1 NP+T2 P+T1 0.031621 0.264972 0.241885 P+T2 0.031621 0.036357 0.913789 NP+T1 0.264972 0.036357 0.02989 NP+T2 0.241885 0.913789 0.02989

** P+T1 - Played game and did Test one — P+T2 - Played game and did Test two — NP+T1 - Control and did Test one — NP+T2 - Control and did Test two

Table 5.3: Least Significant Di↵erence (LSD) test between groups.

1st Mean 2nd Mean Mean Di↵. Std. Error p-value

P+T1 P+T2 -3.6111 1.562608 0.031621 P+T1 NP+T1 3.77778 3.293673 0.264927 P+T1 NP+T2 -3.97222 3.293673 0.241885 P+T2 NP+T1 7.38889 3.293673 0.036357 P+T2 NP+T2 -0.36111 3.293673 0.913789 NP+T1 NP+T2 -7.75000 3.314792 0.029890

** P+T1 - Played game and did Test one — P+T2 - Played game and did Test two — NP+T1 - Control and did Test one — NP+T2 - Control and did Test two process as well as over time, referencing the two separate tests written by each group. The characters at the top of the four bar graphs denote whether any two groups can be considered statisically di↵erent significant at a level of 95%. If a character is shared between any two bars, the groups are not considered di↵erent at a p-value of 0.05.

From this it is determined that both groups scored significantly better in their second test than in their first test. However, the two groups did not vary significantly from each other for either of the tests.

As can be seen in Table 5.1, the only e↵ect to carry a significant p-value, that is, one less than 0.05, is the time e↵ect. The probability of the four groups being statistically similar is 27.21%, much to high too be reliable for reporting purposes. Table 5.2 shows the p-values of each group when compared to every other group. The number in red shows the combination of groups that can be considered statisti-

48 CHAPTER 5. MEASUREMENT AND TEST RESULTS

Table 5.4: Descriptive statistics for language-agnostic test.

N Mean Std. Dev. Std. Error -0.95% +0.95%

Total 44 16.27273 6.24449 0.941392 14.3742 18.1713 P 36 16.58333 6.48680 1.081133 14.3885 18.7782 NP 8 14.87500 5.13914 1.816959 10.5786 19.1714 T1 22 14.09091 4.83941 1.031766 11.9452 16.2366 T2 22 18.4555 6.81544 1.453057 15.4327 21.4763 P+T1 18 14.77778 4.82098 1.139315 12.3804 17.1752 P+T2 18 18.38889 7.51578 1.771486 14.6514 22.4763 NP+T1 4 11.00000 4.08248 2.041241 4.5039 17.4961 NP+T2 4 19.75000 2.21736 1.108678 15.2217 22.2783

** P - Played game — NP - Control — T1 - Did test one — T2 - Did test two — P+T1 - Played game and did Test one — P+T2 - Played game and did Test two — NP+T1 - Control and did Test one — NP+T2 - Control and did Test two cally distinct from one another. Each combination will be displayed twice because of the nature of the table. Game group test one and two di↵er significantly from one another. Similarly, the results of control group test one and two are distinct from one another. Lastly, the results of the first test of the control group and the second test of the game group were distinctly di↵erent. This last comparison, however, does not lead to any logical insight into the data, as neither the time and type variable of the two groups match.

In document The development and evaluation of an electronic serious game aimed at the education of core programming skills (Page 61-64)