Programming Behaviour Results The practical assessment consisted of three tasks:

Investigative Study Results

H 0.922 0.571 0.783 0.412 0.005 0.026 I achieved something

6.5 Programming Behaviour Results The practical assessment consisted of three tasks:

• Task 1 assessed debugging skills by the removal of syntactic and logical errors from a given program.

• Task 2 assessed program composition skills by the creation of a novel solution to a given problem; and

• Task 3 assessed modification and composition skills by the completion of a shell program only containing place holder program code.

Programming behaviour was determined by examining various metrics for each of the tasks, as well as the compile event pairs (Jadud 2004). A summary of the t-test and χ2

- test results are shown in Table 6.11 and Table 6.12 respectively. For the complete set of results, consult Appendix G. The program behaviour metrics examined were the:

• number of times programs were compiled; • number of times the help function was invoked; • total number of errors reported during compilation; • number of keypresses between compilations;

• average time between TT compile event pair compilations; • average time between TF compile event pair compilations; • average time between FT compile event pair compilations; and

• average time between FF compile event pair compilations.

Full Complement High to Medium Risk

Stratum Practical Assessment Task Metric Control Mean Treatment Mean p Control Mean Treatment Mean p Task 1 # Help 0.0 0.8 0.013 0.1 1.0 0.048 # Compiles 28.6 39.6 0.033 26.5 41.2 0.057 Task 3 # Help 0.1 0.5 0.010 0.1 0.7 0.018 ncontrol = 32, ntreatment = 47

#Help = number of times the help function was invoked #Compiles = number of times programs were compiled

Table 6.11: t-test Computed Values for Programming Metrics of the Practical Assessment Tasks (Summary of Table G.1)

Control Treatment Practical Assessment Task S TT TF FT FF TT TF FT FF χ2 p F 23% 21% 23% 33% 26% 19% 20% 34% 3.333 0.343 H 13% 22% 23% 42% 25% 19% 20% 36% 14.626 0.002 Task 1 L 31% 20% 22% 27% 29% 18% 21% 32% 2.134 0.545 F 32% 19% 18% 31% 32% 21% 21% 26% 2.646 0.450 H 23% 17% 15% 45% 30% 22% 22% 26% 21.147 0.000 Task 2 L 40% 22% 21% 17% 35% 19% 19% 27% 5.892 0.117 F 17% 26% 25% 32% 32% 21% 20% 28% 31.824 0.000 H 11% 24% 21% 43% 25% 22% 21% 32% 19.337 0.000 Task 3 L 23% 28% 29% 21% 46% 18% 17% 18% 33.012 0.000 S = Stratum, F = Full complement, H = High to Medium Risk Stratum, L = Low Risk Stratum ncontrol = 32, ntreatment = 47

Table 6.12: χ2

-test Computed Values for Compile Event Pairs of the Practical Assessment Tasks (Summary of Table G.2)

Full details of the t-test for independence at the 95% percentile can be found in Appendix G. The proportion compilation compile event pair frequencies gives an indication of the programming behaviour of subjects and is shown in Table 6.12. The full results of the application of the χ2-test at the 95% percentile to the proportions can be found in Table G.2 in Appendix G.

Full Complement

In the debugging and program modification tasks (Task 1 and 3 respectively), a significant difference was detected in the number of times that the help functionality was used. An average of 0.8 and 0.5 for the treatment group versus 0 for the control group was obtained indicating that subjects in the treatment group made use of the help functionality more often than the control group.

In the program modification task (Task 3), subjects in the treatment group on average compiled their programs 39 times, while those in the control group compiled their programs only 28 times. Subjects in the control group compiled every 129 seconds on average, while the treatment group compiled every 108 seconds on average. The difference in time between compiles was not significant.

The proportion of compile event pairs is not significantly different for Task 1 or Task 2. Task 3, however, showed a significant difference between the control and treatment group (Figure 6.2). T F 17% 32% 25% 26% Control T F 32% 28% 20% 21% Treatment

Figure 6.3: Compile Event Pair Proportions for Practical Assessment Task 3, Full Complement

Few of the control group’s compiles were between compiles in which no syntax errors were reported (17%), while almost double the treatment group’s compiles produced the same event pairs (32%). The proportion of compiles between a compile containing syntax errors and the following one that still contained syntax errors is 32% for the control group, versus 28% for the treatment group. For the control group, in more than half of the compiles (TF + FF = 58%), syntax errors were introduced or not removed, while for the treatment group, this was less than half (TF + FF = 49%).

The null-hypotheses H0.3.2, H0.3.3 and H0.3.5 were rejected and alternate hypotheses H1.3.2, H1.3.3 and H1.3.5 were accepted. In other words, the number of uses of the help

feature, the number of compiles and the proportion of compile event pairs is dependent on the PDE used based on the programming task performed.

High to Medium Risk Stratum

Similar to the full complement, the high to medium risk stratum had significant differences in the number of uses of help features for Task 1 and Task 3 (0.1 and 0.1 for the control group for Task 1 and Task 3 respectively versus 1.0 and 0.7 for the treatment group for Task 1 and Task 3 respectively). The number of compiles for Task 3 was also significantly different (26.5% and 41.2% for the control group and treatment group respectively).

Therefore, null-hypotheses H0.3.2 and H0.3.5 can be rejected and alternate hypotheses H1.3.2 and H1.3.5 can be accepted for the high to medium risk stratum. In other words,

the number of compilations and number of uses of help features are dependent on the PDE used for the low ability subjects.

There were no differences for any other programming metrics, except for the proportion of compile event pairs, which were significantly different for each of the tasks (Figure 6.3). In each of the tasks, the proportion of compiles between compiles that did not introduce syntax errors was significant higher (TT):

• Task 1: 13% versus 25% for the control and treatment group respectively (Figure 6.3(a));

• Task 2: 23% versus 30% for the control and treatment group respectively (Figure 6.3(b)); and

• Task 3: 11% versus 25% the control and treatment group respectively (Figure 6.3(c)).

In each of the tasks, compilations containing syntax errors followed by those still containing syntax errors (FF) were more for the control group than the treatment group. Compiles in which syntax errors were introduced or not removed (TF + FF) were:

• Task 1: 64% versus 55% for control and treatment group respectively (Figure 6.3(a));

• Task 2: 62% versus 48% for control and treatment group respectively (Figure 6.3(b)); and

• Task 3: 67% versus 54% for control and treatment group respectively (Figure 6.3(c)).

(b) Task 2 (c) Task 3 (a) Task 1 T F 23% 45% 15% 17% Control T F 30% 26% 22% 22% Treatment T F 11% 43% 21% 24% Control T F 25% 32% 21% 22% Treatment T F 13% 42% 23% 22% Control T F 25% 36% 20% 19% Treatment

Figure 6.4: Compile Event Pair Distributions for Practical Assessment Tasks (High to Medium Risk Stratum)

In other words, for every task, the control group spent at least 9% more of their compiles introducing syntax errors or not removing syntax errors compared to the treatment group. Therefore null-hypothesis H0.3.3 can be rejected and alternate

hypothesis H1.3.3 can be accepted. In other words, the proportion compile event pairs

Low Risk Stratum

The low risk stratum had the least number of significant differences between the control and treatment groups. The only detected significant difference was in the program modification task, namely Task 3. In Task 3, there was a significant difference in the proportion of compile event pair occurances (Figure 6.4).

Subjects in the treatment group provided double the amount of compile event pairs that resulted from a syntax error free compile followed by another (TT). The treatment group spent 46% of their compiles producing TT compile event pairs, compared to 23% of the control group. 49% of compiles by the control group either introduced or did not remove syntax errors compared to 36% of the treatment group (FF + TF). 28% of compiles by the control group and 18% of compiles by the treatment group (TF) introduced new syntax errors.

T F 23% 21% 29% 28% Control T F 46% 18% 17% 18% Treatment

Figure 6.5: Compile Event Pair Distributions for Practical Assessment Task 3 (Low Risk Stratum)

Therefore, null-hypothesis H0.3.3 can be rejected and H1.3.3 accepted instead for high

ability subjects. In other words, for the high ability subjects, the proportion compile event pairs is dependent on the PDE used for Task 3.

In document The evaluation of a pedagogical-program development environment for Novice programmers : a comparative study (Page 173-178)