Post-test results (participant learning) - Designing and Developing the Learning Materials

CHAPTER 4: Designing and Developing the Learning Materials

4.7 Materials

5.2.3 Post-test results (participant learning)

The following sections present the results from the post-tests conducted in Experiment 1. They show the participants’ subjective ratings to the post-test questionnaire and the level of learning of participants after they had studied the learning materials. Initially, descriptive statistics are presented. This is followed by Analysis of Variance (ANOVA) tests with post-hoc Scheffe comparisons being used for a closer analysis of differences between groups.

5.2.3.1 Test 1 – Post-test questionnaire

The initial questions in the post-test questionnaire were designed to ask participants about their subjective user rating on how the web-based applications compared to learning from a traditional textbook. These questions were answered on a 7-point Likert scale with 1 = strongly disagree and 7 = strongly agree. The results presented below in Table 5.7 are the averaged results from the questions asked in the post-test questionnaire based on the two constructs from the Technology Acceptance Model (TAM) (Davis, 1989). Although, these constructs have been previously evaluated on over 100 studies, Cronbach’s Alpha (α) was still conducted to ensure that the two sets of six questions were still asking the same construct.

For the construct of perceived usefulness, an α result of 0.8708 was recorded (the alpha ranged with an item deleted from 0.8453(U3) to 0.8543(U6) – all of these are close to one another and statistically high). This indicates that all of the questions are asking the same overall construct. According to George and Mallery (2003) this result is considered to be ‘Good’.

result indicated that all of the questions are asking the same overall construct. However, according to George and Mallery (2003) this is a ‘Poor’ result, however there is no lower limit to the coefficient. The question that some participants rated poorly was “EOU4: The human information process application that I have just viewed is flexible to interact with”. The reason that the result for this question was lower than other ease-of-use results was that the application was designed to require the user to follow the 12 procedural steps in order, thus reducing the interactivity/flexibility of the application so the focus was only on learning. This result was therefore not unexpected.

Table 5.7 shows the averages of the two constructs of perceived usefulness and perceived ease-of-use. In a study conducted by Nielsen and Levy (1994), a recommended benchmark for the mean of a good quality system is 5.6. Based on this benchmark, both ‘Integrated layering with previous text displayed’ used by Group 2 and ‘Integrated layering with the current step highlighted’ used by Group 3 were identified by participants as above the benchmark for both perceived constructs.

Table 5.7: Experiment 1 participant results perceived usefulness and ease-of-use

Group Mean S.D. Perceived Usefulness G1 5.58 0.69 G2 5.63 0.61 G3 5.77 0.45 Perceived Ease-of-Use G1 5.67 0.65 G2 6.03 0.60 G3 5.88 0.30

The results from the ANOVA for Perceived usefulness of F(2,27) = 0.258, p=0.775 and for Perceived ease-of-use of F(2,27) = 1.156, p=0.330 indicate that there is no statistically significant difference between the three applications being evaluated in terms of perceived usefulness or ease-of-use. This indicates that the three different groups all had similar opinions about the usability of the three applications.

5.2.3.2 Test 2 – Learning quiz

The second phase in the post-test evaluation was designed to test participants on the information that they learnt from their instructional materials.

Table 5.8: Experiment 1 participant results post-test quiz

Group Mean S.D. Construction (recall) G1 82.66 7.16 G2 85.33 15.33 G3 89.33 7.16 Automation (transfer) G1 57.00 13.98 G2 79.00 13.49 G3 56.00 15.77 General Total G1 68.00 9.01 G2 81.71 12.35 G3 70.28 11.04

Table 5.8 shows the averages achieved by the different groups as percentages for the questions asked in the Learning quiz.

5.2.3.3 Analysis of Variance

In order to identify which of the three ‘layered integrated instructional design applications’ led to increased participant learning outcomes, analysis of the results of the learning quiz was conducted comparing the three groups. An ANOVA analysis was carried out on the performance of the students. Performance of students was recorded and the mean scores between the different groups of participants was analysed. After an ANOVA was conducted post hoc comparisons (i.e. the Scheffe post-hoc comparison) were conducted in order to ascertain where any differences between groups existed. The Scheffe post-hoc comparison is known as a very conservative test and that was suitable for the purposes of the study (Rockloff, 2007). The following are the results of the ANOVA analysis and Scheffe post-hoc comparison tests of the means of the correct answers used in the experiments.

For Construction (recall) F(2, 27) = 1.000, p = 0.381 For Automation (transfer) F(2, 27) = 8.090, p = 0.002 For the total Learning quiz F(2, 27) = 4.550, p = 0.020

From the analysis, the ANOVA of F(2, 27) = 1.00 for construction (recall) indicated a p value of 0.381, which is higher than α value of 0.05. This means there was no significant difference between and among the mean scores of the participants in the category of construction (recall) questions. Due to a lack of significant difference among the means, there was no need to conduct a post-hoc test to identify where the differences were. This means that all three groups recorded a similar level of learning for construction (recall) questions. With participants receiving marks above 80% this could indicate a ceiling effect. As indicated in Table 5.8, all three groups tended to perform quite well on the recall items. Thus, there was some evidence of asymptotic effects for these tests. Of more interest to the current study was performance on transfer tasks.

Data between the three groups were subjected to a one-way ANOVA. This produced a significant ANOVA F(2, 27) = 8.090 with a p value of 0.002, which is less than α value of 0.05, implying that there was at least one significant difference among the three groups. Scheffe post-hoc test indicated that there was a significant difference between the mean score of education students belonging to G1 and G2 as well as G2 and G3. Scheffe post-hoc comparison did not indicate the existence of a difference between the mean score of students of G1 and G3. Analysis of difference between the mean scores of the total of the groups showed a significant difference. Of the three different learning applications, G2 performed stronger on transfer items than G1 and G3.

The total mean score of groups (recall) were 68.00, 81.71 and 70.28 for G1, G2 and G3 respectively. Model significance of ANOVA, F(2, 27) = 4.550 indicated a p value of 0.020, which is less than α value of 0.05. This means that there was at least one significant difference between the means of the groups’ scores. Scheffe post-hoc comparison indicated that there was difference between G1 and G2 only. Scheffe post-hoc comparison did not indicate the existence of significance difference between G1 and G3 or between G2 and G3. The data from the overall scores were of

limited use to the study, as it was clear from the previous analyses that the significant differences only occurred in the transfer questions.

In document Optimising layered integrated instructional design using cognitive load theory (Page 126-130)