Results - Study 1: Evaluation with Adults

5.3 Study 1: Evaluation with Adults

5.3.4 Results

The following section summarizes the evaluation results of the A-BKT model with adult learners with respect to their performance during the training stage of the game (H1) and their performance in the post-test afterwards (H2).

5.3.4.1 Learning Progress During Training

In order to assess learners’ progress during training the number of correct answers provided in the initial quarter of the tutoring game (first seven items) is compared to the corresponding number in the final quarter (last seven items). When an item occurred repeatedly within the initial quarter the first occurrence is taken into account. Similarly, when an item occurred repeatedly within the final quarter just the last occurrence is considered. In both cases, the quarter is expanded so that exactly seven distinct items are included.

A two-way ANOVA with training phase (initial, final) as a within-subjects factor and training type (adaptive, control) as between-subjects factor was conducted. The results are summarized in Table5.2

and Figure5.5. Not surprisingly, there is a main effect of training phase at a significant level (F(1,38) =

66.85,p < .001,η2 = .64), showing that learners’ performance was significantly better in the final phase as compared to the initial phase. Participants achieved an average of3.88(SD = 1.27) correct answers in the first quarter of training, as opposed to an average of6.03(SD = 1.49) items correctly selected in the final quarter. More interestingly, there is also a main effect of training type (F(1,38) =

3,50 4,00 4,50 5,00 5,50 6,00 6,50 7,00

First 7 answers Last 7 answers

tn uoc re ws na t cer ro C Adaptive Random

Figure 5.5: Mean numbers of correct answers at the beginning (first7) and end (last7) of the interaction

in the different conditions (taken and redesigned with permission fromSchodde et al.(2017a)).

significant higher average score of correct answers (M=5.33,SD=.69) as compared to learners in the control condition with an average ofM =4.58(SD =1.12) correct answers. Finally, the interaction between training phase and training type is also significant (F(1,38) = 14.46,p = .001,η2 = .28) indicating that the benefit of A-BKT-based training develops over time (see Figure5.5). While participants’ response behavior in the first quarter of training was similar across conditions, a benefit of training with the A-BKT model becomes evident in the final quarter. Participants in the A-BKT model condition achieved an average ofM=6.90(SD=.31) correct answers for the last seven distinct skills, whereas participants in the control condition achieved only an average ofM =5.15(SD= 1.69) correct answers. In summary, the found results fully support hypothesis H1 that participants will perform better during training when they learn with the adaptive system based on the A-BKT model.

5.3.4.2 Post-Test

Afterwards, the participants’ vocabulary learning performance, assessed subsequent to the learning interaction, is analyzed, which was measured with two different translations tests. Paired-sample t-tests were conducted to compare the number of correctly recalled words after the training with the A-BKT model, to that resulting from the randomized teaching strategy in the control condition. For the German-to-Vimmi translation no significant effect could be observed. The participants who learned with the A-BKT model, correctly recalled an average of3.95(SD = 2.56) words out of ten, while participants in the control condition correctly recalled an average of3.35(SD = 1.98) words. Like- wise, no significant effect for the Vimmi-to-German translation task could be found. Participants’ performance after learning with the adaptive model amounted to an average of7.05(SD = 2.56) correct items compared to participants’ performance in the control condition with an average of6.85 (SD=2.48) correct answers.

Even though, no main effect of training type emerged in the post-test, some details might be worth mentioning. As depicted in Figure5.6in the German-to-Vimmi post-test a maximum of ten correct answers was achieved by some participants in the A-BKT model condition, whereas the maximum in

5.3. Study 1: Evaluation with Adults Adaptive-Model Random tn uoc re wn a tc err oC Participants

Figure 5.6: Participant-wise amount of correct answers grouped by the different conditions for the

German-to-Vimmi post-test (taken and redesigned with permission fromSchodde et al.(2017a)).

the control condition amounted to only six correct answers. Moreover, two participants in the control condition did not manage to perform any German-to-Vimmi translation correctly. In the A-BKT model condition, however, all participants achieved at least one correct answer.

To sum up, the observed results do not support the hypothesis H2 that participants who learn with the adaptive system based on the A-BKT model will perform better in the post-test subsequent to the learning interaction.

5.3.5 Discussion

The major goal of the presented study was to evaluate the proposed A-BKT model within a language learning interaction. Therefore, 40 participants were invited to play a language learning game together with the robot Nao, supported by the underlying SARTS. They were either playing in an adaptive condition, in which the A-BKT model was used to select the next skill and the corresponding task difficulty, or in a randomized control condition.

The results show that participants’ within the adaptive condition perform better during the training stage of the game, as compared to the control group. Analyzing their response behavior shows not only that participants were able to learn Vimmi words throughout the interaction but also, and more importantly, that they learned more successfully with the adaptive model in comparison to randomized tutoring. This is due to the A-BKT model’s ability to prioritize unknown or hard to remember skills, which results in repeated trials for these particular words until the system’s belief state becomes similar to those of known words. This strategy outperformed the tutoring of the same material but with equal number of repetitions (three per word) and in randomized order.

In the post-test, however, no significant difference between experimental conditions were found, although a trend towards better performance in the adaptive model condition compared to the control condition could be observed. Different explanations may account for this inconsistent finding. First,

the way responses were requested from the learner was not identical in the training sessions and the post-test. During training, pictures reflecting the meaning of the target words were shown, whereas in the post-test the participants merely received a linguistic cue from the experimenter in the form of a word they had to translate. Consequently, the training with the SARTS might have led to stronger associations between linguistic and figurative materials, in particular for words that were difficult to remember since they were repeatedly presented in the adaptive condition. This might have triggered a stronger decline of correct answers for participants who trained with the adaptive model as opposed to those in the control condition. Second, the post-test results measured immediately after the training session could be governed by the strong inter-individual differences among learners, e.g., in the time they need to internalize the knowledge. Consulting the literature revealed other studies that try to cope with this problem by introducing a second and delayed post-test, further called as retention-test, which can be conducted a few days or even weeks later (e.g.,Khoshsima et al.,2015;Singer and Gerrits,2015). In fact, their results demonstrate that significant differences can still appear after a longer period of time. Consequently, introducing a retention-test might reveal results that match learners’ performance during the training.

In summary, the A-BKT model revealed promising results for its application in a SARTS within a language learning interaction. Although the post-test results were not entirely conclusive, the A-BKT model demonstrated its ability to support the participants by increasing their learning performance during training and, thus, can be evaluated with children in the next step. However, the study design requires some small but important changes beforehand. First, the post-test style has to be adapted to be similar to the training session and, second, a retention-test needs to be introduced. Finally, the tutoring game, as well as the robot’s dialog, have to be adjusted to be suitable and understandable for young kindergarten children.

In document Integrating Socially Assistive Robots into Language Tutoring Systems. A Computational Model for Scaffolding Young Children's Foreign Language Learning (Page 85-88)