• No results found

Examples of the Oscar ACITS tutoring style adaptation can be seen in the dialogue snippets logged during the experiments (Table 10.5 and Table 10.6).

Table 10.5. Dialogue Snippet Logged During the Experiments: Adapting to a Sequential Learner.

Utterance Dialogue

87. Oscar> Now let's see if you can remember how to write a query which chooses

values between a range. Look at the employees table. We need a query which displays the first and last names of all employees hired between 1 January 1994 and 1 January 1995. The first step in writing a query is to state the information you need. For our query to display the first and last names of all employees hired between 1 January 1994 and 1 January 1995, which columns do we need?

88. Student> dont know

89. Oscar> Look again at the columns in the EMPLOYEES table. If we want to display

the first and last names of all employees hired between 1 January 1994 and 1 January 1995, which columns do we need?

90. Student> first_name, last_name, hire_date

91. Oscar> That's right, we need FIRST_NAME and LAST_NAME, so the first part of

our query is shown in the picture. The next step in writing a query is to state which table the information is retrieved from. For our query to list the first and last names of all employees hired between 1 January 1994 and 1 January 1995, type in the name of the table our information will be taken from.

92. Student> employees

93. Oscar> That's right, the table is EMPLOYEES. The last part of writing a query is the

WHERE clause, which lets us restrict the rows selected to only those satisfying one or more conditions. We want to restrict the employees displayed to only those hired between 1 January 1994 and 1 January 1995. There are a number of ways to write this restriction in ORACLE SQL as dates can be represented in different formats. Type in the WHERE clause which limits the employees in this way.

94. Student> where hire_date in('01-01-1994' && '01-01-1995')

95. Oscar> One way to tackle this query is to use the BETWEEN keyword to specify a

range. Complete the last line of the query shown - type in the WHERE clause to limit the employees displayed to those hired between 1 January 1994 and 1 January 1995.

96. Student> '01-01-1994' and '01-01-1995'

97. Oscar> OK, excluding date conversion functions, ORACLE recognises dates in 3

basic formats: '01-JAN-94', '01-January-94' and '01-January-1994'. The picture shows two different ways in which the WHERE clause could have been written to give the right results.

Both example dialogues (Table 10.5 and Table 10.6) are taken from the same tutorial question where students are asked to write an SQL query, and both students were assigned to the Adapt experimental group (i.e. the adaptation matched their learning style). The student dialogue in Table 10.5 is an example of the tutoring style adapted to a Sequential learner, where Oscar guides the student through the steps of the query. Table 10.6 in contrast shows a student dialogue with tutoring adapted for a Global learner, where the student writes a query without initial guidance and learns from the feedback given about errors in their solution.

Table 10.6. Dialogue Snippet Logged During the Experiments: Adapting to a Global Learner.

Utterance Dialogue

64. Oscar> Now let's see if you can remember how to write a query which chooses

values between a range. Look at the employees table. Please type in a query which displays the first and last names of all employees hired between 1 January 1994 and 1 January 1995.

65. Student> select First_name, last_name from Employees where Hire_date between 1-

JAN-1944 and 1-Jan-1995;

66. Oscar> I'm afraid your WHERE clause is not quite right. Check your spelling and

type in the last part of our query which displays the first and last names of all employees hired between 1 January 1994 and 1 January 1995.

67. Student> select First_name, last_name from Employees where Hire_date between 1-

JAN-1994 and 1-Jan-1995;

68. Oscar> No. Excluding date conversion functions, ORACLE recognises dates in 3

basic formats: '01-JAN-94', '01-January-94' and '01-January-1994'. The picture shows two different ways in which the WHERE clause could have been written to give the right results.

3 Experimental Design

In order to validate the methodology and architecture proposed in Chapter 9, an empirical study was undertaken in a real-world natural learning environment. The study evaluated the success of Oscar ACITS in adapting its tutoring to individuals‟ learning styles. This section will describe the design of the experiment, including the hypotheses to be tested and method of evaluation.

3.1 Hypotheses to be Tested

There are two hypotheses to be tested, which relate to the success of the Oscar ACITS adaptation, as follows:

H1: it is possible to improve learning from an automated online

conversational tutorial by presenting tutor material adapted to a student’s learning style.

Can any evidence be found to support the learning styles theory in suggesting that adapting teaching material to match preferred learning styles improves learning? A common measure of learning is learning gain (Kelly and Tangney, 2006; Graesser et al., 2003; Lee et al., 2004). Learning gain could be measured in a number of ways, for example an improvement in test scores or the number of tutorial questions a learner answers correctly. To test this hypothesis it will be necessary to compare learning gain for a group of learners who experience a tutorial adapted to suit their learning styles with a control group.

H2: it is possible to improve the efficiency of an automated online

conversational tutorial by presenting tutor material adapted to a student’s learning style.

Is there any evidence that adapting teaching material to match preferred learning styles improves the efficiency of learning? Efficiency may be measured in a number of ways, for example by comparing the duration of a conversational tutoring session or the amount of discussion taking place.

3.2 Evaluation Criteria

In addition to evaluating the effect of the Oscar ACITS adaptation to learning styles by testing the hypotheses stated in section 3.1, Oscar ACITS‟ ability to tutor effectively will be investigated. Evaluation of the Oscar ACITS will therefore take place on three levels:

1. Adaptation: Can Oscar ACITS successfully adapt its tutoring to

individuals‟ learning styles? Does the Oscar ACITS adaptation to learning styles improve the learning gain or efficiency of the tutoring?

2. User evaluation: How successful do learners believe Oscar ACITS is and would they use Oscar ACITS in practice?

3. Learning gain: Does Oscar ACITS successfully tutor learners, i.e. do they learn anything?

3.2.1 Adaptation to Learning Styles

This criterion evaluates the second main research question (stated in Chapter 1, Section 1), „Does adapting to a student‟s learning style during a two-way tutoring discourse with a conversational agent tutor improve learning?‟. In order to evaluate

whether the Oscar ACITS adaptation to learning styles has a positive effect on the tutoring, it is necessary to split participants into different experimental groups. A match/mismatch approach was adopted (Tsianos 2008), whereby participants are randomly assigned to follow a tutorial either matched or mismatched to their learning styles. The match/mismatch approach was considered to be a better test of the

adaptation than an approach where one control group experiences a basic tutorial, as it was concluded that any group experiencing additional learning material would be likely to show improved learning.

Each participant will be asked to complete the ILS questionnaire, and depending on their learning styles will be unknowingly assigned to one of three experimental groups, as follows:

Learners whose learning styles are at the centre of all three FS scales (i.e. there is no strong preference, their ILS scores being 1 or 3) will be assigned to the Neutral-Adapt group. These learners will follow the neutral adaptation learning path which contains a mixture of styles.

Learners with at least one preferred learning style will be randomly assigned to either the Adapt or Mismatch group according to a 2:1 ratio. These learners will follow an adaptive learning path assigned by the algorithm. Learners in the Mismatch group will be deliberately presented with learning material unsuited to their learning styles.

The average learning gain and efficiency of the tutorials will be compared for each experimental group, to evaluate whether adapting to learning styles positively affects the success of the tutoring.

3.2.2 Qualitative User Evaluation

In addition to evaluating the Oscar ACITS adaptation approach, qualitative user feedback will be gathered at the end of the Oscar ACITS tutorial. The user evaluation feedback questionnaire designed for the Oscar PCITS study (and described in

Chapter 8, Section 2.2.3) will be reused to collect participant feedback.

3.2.3 Learning Gain

In order to additionally investigate whether participants have increased their knowledge at the end of the tutorial, learning gain will be measured using a pre-test and post-test approach (Kelly and Tangney, 2006; Graesser et al., 2003; Lee et al.,

2004). The same Multiple Choice Question (MCQ) test will be completed before and after the tutoring conversation. The MCQ test scores will be compared to establish whether there is any improvement as follows:

Eq. 1. Learning gain = post-test score - pre-test score

4 Experimental Methodology

This section describes the experimental methodology followed to test where Oscar ACITS can deliver an effective conversational tutorial, and whether adapting to learning styles has a positive effect on the tutoring. As described in Section 2, the Oscar ACITS prototype was implemented to deliver an adaptive conversational tutorial for SQL revision. Oscar ACITS was integrated into a final year

undergraduate module within the Department of Computing and Mathematics at Manchester Metropolitan University. An uncontrolled, real-world experiment was undertaken in a natural learning environment.

4.1 Description of Participants

There were 72 participants who were final year undergraduate students studying for a computer science degree. The participants had previously been taught SQL, although most would not have used SQL for at least six months. No participant had any previous experience using Oscar ACITS.

4.2 Methodology

The Oscar ACITS SQL Revision tutorial was integrated into a final year

undergraduate module. During timetabled laboratory classes, participants were asked to refresh their SQL knowledge by completing the revision tutorial. In order to

promote the completion of the tutorial, participants who completed it were awarded 2% of the module mark in recognition of engagement. Participants started the SQL

revision tutorial during the laboratories, and those who did not complete the tutorial in a single session were able to continue the tutorial via the Internet at a convenient time.

Each participant was unknowingly assigned to one of three experimental groups (Neutral-Adapt, Adapt and Mismatch) as follows:

Participants with no strong preference for all three FS dimensions (i.e. their ILS scores were 1 or 3) were assigned to the Neutral-Adapt group. These participants followed the neutral adaptation learning path, which contains a mixture of styles.

Participants with at least one preferred learning style were randomly assigned to either the Adapt or Mismatch group according to a 2:1 ratio. These

participants followed an adaptive learning path assigned by the algorithm and were given tutor material favouring particular learning styles (e.g. containing explanations of theory rather than practical examples). Participants in the Mismatch group were deliberately presented with learning material unsuited to their learning styles by reversing their learning style scores for all FS dimensions. For example, a participant with learning style scores of Active 9 and Reflective 2 was presented with learning material adapted to the scores Active 2 and Reflective 9.

Each participant followed an individual learning path depending on their experimental group, learning styles, dialogue and existing knowledge. The

participant interaction with Oscar ACITS during the experiment will be described in Section 4.3. During the SQL Revision tutoring session, ten questions were posed, requiring eighteen answers (as some questions incorporated multiple steps or

questions). Each participant‟s tutoring dialogue, adaptations, timings, knowledge and other behaviour factors were recorded in log files as described in Section 2.3.

Following the study, the data gathered was analysed and the experimental group averages were compared to assess the success of the adaptation mechanism, as will be detailed in Section 4.4. In addition, the tutoring success was evaluated in terms of participant learning gain and participant experiences reported in the feedback

questionnaires.

4.3 Participant Interaction

Figure 10.4 illustrates the stages involved in the participant interaction with Oscar ACITS during the study. As shown in Figure 10.4, after registering

participants completed the formal ILS questionnaire before beginning the tutorial. Next, students completed a pre-tutorial multiple choice question (MCQ) test, known as the pre-test, to assess existing knowledge before starting the conversational tutorial. The conversational SQL revision tutorial took on average approximately 43

minutes, with each participant following an individual learning path depending on their knowledge, learning styles and experimental group. After completing the tutorial conversation, students repeated the same MCQ test, known as the post-test, and were then presented with some tutor feedback and a comparison of their test results (indicating their learning gain). Finally, students were asked to complete a user evaluation questionnaire.

Anonymous Registration

Formal ILS Questionnaire

Pre-tutorial MCQ Test (‘pre-test’) Conversational Tutoring Session Post-tutorial MCQ Test (‘post-test’) Test Results Comparison

and Oscar’s Feedback User Evaluation

Questionnaire START

END

Figure 10.4. Stages in the Experimental Oscar ACITS Tutorial Interaction

4.4 Experimental Analysis

The data gathered from the participant interactions was analysed to explore whether the Oscar ACITS adaptation to learning styles improved the tutoring. Seven experiments were designed to test the two hypotheses (Section 3.1) and the results of each experimental group were compared to see if adapting to learning styles affected the tutoring. The analysis performed for each of the seven experiments will now be described.

Experiment 1 – Correct Tutorial Answers

This experiment tests hypothesis H1 by considering the performance of

were required as some questions incorporated multiple steps or questions. The number of correct answers given to tutoring questions was counted, and a score out of 18 assigned for each participant. Next, the average percentage score was

calculated for each experimental group. The experimental group averages were then compared to determine whether there was any difference in performance related to the adaptation style.

Experiment 2 – MCQ Test Score Improvement

This experiment tests hypothesis H1 by considering the actual improvement in test scores from the pre-test to the post-test (defined in Eq. 1 (Section 3) as learning gain). Average test score improvements were calculated for each experimental group and then compared.

Experiment 3 – MCQ Test Score Improvement/Opportunity

Experiment 3 extends experiment 2, also testing hypothesis H1, by considering the average improvement in test scores as a percentage of the possible improvement. This measure is more accurate than experiment 2 as it also considers the opportunity for improvement, i.e. excludes those participants who achieved 12/12 in the pre-test. Improvements were calculated using the formula:

Eq. 2. learning gain

(questionCount - preTestScore)

Average improvements for each experimental group were then compared.

Experiment 4 – MCQ Test Questions Worse

Experiment 4tests hypothesis H1 by considering participants‟ performance in individual MCQ test questions rather than their overall scores. It is possible that in some cases the Oscar ACITS adaptive tutoring had a negative impact on learning. This experiment investigates whether there is any difference between experimental groups in the number of times, following the tutoring, participants performed worse in test questions. Questions where participants selected the correct answer in the pre- test but the incorrect answer in the post-test were counted. The averages were then calculated for each experimental group using the formula:

Eq. 3. (worseCount/12)

The results for each experimental group were then compared.

Experiment 5 – MCQ Test Questions Better

This experiment also tests hypothesis H1 by considering individual test questions, by counting the number of cases a participant‟s performance improved in questions, i.e. questions were answered incorrectly in the pre-test but correctly in the post-test. The averages were calculated for each experimental group using the formula:

Eq. 4. (betterCount/12)

groupSize

The results for each experimental group were then compared.

Experiment 6 – Session Duration

Experiment 6tests hypothesis H2 by considering the duration of the tutoring sessions. The average duration of the conversational tutoring sessions (in seconds) was calculated for each experimental group and then compared.

Experiment 7 – Number of Interactions

This experiment tests hypothesis H2 by considering the number of interactions (i.e. participant dialogue acts) during the tutorial. The average number of interactions during the tutorial was calculated for each experimental group and then compared.

5 Results and Discussion

This section will present the results of the study to validate the Oscar ACITS methodology and architecture presented in Chapter 9.

5.1 Overall Results

63 of the 72 participants fully completed the tutoring session; incomplete tutorial sessions were disregarded. Of the 63 complete tutorial sessions, one was disregarded as the participant had not engaged with the tutorial, answering „no‟ to all questions and selecting the same answer for all multiple choice test questions. Table 10.7 shows the distribution of the 62 participants across experimental groups and the average (mean) test scores.

Table 10.7. Experimental Groups Experimental Group Number of Participants Average Pre-test Score (/12) Average Post-test Score (/12) Neutral-Adapt 10 8.7 10.7 Adapt 32 8.6 10.8 Mismatch 20 8.1 10.8 Total 62 8.5 10.8

In Table 10.7, the ten Neutral-Adapt participants had learning style results that showed no strong preference for a particular learning style (i.e. their styles were balanced in the centre of the scale), and followed a neutral adaptation learning path containing a mixture of styles. The Adapt group contained 32 participants who followed a learning path containing teaching material in a style adapted to their individual learning styles. The Mismatch group of 20 participants followed an adaptive learning path of teaching material that was mismatched to their learning styles. The Mismatch group had a slightly lower average pre-test score (out of 12), but the average post-test scores were approximately the same for all participants across the sample.

On the whole, the Oscar ACITS tutorial made a positive improvement in

participant test scores, with an average learning gain (calculated using Eq. 1) of 19% over the sample.

Table 10.8 shows the distribution of learning styles across the Adapt and Mismatch groups. Two learning styles (Intuitive and Global) are not represented in the Mismatch group, which is sometimes unavoidable with random assignment to experimental groups and where not all learning style dimensions are evenly balanced.

Table 10.8. Learning Style Distribution

Learning Adapt Group Mismatch Group

Style n % of Total n % of Total

Sensory 17 49 18 51 Intuitive 4 100 0 0 Active 11 69 5 31 Reflective 7 70 3 30 Sequential 9 56 7 44 Global 5 100 0 0 5.2 Experimental Results

Table 10.9 reports the results of the seven experiments. All results were tested for difference between the experimental groups using the Kruskal-Wallis test (Kruskal

and Wallis,1952) with a confidence interval of 95%. The Kruskal-Wallis test is non- parametric so it does not require normality and works with data represented as percentages (unlike the more common ANOVA). Each experiment will now be