E-learning evaluation: A cross-technique comparison

(1)

E-learning evaluation:

A cross-technique comparison

Rosa Lanzilotti1,_{Carmelo Ardito}1_{, Maria F. Costabile}1_{, Antonella De Angeli}2_, 1 _{Dipartimento di Informatica – Università di Bari – Italy}

{lanzilotti, ardito, costabile}@di.uniba.it 2 _{School of Informatics - University of Manchester Po BOX 88 M601QD - UK}

[email protected]

Abstract. The evaluation of e-learning applications deserves special attention, and evaluators need effective methodologies and appropriate guidelines to perform their task. In this paper, we present a comparison of different techniques and describe the methodology of an experiment aimed to unveil strengths and weaknesses of Heuristic Evaluation, Direct Observation and SUE Inspection. The experiment was designed to compare the performance of evaluators applying different techniques on the same e-learning software.

1 Introduction

Nowadays, it is becoming usual to learn by an electronic support, such as a simple CD-Rom or the Web. A major challenge for designers and Human-Computer Interaction (HCI) researchers is then to develop software tools able to engage novice learners and to support learning even at a distance. Obviously, the designers have to consider not only the different characteristics of learners, such as cultural background, technical experience, technological equipment, physical/cognitive abilities, but they also have to ensure that student’s interaction with the software is as natural and intuitive as possible. It is a matter of fact, indeed, that often learners prefer traditional instruction to web-based training because the interaction is cognitively demanding (confusing menu, unclear buttons or illogical links). A topic still neglected in e-learning design is the interaction between learner and computer. The interface of an e-learning application can become a barrier to effective learning and information retention: if it is not well designed, the user can feel lost, confused or frustrated.

Squires and Preece [15] argue that researchers have so far not given enough importance to the implications of usability features of an educational application in order to achieve educational goals. To this end, the authors assert that “there is a need to help evaluators consider the way in which usability and learning interact”. A consolidated evaluation methodology of e-learning applications is not yet available. Some authors propose to use traditional usability techniques for evaluating e-learning systems [6, 12]. Instead, Squires and Preece highlight the needs to adapt the

(2)

evaluation techniques to the context of learning and propose a list of heuristics, called “learning with software” that are a list of guidelines adapted to the context [15].

Researchers in our groups have experience in the evaluation of specific types of applications in various domains. The SUE (Systematic Usability Evaluation) methodology was developed, which systematically combined inspection with user-testing [4, 10]. The SUE inspection aims at helping usability inspectors share and transfer their evaluation expertise, simplify the inspection process for newcomers, and achieve more effective and efficient results. It is based on the use of evaluation patterns, called Abstract Tasks (ATs), which precisely describe the evaluators’ activities to be performed during the inspection. SUE has been applied to different classes of applications, such as hypermedia [10] and web sites [3]. We are currently adapting it to the context of e-learning. So far, we have identified a first set of usability attributes and guidelines able to capture the peculiar features of e-learning applications [1]. The ATs specific for e-learning were derived from such usability attributes and guidelines [2]. In order to validate this approach, we have carried out a controlled experiment, whose design is described in this paper. It compares the SUE approach to other well known techniques such as user-testing and heuristics evaluations.

2 Usability Evaluation Techniques

Different methods can be used for evaluating the usability of interactive systems. Among them, the most commonly adopted are user-based methods and inspection methods. User-based methods mainly consist of user testing, in which usability properties are assessed by observing how the system is actually used by some representatives of real users [5]. Usability inspection methods involve expert evaluators only, who inspect the application and provide judgments based on their knowledge [11].

User-based evaluation currently provides the most complete form of evaluation, because it assesses usability through samples of real users. However, this technique has a number of drawbacks, such as the difficulty to properly select adequate user samples and to train these users to manage also advanced functions of an interactive system. Furthermore, it is difficult to reproduce ecological settings of usage in a limited amount of time. Failures in creating real-life situations may lead to “artificial” conclusions rather than to realistic results. The effort and time to set up reliable user testing is often conspicuous. A valid technique in the user-based evaluation methods is the thinking aloud in which users are asked to think aloud verbalising their actions when they use the system or prototype. In this way, evaluators can detect the user mental model of an application recognising eventual misconceptions and the system elements that cause them.

With respect to user-based evaluation, usability inspection methods are more subjective, having heavy dependence upon the inspector skills. Their main advantage is however the cost saving: they “save users” [8], and do not require any special equipment, nor lab facilities. In addition, experts can detect a wide range of problems and possible faults of a complex system in a limited amount of time. Among the

(3)

inspection methods, the most commonly used is the heuristic evaluation [11]. It involves a small set of experts inspecting the system, and evaluating the interface against a list of recognized usability principles: the heuristics. Heuristic evaluation is one of the “discount usability” methods [11]. In fact some researches have shown that it is a very efficient usability engineering method, with a high benefit-cost ratio [11]. It is especially valuable when time and resources are short, because skilled evaluators, without needing the involvement of representative users can produce high quality results in a limited amount of time [9].

In order to provide a more robust evaluation method, SUE (Systematic Usability Evaluation) inspection has been introduced [10]. SUE proposes a set of evaluation patterns, called Abstract Tasks (ATs). ATs guide the inspector’s activities, precisely describing which objects of the application to look for, and which actions to perform during the inspection in order to analyze such objects. In this way, even less experienced evaluators are given the opportunity of achieving with more complete and precise results.

The aim of the experiment reported in this paper is to understand what are the relative strengths and potential problems of different usability techniques in the e-learning context. In details, we concentrated upon Heuristic Evaluation (HE), Direct Observation (DO), and Sue Inspection (SI) as these techniques cover the range of alternatives currently available. The experimental method is described in the following sections

2.1 Method

Following the methodology adopted in [4], the comparison metric was defined along the three major dimensions of effectiveness, efficiency, and user satisfaction, which correspond to the principal usability factors as defined in [7]. In the adopted metric, effectiveness refers to the number of usability problems discovered by an average inspector. Efficiency refers to the time expended in relation to the effectiveness of the evaluation. Satisfaction refers to a number of subjective parameters, such as perceived usefulness, difficulty, acceptability and confidence with respect to the evaluation technique. For each dimension, a specific hypothesis was tested. As it is well known that user testing is accurate in terms of the number of problems unveiled, but requires a well-crafted evaluation setting and is costly in terms o resources and time, DO was tested as an evaluation baseline .

Effectiveness Hypothesis. As a general hypothesis, we predicted that SI should increase evaluation effectiveness as compared to HE. The advantage is related to two factors: (a) the systematic nature of the SI technique that help the evaluator to identify basic application constituents; and (b) the use of ATs, which suggest the activities to be conducted over such application constituents. These factors are likely to reduce the subjective aspect of HE.

Efficiency Hypothesis. We predicted that applying SI should take longer than HE. This prediction is generated by taking into consideration the fact that a rigorous application of several ATs is time demanding. The inspectors need to evaluate the applications performing each step reported in the AT activity description. On the

(4)

other hand, HE is less structured and faster than SI. However, we expected that SI should not compromise inspection efficiency, given the higher effectiveness of the technique that should compensate for the longer time required by its application. Thus, we expected no difference in efficiency between the two techniques.

Satisfaction Hypothesis. The results reported in [4] showed that the SUE inspection enhances the inspectors’ control over the inspection process and their confidence on the obtained results. Thus, we believe that the satisfaction of the inspectors that used the SI is major than the satisfaction of the evaluators that exploited HE and DO, as they may feel more in control.

Participants

The study involved 75 senior students of a Human-Computer Interaction (HCI) class at the University of Bari in Italy. They participated in the experiment as part of their credits for the HCI course. All participants had a basic knowledge of usability of interactive systems and of usability evaluation techniques, because they had some previous experiences of evaluating software systems. In addition, we recruited 25 students of another computer science class, who use the e-learning system, thus acting as users for the DO technique.

Design

The three usability evaluation techniques were manipulated between-subjects. Initially, the evaluators were randomly divided in three groups of 25. Each group was assigned to one of the three experimental conditions. The HE (Heuristic Evaluation) group had to perform an heuristic evaluation exploiting the “learning with software” heuristics [15]. In the DO (Direct Observation) group, every evaluator observed a student during the interaction with the e-learning application using the thinking aloud technique. Finally, the SI (Sue Inspection) group used the inspection technique with ATs proposed by SUE methodology [1].

Procedure

A week before the experiment, all participants were presented with a short demonstration of the application to be evaluated, lasting almost one hour. A few summary indications about the application content and the main functions were introduced, without providing, however, too many details. A couple of days before the experiment, a training session of about one hour introduced participants with the conceptual tools to be used during the experiment. Each group participated in their specific training session in which the usability technique adapted to the context of e-learning was illustrated.

The experiment consisted of two experimental sessions lasting three hours each. During the first session, participants evaluated the e-learning system applying the technique they were assigned to. The HE group was provided with a list of the ten Preece and Squires heuristics, summarizing the usability guidelines for the e-learning systems. The DO group observed a student interacting with the e-learning system during the execution of seven predefined task. The SI group was given the list of

(5)

eight ATs to be applied during inspection. The limited number of ATs was due to the limited time of the experimental session. We selected the most basic ATs, which could guide SI inspectors in the analysis of the main application constituents.

Working individually, participants of the three groups had to find usability problems in the application, and to record them on a report booklet, which differed according to the experimental condition. For the HE group, the booklet included ten forms, one for each e-learning heuristics. The form required information about where that heuristic was violated and a short description of the problem. The SI group was instead provided with a report booklet including eight forms, each one corresponding to an AT. Again, the forms required information about the violations detected through that AT and where they occurred. In the DO group, the booklet included seven forms, one for each predefined task, in which the observer enumerated and described the problems that the user encountered performing the predefined task and described the observed problem. At the end of the experimental session, all the filled forms were collected.

A day after, there was another session in which each evaluator was asked to type each detected problem in an electronic form. This was required in order to avoid readability problems in the data analysis. In the electronic booklet, each participant wrote again the description of the problem, where it occurred, how it was found and how many time it occurred. Finally, the evaluator gave a severity rating to the problem in the scale from 1 (I do not agree that this is a usability problem at all) to 5 (usability catastrophe).

Table 1: An example of the electronic booklet provided to the experimental groups. Problem

description Where How How Many Times Severity Ratings

At the end of the second session, participants were invited to fill in the evaluator-satisfaction questionnaire proposed in [4]. It combined several items to measure three main dimensions: user-satisfaction with the evaluated application, evaluator-satisfaction with the inspection technique, and evaluator-evaluator-satisfaction with the results achieved. The psychometric instrument was organized in two parts. The first was concerned with the application; the second included the questions about the adopted evaluation technique. Two final questions asked participants to specify how much they felt satisfied about their performance as evaluators.

Data coding

The report booklets have been analyzed by four expert evaluators with a strong HCI background to assess effectiveness and efficiency of the applied evaluation techniques. Evaluator satisfaction was instead measured analyzing the self-administered post-experiment questionnaires. Currently, we are analyzing the data. Thus, the discussion of the results can not be included in this paper, but it will be ready at the workshop time.

(6)

3 Conclusion

One present goal of researchers and developers of Human-Computer Interaction is to design software tools that support people to learn the material available online in an educationally effective manner. In this paper, after having briefly described our approach to the evaluation of usability of e-learning application, the design and the execution of a controlled experiment have been illustrated. This experiment aims at answering important questions about the effectiveness, efficiency, and satisfaction of the SUE inspection technique in the context of e-learning, in comparison with other evaluation techniques. The research continues for identifying a wider and more exhaustive set of guidelines and techniques that allow to obtain the more reliable results investigating the aspects of an e-learning application.

References

1. Ardito C., De Marsico M., Lanzilotti R., Levialdi S., Roselli T., Rossano V., Tersigni M., 2004, Usability of E-Learning Tools, Proceedings of the International Conference Advanced Visual Interface 2004 (AVI 2004), Gallipoli, Italy, May 25-28, (2004) 80-84.

2. Ardito C., Costabile M.F., De Marsico M., Lanzilotti R., Levialdi S., Roselli T., Rossano V.: An Approach to Usability Evaluation of e-Learning Applications, Special Issue “User-Centred Interaction Paradigms for Universal Access in the Information Society” of the International Journal Universal Access in the Information Society, Vol. 4/5, (2005).

3. Cantoni L., Di Blas N., Bolchini D. Comunicazione, qualita', usabilita‘, Edizioni Apogeo, 2003.

4. De Angeli A., Matera M., Costabile M., Garzotto F., Paolini P.: On the Advantages of Systematic Inspection for Evaluating Hypermedia Usability. International Journal of Human-Computer Interaction, ISSN: 1044-7318., Vol. 15, N. 3, (2003) 315-355.

5. Dix A., Finlay J., Abowd G., and Beale R.: Human-Computer Interactio (3rd_Edition) London: Prentice Hall Europe, (2003).

6. Dringus L., An Iterative Usability Evaluation Procedure for Interactive Online Courses, Journal of Interactive Instruction Development, Vol. 7, N. 4, (1995) 10-14.

7. International Standard Organization 9241: Ergonomics Requirements for Office Work with Visual Display Terminal (VDT), 1997.

8. Jeffries R. and Desurvies H.W.: Usability testing vs. Heuristic Evaluation: was There a Context?, ACM SIGCHI Bulletin, Vol. 24, N. 4, (October 1992) 39-41.

9. Kantner L. and Rosenbaum S.: Usability Studies of WWW Sites: Heuristics Evaluation vs.

Laboratory Testing, Proceedings of SIGDOC ’97, Snowbird, UT, USA, (1997) 153-160,

ACM Pres.

10. Matera, M., Costabile, M.F., Garzotto, F., and Paolini, P.: SUE Inspection: an Effective Method for Systematic Usability Evaluation of Hypermedia. IEEE Transactions on Systems, Man and Cybernetics - Part A, Vol. 32, N. 1, (2002) 93-103.

11. Nielsen J.: Usability Engineering. Academic Press. Cambridge, MA. (1993).

12. Parlangeli O., Marchigiani E., and Bagnara S.: Multimedia System in Distance Education: Effects on Usability, Interacting with Computers, Elsevier Science Ltd, Great Britain, Vol. 12, (1999) 37-49.

(7)

14. Squire D. and Preece J.: Usability and Learning: Evaluating the potential of educational software. Computer & Education, Elsevier Science Ltd, Great Britain, 27(1), (1996) 15-22. 15. Squires D. and Preece J.: Predicting quality in Educational Software: Evaluating for

Lear-ning, Usability, and the Synergy between them. Interacting with Computers, Elsevier Science Ltd, Great Britain, Vol. 11, N. 5, (1999) 467-483.