• No results found

Chapter 5 – Evaluation

5.3 Methodology

5.3.3 User Evaluation

For learning materials to provide any positive educational benefit a level of engagement needs to be achieved with learners, as the discussion in Chapter 2 shows. Given the key role that LAVA plays in the delivery of learning materials, it is important that users feel willing and able to interact with the system, as these interactions are essential for learning to occur. Given this, the user evaluation process considers the educational value and usability of LAVA by soliciting user opinion and observing user behaviour whilst the system is in use.

When the subjective nature of the user data collected is coupled with the lack of easily defined metrics against which to evaluate the system, accurately measuring and recording user perceptions becomes increasingly important. As opinions at the extremes of the spectrum can easily skew results and distort findings, it is important that any subjective data is analysed in context, with evaluator bias which can affect the data collection process minimised wherever possible.

As discussed in section 5.3.2, in order to apply balance to the process of evaluating user perceptions, several investigative approaches are adopted, with some obvious to evaluation participants owing to their involvement (for example structured interviews and questionnaires) and some less intrusive, with participants’ behaviour monitored from a distance: for example, group observation, and discourse analysis. As previously discussed, the variety of data collection methods makes it possible to obtain a range of qualitative data points against which to evaluate LAVA, with each of the different approaches focusing on different aspects of the system.

User evaluation sessions were undertaken over the course of three academic years. During the first and second year, group evaluation sessions were undertaken in order to obtain a wide spread of data. In the third year, individual evaluation sessions were used to focus on specific aspects of LAVA and provide additional targeted user responses. The structure of each type of session was standardised to provide a similar evaluation environment over consecutive academic years.

Group Sessions

For most participants, group sessions were the first opportunity for them to engage with LAVA. The main focus of each group session was to gain an understanding of the ways in which users made use of the system whilst attempting to undertake their virtual excavation work. In order to investigate how the users interacted with the system, a six stage process was undertaken as shown in Table 10. To ensure that each group session provided similar opportunities to explore the system, a standardised timetable was adopted, with strict timings enforced for each stage of the evaluation process. This standardisation ensured that participants were given opportunities to familiarise themselves with LAVA at the beginning of the session and ask questions relating to the evaluation process at the end of the session. At the start of each group session, participants were given printed instructions to guide their exploration of the LAVA software. They were also given a detailed description of the evaluation process before being asked to complete a short 13 question questionnaire, as shown in Appendix C, which was designed to obtain some basic demographic information about the evaluation participants. The majority of the questions within the demographic information questionnaire focused on evaluating participants’ educational background, archaeology experience and computer literacy.

Following this, participants were introduced to the members of the evaluation and demonstration teams. During this introduction, the roles of the evaluation and demonstration teams were outlined in a bid to ensure that participants knew who they could ask for help when using LAVA, and who would be

Stage Timing Activity

1 5 minutes Check that all participants can log in to the PC Classroom system and access the MMS resources required. Demonstrators available to diagnose and resolve any issues that arise.

2 10 minutes Gather participant demographic information through the electronic questionnaire system.

3 5 minutes Participants shown a walk through of the system by a demonstrator who remains available throughout the evaluation session to answer questions and troubleshoot system problems.

4 75 minutes Participants separated into groups of 2 (or possibly 3 where groups of 2 are not possible). Each group is provided with a worksheet which outlines a series of objectives that each group should aim to achieve within the session. During the session, a team of two evaluators monitor user interaction with the system and mingle with the groups to obtain more detailed user feedback.

5 10 minutes The participants are issued with a set of guidelines and asked to review how well they feel the system conforms to these guidelines. This feedback is gathered electronically, with the questionnaire asking participants to categorise their responses using a 5 point Likert scale. Participants are also given the opportunity to provide more in depth feedback in an open question section at the end of the questionnaire.

6 5 minutes A post evaluation session briefing is given to inform participants of the ongoing evaluation work. During the briefing, the aims and objectives of the evaluation work are reiterated, with participants made aware of the data processing that will be undertaken on the data collected during the evaluation session. This briefing is used to provide an opportunity for queries from participants to be addressed by the evaluators, with longer queries being answered offline after the session.

observing their interactions with the system. Participants were then given an introduction to the system and encouraged to follow a brief familiarisation exercise which was presented by the main session demonstrator. This approach was adopted to ensure familiarity with the main components of the LAVA software that the participants would need to use during the evaluation session.

During the initial stages of each session, participant groups were closely monitored by the evaluation team. As problems arose, the demonstrators were called in to ensure that all groups had access to their own excavation simulation as quickly as possible, thereby reducing the amount of time wasted during the session. Within the main working phase during stage 4 of the evaluation session, participants were given free access to explore the virtual excavation site as they wished whilst completing a series of objectives provided on the information sheets. Participants were given opportunities to ask questions throughout the session, with evaluators and demonstrators available to assist as required.

As each session progressed, demonstrators continued to handle queries relating to LAVA whilst evaluators only answered questions relating to the evaluation session itself. This approach ensured that throughout each session, evaluation staff would not be seen as stakeholders in the LAVA system by those participating in the evaluation process. To emphasise the importance of impartiality, if evaluators were asked questions relating to the operation of LAVA, demonstrators were called in to respond directly to the participant’s query, with the exchange between the participant and demonstrator observed and recorded by the evaluator.

Whilst interacting with LAVA, participants were organised to work in pairs (or groups of three). In each session there were several groups working simultaneously, with each accessing an isolated instance of the Sparta basilica excavation simulation. This approach was adopted for two reasons:

1. Pair/group working allowed the evaluators to assess the working dynamics of each group by listening to conversations between subject pairs/groups who were asked to follow a talk aloud protocol [249, 250] during the session.

2. As other research has suggested [54, 256, 257], collaboration can facilitate successful performance and encourage reflection on learning objectives:

a. Groups can often solve more interesting and complex problems than individuals working alone [257].

b. Students working in groups need to articulate designs, critiques and arguments to other group members. This encourages the kind of reflection that leads to meaningful deep learning [256].

To obtain data relating to usage patterns, group observations were made by the evaluation team, with a single evaluator randomly choosing a group to observe throughout stage 4. To maximise the value of data obtained, a constructive interaction evaluation methodology [168, 258, 259] was adopted by the group under observation. In this form of user testing, the following principles were applied:

• Both users within the group were provided with a scenario containing several tasks which needed to be met.

• Participant A was asked to lead the scenario whilst collaborating with participant B (and possibly C if working in a group of 3).

• The participants were asked to think aloud and verbalise their thought processes as they tackled each scenario. Evaluators monitored progress from a distance, noting down the approaches adopted and thought processes verbalised by the group.

• It was assumed that the users had no prior knowledge of, or experience using, LAVA. In addition no domain specific knowledge, other than that delivered during the AN3020 module, was assumed.

• Where possible, the interaction between the evaluator and participants was minimised whilst participants worked to complete their objectives. Following the completion of each task, evaluators briefly discussed progress with the group in order to solicit user opinion to provide context to the observation data already obtained.

The approach adopted enabled evaluators to record the actions and verbal feedback of participants as they engaged with the system. This provided information on the way the overall scenario was approached, as well as the individual tasks within it. By focusing on the team dynamic, the constructive interaction methodology allows the evaluation process to focus on the collaborative aspects of the system, allowing the evaluators to analyse the ways in which teams use the collaboration tools within the system. Whilst recording the verbal feedback provided by users, evaluators were asked to consider a number of questions concerning the interactions under review. Listed below, these questions were designed to prompt the evaluator to focus on user activities whilst encouraging them to record contextual information surrounding the interactions between the users and the system:

1. How did users navigate through a scenario?

2. How quickly did users complete each task within a scenario? 3. How did the users cooperate within their teams?

4. Did the users identify areas in which LAVA hindered their efforts to meet their objectives? 5. Do users approach each task presented to them in LAVA in a uniformed manner?

This approach made it possible to develop an overview of participant engagement throughout the entire evaluation session. Other evaluators, who moved between groups, were then used to obtain more detailed participant responses whenever an objective was met, an error encountered, or a milestone completed. When combined, these two methods provide both detailed data relating to the entire evaluation session, as well as multiple snapshot reports detailing significant events encountered by several groups during each session.

Following the hands on session with LAVA, the subjects were asked to break from their groups to individually complete a post-session questionnaire in stage 5 of the evaluation process. Containing 29 questions spread over three sections, as shown in Appendix D, the questionnaire solicited user opinion with regards to the following areas:

Section A – System Usability Scale: Consisting of ten standardised questions as defined in

the Digital Equipment Corporation System Usability Scale, a full discussion of which can be found in [241].

Section B – Educational Considerations: Consisting of fifteen multiple choice questions in

the same format at those in section A:

1. I feel that I have learned something by using this system. 2. The excavation simulation reveals believable information.

3. I found it difficult to find out information about the archaeological site. 4. The quality of the material presented was consistent.

5. I believed that all the artefacts I discovered could have been located within the region of the excavation.

6. I feel that using this system helps develop my understanding of fieldwork methods and techniques.

7. I found the system educationally stimulating. 8. I was able to easily identify material culture.

9. The tools provided by the system allowed me to practice the theory that I have learned relating to managing an excavation.

10. Working in a group helped me understand the excavation process.

11. I found it useful to be able to identify where finds were located within the site. 12. The descriptions of the artefacts I found were reasonable.

13. The flow of the excavation made sense to me.

14. I was able to find the tools and information I needed to maintain my context sheets. 15. I would have preferred to work individually using the system.

The focus of these questions was on the educational motivations behind the system, with the questions designed to solicit participants’ perception of the educational value of the simulation they engaged with. These questions were designed to elicit participant perceptions in relation to educational value (questions 1, 6, 7, 9), realism (questions 2, 4, 5, 12, 13) and value of groupwork (questions 10, 15). Students were asked to indicate their support for the above statements on a five point Likert scale. Using an approach similar to the one used to analyse the SUS scale results, a weighted sum of answers was calculated with the result ranging from 0 to 100, with a result of 50 indicating a neutral response.

Section C – Free Form Questions: Consisting of four open questions designed to allow

participants the opportunity to provide feedback on aspects of the system not covered in sections A and B.

By obtaining data relating to participants’ perception of the usability of the system and the perceived educational value of LAVA, the final questionnaire allowed for a review of LAVA’s comparative performance to be undertaken. In addition, the metrics were also useful in inferring properties of system attributes which were otherwise be difficult to measure: user engagement and perceived educational value being two examples.

Of course, as the questionnaires were not primarily designed to solicit data relating to user engagement, the inferences drawn from the questionnaire data were only used to corroborate existing findings from interview and group observation activities. However, as participants are more likely to be aware of these activities, there is a strong possibility of introducing experimenter effects, with users attempting to pre-empt the outcome of the evaluation process by indicating levels of engagement higher than

would normally be achieved by the system. Hence, using user responses relating to usability and educational value provided a valuable counter to these possible experimenter effects, with the combination of direct observation and interview data with secondary questionnaire data strengthening the findings of the evaluation process by highlighting cases where participants were reporting unduly high levels of engagement.

In terms of inferring levels of engagement from the usability and educational value results, an assumption was made that participants who actively engage with the scenarios presented by LAVA were far more likely to report high levels of educational value than those who do not take an active role within the system. The rationale for this being that those who do not engage with the system are less likely to enjoy using it, and are thus less likely to progress to a stage where they are able to experience the more advanced, and thus educationally stimulating, elements of the excavation simulation. In contrast, those who do take the time to engage with the system are far more likely to become accustomed to the user interface, thus allowing them to develop familiarity with the system, which in turn may lead to more positive usability feedback than from those participants who did not engage with the system as fully and who thus are unable to develop the same level of familiarity with the system.

Individual Sessions

Within the evaluation process, individual sessions were used to gather detailed data relating to problems highlighted during the group sessions. Given that the participants of individual sessions were volunteers obtained from the group evaluation sessions, most were familiar with LAVA and had previously engaged with the system. This made the individual sessions a good opportunity to trial updates to LAVA prior to them being rolled out for general use.

Each individual session was scheduled to last for between 30 and 45 minutes depending on the aspect of LAVA being investigated, with the timescale of each session agreed with the participant in advance. Given the prior exposure to the evaluation process, a minimal amount of time was spent on familiarisation exercises at the start of each session, with participants briefed as to the objectives of the session and asked to complete a demographic questionnaire, as shown in Appendix C.

Once the preliminary questionnaire had been completed and the participant briefed as to the nature of the evaluation session, individual sessions were run using a mixture of observation and structured interviews. As with the group sessions, participants were provided with a worksheet to guide their progress throughout the session. Owing to the tailored nature of the individual sessions, these worksheets were not standardised as in the group sessions, but instead listed a customised set of objectives based on the area of LAVA under investigation. Participants’ interactions with LAVA were observed throughout the session by an evaluator. When each objective was met, the evaluator engaged the participant in a semi-structured interview in order to examine the approach adopted by the participant and any problems that they encountered.

Throughout the session, participants were encouraged to describe their thought processes as they progressed through the system, engaging in dialogue with the evaluator as and when they wished. If participants were naturally vocal, then the evaluator took a predominantly passive role, allowing the participant to lead the discussion. In cases where specific issues needed to be discussed, or when the participant was not forthcoming in beginning discussions, the evaluator took a more active role, introducing cues to encourage dialogue.

Unlike group based sessions, within the individual evaluation sessions only a single evaluator was present. In this arrangement, the roles of demonstrator and evaluator were undertaken by the same person. Whilst this approach reduced the protection from experimenter bias causing distortions to findings, it was felt that presenting the evaluator as someone able to assist with both technical issues related to LAVA and procedural issues relating to the evaluation process was of greater benefit – not only encouraging participants to report problems, but also encouraging them to explore possible workarounds with the evaluator, thereby revealing more details of the thought processes guiding their interactions. This arrangement also reduced the problems associated with overcrowding the evaluation session; with fewer people present it was less likely that the participant would feel overwhelmed by the evaluation process, thereby encouraging them to voice their thoughts and opinions freely.

After the completion of the objectives listed on the session worksheet, participants were asked some